climateprediction.net (CPDN) home page
Thread 'time to complete is months away and growing'

Thread 'time to complete is months away and growing'

Message boards : climateprediction.net Science : time to complete is months away and growing
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user160722

Send message
Joined: 14 Feb 06
Posts: 19
Credit: 28,513
RAC: 0
Message 21259 - Posted: 14 Mar 2006, 18:27:17 UTC

I\'m running a 1.7GHz P4 with 785 MB memory. When I downloaded a run it said it would take 2496 hrs to complete. This gave me an \"earliest complete\" date of 5th June. Since then I have run 180 processing hours and the time to complete is now greater than it started and stands at 2561 hrs. This gives me an \"earlist complete\" time of 28th June. I am (supposedly) 6.2% of the way through this run. This just does not add up so someone must have put some funny startup figures in.
ID: 21259 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 21262 - Posted: 14 Mar 2006, 20:22:33 UTC

The \'completion time\' is very approximate, you\'re best off using the 6.2% to estimate the completion time.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 21262 · Report as offensive     Reply Quote
old_user160722

Send message
Joined: 14 Feb 06
Posts: 19
Credit: 28,513
RAC: 0
Message 21264 - Posted: 14 Mar 2006, 20:42:36 UTC
Last modified: 14 Mar 2006, 20:47:25 UTC

\"Approximate\" is not the word I would use. If one computes \"earliest complete\" based on \"percentage through\", this one will still be going in August, and that\'s assuming I never turn the machine off, nor do anything else while it\'s on
ID: 21264 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 21265 - Posted: 14 Mar 2006, 21:24:58 UTC
Last modified: 14 Mar 2006, 21:26:14 UTC

\'Approximate\' was the polite word :-)

Your CPU time for 75614 timesteps is 714877 seconds, hence the total will be 141 days of processing time. This is fairly typical for a 1.7GHz machine.

The deadlines can be ignored, since the scientists are gathering the run information over a period of years.

It is a good idea not to run the screensaver, since this takes a lot of CPU time.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 21265 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 21267 - Posted: 14 Mar 2006, 23:04:57 UTC

I\'m on the last stages of a \'spinup\' model: 200 model years, nearly 3000 hours.
Even when it was over half way, it showed less than 50%.
BOINC just isn\'t very good at estimating long models; it\'s been optimised for short ones, like SETI, LHC, Einstein, etc.
It\'ll get there.

ID: 21267 · Report as offensive     Reply Quote
old_user160722

Send message
Joined: 14 Feb 06
Posts: 19
Credit: 28,513
RAC: 0
Message 21313 - Posted: 15 Mar 2006, 19:56:50 UTC

Then I suspect this particular run will never be completed. Right now I\'m running it 24*7 and the end date (whichever way you calculate it) goes out by at least 18 hours per 23.5 hrs computation time (i.e 24 hrs powered up time). I\'m OK with running it like that at the moment as the heat wasted helps warm the house. In a month or so, I\'ll not do that but have the computer on only when I need to do something. That means it will be on about an hour a day, which means it will get about 10-15 minutes BOINC processing time(max) per day. On that basis, the processing time will be just 48 hrs for the entire summer (i.e. 2 current days running). Even if I ran it 24*7 and did nothing else on the computer, the current end date is September 2006. Minimising the power loss during the \"warm month\" will put this back by at least 6 months. Add in the time that it is \"going out\" means that it will not be near completion until the middle of 2007, but I\'ll not be running it then as it\'s \"warm\" again so my current estimate for completing of this run is late in 2007, early 2008 (that is two years away). Which I note is more than 12 months beyond the \"reporting deadline\".
I should also say that I don\'t anticipate keeping this computer that long.
Les, You say your run took 3000 hours of processing. How long is this in elapse time?
ID: 21313 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 21316 - Posted: 15 Mar 2006, 21:41:39 UTC

but have the computer on only when I need to do something. That means it will be on about an hour a day, which means it will get about 10-15 minutes BOINC processing time(max) per day.


10-15 minutes a day may not be long enough for the model to reach a new checkpoint. So next day it will just repeat the previous days processing.
Even if your model only gets halfway, (or a bit past it), it will still be very useful.
The 1st half is Hindsight, (checking that it has produced a reasonable replica of the past), and the 2nd is Foresight, (seeing what the parameter combination used produces in the way of climate).

****

Spinups take about 4 months on a P4 3.2GHz machine, (or AMD eqivalent), running 24/7, with very little else running.
Mine is now at 406 hours to go. I had a few problems along the way, and used backups to recovery, which has slowed down the finish.

Spinups were the test models for the TCMs, as used in the BBC experiment, and now also here at cpdn.

ID: 21316 · Report as offensive     Reply Quote
old_user160722

Send message
Joined: 14 Feb 06
Posts: 19
Credit: 28,513
RAC: 0
Message 21349 - Posted: 16 Mar 2006, 20:17:06 UTC

It somehow does not seem to be a good starting premis knowing/expecting only a few runs to ever complete in their entirety. I seem to recall an earlier thread about retention of users. I can imagine people being turned off by the length of time to get \"a result\" and therefore cancelling the job and not accepting any more. What about all those jobs that manage to get through Phase I, but then get lost (for whatever reason) Is there a way of another job picking up on these and running with them?
ID: 21349 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 21353 - Posted: 16 Mar 2006, 20:59:57 UTC

Is there a way of another job picking up on these and running with them?

It\'s not possible to continue someone else\'s failed job from the point where it failed.
1) The program isn\'t designed in a way where this could be done.
2) Some parameter combinations will fail any way.

What is being run, is a desktop version of the Met Offices 64 bit programs that run on their supoercomputers. The source code is 50+ Megabytes in size, has 1 million+ lines of Fortran, and took nearly 2 years to convert to 32 bit code and get it to run stably.

This project is the result of an attempt by Dr Myles Allen of the Atmospheric, Oceanic & Planetary Physics dept. at Oxford University, to see if it\'s possible to improve climate forcasting, by running lots of models with slightly different parameters and combining the successfull results into a huge ensemble of results.

Which is all described in the pages and texts in the Climate Science section to the left of here.

It is known that there will be a lot of people who are put of by the long model times, but there are enough left who are willing to plod on with research.
And this is the best that can be hoped for by a university dept.

And don\'t forget that phase 1 of sulphur models have extra data extracted and sent back for inclusion in experiment 2, which has just begun. Or that experiment 2, the Coupled Ocean model, has more data sent back more frequently.

ID: 21353 · Report as offensive     Reply Quote
old_user160722

Send message
Joined: 14 Feb 06
Posts: 19
Credit: 28,513
RAC: 0
Message 21402 - Posted: 18 Mar 2006, 19:39:55 UTC

Les thanks for that.
Re failure, I was meaning more that someone \"gave up\" rather than the program crashed. It would seem a shame if someone \"gave up\" half way through Phase III (say) as the fact that Phases I & II completed would surely mean that the parameters were (generally) OK. Yes you will have the feedback from Phases I & II but will not have the progression.

Is there anywhere I can find out the detail of what is computed in each phase?

Re put off, I\'m sure that there are many. I have read the side notes and it talks of 1.4GHz processor times (in slightly vague terms). Knowing what I know now, I would respectfully suggest that a Climate Prediction run should NOT be attempted on anything less than a 2.0GHz machine. (preferably thoroughly defragged and with at least 512MB memory and 15GB disk space free).
I am, however, determined to get this run done. That said, progress will be slow after the weather warms up (assuming it does!?!), and not speed up again until the cooler months. Right now I\'m at 7.94% after 1 month of 24/7 CPU up time.
ID: 21402 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 21405 - Posted: 18 Mar 2006, 20:10:31 UTC

Re failure

If the server hasn\'t heard from a model for about 6 weeks, it assumes that the model is lost, and marks that data set for possible reissue.
The Oxford people have been working on this for years, and they HAVE considered all the possiblities.


Is there anywhere I can find out the detail of what is computed in each phase?

In the Climate Science pages, via the link in the blue menu to the left of here.

Re put off

One person on the BBC site was trying to use a 192MHz computer with 64Megs of ram. (Or something close to that.)
Another wanted to know the algorithm so that he could work it out by hand as a challenge, becuse he had a GCSE in maths, and \"it couldn\'t be all that hard to do.\"

Those of us who have been at this for a long time had long discusssions about making the documents simpler, and increasing the speed requirement before the BBC launch, but were hampered both by a media embargo on public discussion, and by the BBC desire to make the project available to the masses.

Best to just crunch, and let the Oxford people worry about the details.

ID: 21405 · Report as offensive     Reply Quote
old_user160722

Send message
Joined: 14 Feb 06
Posts: 19
Credit: 28,513
RAC: 0
Message 21408 - Posted: 18 Mar 2006, 22:07:22 UTC

\'nuff said
I\'ll keep the thread updated with my \"progress\"
ID: 21408 · Report as offensive     Reply Quote
old_user160722

Send message
Joined: 14 Feb 06
Posts: 19
Credit: 28,513
RAC: 0
Message 21642 - Posted: 27 Mar 2006, 18:56:04 UTC

Update as promised
1.7GHz tower Start date 21/2/06 estimated work hours 2496
3500+ 64bit Laptop start date 17/2/06 estimated work hours 1230



Machine CPU hrs % thro To complete Earliest complete Estimated complete
Tower 333 10.36 2598 hrs 13/7/06 7/8/06
Laptop 431 31.71 1134 hrs 13/5/06 22/5/06

Notes \"Earliest complete\" = \"now\" + \"hours to complete\"
\"Estimated complete\" = \"now\" + \"proportion work done\" * \"time taken to do current work\"
Both assume machines are on 24/7 (which will not be true during \"warm month\")
Tower is sharing BOINC time with SETI and Einstein
ID: 21642 · Report as offensive     Reply Quote
old_user160722

Send message
Joined: 14 Feb 06
Posts: 19
Credit: 28,513
RAC: 0
Message 22149 - Posted: 17 Apr 2006, 15:51:40 UTC

Update as promised
Mach\'n CPUHrs %Thro To Compl Earl Com Est Compl Disk Phase Date
Tower 638 20.79 2656 hrs 5/8/06 23/8/06 0.55GB 2 6th Jul 1826
Laptop 623 45.90 1048 hrs 30/5/06 12/6/06 1.13GB 3 10th Jun 1845

Now starting to warm up, so neither machine on more than a couple of hrs/day
ID: 22149 · Report as offensive     Reply Quote
old_user160722

Send message
Joined: 14 Feb 06
Posts: 19
Credit: 28,513
RAC: 0
Message 22475 - Posted: 29 Apr 2006, 17:20:16 UTC

Update as promised.
Mach\'n CPUHrs %Thro To Compl Earl Com Est Compl Disk Phase date
Tower 860 28.38 2844hrs 25th Aug 2nd Sept 0.98GB 13 Mar 1832
Laptop 688 50.73 2336hrs 4th Aug 24h June 1.4 GB 18 Dec 1848

Laptop not run much since last update, but Hrs to complete seems to have doubled somehow. Tower run about 50% of the time.
ID: 22475 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 22478 - Posted: 29 Apr 2006, 17:52:02 UTC

The laptop hours will have increased because BOINC is basing part of it\'s \'assumptions\' of your computer usage on the fact that the computer isn\'t on for very long.


ID: 22478 · Report as offensive     Reply Quote
old_user160722

Send message
Joined: 14 Feb 06
Posts: 19
Credit: 28,513
RAC: 0
Message 22487 - Posted: 29 Apr 2006, 20:02:24 UTC

What is it they say about \"assume\"? It makes an \"ASS\" out of \"U\" and \"ME\". Many say that \"modeling\" is based on \"assumes\" (i.e assumptions)
ID: 22487 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 22489 - Posted: 29 Apr 2006, 21:08:18 UTC

OK. How about: does it\'s best to calculate the time to completion based on numerous different factors?
All of which is probably described in the BOINC Wiki.

ID: 22489 · Report as offensive     Reply Quote

Message boards : climateprediction.net Science : time to complete is months away and growing

©2024 cpdn.org