Message boards : climateprediction.net Science : time to complete is months away and growing
Message board moderation
Author | Message |
---|---|
Send message Joined: 14 Feb 06 Posts: 19 Credit: 28,513 RAC: 0 |
I\'m running a 1.7GHz P4 with 785 MB memory. When I downloaded a run it said it would take 2496 hrs to complete. This gave me an \"earliest complete\" date of 5th June. Since then I have run 180 processing hours and the time to complete is now greater than it started and stands at 2561 hrs. This gives me an \"earlist complete\" time of 28th June. I am (supposedly) 6.2% of the way through this run. This just does not add up so someone must have put some funny startup figures in. |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
The \'completion time\' is very approximate, you\'re best off using the 6.2% to estimate the completion time. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 14 Feb 06 Posts: 19 Credit: 28,513 RAC: 0 |
\"Approximate\" is not the word I would use. If one computes \"earliest complete\" based on \"percentage through\", this one will still be going in August, and that\'s assuming I never turn the machine off, nor do anything else while it\'s on |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
\'Approximate\' was the polite word :-) Your CPU time for 75614 timesteps is 714877 seconds, hence the total will be 141 days of processing time. This is fairly typical for a 1.7GHz machine. The deadlines can be ignored, since the scientists are gathering the run information over a period of years. It is a good idea not to run the screensaver, since this takes a lot of CPU time. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I\'m on the last stages of a \'spinup\' model: 200 model years, nearly 3000 hours. Even when it was over half way, it showed less than 50%. BOINC just isn\'t very good at estimating long models; it\'s been optimised for short ones, like SETI, LHC, Einstein, etc. It\'ll get there. |
Send message Joined: 14 Feb 06 Posts: 19 Credit: 28,513 RAC: 0 |
Then I suspect this particular run will never be completed. Right now I\'m running it 24*7 and the end date (whichever way you calculate it) goes out by at least 18 hours per 23.5 hrs computation time (i.e 24 hrs powered up time). I\'m OK with running it like that at the moment as the heat wasted helps warm the house. In a month or so, I\'ll not do that but have the computer on only when I need to do something. That means it will be on about an hour a day, which means it will get about 10-15 minutes BOINC processing time(max) per day. On that basis, the processing time will be just 48 hrs for the entire summer (i.e. 2 current days running). Even if I ran it 24*7 and did nothing else on the computer, the current end date is September 2006. Minimising the power loss during the \"warm month\" will put this back by at least 6 months. Add in the time that it is \"going out\" means that it will not be near completion until the middle of 2007, but I\'ll not be running it then as it\'s \"warm\" again so my current estimate for completing of this run is late in 2007, early 2008 (that is two years away). Which I note is more than 12 months beyond the \"reporting deadline\". I should also say that I don\'t anticipate keeping this computer that long. Les, You say your run took 3000 hours of processing. How long is this in elapse time? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
but have the computer on only when I need to do something. That means it will be on about an hour a day, which means it will get about 10-15 minutes BOINC processing time(max) per day. 10-15 minutes a day may not be long enough for the model to reach a new checkpoint. So next day it will just repeat the previous days processing. Even if your model only gets halfway, (or a bit past it), it will still be very useful. The 1st half is Hindsight, (checking that it has produced a reasonable replica of the past), and the 2nd is Foresight, (seeing what the parameter combination used produces in the way of climate). **** Spinups take about 4 months on a P4 3.2GHz machine, (or AMD eqivalent), running 24/7, with very little else running. Mine is now at 406 hours to go. I had a few problems along the way, and used backups to recovery, which has slowed down the finish. Spinups were the test models for the TCMs, as used in the BBC experiment, and now also here at cpdn. |
Send message Joined: 14 Feb 06 Posts: 19 Credit: 28,513 RAC: 0 |
It somehow does not seem to be a good starting premis knowing/expecting only a few runs to ever complete in their entirety. I seem to recall an earlier thread about retention of users. I can imagine people being turned off by the length of time to get \"a result\" and therefore cancelling the job and not accepting any more. What about all those jobs that manage to get through Phase I, but then get lost (for whatever reason) Is there a way of another job picking up on these and running with them? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Is there a way of another job picking up on these and running with them? It\'s not possible to continue someone else\'s failed job from the point where it failed. 1) The program isn\'t designed in a way where this could be done. 2) Some parameter combinations will fail any way. What is being run, is a desktop version of the Met Offices 64 bit programs that run on their supoercomputers. The source code is 50+ Megabytes in size, has 1 million+ lines of Fortran, and took nearly 2 years to convert to 32 bit code and get it to run stably. This project is the result of an attempt by Dr Myles Allen of the Atmospheric, Oceanic & Planetary Physics dept. at Oxford University, to see if it\'s possible to improve climate forcasting, by running lots of models with slightly different parameters and combining the successfull results into a huge ensemble of results. Which is all described in the pages and texts in the Climate Science section to the left of here. It is known that there will be a lot of people who are put of by the long model times, but there are enough left who are willing to plod on with research. And this is the best that can be hoped for by a university dept. And don\'t forget that phase 1 of sulphur models have extra data extracted and sent back for inclusion in experiment 2, which has just begun. Or that experiment 2, the Coupled Ocean model, has more data sent back more frequently. |
Send message Joined: 14 Feb 06 Posts: 19 Credit: 28,513 RAC: 0 |
Les thanks for that. Re failure, I was meaning more that someone \"gave up\" rather than the program crashed. It would seem a shame if someone \"gave up\" half way through Phase III (say) as the fact that Phases I & II completed would surely mean that the parameters were (generally) OK. Yes you will have the feedback from Phases I & II but will not have the progression. Is there anywhere I can find out the detail of what is computed in each phase? Re put off, I\'m sure that there are many. I have read the side notes and it talks of 1.4GHz processor times (in slightly vague terms). Knowing what I know now, I would respectfully suggest that a Climate Prediction run should NOT be attempted on anything less than a 2.0GHz machine. (preferably thoroughly defragged and with at least 512MB memory and 15GB disk space free). I am, however, determined to get this run done. That said, progress will be slow after the weather warms up (assuming it does!?!), and not speed up again until the cooler months. Right now I\'m at 7.94% after 1 month of 24/7 CPU up time. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Re failure If the server hasn\'t heard from a model for about 6 weeks, it assumes that the model is lost, and marks that data set for possible reissue. The Oxford people have been working on this for years, and they HAVE considered all the possiblities.
In the Climate Science pages, via the link in the blue menu to the left of here. Re put off One person on the BBC site was trying to use a 192MHz computer with 64Megs of ram. (Or something close to that.) Another wanted to know the algorithm so that he could work it out by hand as a challenge, becuse he had a GCSE in maths, and \"it couldn\'t be all that hard to do.\" Those of us who have been at this for a long time had long discusssions about making the documents simpler, and increasing the speed requirement before the BBC launch, but were hampered both by a media embargo on public discussion, and by the BBC desire to make the project available to the masses. Best to just crunch, and let the Oxford people worry about the details. |
Send message Joined: 14 Feb 06 Posts: 19 Credit: 28,513 RAC: 0 |
\'nuff said I\'ll keep the thread updated with my \"progress\" |
Send message Joined: 14 Feb 06 Posts: 19 Credit: 28,513 RAC: 0 |
Update as promised 1.7GHz tower Start date 21/2/06 estimated work hours 2496 3500+ 64bit Laptop start date 17/2/06 estimated work hours 1230 Machine CPU hrs % thro To complete Earliest complete Estimated complete Tower 333 10.36 2598 hrs 13/7/06 7/8/06 Laptop 431 31.71 1134 hrs 13/5/06 22/5/06 Notes \"Earliest complete\" = \"now\" + \"hours to complete\" \"Estimated complete\" = \"now\" + \"proportion work done\" * \"time taken to do current work\" Both assume machines are on 24/7 (which will not be true during \"warm month\") Tower is sharing BOINC time with SETI and Einstein |
Send message Joined: 14 Feb 06 Posts: 19 Credit: 28,513 RAC: 0 |
Update as promised Mach\'n CPUHrs %Thro To Compl Earl Com Est Compl Disk Phase Date Tower 638 20.79 2656 hrs 5/8/06 23/8/06 0.55GB 2 6th Jul 1826 Laptop 623 45.90 1048 hrs 30/5/06 12/6/06 1.13GB 3 10th Jun 1845 Now starting to warm up, so neither machine on more than a couple of hrs/day |
Send message Joined: 14 Feb 06 Posts: 19 Credit: 28,513 RAC: 0 |
Update as promised. Mach\'n CPUHrs %Thro To Compl Earl Com Est Compl Disk Phase date Tower 860 28.38 2844hrs 25th Aug 2nd Sept 0.98GB 13 Mar 1832 Laptop 688 50.73 2336hrs 4th Aug 24h June 1.4 GB 18 Dec 1848 Laptop not run much since last update, but Hrs to complete seems to have doubled somehow. Tower run about 50% of the time. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The laptop hours will have increased because BOINC is basing part of it\'s \'assumptions\' of your computer usage on the fact that the computer isn\'t on for very long. |
Send message Joined: 14 Feb 06 Posts: 19 Credit: 28,513 RAC: 0 |
What is it they say about \"assume\"? It makes an \"ASS\" out of \"U\" and \"ME\". Many say that \"modeling\" is based on \"assumes\" (i.e assumptions) |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
OK. How about: does it\'s best to calculate the time to completion based on numerous different factors? All of which is probably described in the BOINC Wiki. |
©2024 cpdn.org