Message boards : Number crunching : HadCM3 160-year runtime estimate wrong?
Message board moderation
Author | Message |
---|---|
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,716,561 RAC: 8,355 |
I\'ve just downloaded two new HadCM3 models. I got 160-year models (7177229, 7177246), replacing 80-year models which have just finished on both machines. Yet both machines are telling me that the models will take about 48 days, the same as the 80-year models: that estimate is about right for 80 years at 2.0/2.1 s/TS, but it should be 96 days for 160 years. BOINC is v5.10.13 under Windows XP. |
Send message Joined: 3 Oct 06 Posts: 43 Credit: 8,017,057 RAC: 0 |
I\'ve just downloaded two new HadCM3 models. I got 160-year models (7177229, 7177246), replacing 80-year models which have just finished on both machines. I also got a 160 year model, and it also displays about the same time to complete as the 80 years models, I crunched previously. The deadline is sufficiently far away, so it is not much of a problem to the BOINC manager, I think. Yes,I do know deadlines don\'t matter on CPDN, but the manager doesn\'t know that. ;) |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Last I heard, the deadline for the 160 year models was going to be made just under 3 years. (There\'s a BOINC problem if it\'s made longer than 3 years.) Best to leave it for a year before worrying. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,716,561 RAC: 8,355 |
The deadline for this morning\'s models is 10 July 2010 (2.5 years away), which sounds about right. It\'s the To completion estimate which seems to have gone wrong - either as a consequence, or otherwise. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Ahhh! I just thought of something. If there is only one \"result duration correction factor\" for ALL TCM\'s, then it may be set for the 80 year models. If so, then BOINC is in for a rude shock. And you should prepare for lots of error messages as you near the current \"estimate time\". I\'ll ask about it. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Ah HAAA! Early reply: There\'s only one RDCF per host per project, no matter what kinds of applications they run on that host. Also, the RDCF can be edited, BUT ...... If, like me, people run several different types of models on one computer, (in my case, a quad core), this is not a solution. Luckily, I\'m running one HADAM3, and 3 HADSM3s, which have a similar completion time, so BOINC won\'t get too flustered. At the moment, all the advice that I can offer is Don\'t Panic (in big friendly letters. ) No doubt more latter. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,716,561 RAC: 8,355 |
Then, with apologies, we have to get technical: Here are two snippets from client_state.xml: <workunit> <name>hadcm3iozn_cpcx_2000_80_135898634</name> <app_name>hadcm3i</app_name> <version_num>544</version_num> <rsc_fpops_est>7000000000000000.000000</rsc_fpops_est> <rsc_fpops_bound>21000000000000000.000000</rsc_fpops_bound> <rsc_memory_bound>134217728.000000</rsc_memory_bound> <rsc_disk_bound>629145600.000000</rsc_disk_bound> ..... <workunit> <name>hadcm3istd_00uh_1920_160_15922618</name> <app_name>hadcm3i</app_name> <version_num>544</version_num> <rsc_fpops_est>7000000000000000.000000</rsc_fpops_est> <rsc_fpops_bound>21000000000000000.000000</rsc_fpops_bound> <rsc_memory_bound>134217728.000000</rsc_memory_bound> <rsc_disk_bound>629145600.000000</rsc_disk_bound> ..... Those are the two active tasks on host 788869. You can see from the names that the first is an 80-year model, and the second is a 160-year model. Yet they both have the same <rsc_fpops_est> (which is what BOINC uses, along with RDCF and benchmarks, to calculate \'time to completion\'): and, more alarmingly, they have the same <rsc_fpops_bound> (which could, in extreme cases, cause the tasks to fail with a \'maximum CPU time exceeded\', or whatever the wording is). Something needs a bit of a tweak in the workunit generation department. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I\'ve sent a PM to the project people to get them across the situation. |
Send message Joined: 5 Aug 04 Posts: 173 Credit: 1,843,046 RAC: 0 |
You\'re absolutely right. The initial duration estimate should be double for the 160 WU. I apologize for this mistake on our part. I\'ve corrected this. You can edit the client state file to reflect the correct values below. (i.e just for the 160 WU) 80 WU <rsc_fpops_est> \"7000000000000000\" </rsc_fpops_est> <rsc_fpops_bound> \"21000000000000000\" </rsc_fpops_bound> 160 WU <rsc_fpops_est> \"14000000000000000\" </rsc_fpops_est> <rsc_fpops_bound> \"42000000000000000\" </rsc_fpops_bound> Sorry once again and thanks for pointing this out. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,716,561 RAC: 8,355 |
You\'re absolutely right. The initial duration estimate should be double for the 160 WU. Thanks, Tolu. Memo to other users - stop BOINC completely, and take an extra backup for luck, before attempting to edit client_state.xml [edit - and keep the formatting as in my message 32106 - i.e., no quotation marks, and six decimal places.] |
Send message Joined: 1 Feb 07 Posts: 26 Credit: 885,216 RAC: 0 |
You\'re absolutely right. The initial duration estimate should be double for the 160 WU. Good spot, Richard. An interesting sideline to this one is that, for the past 12 hours the \"Time to complete\" in Boinc Manager for my 160 year model has been slowly rising (i.e. before I edited the .xml). After 36 hours crunching it was showing about 706 hours to complete, after 46 hours (just before the edit) it was showing 710 hours to complete. Following the edit (still 46 hours in) it is showing 1376 hours to complete - and reducing as expected. Perhaps there is a mechanism that would have prevented the Armageddon that Les was warning of? F. |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
Done mine: it looks a lot more sensible now. Note that there is a group of bounds for each model and it\'s only the fpops bit that needs changing, and only for the 160-year HADCM3 models. Thanks, all. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
There\'s a click-by-click explanation of how to edit numbers in the xml file in this post: http://www.climateprediction.net/board/viewtopic.php?t=7215 Don\'t bother reading the explanation at the beginning of that post if you don\'t want to. It was in any case written for members moving their model to a faster computer. But the editing method is the same. Go straight to the How to fix it section of the post. If you find that the values for your model are already what Tolu says they should be 4 posts above this, you don\'t need to edit anything. Cpdn news |
©2024 cpdn.org