Message boards : Number crunching : deadline too short for these models ?
Message board moderation
Author | Message |
---|---|
Send message Joined: 17 Aug 04 Posts: 289 Credit: 44,103,664 RAC: 0 |
. do these deadlines (3 months) seem too short for these ClimatePrediction.net models ?
|
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,827,799 RAC: 5,038 |
As far as I can tell there is no hyperthreaded Xeon processor operating at ~1.8 GHz, so that isn't the cause. The models must therefore be slow because: 1. RAM is rather low for 4 x HADCM3N 2. 1.8 GHz is actually a bit slow by modern standards 3. 4 similar models slow each other down 4. The models are not really slow, but BOINC thinks they are. Option 1 may be fixable depending on the machine build, but you may be reluctant to do that for an old machine (with possibly expensive memory). Upgrading out of option #2 is unlikely for cost reasons. Option #3 can be tested by suspending two of the models and checking for a significant increase in speed. Option #4 is easy to check: just leave them for a bit and check whether the time-to-go reduces by more than the increase in the time-already-gone. Start with option #4, then #3 and see how things develop. I wouldn't normally advise anyone to abort a viable model, but the sub-project running the HADCM3N models does have a pressing schedule: the data will be accepted by CPDN whenever it's submitted but a late submission may be too late for the science. That isn't usually the case with CPDN, but the RAPID-RAPIT case is unusual. |
Send message Joined: 17 Aug 04 Posts: 289 Credit: 44,103,664 RAC: 0 |
. #4. The models are not really slow, but BOINC thinks they are. Yes thank you Iain Yes I think your #4 may be the case. Because as I watch the BOINC GUI ... the time remaining ... to deadline ... is rapidly declining. and by a large factor. so I think I will be fine. I have set no new work. and will just let these four (4) models on this slow computer crunch away 24/7/365 Best Wishes Byron . |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Dear Bryon It is usual for the “to completion” time at the start of a model to be twice the real running time. I presently have a CM3n that I downloaded yesterday running in “high priority” mode. Completion time is listed as 2140 hours. The CM3n spin-up model that I recently finished said just about the same when it started. It finished in less than 1000 hours. That’s was on a 2.2 GHz processor so yours will take a little longer. So don’t worry, be happy. |
Send message Joined: 17 Aug 04 Posts: 289 Credit: 44,103,664 RAC: 0 |
Thank you Jim for that information. Best Wishes Byron |
Send message Joined: 12 Feb 08 Posts: 66 Credit: 4,877,652 RAC: 0 |
The cpdn needs the results of the third batch by mid-August, so the deadline doesn’t seem too short at all, quite the contrary actually! You can get a much better estimate of the actual total run time by dividing the elapsed time by the percentage done, eg. 11.9 h / 0.408 % = 2900 hours, but you'll get a better estimate when your runtime gets longer. |
Send message Joined: 17 Aug 04 Posts: 289 Credit: 44,103,664 RAC: 0 |
. The cpdn needs the results of the third batch by mid-August, so the deadline doesn’t seem too short at all, quite the contrary actually! You can get a much better estimate of the actual total run time by dividing the elapsed time by the percentage done, eg. 11.9 h / 0.408 % = 2900 hours, but you'll get a better estimate when your runtime gets longer. Thank you 3rkko for that information. dividing the elapsed time .. by .. the percentage done = approximate run time A little update and Good news on the four (4) models that my four (4) Intel Xeon CPU 1.80GHz are running on this 24/7 full time ... dedicated computer to ... Climate Prediction.net my first trickle up Claimed credit of: 311.04 so I estimate on my slow computer to complete these four (4) models by first part of August 2011
|
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
The two tasks I am running both suggest over 1000 hours to go. An estimate based on percentage and elapsed time for both suggests a total time of under 700 Hours with both tasks a little over 10% complete. A link to how the estimated time to completion is worked out would be interesting, though I suspect the answer would lose me! |
Send message Joined: 6 Aug 04 Posts: 195 Credit: 28,374,828 RAC: 10,749 |
The two tasks I am running both suggest over 1000 hours to go. An estimate based on percentage and elapsed time for both suggests a total time of under 700 Hours with both tasks a little over 10% complete. The short answer is: That's BOINC for you. Time 'To completion' will get itself sorted out over time. |
Send message Joined: 3 Oct 06 Posts: 43 Credit: 8,017,057 RAC: 0 |
If I remember correctly, the estimate is based on the runtime of previous tasks. The term duration corection factor may sound familiar to you. If that info is not available for the task type, I believe a project estimate will be used.In this casae apparently 1000 hours |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
The formula to work out the estimated elapsed time uses the following XML tags from client_state.xml:
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 14 Sep 10 Posts: 11 Credit: 1,812,972 RAC: 0 |
On my Intel E5520-based system, BOINC usually estimates that HADCM3N models will take ~1400 hours, but they usually end up taking only ~550 hours to run to completion. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Thanks for that - not as lost as I thought I might be. |
Send message Joined: 27 Feb 08 Posts: 4 Credit: 960,510 RAC: 0 |
Just a heads up, I was crunching 2 of these models but they both crashed at around the same % complete. One crunched for a little over 98 hours and another for almost 98 hours. I think the completion % was around 25% at that time. \ Task 1: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=12993951 Task 2:http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=12992822 Of course I don't know if all the models from that batch will error out there but I thought it was quite suspicious that they both did so at around the same % complete. |
Send message Joined: 6 Aug 04 Posts: 264 Credit: 965,476 RAC: 0 |
Why do they have to run in high priority? Mine deadline is September 26. Tullio |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Why do they have to run in high priority? That's just the way that BOINC works when you have a mix of other projects with very short run times, and then add really long models such as these. Boinc will eventually "learn", and then it should go back to normal. But it may take a week or two, so it's best just to let BOINC get on with the learning process. Backups: Here |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
The only problem with just letting it run for a week or two while it learns that you are not going to miss the deadline is that while it is running in “high priority” CPDN is building a large debt to the other projects. Since there does not seem to be any way to cancel this debt it can cause problems downloading new models later. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Life's tough in the multi-project lane. :) The average cruncher will just have to accept a bit of a delay in their short projects work. Experienced crunchers can (carefully) edit their client_state file to adjust the values. But debt isn't as bad as it may seem at 'first sight'. And I DID post in the News thread, (everyone remember that thread? :) ), that once the loooong RAPIT models are running, that there'll be shorter regional models again. Which is still the case. Backups: Here |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
Why do they have to run in high priority? Mine deadline is September 26. It's probably happening because <rsc_fpops_est> is set too high (as I mentioned here). "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 27 Jan 07 Posts: 300 Credit: 3,288,263 RAC: 26,370 |
My duration correction factor used to be near 1.0 for CPDN. All of my other projects are within +/-25% of 1. However, since running these very long 80 year models, my DCF is over 1.7. The estimated time to completion always rises, and it's well over 1400 hours for a single task. Inside my client_state.xml: <workunit> <name>hadcm3n_t5wu_1940_40_007316366</name> <app_name>hadcm3n</app_name> <version_num>607</version_num> <rsc_fpops_est>8457924000000000.000000</rsc_fpops_est> <rsc_fpops_bound>84579240000000000.000000</rsc_fpops_bound> <rsc_memory_bound>124000000.000000</rsc_memory_bound> <rsc_disk_bound>1887436800.000000</rsc_disk_bound> <command_line> hadcm3n_t5wu_1940_40_007316366 ocean_o5wu_1940_40_007316366_0 atmos_o5wu_1940_40 </command_line> ... </workunit> If you look real close, you'll see the rsc_fpops_bound is exactly 10x the rsc_fpops_est value. Looks like someone missed a zero. What I also don't understand is this particular task is a 40 year model, yet BOINC thinks it will take longer than an 80 year model I recently finished. Shouldn't the 80 year models have roughly twice the FLOPS as the 40 year models? |
©2024 cpdn.org