climateprediction.net (CPDN) home page
Thread 'deadline too short for these models ?'

Thread 'deadline too short for these models ?'

Message boards : Number crunching : deadline too short for these models ?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
ProfileByron Leigh Hatch @ team Carl ...
Avatar

Send message
Joined: 17 Aug 04
Posts: 289
Credit: 44,103,664
RAC: 0
Message 42435 - Posted: 21 Jun 2011, 14:27:14 UTC
Last modified: 21 Jun 2011, 14:48:50 UTC

.




do these deadlines (3 months) seem too short for these ClimatePrediction.net models ?


  • sent ............ 21 Jun 2011
  • deadline ...... 20 Sep 2011




on my slow computer I have only one project running ... ClimatePrediction.net I run ClimatePrediction.net 24/7/365


hadcm3n_s6lt_1940_40_007300590


http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=7498014


can you "see" the following screen shots ?








I think so far BOINC says it will make deadline of 20 Sep 2011 ... not sure ?


because I think BOINC will usually say something like ... "these tasks will not make their deadline ... you may want to consider aborting " ... not sure ?



6/21/2011 6:41:32 AM | | Unrecognized tag in cc_config.xml:
6/21/2011 6:41:44 AM | | Starting BOINC client version 6.12.26 for windows_intelx86
6/21/2011 6:41:44 AM | | log flags: file_xfer, sched_ops, task, state_debug, task_debug
6/21/2011 6:41:44 AM | | Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.5
6/21/2011 6:41:44 AM | | Data directory: C:\Documents and Settings\All Users\Application Data\BOINC
6/21/2011 6:41:44 AM | | Running under account byron leigh hatch
6/21/2011 6:41:44 AM | | Processor: 4 GenuineIntel Intel(R) Xeon(TM) CPU 1.80GHz [Family 15 Model 2 Stepping 7]
6/21/2011 6:41:44 AM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pbe
6/21/2011 6:41:44 AM | | OS: Microsoft Windows XP: Professional x86 Edition, Service Pack 3, (05.01.2600.00)
6/21/2011 6:41:44 AM | | Memory: 1.50 GB physical, 2.10 GB virtual
6/21/2011 6:41:44 AM | | Disk: 74.47 GB total, 57.68 GB free
6/21/2011 6:41:44 AM | | Local time is UTC -7 hours
6/21/2011 6:41:44 AM | | No usable GPUs found
6/21/2011 6:41:44 AM | | [state] Client state summary:
6/21/2011 6:41:44 AM | | 1 projects:
6/21/2011 6:41:44 AM | | http://climateprediction.net/ min RPC -30678.687500.0 seconds from now
6/21/2011 6:41:44 AM | | 92 file_infos:

...

<snip>
...

6/21/2011 6:41:44 AM | | 4 workunits
6/21/2011 6:41:44 AM | | hadcm3n_s6lt_1940_40_007300590
6/21/2011 6:41:44 AM | | hadcm3n_s78s_1940_40_007300804
6/21/2011 6:41:44 AM | | hadcm3n_s4ba_1940_40_007301302
6/21/2011 6:41:44 AM | | hadcm3n_s1hi_1940_40_007301498
6/21/2011 6:41:44 AM | | 4 results
6/21/2011 6:41:44 AM | | hadcm3n_s6lt_1940_40_007300590_0 state:2
6/21/2011 6:41:44 AM | | hadcm3n_s78s_1940_40_007300804_2 state:2
6/21/2011 6:41:44 AM | | hadcm3n_s4ba_1940_40_007301302_1 state:2
6/21/2011 6:41:44 AM | | hadcm3n_s1hi_1940_40_007301498_1 state:2
6/21/2011 6:41:44 AM | | 0 persistent file xfers
6/21/2011 6:41:44 AM | | 4 active tasks
6/21/2011 6:41:44 AM | | hadcm3n_s6lt_1940_40_007300590_0
6/21/2011 6:41:44 AM | | hadcm3n_s78s_1940_40_007300804_2
6/21/2011 6:41:44 AM | | hadcm3n_s4ba_1940_40_007301302_1
6/21/2011 6:41:44 AM | | hadcm3n_s1hi_1940_40_007301498_1
6/21/2011 6:41:44 AM | climateprediction.net | URL http://climateprediction.net/; Computer ID 948812; resource share 100
6/21/2011 6:41:44 AM | | General prefs: from http://einstein.phys.uwm.edu/ (last modified 22-May-2011 20:44:15)
6/21/2011 6:41:44 AM | | Computer location: work
6/21/2011 6:41:44 AM | | General prefs: no separate prefs for work; using your defaults
6/21/2011 6:41:44 AM | | Reading preferences override file
6/21/2011 6:41:44 AM | | Preferences:
6/21/2011 6:41:44 AM | | max memory usage when active: 1535.01MB
6/21/2011 6:41:44 AM | | max memory usage when idle: 1535.01MB
6/21/2011 6:41:44 AM | | max disk usage: 60.14GB
6/21/2011 6:41:44 AM | | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
6/21/2011 6:41:44 AM | | Not using a proxy
6/21/2011 6:42:23 AM | climateprediction.net | [task] task_state=EXECUTING for hadcm3n_s6lt_1940_40_007300590_0 from start
6/21/2011 6:42:23 AM | climateprediction.net | Restarting task hadcm3n_s6lt_1940_40_007300590_0 using hadcm3n version 607
6/21/2011 6:42:23 AM | climateprediction.net | [task] task_state=EXECUTING for hadcm3n_s78s_1940_40_007300804_2 from start
6/21/2011 6:42:23 AM | climateprediction.net | Restarting task hadcm3n_s78s_1940_40_007300804_2 using hadcm3n version 607
6/21/2011 6:42:23 AM | climateprediction.net | [task] task_state=EXECUTING for hadcm3n_s4ba_1940_40_007301302_1 from start
6/21/2011 6:42:23 AM | climateprediction.net | Restarting task hadcm3n_s4ba_1940_40_007301302_1 using hadcm3n version 607
6/21/2011 6:42:23 AM | climateprediction.net | [task] task_state=EXECUTING for hadcm3n_s1hi_1940_40_007301498_1 from start
6/21/2011 6:42:23 AM | climateprediction.net | Restarting task hadcm3n_s1hi_1940_40_007301498_1 using hadcm3n version 607

thank you for any advise or help
Best Wishes
Byron


.

ID: 42435 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,841,902
RAC: 5,047
Message 42436 - Posted: 21 Jun 2011, 15:11:40 UTC

As far as I can tell there is no hyperthreaded Xeon processor operating at ~1.8 GHz, so that isn't the cause. The models must therefore be slow because:

1. RAM is rather low for 4 x HADCM3N

2. 1.8 GHz is actually a bit slow by modern standards

3. 4 similar models slow each other down

4. The models are not really slow, but BOINC thinks they are.

Option 1 may be fixable depending on the machine build, but you may be reluctant to do that for an old machine (with possibly expensive memory). Upgrading out of option #2 is unlikely for cost reasons. Option #3 can be tested by suspending two of the models and checking for a significant increase in speed. Option #4 is easy to check: just leave them for a bit and check whether the time-to-go reduces by more than the increase in the time-already-gone.

Start with option #4, then #3 and see how things develop.

I wouldn't normally advise anyone to abort a viable model, but the sub-project running the HADCM3N models does have a pressing schedule: the data will be accepted by CPDN whenever it's submitted but a late submission may be too late for the science. That isn't usually the case with CPDN, but the RAPID-RAPIT case is unusual.
ID: 42436 · Report as offensive     Reply Quote
ProfileByron Leigh Hatch @ team Carl ...
Avatar

Send message
Joined: 17 Aug 04
Posts: 289
Credit: 44,103,664
RAC: 0
Message 42437 - Posted: 21 Jun 2011, 15:55:36 UTC - in response to Message 42436.  

.


#4. The models are not really slow, but BOINC thinks they are.



Yes thank you Iain

Yes I think your #4 may be the case.

Because as I watch the BOINC GUI ... the time remaining ... to deadline ... is rapidly declining.

and by a large factor.

so I think I will be fine.

I have set no new work.

and will just let these four (4) models on this slow computer crunch away 24/7/365


Best Wishes
Byron

.
ID: 42437 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 42438 - Posted: 21 Jun 2011, 20:01:37 UTC

Dear Bryon

It is usual for the “to completion” time at the start of a model to be twice the real running time. I presently have a CM3n that I downloaded yesterday running in “high priority” mode. Completion time is listed as 2140 hours. The CM3n spin-up model that I recently finished said just about the same when it started. It finished in less than 1000 hours. That’s was on a 2.2 GHz processor so yours will take a little longer.

So don’t worry, be happy.



ID: 42438 · Report as offensive     Reply Quote
ProfileByron Leigh Hatch @ team Carl ...
Avatar

Send message
Joined: 17 Aug 04
Posts: 289
Credit: 44,103,664
RAC: 0
Message 42442 - Posted: 22 Jun 2011, 4:40:29 UTC - in response to Message 42438.  

Thank you Jim for that information.

Best Wishes
Byron
ID: 42442 · Report as offensive     Reply Quote
3rkko

Send message
Joined: 12 Feb 08
Posts: 66
Credit: 4,877,652
RAC: 0
Message 42457 - Posted: 24 Jun 2011, 20:02:57 UTC

The cpdn needs the results of the third batch by mid-August, so the deadline doesn’t seem too short at all, quite the contrary actually! You can get a much better estimate of the actual total run time by dividing the elapsed time by the percentage done, eg. 11.9 h / 0.408 % = 2900 hours, but you'll get a better estimate when your runtime gets longer.
ID: 42457 · Report as offensive     Reply Quote
ProfileByron Leigh Hatch @ team Carl ...
Avatar

Send message
Joined: 17 Aug 04
Posts: 289
Credit: 44,103,664
RAC: 0
Message 42463 - Posted: 25 Jun 2011, 0:34:31 UTC - in response to Message 42457.  
Last modified: 25 Jun 2011, 0:51:39 UTC

.


The cpdn needs the results of the third batch by mid-August, so the deadline doesn’t seem too short at all, quite the contrary actually! You can get a much better estimate of the actual total run time by dividing the elapsed time by the percentage done, eg. 11.9 h / 0.408 % = 2900 hours, but you'll get a better estimate when your runtime gets longer.



Thank you 3rkko for that information.

dividing the elapsed time .. by .. the percentage done = approximate run time

A little update and Good news on the four (4) models

that my four (4) Intel Xeon CPU 1.80GHz are running

on this 24/7 full time ... dedicated computer to ... Climate Prediction.net

my first trickle up Claimed credit of: 311.04

so I estimate on my slow computer to complete these four (4) models by first part of August 2011

  • sent ............ 21 Jun 2011
  • deadline ...... 20 Sep 2011


All tasks for computer 948812

Best Wishes
Byron


.

ID: 42463 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 42474 - Posted: 27 Jun 2011, 8:12:28 UTC - in response to Message 42463.  

The two tasks I am running both suggest over 1000 hours to go. An estimate based on percentage and elapsed time for both suggests a total time of under 700 Hours with both tasks a little over 10% complete.
A link to how the estimated time to completion is worked out would be interesting, though I suspect the answer would lose me!
ID: 42474 · Report as offensive     Reply Quote
wateroakley

Send message
Joined: 6 Aug 04
Posts: 195
Credit: 28,405,498
RAC: 10,268
Message 42476 - Posted: 27 Jun 2011, 16:03:46 UTC - in response to Message 42474.  

The two tasks I am running both suggest over 1000 hours to go. An estimate based on percentage and elapsed time for both suggests a total time of under 700 Hours with both tasks a little over 10% complete.
A link to how the estimated time to completion is worked out would be interesting, though I suspect the answer would lose me!

The short answer is: That's BOINC for you. Time 'To completion' will get itself sorted out over time.
ID: 42476 · Report as offensive     Reply Quote
transient

Send message
Joined: 3 Oct 06
Posts: 43
Credit: 8,017,057
RAC: 0
Message 42477 - Posted: 27 Jun 2011, 16:43:14 UTC - in response to Message 42474.  

If I remember correctly, the estimate is based on the runtime of previous tasks. The term duration corection factor may sound familiar to you.
If that info is not available for the task type, I believe a project estimate will be used.In this casae apparently 1000 hours
ID: 42477 · Report as offensive     Reply Quote
ProfileThyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 42478 - Posted: 27 Jun 2011, 22:05:08 UTC
Last modified: 28 Jun 2011, 8:45:41 UTC

The formula to work out the estimated elapsed time uses the following XML tags from client_state.xml:

  • <duration_correction_factor> from the <project> section for CPDN.

  • <flops> from the hadcm3n v6.07 <app_version> section.

  • <rsc_fpops_est> from the <workunit> section (fixed at 8457924000000000.000000).


The value for <flops> depends on your benchmark and will be different on each system (it's 2360018545.768535 on my C2Q Q6600 XP system).

The maximum allowed elapsed time is calculated using the formula

<rsc_fpops_est> / <flops> * <duration_correction_factor>


On my Q6600 the formula for maximum elapsed time with those values would be

8457924000000000 / 2360018545.768535 * 1.023501 = 3668062 seconds (1018:54:22)

If I divide <rsc_fpops_est> by 2 the formula changes to

4228962000000000 / 2360018545.768535 * 1.023501 = 1834031 seconds (509:27:11)

That's much closer to the 507 hours my last HadCM3N task took.


"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 42478 · Report as offensive     Reply Quote
old_user633787

Send message
Joined: 14 Sep 10
Posts: 11
Credit: 1,812,972
RAC: 0
Message 42479 - Posted: 27 Jun 2011, 23:07:37 UTC

On my Intel E5520-based system, BOINC usually estimates that HADCM3N models will take ~1400 hours, but they usually end up taking only ~550 hours to run to completion.
ID: 42479 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 42480 - Posted: 28 Jun 2011, 4:43:03 UTC - in response to Message 42479.  

Thanks for that - not as lost as I thought I might be.
ID: 42480 · Report as offensive     Reply Quote
Urglab

Send message
Joined: 27 Feb 08
Posts: 4
Credit: 960,510
RAC: 0
Message 42481 - Posted: 28 Jun 2011, 7:49:25 UTC

Just a heads up, I was crunching 2 of these models but they both crashed at around the same % complete. One crunched for a little over 98 hours and another for almost 98 hours. I think the completion % was around 25% at that time. \

Task 1: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=12993951

Task 2:http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=12992822

Of course I don't know if all the models from that batch will error out there but I thought it was quite suspicious that they both did so at around the same % complete.
ID: 42481 · Report as offensive     Reply Quote
Profiletullio

Send message
Joined: 6 Aug 04
Posts: 264
Credit: 965,476
RAC: 0
Message 42482 - Posted: 28 Jun 2011, 12:18:59 UTC

Why do they have to run in high priority? Mine deadline is September 26.
Tullio
ID: 42482 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42483 - Posted: 28 Jun 2011, 12:42:30 UTC - in response to Message 42482.  

Why do they have to run in high priority?

That's just the way that BOINC works when you have a mix of other projects with very short run times, and then add really long models such as these.
Boinc will eventually "learn", and then it should go back to normal. But it may take a week or two, so it's best just to let BOINC get on with the learning process.


Backups: Here
ID: 42483 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 42484 - Posted: 28 Jun 2011, 19:51:03 UTC

The only problem with just letting it run for a week or two while it learns that you are not going to miss the deadline is that while it is running in “high priority” CPDN is building a large debt to the other projects. Since there does not seem to be any way to cancel this debt it can cause problems downloading new models later.

ID: 42484 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42485 - Posted: 28 Jun 2011, 20:54:24 UTC - in response to Message 42484.  

Life's tough in the multi-project lane. :)

The average cruncher will just have to accept a bit of a delay in their short projects work. Experienced crunchers can (carefully) edit their client_state file to adjust the values.
But debt isn't as bad as it may seem at 'first sight'.

And I DID post in the News thread, (everyone remember that thread? :) ), that once the loooong RAPIT models are running, that there'll be shorter regional models again.
Which is still the case.


Backups: Here
ID: 42485 · Report as offensive     Reply Quote
ProfileThyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 42494 - Posted: 29 Jun 2011, 16:52:09 UTC - in response to Message 42482.  

Why do they have to run in high priority? Mine deadline is September 26.
Tullio

It's probably happening because <rsc_fpops_est> is set too high (as I mentioned here).
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 42494 · Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 300
Credit: 3,288,263
RAC: 26,370
Message 42500 - Posted: 30 Jun 2011, 1:14:48 UTC

My duration correction factor used to be near 1.0 for CPDN. All of my other projects are within +/-25% of 1. However, since running these very long 80 year models, my DCF is over 1.7. The estimated time to completion always rises, and it's well over 1400 hours for a single task.

Inside my client_state.xml:
<workunit>
    <name>hadcm3n_t5wu_1940_40_007316366</name>
    <app_name>hadcm3n</app_name>
    <version_num>607</version_num>
       <rsc_fpops_est>8457924000000000.000000</rsc_fpops_est>
    <rsc_fpops_bound>84579240000000000.000000</rsc_fpops_bound>
    <rsc_memory_bound>124000000.000000</rsc_memory_bound>
    <rsc_disk_bound>1887436800.000000</rsc_disk_bound>
    <command_line>
hadcm3n_t5wu_1940_40_007316366 ocean_o5wu_1940_40_007316366_0 atmos_o5wu_1940_40
    </command_line>
...
</workunit>


If you look real close, you'll see the rsc_fpops_bound is exactly 10x the rsc_fpops_est value. Looks like someone missed a zero.

What I also don't understand is this particular task is a 40 year model, yet BOINC thinks it will take longer than an 80 year model I recently finished. Shouldn't the 80 year models have roughly twice the FLOPS as the 40 year models?
ID: 42500 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : deadline too short for these models ?

©2024 cpdn.org