climateprediction.net (CPDN) home page
Thread 'downloaded job with report deadline in the past?'

Thread 'downloaded job with report deadline in the past?'

Questions and Answers : Windows : downloaded job with report deadline in the past?
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user15411

Send message
Joined: 8 Sep 04
Posts: 5
Credit: 98,508
RAC: 0
Message 18405 - Posted: 19 Dec 2005, 10:04:32 UTC

Dear All,

recently my boinc manager downloaded sulphur cycle program and jobs.

app:sulphur_cycle 4.22
job: sulphur_gj1z_000771191_0

After a few days I noticed that all my other boinc projects had stopped crunshing. When i looked to see what it was discovered that since climateprediction had a date in the past boinc thinks it is overcommitted and do not download anything else.

What shall I do? abort this climateprediction job and see if the next download has a \"good\" limit data?

cheers
ID: 18405 · Report as offensive     Reply Quote
ProfileThyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 18408 - Posted: 19 Dec 2005, 11:38:40 UTC
Last modified: 19 Dec 2005, 11:39:10 UTC

The deadline for that job is 29 Nov 2006 5:11:20 UTC. My guess from your trickle history is that you\'re running CPDN at quite a low resource share on a relatively slow processor. If you\'re also being hit by BOINC\'s frequent over-estimation of completion times that could easily cause BOINC to think it\'s over committed.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 18408 · Report as offensive     Reply Quote
old_user94880

Send message
Joined: 27 Aug 05
Posts: 156
Credit: 112,423
RAC: 0
Message 18416 - Posted: 19 Dec 2005, 18:37:14 UTC

EDF (earliest date first) is caused by:

1) A deadline within 24 hours.
2) A deadline within 2 * the connect time.
3) A failure of the Round Robin simulator to finish a result within 90% of its deadline.

A project not requesting work is caused by:
1) A host that is in NWF (no work fetch)
2) A project that has enough work on a host that has enough work.
3) A project that has a LTD that is negative enough.

NWF (no work fetch) is caused by:
1) A failure of the Round Robin simulator to get a result done within 90% of a deadline if the resource share of the next project to request work from is added to the Round Robin simulation.

Work will always be requested from somewhere, even if that somewhere has a very negative LTD and/or the host is in NWF (no work fetch) if there is a CPU that is idle and there is a network connection.
BOINC Wiki
ID: 18416 · Report as offensive     Reply Quote
old_user15411

Send message
Joined: 8 Sep 04
Posts: 5
Credit: 98,508
RAC: 0
Message 18422 - Posted: 19 Dec 2005, 19:42:54 UTC

My mistake. In fact the report deadline is 2006... and since we are in December my mind is already in 2006. Sorry for that...

But the truth is that BOINC stopped to download data for other projects and is giving me this message:

19/12/2005 11:14:27||Suspending work fetch because computer is overcommitted.

and I\'m running (has I was before and there was not this problem) on a Pentium M 730 processor (2Mb L2 cache, 1.6 Ghz, 533 Mhz FSB) with 512 Mb memory. The resource share is in 20% but since there is no more data (no download because of overcommitted) it\'s using 100% resource.

I have been thinking that could be because it\'s relatively in the beggining of the crunch process (maybe if I let him cruch a bit the data he will make a better estimation of the time needed to have the job done? what do you think?

Thanks,


ID: 18422 · Report as offensive     Reply Quote
ProfileThyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 18428 - Posted: 19 Dec 2005, 21:03:30 UTC - in response to Message 18422.  
Last modified: 19 Dec 2005, 21:07:33 UTC

I\'m running (has I was before and there was not this problem) on a Pentium M 730 processor (2Mb L2 cache, 1.6 Ghz, 533 Mhz FSB) with 512 Mb memory. The resource share is in 20% but since there is no more data (no download because of overcommitted) it\'s using 100% resource.

I have a similar system, but running at 2.8GHz and dedicated to CPDN. Its sulphur cycle 4.22 model is running at 7.22 secs/TS. Assuming everything else is the same (very unlikely!) I reckon that would translate to 12.64 secs/TS on your system.

Sulphur cycle models have 1,296,240 timesteps. At the rate I\'ve worked out that\'s 190 days of CPU time, but that is 950 days at a 20% resource share. You\'d need to be averaging less than 4.67 secs/TS to complete the model within its deadline (4.2 secs/TS to meet it and avoid going into EDF mode).

You were previously running a slab model, which runs faster and only has 777,744 timesteps.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 18428 · Report as offensive     Reply Quote
old_user15411

Send message
Joined: 8 Sep 04
Posts: 5
Credit: 98,508
RAC: 0
Message 18445 - Posted: 19 Dec 2005, 23:47:05 UTC - in response to Message 18428.  
Last modified: 19 Dec 2005, 23:47:58 UTC


[snip]
Sulphur cycle models have 1,296,240 timesteps. At the rate I\'ve worked out that\'s 190 days of CPU time, but that is 950 days at a 20% resource share. You\'d need to be averaging less than 4.67 secs/TS to complete the model within its deadline (4.2 secs/TS to meet it and avoid going into EDF mode).

You were previously running a slab model, which runs faster and only has 777,744 timesteps.


Ok. That makes sense.

So since I have 8 projects (100/8=12,5%) and I\'m already giving 20% resource share to CP. It seems to me that this sulphur model is incompatible with running other projects. I don\'t think my computer is as slow as a 800 Pentium III or something :)

If I increase the share I start to have problems with other projects :(

hmm... i\'m going to think. But maybe I have to leave CP for a dedicated machine... and on this machine not use it...

Maybe they should increase the time limit?

Thanks for all the help Thyme !

Cheers,


ID: 18445 · Report as offensive     Reply Quote
old_user15411

Send message
Joined: 8 Sep 04
Posts: 5
Credit: 98,508
RAC: 0
Message 18507 - Posted: 20 Dec 2005, 23:50:04 UTC - in response to Message 18445.  

I don\'t really understand.

Now I have 25h CPU time (1.47 progress). 1501h to completation and a report deadline of 29/11/2006.

still the message is \"Allowing work to fetch again\". But 1 second later:
\"resuming round-robin CPU scheduling\" another second
\"SUspending work fetch because computer is overcommitted\".

For what I understand 1501h are about 60 days of work and since the report deadline is in November 2006... strange.
I\'ll wait still a few more days before aborting CP ..


:(


ID: 18507 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 18509 - Posted: 21 Dec 2005, 0:06:09 UTC - in response to Message 18507.  

I don\'t really understand.

Now I have 25h CPU time (1.47 progress). 1501h to completation and a report deadline of 29/11/2006.

still the message is \"Allowing work to fetch again\". But 1 second later:
\"resuming round-robin CPU scheduling\" another second
\"SUspending work fetch because computer is overcommitted\".

For what I understand 1501h are about 60 days of work and since the report deadline is in November 2006... strange.
I\'ll wait still a few more days before aborting CP ..

You may be right about the 60 days, but since you are running 8 projects, it may not be allocating a large enough percentage to CPDN to do 60 days worth of work.

But that sequence of messages about allowing then suspending work fetch was probably because you were right on the edge of being overcommitted, time estimates changed for the projects\' completion, so it allowed work fetch, then some other work unit was downloaded that put it back into overcommitted. This type of thing might happen frequently as BOINC tries to juggle 8 different projects.
ID: 18509 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 18517 - Posted: 21 Dec 2005, 1:07:33 UTC

Best to let BOINC do it\'s own thing. Looking over it\'s shoulder all the time to fiddle, will make things worse.

ID: 18517 · Report as offensive     Reply Quote
old_user15411

Send message
Joined: 8 Sep 04
Posts: 5
Credit: 98,508
RAC: 0
Message 18574 - Posted: 21 Dec 2005, 19:31:27 UTC - in response to Message 18517.  

Best to let BOINC do it\'s own thing. Looking over it\'s shoulder all the time to fiddle, will make things worse.



Problem Solved by it self !!!! :-)

Thanks true :) and is usually what I do.
Just that recently finished a CP job and the download of a new job seemed to have some kind of strange effect.

But like I said before it was good idea not to delete the CP job. After 35h running \"alone\" boinc recalculated it\'s predictions and started to download work for the other projects.

I have been running this 8 projects for a while and my computer seems to handle. Never had the overcommited problem before.

Thanks for all the answers you were very kind,

cheers,
ID: 18574 · Report as offensive     Reply Quote

Questions and Answers : Windows : downloaded job with report deadline in the past?

©2025 cpdn.org