climateprediction.net (CPDN) home page
Thread 'Cannot get work'

Thread 'Cannot get work'

Message boards : Number crunching : Cannot get work
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user2379

Send message
Joined: 28 Aug 04
Posts: 5
Credit: 60,231
RAC: 0
Message 10678 - Posted: 11 Mar 2005, 6:50:30 UTC

Anybody having problems getting work? I seem to have lost all work from the project and cannot get more.
ID: 10678 · Report as offensive     Reply Quote
old_user2379

Send message
Joined: 28 Aug 04
Posts: 5
Credit: 60,231
RAC: 0
Message 10679 - Posted: 11 Mar 2005, 7:04:28 UTC

Here are the sections from my log file
--------------------------------

10/03/2005 02:31:55 PM|climateprediction.net|Restarting result 0rco_000055827_0 using hadsm3 version 4.10
10/03/2005 02:56:10 PM|climateprediction.net|Unrecoverable error for result 0rco_000055827_0 ( - exit code -5 (0xfffffffb))
10/03/2005 02:56:10 PM|climateprediction.net|Deferring communication with project for 1 minutes and 0 seconds
10/03/2005 02:56:10 PM|climateprediction.net|Computation for result 0rco_000055827 finished


======
10/03/2005 04:03:23 PM|climateprediction.net|Unrecoverable error for result 0ruv_000056489_0 ( - exit code -5 (0xfffffffb))
10/03/2005 04:03:23 PM|climateprediction.net|Deferring communication with project for 1 minutes and 0 seconds
10/03/2005 04:03:23 PM|climateprediction.net|Computation for result 0ruv_000056489 finished

=====
10/03/2005 04:03:24 PM|climateprediction.net|Started upload of 0ruv_000056489_0_1.zip
10/03/2005 04:03:24 PM|climateprediction.net|Started upload of 0ruv_000056489_0_2.zip
10/03/2005 04:03:26 PM|climateprediction.net|Finished upload of 0ruv_000056489_0_1.zip
10/03/2005 04:03:26 PM|climateprediction.net|Throughput 32256 bytes/sec
10/03/2005 04:03:26 PM|climateprediction.net|Finished upload of 0ruv_000056489_0_2.zip
10/03/2005 04:03:26 PM|climateprediction.net|Throughput 458080 bytes/sec
10/03/2005 04:03:26 PM|climateprediction.net|Started upload of 0ruv_000056489_0_3.zip
10/03/2005 04:03:26 PM|climateprediction.net|Started upload of 0ruv_000056489_0_4.zip
10/03/2005 04:03:28 PM|climateprediction.net|Finished upload of 0ruv_000056489_0_3.zip
10/03/2005 04:03:28 PM|climateprediction.net|Throughput 32256 bytes/sec
10/03/2005 04:03:28 PM|climateprediction.net|Finished upload of 0ruv_000056489_0_4.zip
10/03/2005 04:03:28 PM|climateprediction.net|Throughput 16128 bytes/sec
10/03/2005 04:03:28 PM|climateprediction.net|Started upload of 0ruv_000056489_0_5.zip
10/03/2005 04:03:30 PM|climateprediction.net|Finished upload of 0ruv_000056489_0_5.zip
10/03/2005 04:03:30 PM|climateprediction.net|Throughput 32256 bytes/sec
10/03/2005 04:04:23 PM||Insufficient work; requesting more
10/03/2005 04:04:23 PM|climateprediction.net|Requesting 86400.00 seconds of work
10/03/2005 04:04:23 PM|climateprediction.net|Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
10/03/2005 04:04:24 PM|climateprediction.net|Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded
10/03/2005 04:04:25 PM|climateprediction.net|Started download of 0rzs_000056667.zip
10/03/2005 04:04:26 PM|climateprediction.net|Finished download of 0rzs_000056667.zip
10/03/2005 04:04:26 PM|climateprediction.net|Throughput 50662 bytes/sec

=======

10/03/2005 05:04:35 PM|climateprediction.net|Unrecoverable error for result 0rzs_000056667_0 ( - exit code -5 (0xfffffffb))
10/03/2005 05:04:35 PM|climateprediction.net|Deferring communication with project for 1 minutes and 0 seconds

=======

10/03/2005 05:05:36 PM|climateprediction.net|Message from server: No work available (daily quota exceeded)
10/03/2005 05:05:36 PM|climateprediction.net|No work from project
10/03/2005 05:05:36 PM|climateprediction.net|Deferring communication with project for 1 hours, 0 minutes, and 0 seconds




ID: 10679 · Report as offensive     Reply Quote
ProfileThyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 10680 - Posted: 11 Mar 2005, 7:43:57 UTC - in response to Message 10679.  
Last modified: 11 Mar 2005, 7:44:23 UTC

> 10/03/2005 05:04:35 PM|climateprediction.net|Unrecoverable error for result
> 0rzs_000056667_0 ( - exit code -5 (0xfffffffb))
> 10/03/2005 05:04:35 PM|climateprediction.net|Deferring communication with
> project for 1 minutes and 0 seconds
>
> =======
>
> 10/03/2005 05:05:36 PM|climateprediction.net|Message from server: No work
> available (daily quota exceeded)
> 10/03/2005 05:05:36 PM|climateprediction.net|No work from project
> 10/03/2005 05:05:36 PM|climateprediction.net|Deferring communication with
> project for 1 hours, 0 minutes, and 0 seconds

There is a daily limit of 4 models per host, which you are definitely hitting. Looking at <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=3574">the results for that system</a> it seems to have been running fine until the early hours of yesterday morning. Since then there have been 9 failed results, all but 1 of them with exit status -5. All the failed results indicate that you were running BOINC 4.24 and (from the workunit names) it looks like the 8 that never got started were all HADSM3 version 4.10 models.

Could you check the stderr.* and stdout.* files in your BOINC and projects/climateprediction.net/{WU name} directories to see if any of them give an indication of what the problem might be?

Also, would it be possible for you to revert to BOINC 4.19 before the system downloads more work tomorrow to eliminate one of the factors that's changed?
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 10680 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 10681 - Posted: 11 Mar 2005, 7:51:53 UTC
Last modified: 11 Mar 2005, 7:53:35 UTC

Deleted. Thyme Lawn got there first.

Les
ID: 10681 · Report as offensive     Reply Quote
old_user2379

Send message
Joined: 28 Aug 04
Posts: 5
Credit: 60,231
RAC: 0
Message 10683 - Posted: 11 Mar 2005, 8:16:04 UTC - in response to Message 10681.  

I reset the project before going to the message board so there may be bits missing, however I have looked in the workunit folders and see only three files, a zip file, an XML file and a lock file. There is no relavent data in the XML and lock file, I did not look in the day file.

I am running Seti and ProteinPredictor on this machine as well and neither are having a problem The machine is seen as a dual processor and will be running one of these apps at the same time as Climate.

I have looked in the Boinc error files but see no further information over and above that which I have already posted.


ID: 10683 · Report as offensive     Reply Quote
old_user2379

Send message
Joined: 28 Aug 04
Posts: 5
Credit: 60,231
RAC: 0
Message 10790 - Posted: 13 Mar 2005, 8:36:58 UTC

I still have this problem and have been unable to run the application for more than a few minuites before getting the error detailed above. This is on a machine that has been running with no problems for 6 months or so.

I have upgraded to Boinc 4.25 and have rebooted but no luck. Is it worth clearing the CP directory of all files, has anyone any other thoughts that may help?

Peter

ID: 10790 · Report as offensive     Reply Quote
old_user2379

Send message
Joined: 28 Aug 04
Posts: 5
Credit: 60,231
RAC: 0
Message 10989 - Posted: 16 Mar 2005, 6:47:51 UTC

Well I do not know if anyone is interested but I have finally solved my problem. I looked into the CP directory and found quite a few files a folders which were the remanents of old work which was either complete or had failed through some sort of error. The was also the applications, old and new to run, plus a few data files. I deleted the lot and well- it all started working again.

If there is any lesson here it might be that the Boinc/CP team should look at how they clean up after a run fails with an error and how they remove applications that may not be needed.

My CP directory size fell from over 1Gb to around 600Mb.

Peter

ID: 10989 · Report as offensive     Reply Quote

Message boards : Number crunching : Cannot get work

©2024 cpdn.org