Message boards : Number crunching : Cannot get work
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Aug 04 Posts: 5 Credit: 60,231 RAC: 0 |
Anybody having problems getting work? I seem to have lost all work from the project and cannot get more. |
Send message Joined: 28 Aug 04 Posts: 5 Credit: 60,231 RAC: 0 |
Here are the sections from my log file -------------------------------- 10/03/2005 02:31:55 PM|climateprediction.net|Restarting result 0rco_000055827_0 using hadsm3 version 4.10 10/03/2005 02:56:10 PM|climateprediction.net|Unrecoverable error for result 0rco_000055827_0 ( - exit code -5 (0xfffffffb)) 10/03/2005 02:56:10 PM|climateprediction.net|Deferring communication with project for 1 minutes and 0 seconds 10/03/2005 02:56:10 PM|climateprediction.net|Computation for result 0rco_000055827 finished ====== 10/03/2005 04:03:23 PM|climateprediction.net|Unrecoverable error for result 0ruv_000056489_0 ( - exit code -5 (0xfffffffb)) 10/03/2005 04:03:23 PM|climateprediction.net|Deferring communication with project for 1 minutes and 0 seconds 10/03/2005 04:03:23 PM|climateprediction.net|Computation for result 0ruv_000056489 finished ===== 10/03/2005 04:03:24 PM|climateprediction.net|Started upload of 0ruv_000056489_0_1.zip 10/03/2005 04:03:24 PM|climateprediction.net|Started upload of 0ruv_000056489_0_2.zip 10/03/2005 04:03:26 PM|climateprediction.net|Finished upload of 0ruv_000056489_0_1.zip 10/03/2005 04:03:26 PM|climateprediction.net|Throughput 32256 bytes/sec 10/03/2005 04:03:26 PM|climateprediction.net|Finished upload of 0ruv_000056489_0_2.zip 10/03/2005 04:03:26 PM|climateprediction.net|Throughput 458080 bytes/sec 10/03/2005 04:03:26 PM|climateprediction.net|Started upload of 0ruv_000056489_0_3.zip 10/03/2005 04:03:26 PM|climateprediction.net|Started upload of 0ruv_000056489_0_4.zip 10/03/2005 04:03:28 PM|climateprediction.net|Finished upload of 0ruv_000056489_0_3.zip 10/03/2005 04:03:28 PM|climateprediction.net|Throughput 32256 bytes/sec 10/03/2005 04:03:28 PM|climateprediction.net|Finished upload of 0ruv_000056489_0_4.zip 10/03/2005 04:03:28 PM|climateprediction.net|Throughput 16128 bytes/sec 10/03/2005 04:03:28 PM|climateprediction.net|Started upload of 0ruv_000056489_0_5.zip 10/03/2005 04:03:30 PM|climateprediction.net|Finished upload of 0ruv_000056489_0_5.zip 10/03/2005 04:03:30 PM|climateprediction.net|Throughput 32256 bytes/sec 10/03/2005 04:04:23 PM||Insufficient work; requesting more 10/03/2005 04:04:23 PM|climateprediction.net|Requesting 86400.00 seconds of work 10/03/2005 04:04:23 PM|climateprediction.net|Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi 10/03/2005 04:04:24 PM|climateprediction.net|Scheduler RPC to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi succeeded 10/03/2005 04:04:25 PM|climateprediction.net|Started download of 0rzs_000056667.zip 10/03/2005 04:04:26 PM|climateprediction.net|Finished download of 0rzs_000056667.zip 10/03/2005 04:04:26 PM|climateprediction.net|Throughput 50662 bytes/sec ======= 10/03/2005 05:04:35 PM|climateprediction.net|Unrecoverable error for result 0rzs_000056667_0 ( - exit code -5 (0xfffffffb)) 10/03/2005 05:04:35 PM|climateprediction.net|Deferring communication with project for 1 minutes and 0 seconds ======= 10/03/2005 05:05:36 PM|climateprediction.net|Message from server: No work available (daily quota exceeded) 10/03/2005 05:05:36 PM|climateprediction.net|No work from project 10/03/2005 05:05:36 PM|climateprediction.net|Deferring communication with project for 1 hours, 0 minutes, and 0 seconds |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
> 10/03/2005 05:04:35 PM|climateprediction.net|Unrecoverable error for result > 0rzs_000056667_0 ( - exit code -5 (0xfffffffb)) > 10/03/2005 05:04:35 PM|climateprediction.net|Deferring communication with > project for 1 minutes and 0 seconds > > ======= > > 10/03/2005 05:05:36 PM|climateprediction.net|Message from server: No work > available (daily quota exceeded) > 10/03/2005 05:05:36 PM|climateprediction.net|No work from project > 10/03/2005 05:05:36 PM|climateprediction.net|Deferring communication with > project for 1 hours, 0 minutes, and 0 seconds There is a daily limit of 4 models per host, which you are definitely hitting. Looking at <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=3574">the results for that system</a> it seems to have been running fine until the early hours of yesterday morning. Since then there have been 9 failed results, all but 1 of them with exit status -5. All the failed results indicate that you were running BOINC 4.24 and (from the workunit names) it looks like the 8 that never got started were all HADSM3 version 4.10 models. Could you check the stderr.* and stdout.* files in your BOINC and projects/climateprediction.net/{WU name} directories to see if any of them give an indication of what the problem might be? Also, would it be possible for you to revert to BOINC 4.19 before the system downloads more work tomorrow to eliminate one of the factors that's changed? "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Deleted. Thyme Lawn got there first. Les |
Send message Joined: 28 Aug 04 Posts: 5 Credit: 60,231 RAC: 0 |
I reset the project before going to the message board so there may be bits missing, however I have looked in the workunit folders and see only three files, a zip file, an XML file and a lock file. There is no relavent data in the XML and lock file, I did not look in the day file. I am running Seti and ProteinPredictor on this machine as well and neither are having a problem The machine is seen as a dual processor and will be running one of these apps at the same time as Climate. I have looked in the Boinc error files but see no further information over and above that which I have already posted. |
Send message Joined: 28 Aug 04 Posts: 5 Credit: 60,231 RAC: 0 |
I still have this problem and have been unable to run the application for more than a few minuites before getting the error detailed above. This is on a machine that has been running with no problems for 6 months or so. I have upgraded to Boinc 4.25 and have rebooted but no luck. Is it worth clearing the CP directory of all files, has anyone any other thoughts that may help? Peter |
Send message Joined: 28 Aug 04 Posts: 5 Credit: 60,231 RAC: 0 |
Well I do not know if anyone is interested but I have finally solved my problem. I looked into the CP directory and found quite a few files a folders which were the remanents of old work which was either complete or had failed through some sort of error. The was also the applications, old and new to run, plus a few data files. I deleted the lot and well- it all started working again. If there is any lesson here it might be that the Boinc/CP team should look at how they clean up after a run fails with an error and how they remove applications that may not be needed. My CP directory size fell from over 1Gb to around 600Mb. Peter |
©2024 cpdn.org