climateprediction.net (CPDN) home page
Thread 'CPDN crashing on completion'

Thread 'CPDN crashing on completion'

Message boards : Number crunching : CPDN crashing on completion
Message board moderation

To post messages, you must log in.

AuthorMessage
Mike.Gibson

Send message
Joined: 2 May 07
Posts: 20
Credit: 657,542
RAC: 0
Message 31652 - Posted: 10 Dec 2007, 23:22:36 UTC

Hi, folks.

10/12/2007 22:39:14|climateprediction.net|Computation for task hadsm3fub_0361_005911833_8 finished
10/12/2007 22:39:14|climateprediction.net|Output file hadsm3fub_0361_005911833_8_3.zip for task hadsm3fub_0361_005911833_8 absent

Does anyone know what has happened here, please? How can it be avoided in future? And can the unit be resurrected?

Cheers
ID: 31652 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 31658 - Posted: 11 Dec 2007, 3:01:32 UTC


It sounds a little like you may have done an Update shortly after the zip upload to get it to \"Report\".

If this message is received too early, the zip might still be on the Upload server, waiting to be transferred to the storage server. Hence the message (paraphased): \"What are you talking about? There\'s no final zip file here.\"

If you DIDN\'T click on Update, then I\'m not sure what happened.

As recovering the unit, the usual advice applys: Only by rerunning the last bit of the model from a backup made before the model finished.


Backups: Here
ID: 31658 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 31660 - Posted: 11 Dec 2007, 8:39:20 UTC


This was the result being talked about:
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6981207

I think it crashed just prior to the end? The last successful trickle was phase 3, 248,446 (very near the end but still a few hours to go).


And the error message was an exit code 3:
<core_client_version>5.10.28</core_client_version>
<![CDATA[
<message>
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
CPDN Monitor - Quit request from BOINC...
Not a JPEG file: starts with 0x01 0xda
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...
CPDN Monitor - Quit request from BOINC...



What was happening on the PC at 22:42 yesterday? Where is the model installed, on your PC\'s local hard disk, or on a network / removable disk / usb key / etc? )just guessing from the error message).


As Les says, if you have a recent backup, you could resume this model.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 31660 · Report as offensive     Reply Quote
Mike.Gibson

Send message
Joined: 2 May 07
Posts: 20
Credit: 657,542
RAC: 0
Message 31665 - Posted: 11 Dec 2007, 11:01:21 UTC

Thanks, Les & Mike.

At 22:42 I did do an Update, but the model had crashed by then. It had come up with a computation error message and a time error. The time to go had stuck at 3 seconds - so nearly there! However, it had been attempting to trickle up between 22:29 & 22:38 and then a new model started at 22:38. Could it be that BOINC started the new model because of a time error on the trickle up and then the crash occurred as a consequence of the 2 models overloading my PC?

BOINC/CPDN runs on my PC (Vista with dual-core 3800+).

I haven\'t been doing back-ups and the PC had been running for 6 days continuously.

10/12/2007 22:29:29|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
10/12/2007 22:29:35|climateprediction.net|Scheduler request succeeded: got 0 new tasks
10/12/2007 22:29:50|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
10/12/2007 22:29:55|climateprediction.net|Scheduler request succeeded: got 0 new tasks

Lots more of these every 5/6 seconds until ........

10/12/2007 22:38:01|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
10/12/2007 22:38:06|climateprediction.net|Scheduler request succeeded: got 0 new tasks
10/12/2007 22:38:54|climateprediction.net|Starting hadsm3fub_0305_005913862_6
10/12/2007 22:38:54|climateprediction.net|Starting task hadsm3fub_0305_005913862_6 using hadsm3 version 506
10/12/2007 22:39:07|lhcathome|Sending scheduler request: To fetch work. Requesting 158849 seconds of work, reporting 0 completed tasks
10/12/2007 22:39:12|lhcathome|Scheduler request succeeded: got 0 new tasks
10/12/2007 22:39:14|climateprediction.net|Computation for task hadsm3fub_0361_005911833_8 finished
10/12/2007 22:39:14|climateprediction.net|Output file hadsm3fub_0361_005911833_8_3.zip for task hadsm3fub_0361_005911833_8 absent
10/12/2007 22:39:14|SETI@home|Resuming task 16no06aa.7435.8661.10.6.137_1 using setiathome_enhanced version 527
10/12/2007 22:40:19|World Community Grid|Resuming task dddt0201m0751_ZINC00068317-0000_00_0 using dddt version 510
10/12/2007 22:40:19|World Community Grid|Resuming task dddt0201m0754_ZINC05090329-0000_00_0 using dddt version 510
10/12/2007 22:40:28|World Community Grid|Resuming task dddt0201m0754_ZINC04687985-0000_00_0 using dddt version 510
10/12/2007 22:41:43|rosetta@home|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 2 completed tasks
10/12/2007 22:41:48|rosetta@home|Scheduler request succeeded: got 0 new tasks
10/12/2007 22:42:33|climateprediction.net|Sending scheduler request: Requested by user. Requesting 0 seconds of work, reporting 1 completed tasks
10/12/2007 22:42:38|climateprediction.net|Scheduler request succeeded: got 0 new tasks

Regards

Mike
ID: 31665 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 31666 - Posted: 11 Dec 2007, 12:25:36 UTC
Last modified: 11 Dec 2007, 12:26:01 UTC

The scheduler messages might indicate that it\'s something to do with networking? There is a bug in boinc which can cause crashes if the local network fails (i.e., when dialing in, or a firewall crash).

The second model would have started to download+run when the first crashed.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 31666 · Report as offensive     Reply Quote

Message boards : Number crunching : CPDN crashing on completion

©2024 cpdn.org