climateprediction.net (CPDN) home page
Thread 'Resuming computation from Backup after WorkUnit Error ?'

Thread 'Resuming computation from Backup after WorkUnit Error ?'

Message boards : Number crunching : Resuming computation from Backup after WorkUnit Error ?
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user3434
Avatar

Send message
Joined: 30 Aug 04
Posts: 77
Credit: 1,785,934
RAC: 0
Message 36164 - Posted: 18 Feb 2009, 9:03:32 UTC
Last modified: 18 Feb 2009, 9:03:57 UTC

Hi,

I got one System that trashed two WorkUnits after considerable runtime due to System failure.
After that, I reverted to a functional Backup which resumed computation from an early stage of both WorkUnits, which run without problems so far.

The only Problem : the Host already reported the two WorkUnits that errored out before I could intervene.

Question : Will the System realize that a backup is rerunning and eventually change WorkUnit status again, or is it futile as the \"Computing Error\" has fixed the WorkUnit Status to \"Over\" ?

(the Host is sending its Trickles without any Error messages, thus I assume the server is still accepting them)
Scientific Network : 44800 MHz - 77824 MB - 1970 GB
ID: 36164 · Report as offensive     Reply Quote
ProfileIain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 36165 - Posted: 18 Feb 2009, 10:00:06 UTC - in response to Message 36164.  
Last modified: 18 Feb 2009, 10:01:28 UTC

Question : Will the System realize that a backup is rerunning and eventually change WorkUnit status again ... ?
No. The work unit pages will show the first error received. BOINC doesn\'t really understand the concept of a backup - presumably because most BOINC projects have short work units that aren\'t worth backing up. CPDN has unusually long work units that are worth backing up.

(the Host is sending its Trickles without any Error messages, thus I assume the server is still accepting them)
Yes. Trickles and Zip file uploads will continue to be accepted by the server and credits awarded, as if there had never been a crash.
ID: 36165 · Report as offensive     Reply Quote
old_user3434
Avatar

Send message
Joined: 30 Aug 04
Posts: 77
Credit: 1,785,934
RAC: 0
Message 36172 - Posted: 18 Feb 2009, 20:08:47 UTC - in response to Message 36165.  

Cool, then I\'ll keep everything running, thanks for the quick Info :)

Scientific Network : 44800 MHz - 77824 MB - 1970 GB
ID: 36172 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 28 Nov 06
Posts: 89
Credit: 12,002,945
RAC: 3,153
Message 36173 - Posted: 19 Feb 2009, 13:33:58 UTC
Last modified: 19 Feb 2009, 13:47:32 UTC

I detached by accident my host from CPDN (doing an unhappy \"experiment\" with BAM). I had 2 tasks in progress, both are marked now \"Over, Client detached\".
I restored the tasks from backup and I am trying now to continue them.
Question!
Is my attempt correct?
Is it possible to finish my tasks? Is it the same as restore after \"Client error, Compute error\"?
Will the project server accept the trickles etc.?
ID: 36173 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 36174 - Posted: 19 Feb 2009, 15:08:55 UTC
Last modified: 19 Feb 2009, 15:09:31 UTC

Is my attempt correct?
Yes.
Is it possible to finish my tasks?
Yes.
Is it the same as restore after \"Client error, Compute error\"?
Yes.
Will the project server accept the trickles etc.?
Yes.
ID: 36174 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 28 Nov 06
Posts: 89
Credit: 12,002,945
RAC: 3,153
Message 36175 - Posted: 19 Feb 2009, 15:09:10 UTC
Last modified: 19 Feb 2009, 15:12:03 UTC

19/02/2009 16:56:43|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
19/02/2009 16:56:48|climateprediction.net|Scheduler request succeeded: got 0 new tasks

Trickle is sent without Error message, but I don\'t see this trickle at task\'s page.

-Edit-

Les, You were faster as me! :-) Thank You!
ID: 36175 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 36176 - Posted: 19 Feb 2009, 15:11:13 UTC

Please be patient. It sometimes takes a while for trickles to show up if the server is busy.

ID: 36176 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 28 Nov 06
Posts: 89
Credit: 12,002,945
RAC: 3,153
Message 36177 - Posted: 19 Feb 2009, 15:14:47 UTC

It is a deal! I will be patient. :-) Thank You again.
ID: 36177 · Report as offensive     Reply Quote

Message boards : Number crunching : Resuming computation from Backup after WorkUnit Error ?

©2024 cpdn.org