climateprediction.net (CPDN) home page
Thread 'Any way to resume a project after crash?'

Thread 'Any way to resume a project after crash?'

Message boards : climateprediction.net Science : Any way to resume a project after crash?
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user194039

Send message
Joined: 28 Jul 06
Posts: 2
Credit: 48,989
RAC: 0
Message 24328 - Posted: 17 Sep 2006, 11:34:42 UTC
Last modified: 17 Sep 2006, 11:35:44 UTC

My PC crashed and now the project BOINC was working on failed after 1000 CPU hours...
Is there any way to resume the project at some point? It was at 1989 :/

Guess I\'ll do some regular backups from now on... I think BOINC should incorporate this in it\'s program...
ID: 24328 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 24337 - Posted: 18 Sep 2006, 7:16:00 UTC

Only if you have a backup to work from. On the more positive side, the model will have been uploading a summary of it\'s progress at yearly and decade intervals, and at 1960 a much more detailed summary was uploaded (a \'restart dump\').
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 24337 · Report as offensive     Reply Quote
old_user194039

Send message
Joined: 28 Jul 06
Posts: 2
Credit: 48,989
RAC: 0
Message 24346 - Posted: 18 Sep 2006, 21:07:37 UTC - in response to Message 24337.  

Only if you have a backup to work from. On the more positive side, the model will have been uploading a summary of it\'s progress at yearly and decade intervals, and at 1960 a much more detailed summary was uploaded (a \'restart dump\').


So is it possible to restart from there? The data is still on my harddrive...
ID: 24346 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 24353 - Posted: 19 Sep 2006, 15:18:45 UTC

So is it possible to restart from there? The data is still on my harddrive...

Sometime in the future, when the software gets written and tested.

But it\'s unlikely that the needed data is still on your computer; just other remnants of the crash.

ID: 24353 · Report as offensive     Reply Quote
B. B. Stanfield
Avatar

Send message
Joined: 28 Nov 05
Posts: 7
Credit: 2,629,585
RAC: 0
Message 24378 - Posted: 21 Sep 2006, 23:55:14 UTC - in response to Message 24353.  

Hello folks.

This is about the closest thread I can find to match my issues/questions. I have two systems running BOINC. The home system is fine and stable. The other is my laptop which has the problem. It\'s an older system but capable if I don\'t forget and overtax it. Today I got the Windows XP Pro blue screen of death. No idea what happened until I got it back up. Turns out the video driver appears to be unstable. After extensive searching there are no updates or further fixes and the latest driver is not \"bad\", just old. I am stuck with the driver and hardware showing signs of stress with the greater loads it\'s seeing after 5+ years.

When BOINC restarted after the crash my CPDN automatically downloaded new work. The old project was about 7% complete doing about 1% for each 100 hours. I was unaware of the backup option mentioned. I\'ll implement that for this system, probably both.

OKAY, with all that background my questions are:

Is there any move afoot to fix the issue with lost work on a client system besides finding out the hard way that it is available after losing so much effort?

Secondly, and probably a bigger issue, on checking my work on the CPDN site there are a lot of server and client errors with few work units showing credit. It seems a waste of time with this project on this system. Is this fixable or is it symptomatic of something I\'m not seeing? I hate to live with this error rate so either the program is just too much for this machine or I\'m doing something causeing errors.

After reviewing other posts I\'m inclined to think the advice would be to continue CPDN on the home system, terminate it on the laptop and increase the work from other projects on the laptop to compensate. Does this sound reasonable?

Thanks guys,

B. B. Stanfield III
ID: 24378 · Report as offensive     Reply Quote
old_user193545

Send message
Joined: 22 Jul 06
Posts: 3
Credit: 88,290
RAC: 0
Message 24379 - Posted: 22 Sep 2006, 8:51:24 UTC
Last modified: 22 Sep 2006, 8:50:39 UTC

I\'ve lost 3 projects in the last 3 months. None have got past the 1940\'s. Chances are the same thing will happen to this project.

So if the project reports at various intervals, and a project crashes, will anyone anywhere ever resume the unfinished project? And can I resume someone elses unfinished project?

I would like to see one project one day.
ID: 24379 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 24380 - Posted: 22 Sep 2006, 9:09:00 UTC

Andrew

When a model crashes, the server flags that dataset for possible re-issue. (Up to a total of 4 re-issues.)
Whether or not it DOES get re-issued is another matter, as there are millions of combinations of values for the parameters.
You can see if a model is new or a re-issue by looking at the last digit in the name. If it\'s zero, it\'s a previously used set.
You can also see who has previously tried to crunch that dataset by looking at the Work unit ID, as displayed on the page listing all of your models.

Also, when the program gets written and tested, it will be possible to start part way through a failed model, from restart dumps which are uploaded every 40 years.
These, no doudt, will also be issued at random, although at this point, only the most basic of information is known about how it will work.

ID: 24380 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 24381 - Posted: 22 Sep 2006, 9:14:16 UTC

B. B. Stanfield III

Yes, it would be best to try other projects on your laptop.
One of the basic recommendations for this project is that people DON\'T run it on laptops. (Unless they have good fire insurance.)
These computers were never intended to be run with such a maths intensive program for so long a time. Their cooling isn\'t up to it, their HDs are more prone to failure, and their power supplies tend to get very hot with the continuous use.

ID: 24381 · Report as offensive     Reply Quote
old_user18828

Send message
Joined: 17 Sep 04
Posts: 3
Credit: 1,671,744
RAC: 0
Message 25293 - Posted: 25 Nov 2006, 20:39:50 UTC - in response to Message 24380.  
Last modified: 25 Nov 2006, 20:48:59 UTC

Andrew

When a model crashes, the server flags that dataset for possible re-issue. (Up to a total of 4 re-issues.)
Whether or not it DOES get re-issued is another matter, as there are millions of combinations of values for the parameters.
You can see if a model is new or a re-issue by looking at the last digit in the name. If it\'s zero, it\'s a previously used set.
You can also see who has previously tried to crunch that dataset by looking at the Work unit ID, as displayed on the page listing all of your models.

Also, when the program gets written and tested, it will be possible to start part way through a failed model, from restart dumps which are uploaded every 40 years.
These, no doudt, will also be issued at random, although at this point, only the most basic of information is known about how it will work.




I wish I\'d seen this before agonising about a nearly finished model that got lost in mysterious circumstances. Have now restarted an earlier one that had been in suspended animation for a oouple of months... going fine so far....

There is so much material on the user forums (BOINC and CPDN) that it\'s easy to miss something quite fundamental like this. Maybe it\'s in one of the \"sticky\'s\"? Must have a look.

ID: 25293 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25294 - Posted: 25 Nov 2006, 20:53:56 UTC

The 4 README files here contain lots of usefull hints, tips, and advice.

Several ways to create and use backups are in the one called Backup and Restore.

ID: 25294 · Report as offensive     Reply Quote

Message boards : climateprediction.net Science : Any way to resume a project after crash?

©2024 cpdn.org