Message boards : climateprediction.net Science : Any way to resume a project after crash?
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Jul 06 Posts: 2 Credit: 48,989 RAC: 0 |
My PC crashed and now the project BOINC was working on failed after 1000 CPU hours... Is there any way to resume the project at some point? It was at 1989 :/ Guess I\'ll do some regular backups from now on... I think BOINC should incorporate this in it\'s program... |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
Only if you have a backup to work from. On the more positive side, the model will have been uploading a summary of it\'s progress at yearly and decade intervals, and at 1960 a much more detailed summary was uploaded (a \'restart dump\'). I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 28 Jul 06 Posts: 2 Credit: 48,989 RAC: 0 |
Only if you have a backup to work from. On the more positive side, the model will have been uploading a summary of it\'s progress at yearly and decade intervals, and at 1960 a much more detailed summary was uploaded (a \'restart dump\'). So is it possible to restart from there? The data is still on my harddrive... |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
So is it possible to restart from there? The data is still on my harddrive... Sometime in the future, when the software gets written and tested. But it\'s unlikely that the needed data is still on your computer; just other remnants of the crash. |
Send message Joined: 28 Nov 05 Posts: 7 Credit: 2,629,585 RAC: 0 |
Hello folks. This is about the closest thread I can find to match my issues/questions. I have two systems running BOINC. The home system is fine and stable. The other is my laptop which has the problem. It\'s an older system but capable if I don\'t forget and overtax it. Today I got the Windows XP Pro blue screen of death. No idea what happened until I got it back up. Turns out the video driver appears to be unstable. After extensive searching there are no updates or further fixes and the latest driver is not \"bad\", just old. I am stuck with the driver and hardware showing signs of stress with the greater loads it\'s seeing after 5+ years. When BOINC restarted after the crash my CPDN automatically downloaded new work. The old project was about 7% complete doing about 1% for each 100 hours. I was unaware of the backup option mentioned. I\'ll implement that for this system, probably both. OKAY, with all that background my questions are: Is there any move afoot to fix the issue with lost work on a client system besides finding out the hard way that it is available after losing so much effort? Secondly, and probably a bigger issue, on checking my work on the CPDN site there are a lot of server and client errors with few work units showing credit. It seems a waste of time with this project on this system. Is this fixable or is it symptomatic of something I\'m not seeing? I hate to live with this error rate so either the program is just too much for this machine or I\'m doing something causeing errors. After reviewing other posts I\'m inclined to think the advice would be to continue CPDN on the home system, terminate it on the laptop and increase the work from other projects on the laptop to compensate. Does this sound reasonable? Thanks guys, B. B. Stanfield III |
Send message Joined: 22 Jul 06 Posts: 3 Credit: 88,290 RAC: 0 |
I\'ve lost 3 projects in the last 3 months. None have got past the 1940\'s. Chances are the same thing will happen to this project. So if the project reports at various intervals, and a project crashes, will anyone anywhere ever resume the unfinished project? And can I resume someone elses unfinished project? I would like to see one project one day. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Andrew When a model crashes, the server flags that dataset for possible re-issue. (Up to a total of 4 re-issues.) Whether or not it DOES get re-issued is another matter, as there are millions of combinations of values for the parameters. You can see if a model is new or a re-issue by looking at the last digit in the name. If it\'s zero, it\'s a previously used set. You can also see who has previously tried to crunch that dataset by looking at the Work unit ID, as displayed on the page listing all of your models. Also, when the program gets written and tested, it will be possible to start part way through a failed model, from restart dumps which are uploaded every 40 years. These, no doudt, will also be issued at random, although at this point, only the most basic of information is known about how it will work. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
B. B. Stanfield III Yes, it would be best to try other projects on your laptop. One of the basic recommendations for this project is that people DON\'T run it on laptops. (Unless they have good fire insurance.) These computers were never intended to be run with such a maths intensive program for so long a time. Their cooling isn\'t up to it, their HDs are more prone to failure, and their power supplies tend to get very hot with the continuous use. |
Send message Joined: 17 Sep 04 Posts: 3 Credit: 1,671,744 RAC: 0 |
Andrew I wish I\'d seen this before agonising about a nearly finished model that got lost in mysterious circumstances. Have now restarted an earlier one that had been in suspended animation for a oouple of months... going fine so far.... There is so much material on the user forums (BOINC and CPDN) that it\'s easy to miss something quite fundamental like this. Maybe it\'s in one of the \"sticky\'s\"? Must have a look. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The 4 README files here contain lots of usefull hints, tips, and advice. Several ways to create and use backups are in the one called Backup and Restore. |
©2024 cpdn.org