Questions and Answers : Windows : Why the crash?
Message board moderation
Author | Message |
---|---|
Send message Joined: 31 Aug 04 Posts: 42 Credit: 547,031 RAC: 0 |
Hi Today, I came back from work to find the hadcm3istd model I was crunching had crashed :( http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=8099696 The long list of \"stderr out\" error messages don\'t mean a great deal to me, although the last few lines say: \"Model crashed: umshell1.f: TRANSO2A: Missing data in ocean UV fields\" I haven\'t changed anything on my PC (software etc), the only thing I can think of is that I\'ve been \"locking\" my PC when leaving the house. (Start -> Lock) It also crashed roughly about the time the next trickle was due... Was there anything I could have done to prevent the crash? Thanks for any insights. |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Chances are that it is terminal. However, if you have a recent backup, and given that the Run progressed to 2007, you might try restoring the backup on the chance that the problem was transient and it might through next time. Sometimes it works. As to Start/Lock, I don\'t recall the issue coming up before. If that\'s a long-standing behavior, and processing and communication with the CPDN servers has been done successfully over time, it isn\'t likely to have affected the Model. Perhaps someone who uses that feature will weigh-in. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
Perhaps someone who uses that feature will weigh-in. All of my systems are regularly locked in 3 different ways; manually, automatically when a remote access session finishes and automatically on resume from screen blanking. If locking was a problem I\'d have had a lot more failed models. "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 31 Aug 04 Posts: 42 Credit: 547,031 RAC: 0 |
Thanks for the fast replies. I do have a recent backup, but Boinc downloaded and started another model. Not sure if killing one model to try and save another is worth it. I think I\'ll leave the option to \"allow new tasks\" on as default. It would be nice if Boinc gave you the option after a crash to try a backup file before starting another model. Btw, it\'s a funny way of thinking when an option button stating \"allow new tasks\" means exactly the opposite of what it says. To quote from an old comedy show...\"Confused? you will be...\" |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
As the model was so well advanced I still think it would be worth trying your backup. But before restoring it you could back up the BOINC (or BOINC Data folder if you have version 6) folder with your new model in it. So if the crashed model fails again at the same point you\'d just abandon it and would be able to restore the new one. That way you wouldn\'t have to start a third model and you\'d keep the progress you\'ve made on the new one. Just name the backup folders in such a way that you\'ll know which backup is which. Regarding the allowing of new tasks, it\'s what is listed in the project status column that matters. That\'s the current situation. The button has to say the opposite to allow you change the project\'s status. Cpdn news |
Send message Joined: 31 Aug 04 Posts: 42 Credit: 547,031 RAC: 0 |
Thanks for the reply. So do I just stop Boinc in the normal way, and then, after backing up the current model, apply the previous models backup, or do I have to \"suspend\" the model first? I\'ll definately give it a go... |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Suspend BOINC (in the menu), first and then WAIT until it SAYS Suspended in the Tasks tab. That way, when you restart that model latter, BOINC will wait for you to start it. This will prevent any problems if the restore doesn\'t work. The less the server knows about your fiddling the better, error-label wise. As for the button label, think of it this way: There is a big machine which is started and stopped by a single push button in a different room. (Because of the noise.) So that you can tell if pushing the button will start or stop the machine, there are 2 message lights, each of which tells you what will happen when you push the button. Pushing the button also switches the lights ready for next time. Backups: Here |
Send message Joined: 31 Aug 04 Posts: 42 Credit: 547,031 RAC: 0 |
Job done sucessfully. I\'ll see what happens in about 24 hours... Thanks again for all your help. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Well done! Let us know please whether the model gets past the earlier crash point. Cpdn news |
Send message Joined: 31 Aug 04 Posts: 42 Credit: 547,031 RAC: 0 |
Success:) Boinc sucessfully crunched past the previous sticking/crash point. Thak goodness I had backed up just the day before. I used to let to it run to 3 or 4 days between backups...not any more. It\'ll be every other day from now on. Its a good feeling being able to fix something succesfully - your help was invaluable. Thankyou. A Merry Christams to all. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Glad you got it progressing again. Merry Christmas to you. |
©2025 cpdn.org