Questions and Answers : Macintosh : System crashed, on restart BOINC downloads new model, won't work on old one
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Jan 05 Posts: 13 Credit: 1,884,525 RAC: 0 |
I downloaded my first model and was happily crunching for a few days, up to about timestep 21,000. My system (G4 tower, 10.3.7) froze for reasons unrelated to BOINC. After rebooting, I restarted BOINC; but it downloaded a new model rather than resuming the old one. It also interspersed the new model's files in with the old, creating a new projects directory inside the 'jobs' directory of the old one, among other things. I eventually tossed the entire project directory and restarted BOINC. It downloaded a third model and everything is fine again, although I guess the 21,000 timesteps already done were pointless. For future reference, can somebody please let me know how to get BOINC to resume the old model after a crash? It would be appreciated if this were explained in terms comprehensible to somebody who doesn't typically use a command line interface. EH |
Send message Joined: 17 Aug 04 Posts: 753 Credit: 9,804,700 RAC: 0 |
Sadly, there is no way to return to the old model unless you kept a backup of the BOINC folder. The program is designed to rewind itself if there is a possible computing error, but in some situations it will simply crash and there is nothing the user can do about it. It can be very frustrating, so let us hope that you were unlucky this time and the experience does not repeat. |
Send message Joined: 29 Dec 04 Posts: 9 Credit: 32,552 RAC: 0 |
Hello, I have just experienced the same situation which is indeed rather frustrating considering the time it takes to complete the model. The Mac had been running quite happily for just over a month now !!! I had restarted it a few times and it always managed to resume its activity without any particular problem. Tonight however, after installing the Mac OSX updates and restarting the machine I had the following when starting the program : [HomeG4:~/documents/dev/climateprediction] davidcar% ./boinc 2005-02-10 19:50:07 [---] Starting BOINC client version 4.13 for powerpc-apple-darwin 2005-02-10 19:50:07 [climateprediction.net] Project prefs: no separate prefs for home; using your defaults 2005-02-10 19:50:07 [climateprediction.net] Host ID is 81718 2005-02-10 19:50:07 [---] General prefs: from climateprediction.net (last modified 2005-01-05 22:57:39) 2005-02-10 19:50:07 [---] General prefs: no separate prefs for home; using your defaults 2005-02-10 19:50:07 [climateprediction.net] Resuming computation for result 2s21_100150982_2 using hadsm3 version 4.03 Starting model in /Users/dc/Documents/Dev/ClimatePrediction/projects/climateprediction.net... Created shared memory region key = 24545 Env Used=DYLD_LIBRARY_PATH=/Users/dc/Documents/Dev/ClimatePrediction/projects/climateprediction.net:../ Starting model ID 2s21_100150982 Phase 2 Stack size=48.00 MB Waiting for model startup, this may take a minute... 2s21_100150982 - PH 2 TS 091873 - 00/00/0000 00:00 - H:M:S=0931:23:14 AVG= 9.55 DLT= 0.00 Model crashed...retrying...restart level 0 Preparing for restart... Rewinding a model-day... Starting model ID 2s21_100150982 Phase 2 Stack size=48.00 MB Waiting for model startup, this may take a minute... 2s21_100150982 - PH 2 TS 091873 - 00/00/0000 00:00 - H:M:S=0931:23:14 AVG= 9.55 DLT= 0.00 Model crashed...retrying...restart level 1 Preparing for restart... Rewinding a model-month... Copying restart files for model retry... Starting model ID 2s21_100150982 Phase 2 Waiting for model startup, this may take a minute... Stack size=48.00 MB 2s21_100150982 - PH 2 TS 091873 - 00/00/0000 00:00 - H:M:S=0931:23:14 AVG= 9.55 DLT= 0.00 Model crashed...retrying...restart level 2 Preparing for restart... Rewinding a model-year... Copying restart files for model retry... Starting model ID 2s21_100150982 Phase 2 Waiting for model startup, this may take a minute... Stack size=48.00 MB 2s21_100150982 - PH 2 TS 091873 - 00/00/0000 00:00 - H:M:S=0931:23:14 AVG= 9.55 DLT= 0.00 Model crashed...retrying...restart level 3 Preparing for restart... Error: Restart files for not found Giving up, this result exceeded crash count for available restart files. adding: 2s21aa.pa.gmts.x1.nc (deflated 35%) adding: 2s21aa.pa.rmts.x1.nc (deflated 36%) adding: 2s21aa.pc.gmts.x1.nc (deflated 53%) adding: 2s21aa.pc.rmts.x1.nc (deflated 40%) adding: 2s21aa.pd.gmts.x1.nc (deflated 53%) ........ and then it started from scratch on a new model !! Am only posting this in case this helps find a solution to this issue. It seems a great waste to loose the benefit of over a month of computation just because of a "crash". Looks like I have lost the files indeed as the size of the project folder is reduced to 400MB (instead of about 600 before the event) so I guess there is no way back. I can hardly believe this has happened, should there not be any way of preventing this ? Surely the program should never replace an existing set of data without at least giving the opportunity to the operator to make that decision. Any comment or potential solution will be highly appreciated. Thank you. David. |
Send message Joined: 29 Nov 04 Posts: 7 Credit: 66,811 RAC: 0 |
> Sadly, there is no way to return to the old model unless you kept a backup of > the BOINC folder. The program is designed to rewind itself if there is a > possible computing error, but in some situations it will simply crash and > there is nothing the user can do about it. It can be very frustrating, so let > us hope that you were unlucky this time and the experience does not repeat. > It seems to me there are two situations (1) where the computer crashes, and (2) , much more frequently, where ther is a need to shut down the program because of needing to power off or restart to complete installation of a new (other) program. I have been able to shut off Boinc 2.13 by using control-C repeatedly over a couple of months, and the run has always restarted successfully. But I have just downloaded 2.19 to work on a new dual-processor computer, and each of the two times I have shut it off in the same way, the models have crashed leaving a message much like described by Frenchy (9101) (I have not been able to find a log-file on my computer; is there one?). I am obviously worried that the same will happen again, but at least I could plan by backing up the BOINC folder; could you describe the process of restoring from it? Presumably one would have to do the back-up before shutting down boinc. Thank you |
Send message Joined: 25 Nov 05 Posts: 1 Credit: 23,295 RAC: 0 |
Hi, I still have some data of my model on my hard drive, but I don\'t know what to change. It seems that all the data are still there, but only a status has to be changed. Is there a brief description of the meaning of all the files? Sadly, there is no way to return to the old model unless you kept a backup of the BOINC folder. The program is designed to rewind itself if there is a possible computing error, but in some situations it will simply crash and there is nothing the user can do about it. It can be very frustrating, so let us hope that you were unlucky this time and the experience does not repeat. |
©2024 cpdn.org