Questions and Answers : Windows : Can Someone help me get back to where I was?
Message board moderation
Author | Message |
---|---|
Send message Joined: 7 Jan 08 Posts: 1 Credit: 217,502 RAC: 0 |
When I powered this morning, I discovered that my climate prediction model was reset back to zero progress. The strange thing is that on the CPU run time is where it left off when I powered down last night, 138:37:30. I also started up the graphic model to check on the model year. The model year was set back to 1810. Right before I powered down last night, I was already in Winter 1825. I would like to know if there is anyway I can get back the information to where I was, Winter 1825. I would like to thank anyone that can help me solve this problem... |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
Hi RoadWarrior and welcome to the message board. This \'slab\' model, 7093797, appears to be the one. It was just approaching the end of the first of three phases when it rewound to the start. The slab model appears to be rather sensitive at the phase change if interrupted before the phase post-processing is complete. One of my models did exactly that, rewound and finished quite happily. If you have a backup, then that can be restored. Otherwise the only thing to do is to let it run through phase 1 again and it\'ll carry on as before. There isn\'t much point aborting the model, because you would only have to start another one at the beginning. It might be an idea to watch for the phase change at timestep 259,248 and let it run for a couple of hours after, to clear the Zip file upload and any trickles (there can be quite a few trickles at the end of a phase). Iain PS This model, 7100731, is the one of mine that misbehaved. You can see that the seconds/timestep doubles at the end of phase 1. That\'s because the CPU time continues to accumulate - as you noticed. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
There are 2 possibilities: 1) The model has rewound, in which case it will get back to where it was by itself. Eventually. 2) The model has crashed, in which case, only restoring from a backup made BEFORE the failure can get it back. Check the links in my sig below; one is about making backups, the other is a set of README files, each containing links to posts on different subjects. The set for Crashes and other problems may help you. Post again if you need more help. edit I see that Iain\'s already onto it. :) Backups: Here |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Hi, everyone. This seems to be the place to post about unintended model resets. I am running 2 HadCM3’s on a dual core machine with 2GB of RAM. This morning the models were crunching their way through 1942. When I checked on them tonight they had reset to 1921 about 8000 timesteps into the model! Since both reset, I don’t think it is likely that there is a flaw in the WU’s that is causing them to loop. I don’t think that I could have accidentally clicked the reset tab in “Projects“. Fortunately, I made a backup this morning so I was able to empty the BOINC folder and refill it with the backup copy. I am back in 1942 and I only lost about 10 hours of crunching. I don’t know how this happened and I hope it doesn’t happen again. Hi RoadWarrior and welcome to the message board. |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
Jim, It looks as if the HADCM3 models are these two: 7202708 and 7202703. These two models appear to have crashed for some reason and two more were downloaded - so they didn\'t really reset, though the progress indicators for the new models would have certainly started from zero. Well done for taking and restoring a backup. To prevent new models arriving when models crash, just press the \'No new tasks\' button in BOINC Manager (and press it again when the models finish, to allow new tasks to be downloaded). Iain PS When a backup is restored a duplicate computer appears in the computer list (here). If you display the \'computer summary\' page by clicking on one of your computer links, then there is a \'merge this computer\' link at the bottom of the page, which will merge duplicate computer records. |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
The two crashes show an error code often associated with either the graphics being shut down when \'not responding\', and also with the Vista shut down process. So I\'d suggest firstly disabling the screensaver (use \'blank\' instead), and secondly shutting down Boinc prior to shutting down Vista. There are also some system settings which will reduce the chance of Vista killing the model (see the \'crashes and other problems\' readme, in the \'vista\' section, link is in my signature). I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Dear Mike and Iain: I guess your right. I checked “Projects†and the “Allow New Tasks†tab was clicked. I don’t know this happened because I am sure that I clicked “No New Tasks†when I downloaded the crashed WU’s to stop automatic downloads of new WU‘s in the event of a crash. I don’t know why they crashed because I always exit the manager before shutting down. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The error code for both of your failed models is: exit code 1073807364 (0x40010004) Codes stating with 107 are Windows \"stop\" errors, (there\'s 4 or 5 of these, with the last few numbers being different), and, as Mike said, can be associated with a graphics problem. Updating the drivers for the graphics card, (from the card maker\'s web site), often fixs the problem. Backups: Here |
Send message Joined: 5 Aug 04 Posts: 250 Credit: 93,274 RAC: 0 |
The error code for both of your failed models is: exit code 1073807364 (0x40010004) On Windows Vista these errors also occur if you shut down Windows/reboot without exiting BOINC first. Vista\'s fast shutdown mode ignores any programs still running, doesn\'t allow them to write any state to disk and corrupts them. See this BOINC FAQ for workarounds. Jord. |
©2025 cpdn.org