Message boards : Number crunching : Computation Error
Message board moderation
Author | Message |
---|---|
Send message Joined: 27 Mar 12 Posts: 1 Credit: 3,191 RAC: 0 |
Both of my climate prediction tasks just failed with the error above. Can they be recovered and restarted, or do new tasks have to be downloaded? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
|
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
As all three models on that computer have failed, it may be worth checking some of the known causes of crashes before downloading any more. - Running memtest to ensure it is not a faulty memory problem that only show up under the intensive load cpdn puts on some machines, making sure the boinc data directory is excluded from any virus scans as they can put a lock on the file when boinc needs to write to it. Also on my machine the odds improve if I suspend work units and exit boinc before turning machine off. Not quite sure if this last one is relevant under windows or just linux. Dave |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Suspending and exiting from Boinc before shutdown is relevant in Windows. You can loose a model if the shutdown catches it at a crucial moment such as when it is writing to the disk. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Oh for an intelligent OS that will shut everything down cleanly before closing itself down! |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Unfortunately, that would require "slow and careful", and people want "Faster! Faster!", which is what is happening with each new version of Windows. :( Sometimes people just have to include themselves in the loop. The problem with this project, is that there's a LOT of ancillary files open, all of which need to be shut down. I think that the problem mainly occurs if shut down occurs while the files are being saved at a checkpoint, and only some have been saved. Then some of them are out of sync with the others, and the program can't restart. Backups: Here |
Send message Joined: 28 Jun 07 Posts: 31 Credit: 4,341,796 RAC: 624 |
I normally do 'File | Exit BOINC' and tell it that I do want to stop running science applications. That seems to work OK, but maybe I have been lucky. Do I need to do something else before doing that (suspend the CPDN tasks in the task tab or the CPDN project in in the project tab)? Thanks |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
That's prudent and proper, so boinc should "do right" every time. As an extra measure of caution, I developed the habit, years ago, of clicking "Suspend" in boinc "Activity" before "File/Exit." (As you are aware, having been around since 2007, we used to run some really long tasks and there was no such thing as being too safe.) Cheers, Randi. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 28 Jun 07 Posts: 31 Credit: 4,341,796 RAC: 624 |
Thank you! |
©2024 cpdn.org