climateprediction.net home page
Computation error when BOINC halts

Computation error when BOINC halts

Questions and Answers : Windows : Computation error when BOINC halts
Message board moderation

To post messages, you must log in.

AuthorMessage
flensr

Send message
Joined: 17 Oct 18
Posts: 8
Credit: 1,667,803
RAC: 3,199
Message 69349 - Posted: 17 Jul 2023, 14:06:05 UTC

Hello,

I'm running BOINC manager 7.22.2 (x64), with a handful of projects running. I have had a couple of climateprediction.net tasks complete recently but I've also noticed that if I have to reboot my computer or shut down BOINC for any reason with a climateprediction.net task in progress, when I restart BOINC the task status changes to computation error.

Is there a setting I need to check or something to keep this from happening? It's happened a few times recently and it's kind of sad because I have had a few days of work done on those tasks when they failed out with the computation error. It's only the climateprediction.net tasks that are failing with the computation error whenever BOINC is halted and restarted, all tasks from other projects seem to be able to pick up where they left off when BOINC restarted or the computer was rebooted or whatever.

Thanks for any info or advice...
ID: 69349 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4529
Credit: 18,661,594
RAC: 14,529
Message 69351 - Posted: 17 Jul 2023, 14:45:14 UTC - in response to Message 69349.  

I currently have 8 tasks running and have been shutting down at night without losing any tasks. What I do is suspend computation for each task, wait about three minutes then exit BOINC wait another three minutes before shutting down.

My situation may not be completely analogous though as I run my Windows client under WINE on a Linux box. As other moderators could tell you, this is not foolproof under either OS but does improve the odds.
ID: 69351 · Report as offensive     Reply Quote
rob

Send message
Joined: 5 Jun 09
Posts: 97
Credit: 3,673,031
RAC: 4,752
Message 69352 - Posted: 17 Jul 2023, 17:39:19 UTC - in response to Message 69349.  

There is a problem with a batch of CPDN tasks which affects some, but not all users - lots of discussion on their "number crunching" forum https://www.cpdn.org/cpdnboinc/forum_thread.php?id=9149

It is particularly prevalent with tasks starting with "WAH2_EAS2", but may not be limited to them.

Taking care to shut down BOINC (not just close it) before shutting down the computer does appear to improve the situation a little.
(Windows - right click on BOINC in the taks bar, select "EXIT", the "stop all running tasks".)
ID: 69352 · Report as offensive     Reply Quote
flensr

Send message
Joined: 17 Oct 18
Posts: 8
Credit: 1,667,803
RAC: 3,199
Message 69353 - Posted: 17 Jul 2023, 18:01:52 UTC - in response to Message 69352.  

Thanks I think I'll try suspending the tasks before closing, although that doesn't help with automatic reboots for windows updates... Oh well hopefully it'll get fixed someday.
ID: 69353 · Report as offensive     Reply Quote
wateroakley

Send message
Joined: 6 Aug 04
Posts: 195
Credit: 28,192,402
RAC: 10,436
Message 69354 - Posted: 18 Jul 2023, 7:51:33 UTC - in response to Message 69349.  

Is there a setting I need to check or something to keep this from happening?
Pause 'Windows Update' for the maximum amount of time, that stops Windows annoyingly doing an update while CPDN tasks are runniing. 'Resume updates' after the tasks have fnished to keep the OS up to date, and then pause again.
ID: 69354 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4529
Credit: 18,661,594
RAC: 14,529
Message 69355 - Posted: 18 Jul 2023, 8:11:44 UTC - in response to Message 69354.  

Is there a setting I need to check or something to keep this from happening?

Block the Microsoft domains in your router is how I would stop random updates.
ID: 69355 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1044
Credit: 16,196,312
RAC: 12,647
Message 69356 - Posted: 18 Jul 2023, 8:14:44 UTC - in response to Message 69353.  

Thanks I think I'll try suspending the tasks before closing, although that doesn't help with automatic reboots for windows updates... Oh well hopefully it'll get fixed someday.
The problem is caused by the size of the checkpoint files that the task needs to do a restart. They are larger than other projects. If the task is writing to those checkpoints when the machine is suddenly shutdown the files are not written correctly and the task can't restart. It's just bad luck that sometimes the shutdown happens when the task is at the point of writing those files.

I would recommend turning on 'Leave non-GPU tasks in memory while suspended', under 'Disk & memory' in boincmgr. This will stop the task having to restart from checkpoint files if it's suspended for any reason (not a shutdown).

I also agree turning off/suspending Windows auto-updates helps too. I do the same for these long running CPDN tasks.
---
CPDN Visiting Scientist
ID: 69356 · Report as offensive     Reply Quote
rob

Send message
Joined: 5 Jun 09
Posts: 97
Credit: 3,673,031
RAC: 4,752
Message 69357 - Posted: 18 Jul 2023, 13:01:57 UTC - in response to Message 69353.  

Suspending and halting (stopping) are not the same - The safer option is to halt the processing, which forces the "resume" file to be written instantly to disk; suspend on the other hand may not even produce a resume file (worst case), or will defer its creation for some time.

As for Windows automatic updates - they are an absolute pain, and should be blocked - others have suggested ways of doing this.
ID: 69357 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1044
Credit: 16,196,312
RAC: 12,647
Message 69358 - Posted: 18 Jul 2023, 15:30:09 UTC - in response to Message 69357.  

Suspending and halting (stopping) are not the same - The safer option is to halt the processing, which forces the "resume" file to be written instantly to disk; suspend on the other hand may not even produce a resume file (worst case), or will defer its creation for some time.
That's not necessarily true. boinc can't 'force' the model to write the file. Writing the file is under the control of the model, not boinc, and the OS takes responsibility for flushing the file to disk. I know this from coding up the OpenIFS model to work under boinc. The MetO models work the same way.
ID: 69358 · Report as offensive     Reply Quote
wateroakley

Send message
Joined: 6 Aug 04
Posts: 195
Credit: 28,192,402
RAC: 10,436
Message 69368 - Posted: 19 Jul 2023, 19:51:18 UTC - in response to Message 69357.  

As for Windows automatic updates - they are an absolute pain, and should be blocked - others have suggested ways of doing this.
I fully agree for experienced CPDN users. However, more than 50 years of computing experience tells me that there are many many users where forcing OS and application updates, especially security updates, is an imperative.
ID: 69368 · Report as offensive     Reply Quote
Profile Alan K

Send message
Joined: 22 Feb 06
Posts: 490
Credit: 30,766,944
RAC: 10,886
Message 69370 - Posted: 19 Jul 2023, 22:26:22 UTC - in response to Message 69357.  

As for Windows automatic updates - they are an absolute pain, and should be blocked - others have suggested ways of doing this.


This can be done using group policies in the registry so that Windows has to ask you to do the updates - if you feel up to it. Search the web for details.
ID: 69370 · Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 9 Nov 20
Posts: 6
Credit: 6,907,448
RAC: 3,441
Message 69749 - Posted: 10 Oct 2023, 19:54:21 UTC

Getting this myself, it looks like the WU's will take around 20 days each for me, its hard to run Windows and not reboot in that time, is there any solution yet?
ID: 69749 · Report as offensive     Reply Quote
rob

Send message
Joined: 5 Jun 09
Posts: 97
Credit: 3,673,031
RAC: 4,752
Message 69753 - Posted: 10 Oct 2023, 21:35:05 UTC - in response to Message 69749.  

A couple of things
- The initial estimates of task duration are often very pessimistic, but in time they get a bit better. Do a quick calculation yourself of the time left to run, once the progress has got beyond about 10% your "guess" will be a lot more accurate than BOINC's.

Second, as these are recent tasks the will belong to the 966 batch, there's a thread running about a couple of issues with these tasks, but no solutions have arrived yet (apart from not shutting down to avoid the "fails on restart" type error, which is a real pain for those that suffer Windows forced reboots, or power cuts, or shut-down at night, or suspend to do something else...).
https://www.cpdn.org/cpdnboinc/forum_thread.php?id=9222
ID: 69753 · Report as offensive     Reply Quote

Questions and Answers : Windows : Computation error when BOINC halts

©2024 cpdn.org