climateprediction.net (CPDN) home page
Thread 'WAH2 not checkpointing?'

Thread 'WAH2 not checkpointing?'

Questions and Answers : Windows : WAH2 not checkpointing?
Message board moderation

To post messages, you must log in.

AuthorMessage
keputnam

Send message
Joined: 31 Aug 04
Posts: 29
Credit: 3,972,828
RAC: 132
Message 71683 - Posted: 25 Oct 2024, 19:39:57 UTC

I had to shutdown BOINC to apply some Windows maintenance

The time remaining on my WAH2 WU, after restart, went from 10D 12H to 12D 15H

Is this expected behavior with an orderly shutdown?
ID: 71683 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 127
Credit: 43,925,559
RAC: 52,842
Message 71685 - Posted: 26 Oct 2024, 6:54:10 UTC - in response to Message 71683.  

I don't trust the time remaining at all. It gets updated due to various reasons and can go back and forth. The best way to check if you've lost work is by tracking the CPU time or fraction done in the properties page. Usually you should only lose the "CPU time since last checkpoint" when BOINC client restarts.
ID: 71685 · Report as offensive     Reply Quote
rob

Send message
Joined: 5 Jun 09
Posts: 97
Credit: 3,746,817
RAC: 869
Message 71686 - Posted: 26 Oct 2024, 7:15:31 UTC - in response to Message 71683.  

Checkpoints have nothing to do with the guesstimation at remaining time. CPDN does save a checkpoint at regular intervals (hours not minutes). Expected run duration is only recalculated after a trickle-up has occurred, which is at about 5% progress intervals, or a re-start. Until you have completed a number of similar tasks BOINC can't accurately calculate the expected run duration for a task - it uses a very crude estimated for your computer's performance, which is initially quite pessimistic, hence over estimates the expected duration.
ID: 71686 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4542
Credit: 19,039,635
RAC: 18,944
Message 71687 - Posted: 26 Oct 2024, 8:06:59 UTC

it uses a very crude estimated for your computer's performance, which is initially quite pessimistic,
Or very as opposed to quite pessimistic if on a new instance of BOINC you haven't run CPU benchmarks from the tools menu. (running them after you have downloaded tasks doesn't update the estimate, you have to do it first. (I usually forget!)
ID: 71687 · Report as offensive     Reply Quote
keputnam

Send message
Joined: 31 Aug 04
Posts: 29
Credit: 3,972,828
RAC: 132
Message 71766 - Posted: 31 Oct 2024, 21:54:44 UTC - in response to Message 71685.  
Last modified: 31 Oct 2024, 21:55:06 UTC

Thanks for the response



Usually you should only lose the "CPU time since last checkpoint" when BOINC client restarts.

That value is blank in Windows 10 PRO with BOINC Manager 8.0.2 x64



The best way to check if you've lost work is by tracking the CPU time

That value agrees with what is displayed in BOINC manager, and is reset each time BOINC restarts (I had more Windows maintenance last night



I used to be a programmer before I got into the Systems side of things If I had ever written a program that did not checkpoint/savestate upon a normal shutdown, I'd have been told to go back and do it again




Another thing Since I noticed this, I've been monotoring WAH2 a little closer:

There are three processes, and Resource Manager shows NONE of the three getting any CPU time or Disk activity
ID: 71766 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1052
Credit: 16,817,940
RAC: 12,877
Message 71769 - Posted: 31 Oct 2024, 22:17:00 UTC - in response to Message 71766.  

Another thing Since I noticed this, I've been monotoring WAH2 a little closer:
There are three processes, and Resource Manager shows NONE of the three getting any CPU time or Disk activity
Check in boincmgr the task is actually running and not suspended.

There is a monitor process that communicates between the models and boinc which always runs, though doesn't consume much cpu. Then the two model processes take it in turns to consume 100% cpu.
---
CPDN Visiting Scientist
ID: 71769 · Report as offensive     Reply Quote
ProfilePDW

Send message
Joined: 29 Nov 17
Posts: 82
Credit: 16,421,131
RAC: 47,543
Message 71770 - Posted: 31 Oct 2024, 23:59:06 UTC - in response to Message 71769.  

Usually after a reboot if the task is not happy when it is restarted it will abort and fail but I have seen on several occasions after a reboot that the task appears to be running and not abort, no CPU gets used and this can be seen in the Properties of the task. The CPU time reported stays at whatever point it had got up to before it crashed. These are good for Wuprop hours :D
ID: 71770 · Report as offensive     Reply Quote
keputnam

Send message
Joined: 31 Aug 04
Posts: 29
Credit: 3,972,828
RAC: 132
Message 71774 - Posted: 1 Nov 2024, 7:11:16 UTC - in response to Message 71770.  

Thanks

I've gone ahead and aborted that WU

ID: 71774 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1052
Credit: 16,817,940
RAC: 12,877
Message 71777 - Posted: 1 Nov 2024, 9:17:36 UTC - in response to Message 71770.  
Last modified: 1 Nov 2024, 9:18:07 UTC

Usually after a reboot if the task is not happy when it is restarted it will abort and fail but I have seen on several occasions after a reboot that the task appears to be running and not abort, no CPU gets used and this can be seen in the Properties of the task. The CPU time reported stays at whatever point it had got up to before it crashed. These are good for Wuprop hours :D
Yes, that can happen occasionally. It's a very specific bug that I've got on my list of things to fix. All that can be done is to abort the task in that situation.
---
CPDN Visiting Scientist
ID: 71777 · Report as offensive     Reply Quote

Questions and Answers : Windows : WAH2 not checkpointing?

©2024 cpdn.org