climateprediction.net (CPDN) home page
Thread 'BOINC quitting'

Thread 'BOINC quitting'

Message boards : Number crunching : BOINC quitting
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,705,793
RAC: 9,655
Message 71170 - Posted: 3 Aug 2024, 21:07:54 UTC

Just checking my remote monitoring before preparing to turn in for the night, and found this:

03-Aug-2024 17:39:49 [Einstein@Home] Task Ter5_1_dns_cfbf00023_segment_6_dms_200_40000_13_9100000_1 exited with zero status but no 'finished' file
03-Aug-2024 17:39:49 [Einstein@Home] If this happens repeatedly you may need to reset the project.
03-Aug-2024 17:39:49 [Einstein@Home] Task Ter5_1_dns_cfbf00023_segment_6_dms_200_40000_175_2800000_1 exited with zero status but no 'finished' file
03-Aug-2024 17:39:49 [Einstein@Home] If this happens repeatedly you may need to reset the project.
03-Aug-2024 17:39:49 [NumberFields@home] Task wu_sf6_DS-14x11_Grp2546298of12000000_1 exited with zero status but no 'finished' file
03-Aug-2024 17:39:49 [NumberFields@home] If this happens repeatedly you may need to reset the project.
03-Aug-2024 17:39:49 [Einstein@Home] [cpu_sched] Restarting task Ter5_1_dns_cfbf00023_segment_6_dms_200_40000_13_9100000_1 using einsteinbinary_BRP7 version 12 (BRP7-cuda55) in slot 0
03-Aug-2024 17:39:49 [Einstein@Home] [cpu_sched] Restarting task Ter5_1_dns_cfbf00023_segment_6_dms_200_40000_175_2800000_1 using einsteinbinary_BRP7 version 12 (BRP7-cuda55) in slot 1
03-Aug-2024 17:39:49 [NumberFields@home] [cpu_sched] Restarting task wu_sf6_DS-14x11_Grp2546298of12000000_1 using GetDecics version 400 (default) in slot 3
03-Aug-2024 17:39:50 [---] Exiting
Came down to my workroom, and found the machine was entirely dark. I powered it up, and it started normally - no warnings about an unexpected shutdown, no offer of safe mode, all seems normal. I get this in the new event log:

03/08/2024 21:43:25 |  | Checking presence of 311 project files
03/08/2024 21:43:25 | climateprediction.net | [cpu_sched] Restarting task wah2_eas25_n07y_200912_24_1022_012312461_0 using wah2_ri version 832 in slot 4
03/08/2024 21:43:25 | climateprediction.net | [cpu_sched] Restarting task wah2_eas25_g0e2_201012_24_1023_012317721_0 using wah2_ri version 832 in slot 2
03/08/2024 21:43:26 | Einstein@Home | [cpu_sched] Restarting task Ter5_1_dns_cfbf00023_segment_6_dms_200_40000_13_9100000_1 using einsteinbinary_BRP7 version 12 (BRP7-cuda55) in slot 0
03/08/2024 21:43:26 | Einstein@Home | [cpu_sched] Restarting task Ter5_1_dns_cfbf00023_segment_6_dms_200_40000_175_2800000_1 using einsteinbinary_BRP7 version 12 (BRP7-cuda55) in slot 1
03/08/2024 21:43:27 | NumberFields@home | [cpu_sched] Restarting task wu_sf6_DS-14x11_Grp2546298of12000000_1 using GetDecics version 400 (default) in slot 3
No other warnings in the log, apart from a few leftovers from experiments in various test projects years ago (must clean those up someday...)

Machine is Intel i5, running Windows 7 x64 and BOINC v 7.24.1 - it's host 1390516 here. The CPDN tasks seem to have restarted at around the expected point (about 38%), and one of them has uploaded a _9.zip while I've been typing. No problems on other machines, weather is fine (no thunderstorms), no obvious power problems. Further investigations in the morning.
ID: 71170 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,833,502
RAC: 19,744
Message 71171 - Posted: 4 Aug 2024, 0:37:11 UTC - in response to Message 71170.  

Mine didn't have any errors like that before "Exiting". Prior to "Exiting", the entries are Start and Finish of uploading Einstein BRP7 (CUDA) tasks. Upon BOINC start the entries are normal BOINC start up entries.

Anything in the Event Viewer or Reliability Monitor? Not sure if Windows 7 has those features.

I have an i7-4790 with Windows 10 but it's not the PC that's has this issue.
ID: 71171 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,705,793
RAC: 9,655
Message 71176 - Posted: 4 Aug 2024, 11:03:45 UTC - in response to Message 71171.  

This copy of Windows 7 is the Pro edition, so it's got the full array of Administrative tools - including Event Viewer, but not Reliability Monitor. But I find those are really designed to work with a Windows Domain Controller and a Group Policy setup. I'll look, but don't expect much. [I've worked with small business networks previously, but I'm retired now - this is just a home hobbyist network now.]

I've looked further back in the BOINC Event Log, but there wasn't a peep from CPDN since the last Trickle / Upload pair. I think that's a possible area for investigation - "The dog that didn't bark in the night-time".

Both NumberFields and Einstein reacted as if they'd received a 'shutting down' signal - but CPDN didn't.
Then BOINC tried to restart the other two projects, as if it hadn't heard the shutdown signal either.
All of that within a single second.
And the next second, BOINC exited. Strange timing.

The machine ran smoothly overnight, and returned another trickle from both tasks. There's nothing unusual in stderr.txt - just a normal restart.
ID: 71176 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,705,793
RAC: 9,655
Message 71177 - Posted: 4 Aug 2024, 11:28:24 UTC

OK - system event viewer says it all started with:

The process C:\Windows\system32\winlogon.exe (*****) has initiated the power off of computer ***** on behalf of user *****\Richard Haselgrove for the following reason: No title for this reason could be found
Reason Code: 0x500ff
Shutdown Type: power off
Comment:
That wasn't me, guv - at least, not consciously. I did come down here to update my laptop on the same network - a quarter of an hour later, after it finished the first pair of CPDN tasks this run, before starting the next pair. But I didn't have to go anywhere near the power button for the desktop.
ID: 71177 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,833,502
RAC: 19,744
Message 71186 - Posted: 5 Aug 2024, 3:56:37 UTC - in response to Message 71177.  

So also a controlled shutdown of unknown initiation, but of the entire system. I'm not sure what BOINC logs look like if there's a controlled shutdown of the system without exiting BOINC first.
ID: 71186 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 71193 - Posted: 5 Aug 2024, 8:39:29 UTC - in response to Message 71186.  

Yep, it's still a puzzle. I've had brownouts before that can leave the machine in a funny state with multiple processes affected. But yours only seems to be affecting the boinc client?

Is it possible it's related to the system running updates?
---
CPDN Visiting Scientist
ID: 71193 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,833,502
RAC: 19,744
Message 71199 - Posted: 5 Aug 2024, 13:40:26 UTC - in response to Message 71193.  

Yep, it's still a puzzle. I've had brownouts before that can leave the machine in a funny state with multiple processes affected. But yours only seems to be affecting the boinc client?

Is it possible it's related to the system running updates?

Yes, as far as I can tell it's just BOINC manager & client.

Not sure why it would be as I pause updates when running CPDN tasks and control them. When no CPDN tasks, I notice that a restart is pending due to updates and control it and shutdown BOINC before restarting PC for updates.
ID: 71199 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,833,502
RAC: 19,744
Message 71274 - Posted: 16 Aug 2024, 8:11:55 UTC

I found the culprit behind the BOINC shutdowns - Task Scheduler. I have BOINC starting via Task Scheduler after CPU & GPU tuning utilities to turn on the undervolt and GPU settings first. However, there's a setting that was checked to shutdown the task if it runs longer than 3 days. So every time there was a PC restart, 3 days later BOINC would shut down but I didn't notice that pattern as I don't restart the PC often. I didn't try to investigate the shutdowns until recently, upon starting this thread. Earlier I looked at the Task Scheduler settings and started suspecting it, today I was able to confirm it.

I'm not sure why that setting was on but I got it turned off now. I suspect at some point earlier this year a Windows update made that unwanted change on its own, which I heard tends to happen in general.
ID: 71274 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 71276 - Posted: 16 Aug 2024, 9:30:51 UTC - in response to Message 71274.  

Great you found it. That must have taken some time.
ID: 71276 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : BOINC quitting

©2024 cpdn.org