climateprediction.net (CPDN) home page
Thread 'BOINC quitting'

Thread 'BOINC quitting'

Message boards : Number crunching : BOINC quitting
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,816,935
RAC: 19,934
Message 71136 - Posted: 29 Jul 2024, 7:40:27 UTC

Has anyone else experienced BOINC seemingly randomly quitting on its own? It happens to me at least once a month I'd say. It started happening earlier this year.
ID: 71136 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 71137 - Posted: 29 Jul 2024, 8:53:28 UTC - in response to Message 71136.  

What version of boinc?
ID: 71137 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,700,823
RAC: 9,977
Message 71138 - Posted: 29 Jul 2024, 9:36:58 UTC - in response to Message 71136.  

And which part of BOINC is quitting - Manager, Client, or both? Any other symptoms you can describe?
ID: 71138 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4537
Credit: 19,001,532
RAC: 21,726
Message 71139 - Posted: 29 Jul 2024, 10:12:12 UTC - in response to Message 71136.  
Last modified: 29 Jul 2024, 10:12:53 UTC

On both machines or just on one of the two. Also, are the machines actually running Windows or are they using WINE under Linux. In the latter case, I find the manager freezes from time to time and I use xkill and restart it. Client seems unaffected. I don't notice the behaviour in a VM running Tiny10.
ID: 71139 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 71142 - Posted: 29 Jul 2024, 13:07:04 UTC - in response to Message 71138.  

I am running boinc manager and boinc client on both my Linux and on my Windows machines.

My Linux machine is

Computer 1511241
Computer information

Created 	14 Nov 2020, 15:37:02 UTC
Total credit 	12,455,953
Average credit 	1,698.73
CPU type 	GenuineIntel
Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7]
Number of processors 	16
Coprocessors 	---
Virtualization 	None
Operating System 	Linux Red Hat Enterprise Linux
Red Hat Enterprise Linux 8.10 (Ootpa) [4.18.0-553.8.1.el8_10.x86_64|libc 2.28]
BOINC version 	7.20.2


I am not aware that the boinc client has ever exited unless I tell it to.
The boinc manager has started exiting when I click the Notices tab. BOINC version 7.20.2 is where it started doing this, but it did not always do this..

My Windows machine has been running like this, and does not seem to have problems with boinc client or boinc manager.

Computer 1512658
Computer information
Created 	19 Dec 2020, 22:21:58 UTC
Total credit 	644,168
Average credit 	3,045.80
CPU type 	GenuineIntel
11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz [Family 6 Model 140 Stepping 1]
Number of processors 	8
Operating System 	Microsoft Windows 11
Core x64 Edition, (10.00.22631.00)
BOINC version 	8.0.2

ID: 71142 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,816,935
RAC: 19,934
Message 71143 - Posted: 29 Jul 2024, 19:41:44 UTC

The Ryzen9 5900X one. It's a Windows 10 system which I keep pretty well updated. BOINC is 8.0.2, which I also keep updated. Both Manager and Client quit. What makes me notice is that the CPU temperature reading is too low (have Core Temp in the system tray). Haven't noticed any symptoms unfortunately. It's not me closing it by accident either as I have an Exit Confirmation window popping up to prevent accidental closures. Luckily recent versions of WAH2 don't crash on BOINC shutdown but earlier in the year I have lost tasks from this happening. It's puzzling.
ID: 71143 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 71144 - Posted: 29 Jul 2024, 21:51:20 UTC - in response to Message 71143.  
Last modified: 29 Jul 2024, 21:52:22 UTC

Check if there are any errors listed for boinc in the system messages log.

Are you undervolting the 5900X by any chance?
ID: 71144 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4537
Credit: 19,001,532
RAC: 21,726
Message 71145 - Posted: 30 Jul 2024, 5:28:00 UTC - in response to Message 71143.  
Last modified: 30 Jul 2024, 5:31:51 UTC

earlier in the year I have lost tasks from this happening.
That answers one of Richard's questions. The client must be going down or you would not have lost tasks. I think the next stage is to look at the logs. Might also be worth posting in the BOINC forums Anything in stderrdae.txt? On a default installation this will be in c:\ProgramData\BOINC

I am sure Richard or Glenn will add something if there is somewhere else worth looking in as well.
Edit: I see Glenn has already posted. I didn't refresh the page from last night to check before posting.
ID: 71145 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,816,935
RAC: 19,934
Message 71146 - Posted: 30 Jul 2024, 7:27:42 UTC - in response to Message 71144.  

Check if there are any errors listed for boinc in the system messages log.

Are you undervolting the 5900X by any chance?

How would I check those logs? I did some searching in the Event Viewer and the only thing that came up are entries related to BOINC installation when I upgraded to 8.0.2.

I am undervolting the CPU. I've had it at the current setting for over 2 years without issues though. I did have to up the voltage a couple of times after BIOS updates as the system kept rebooting right after those updates and upping the voltage seemed to resolve it.
ID: 71146 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 71147 - Posted: 31 Jul 2024, 15:19:23 UTC - in response to Message 71146.  

Yep, I was thinking of Windows Event viewer in either the Application or System logs under 'Windows Logs'. I've just tried doing a right-click 'Find' : boinc and it's still searching after 20mins as my logs are huge. Might be worth clearing the logs, rebooting and then having another look if it fails again?

is this the only app that's failing as far as you know? If it's not I'd suspect the machine itself esp if you've had stability issues. But as it's the new boinc version I wonder if that's a reason?
ID: 71147 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,816,935
RAC: 19,934
Message 71148 - Posted: 31 Jul 2024, 20:46:55 UTC - in response to Message 71147.  

I searched all of the sections of the Windows Logs, my logs don't seem to be that big as it didn't take long at all. The only things that came up are installation entries. I wonder if there's a way to monitor a specific program in the background?

Yes, that's the only app that's done this, as far as I know. I think BOINC has been updated twice this year and it started earlier in the year, so it's happened to the last 2 or 3 versions.
ID: 71148 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,816,935
RAC: 19,934
Message 71154 - Posted: 1 Aug 2024, 9:25:51 UTC

It happened again. From stdoutdae.txt in BOINC directory I got a timestamp and from stderr.txt in CPDN slot directories it seems like request came from BOINC. The last 2 lines are
CPDN Monitor - Quit request from BOINC...
Detaching shared memory... Done.

There was nothing helpful in the Event Viewer that I could tell.
ID: 71154 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 71157 - Posted: 1 Aug 2024, 11:56:53 UTC - in response to Message 71154.  
Last modified: 1 Aug 2024, 11:57:33 UTC

Ok, that suggests the client shutdown cleanly rather than seg faulting. Otherwise I wouldn't expect to see that particular message. I will check the code i to see exactly what triggers that message. If the client has suddenly disappeared the message is something like 'client heartbeat not found' instead. But I'll check.

This is Richard's domain more than mine. Are there other reports of the client shutting down seemingly all by itself?
---
CPDN Visiting Scientist
ID: 71157 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,700,823
RAC: 9,977
Message 71158 - Posted: 1 Aug 2024, 12:27:03 UTC - in response to Message 71157.  

I seem to have a dim memory of some reports like this appearing in the early days after the initial test release of v8.0.0 Alpha - I think as GitHub issues.

I had a quick look after this thread appeared, but couldn't find them. I'll try doing a more systematic search later when I have time, but I don't find the GitHub search tools very helpful unless you have a very specific target phrase - like an error message in code - to look for. And the interface keeps changing...
ID: 71158 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 71160 - Posted: 1 Aug 2024, 13:34:58 UTC - in response to Message 71154.  

From stdoutdae.txt in BOINC directory I got a timestamp ..

Was there just a timestamp, no text of any kind?

I searched the forums threads which were rather old. But one thread mentioned problems after a Microsoft update. I wonder if it might be worth deleting and reinstalling the client?
---
CPDN Visiting Scientist
ID: 71160 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,700,823
RAC: 9,977
Message 71161 - Posted: 1 Aug 2024, 15:53:38 UTC

I've looked through the issues and PRs at GitHub, but couldn't find any reference to generalised client quitting. A few edge cases, like WSL failing to load at startup and leaving a message of 'client has failed to run three times in succession'.

Like Glenn, I associate random crashes with thermal issues or bad power supplies - but they usually take down the entire system. I can't think of anything which would take out the BOINC client selectively.

Except possibly Windows 11's tendency to restart at random intervals to install updates. The curious thing there is that I have BOINC installed under my user name: I have a password set so that my laptop can't be restarted without manual intervention: and yet BOINC seems to respond to remote monitoring after a restart, but before local login. I must investigate that sometime.

Maybe Windows 11 tries to restore previously running apps after an automatic restart? But doesn't quite get it right for BOINC? Just scratching my head.
ID: 71161 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,816,935
RAC: 19,934
Message 71163 - Posted: 1 Aug 2024, 20:45:44 UTC

There was a message in stdoutdae.txt in BOINC directory, here's the last line:
01-Aug-2024 00:31:13 [---] Exiting

It does look like a controlled shutdown.

While I never deleted and reinstalled BOINC, I have upgraded it a couple of times since the problem started.

It's both client and manager that are quitting. I have WIndows 10, latest updates. WIndows rarely restarts on its own for updates, I usually notice that it's pending and control it or pause updates for a short time and update when I'm ready. I also know when the PC has restarted as the first thing that comes up is to turn on the undervolt, I have to do it manually.
ID: 71163 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 71164 - Posted: 2 Aug 2024, 11:53:27 UTC - in response to Message 71163.  
Last modified: 2 Aug 2024, 11:55:20 UTC

There was a message in stdoutdae.txt in BOINC directory, here's the last line:
01-Aug-2024 00:31:13 [---] Exiting
It does look like a controlled shutdown.

I had a look in the boinc 8.0.2 code. That's definitely a controlled shutdown. The string 'Exiting' is found the main_loop() function in the client code. It's triggered when the state flag 'requested_exit' goes true. Looks as if something on the system told the boinc client to exit.

There might be a way to get the client to dump out some more debugging information - without recompiling it. Richard might know? I'm not sure what else to suggest. Try a reinstall?
---
CPDN Visiting Scientist
ID: 71164 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,700,823
RAC: 9,977
Message 71168 - Posted: 2 Aug 2024, 15:12:53 UTC - in response to Message 71164.  

I take it you found that in L455? I read that as setting requested_exit if any of a small group of signals is received (L157). But I can't see any options around that code for using a debug print option to write extra messages into stderr.txt or stdoutdae.txt

I looked at the list of event log options too, and the only one that is remotely close is 'heartbeat_debug' - but that mainly concerns science apps stopping or failing to stop in unexpected ways.

We have had major problems with Microsoft changing the API for closing / minimising / hiding windows and icons through XP / 7 / 10 / 11 (I was fortunate enough to avoid 8) - see #5164. But I couldn't even reproduce the original problem when asked to test #5174. I never found out what was meant by "Close window with middle mouse button" - but it feels related.
ID: 71168 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,816,935
RAC: 19,934
Message 71169 - Posted: 3 Aug 2024, 6:32:51 UTC

What could signal BOINC to quit? It'd seem to me there would be very few possibilities.

I do vaguely remember in times past closing programs with middle mouse button when it shouldn't have been possible. The wheel is sometimes also a button and used to be more configurable than now, from looking at current mouse configuration options. Seems like sometimes things didn't work right and the wheel would close programs.
ID: 71169 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : BOINC quitting

©2024 cpdn.org