Message boards : Number crunching : BOINC quitting
Message board moderation
Author | Message |
---|---|
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,925,468 RAC: 12,903 |
Has anyone else experienced BOINC seemingly randomly quitting on its own? It happens to me at least once a month I'd say. It started happening earlier this year. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,578,380 RAC: 15,009 |
What version of boinc? |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,743,089 RAC: 6,177 |
And which part of BOINC is quitting - Manager, Client, or both? Any other symptoms you can describe? |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
On both machines or just on one of the two. Also, are the machines actually running Windows or are they using WINE under Linux. In the latter case, I find the manager freezes from time to time and I use xkill and restart it. Client seems unaffected. I don't notice the behaviour in a VM running Tiny10. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I am running boinc manager and boinc client on both my Linux and on my Windows machines. My Linux machine is Computer 1511241 Computer information Created 14 Nov 2020, 15:37:02 UTC Total credit 12,455,953 Average credit 1,698.73 CPU type GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7] Number of processors 16 Coprocessors --- Virtualization None Operating System Linux Red Hat Enterprise Linux Red Hat Enterprise Linux 8.10 (Ootpa) [4.18.0-553.8.1.el8_10.x86_64|libc 2.28] BOINC version 7.20.2 I am not aware that the boinc client has ever exited unless I tell it to. The boinc manager has started exiting when I click the Notices tab. BOINC version 7.20.2 is where it started doing this, but it did not always do this.. My Windows machine has been running like this, and does not seem to have problems with boinc client or boinc manager. Computer 1512658 Computer information Created 19 Dec 2020, 22:21:58 UTC Total credit 644,168 Average credit 3,045.80 CPU type GenuineIntel 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz [Family 6 Model 140 Stepping 1] Number of processors 8 Operating System Microsoft Windows 11 Core x64 Edition, (10.00.22631.00) BOINC version 8.0.2 |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,925,468 RAC: 12,903 |
The Ryzen9 5900X one. It's a Windows 10 system which I keep pretty well updated. BOINC is 8.0.2, which I also keep updated. Both Manager and Client quit. What makes me notice is that the CPU temperature reading is too low (have Core Temp in the system tray). Haven't noticed any symptoms unfortunately. It's not me closing it by accident either as I have an Exit Confirmation window popping up to prevent accidental closures. Luckily recent versions of WAH2 don't crash on BOINC shutdown but earlier in the year I have lost tasks from this happening. It's puzzling. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,578,380 RAC: 15,009 |
Check if there are any errors listed for boinc in the system messages log. Are you undervolting the 5900X by any chance? |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
earlier in the year I have lost tasks from this happening.That answers one of Richard's questions. The client must be going down or you would not have lost tasks. I think the next stage is to look at the logs. Might also be worth posting in the BOINC forums Anything in stderrdae.txt? On a default installation this will be in c:\ProgramData\BOINC I am sure Richard or Glenn will add something if there is somewhere else worth looking in as well. Edit: I see Glenn has already posted. I didn't refresh the page from last night to check before posting. |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,925,468 RAC: 12,903 |
Check if there are any errors listed for boinc in the system messages log. How would I check those logs? I did some searching in the Event Viewer and the only thing that came up are entries related to BOINC installation when I upgraded to 8.0.2. I am undervolting the CPU. I've had it at the current setting for over 2 years without issues though. I did have to up the voltage a couple of times after BIOS updates as the system kept rebooting right after those updates and upping the voltage seemed to resolve it. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,578,380 RAC: 15,009 |
Yep, I was thinking of Windows Event viewer in either the Application or System logs under 'Windows Logs'. I've just tried doing a right-click 'Find' : boinc and it's still searching after 20mins as my logs are huge. Might be worth clearing the logs, rebooting and then having another look if it fails again? is this the only app that's failing as far as you know? If it's not I'd suspect the machine itself esp if you've had stability issues. But as it's the new boinc version I wonder if that's a reason? |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,925,468 RAC: 12,903 |
I searched all of the sections of the Windows Logs, my logs don't seem to be that big as it didn't take long at all. The only things that came up are installation entries. I wonder if there's a way to monitor a specific program in the background? Yes, that's the only app that's done this, as far as I know. I think BOINC has been updated twice this year and it started earlier in the year, so it's happened to the last 2 or 3 versions. |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,925,468 RAC: 12,903 |
It happened again. From stdoutdae.txt in BOINC directory I got a timestamp and from stderr.txt in CPDN slot directories it seems like request came from BOINC. The last 2 lines are CPDN Monitor - Quit request from BOINC... Detaching shared memory... Done. There was nothing helpful in the Event Viewer that I could tell. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,578,380 RAC: 15,009 |
Ok, that suggests the client shutdown cleanly rather than seg faulting. Otherwise I wouldn't expect to see that particular message. I will check the code i to see exactly what triggers that message. If the client has suddenly disappeared the message is something like 'client heartbeat not found' instead. But I'll check. This is Richard's domain more than mine. Are there other reports of the client shutting down seemingly all by itself? --- CPDN Visiting Scientist |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,743,089 RAC: 6,177 |
I seem to have a dim memory of some reports like this appearing in the early days after the initial test release of v8.0.0 Alpha - I think as GitHub issues. I had a quick look after this thread appeared, but couldn't find them. I'll try doing a more systematic search later when I have time, but I don't find the GitHub search tools very helpful unless you have a very specific target phrase - like an error message in code - to look for. And the interface keeps changing... |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,578,380 RAC: 15,009 |
From stdoutdae.txt in BOINC directory I got a timestamp .. Was there just a timestamp, no text of any kind? I searched the forums threads which were rather old. But one thread mentioned problems after a Microsoft update. I wonder if it might be worth deleting and reinstalling the client? --- CPDN Visiting Scientist |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,743,089 RAC: 6,177 |
I've looked through the issues and PRs at GitHub, but couldn't find any reference to generalised client quitting. A few edge cases, like WSL failing to load at startup and leaving a message of 'client has failed to run three times in succession'. Like Glenn, I associate random crashes with thermal issues or bad power supplies - but they usually take down the entire system. I can't think of anything which would take out the BOINC client selectively. Except possibly Windows 11's tendency to restart at random intervals to install updates. The curious thing there is that I have BOINC installed under my user name: I have a password set so that my laptop can't be restarted without manual intervention: and yet BOINC seems to respond to remote monitoring after a restart, but before local login. I must investigate that sometime. Maybe Windows 11 tries to restore previously running apps after an automatic restart? But doesn't quite get it right for BOINC? Just scratching my head. |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,925,468 RAC: 12,903 |
There was a message in stdoutdae.txt in BOINC directory, here's the last line: 01-Aug-2024 00:31:13 [---] Exiting It does look like a controlled shutdown. While I never deleted and reinstalled BOINC, I have upgraded it a couple of times since the problem started. It's both client and manager that are quitting. I have WIndows 10, latest updates. WIndows rarely restarts on its own for updates, I usually notice that it's pending and control it or pause updates for a short time and update when I'm ready. I also know when the PC has restarted as the first thing that comes up is to turn on the undervolt, I have to do it manually. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,578,380 RAC: 15,009 |
There was a message in stdoutdae.txt in BOINC directory, here's the last line: I had a look in the boinc 8.0.2 code. That's definitely a controlled shutdown. The string 'Exiting' is found the main_loop() function in the client code. It's triggered when the state flag 'requested_exit' goes true. Looks as if something on the system told the boinc client to exit. There might be a way to get the client to dump out some more debugging information - without recompiling it. Richard might know? I'm not sure what else to suggest. Try a reinstall? --- CPDN Visiting Scientist |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,743,089 RAC: 6,177 |
I take it you found that in L455? I read that as setting requested_exit if any of a small group of signals is received (L157). But I can't see any options around that code for using a debug print option to write extra messages into stderr.txt or stdoutdae.txt I looked at the list of event log options too, and the only one that is remotely close is 'heartbeat_debug' - but that mainly concerns science apps stopping or failing to stop in unexpected ways. We have had major problems with Microsoft changing the API for closing / minimising / hiding windows and icons through XP / 7 / 10 / 11 (I was fortunate enough to avoid 8) - see #5164. But I couldn't even reproduce the original problem when asked to test #5174. I never found out what was meant by "Close window with middle mouse button" - but it feels related. |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,925,468 RAC: 12,903 |
What could signal BOINC to quit? It'd seem to me there would be very few possibilities. I do vaguely remember in times past closing programs with middle mouse button when it shouldn't have been possible. The wheel is sometimes also a button and used to be more configurable than now, from looking at current mouse configuration options. Seems like sometimes things didn't work right and the wheel would close programs. |
©2024 cpdn.org