Message boards : Number crunching : #1020,1,2,3...
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,891,506 RAC: 18,742 |
I assume if I suspend them and then reboot, they will not come back. So should I abort them? No, don't abort. You don't even have to suspend the tasks, just exit BOINC like normal (File -> Exit BOINC) and restart the PC. I've gone thru several restarts, including ones without exiting BOINC first and have yet to loose a task from that. All of these new batches are using the new app version, 8.32, which doesn't suffer from crashes when PC restarts or BOINC shuts down. Chances of you loosing those tasks from a restart are de minimis. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,490,541 RAC: 15,784 |
Just to echo previous message. I've fixed the bugs that were responsible for causing the model to sometimes crash on a restart. It doesn't suffer any of the same issues now in the v8.32 app. Just leave the tasks as you would for any project. Suspending tasks before shutting down was an 'urban myth'. It didn't actually make any difference to the chances of the model crashing. The problems were related to the way the model processes were talking to each other on startup, nothing to do with BOINC. --- CPDN Visiting Scientist |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
No, don't abort. You don't even have to suspend the tasks, just exit BOINC like normal (File -> Exit BOINC) and restart the PC. I do not think I understand this. I know how to stop the boinc-manager, but I do not know how to start and stop the client. (I do know how to do that with Linux, but not in Windows 11.) Do you mean File->Shutdown connected client in the boinc manager? |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Do you mean File->Shutdown connected client in the boinc manager?That is how I would do it unless that no longer works with 11 in the same way the option has gone in Linux. (It still worked on a self compiled BOINC last time I tried on Ubuntu.) |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,490,541 RAC: 15,784 |
I just shutdown the PC and let Windows shut down the boinc client. Works fine with the new WaH app. --- CPDN Visiting Scientist |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Do you mean File->Shutdown connected client in the boinc manager? OK. That seemed to work. Not only did the two tasks come back up where they left off, I got a new third one! They are all 8.32 tasks. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,719,896 RAC: 7,946 |
Just triggered a manual update of my Windows 11 lappy between jobs, and watched the restart with Task Manager. As soon as I could open it, that is - it took ages. BOINC started automatically as a service, before I logged in. But it took more than eight minutes before things had settled down enough for me to feel confident enough to start the replacement CPDN tasks I downloaded last night. All sorts of other things were using up to 100% CPU, and hundreds of megabytes of memory, and flickering up and down the usage list. I've suppressed updates for another 5 weeks so they can run in peace ... [the button says 1 week, but I have a drop-down list for up to 5 weeks - that may be version specific] |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,891,506 RAC: 18,742 |
OK. That seemed to work. Not only did the two tasks come back up where they left off, I got a new third one! They are all 8.32 tasks. I'm glad it worked. I did mean File --> Exit BOINC, It's the last menu option, it shuts down both the client and the manager. It's always been available on Windows BOINC although perhaps not on Linux. File -->Shutdown connected client, just shuts down the client. You can use that option to shutdown individual clients if you're controlling multiple ones from the same manager. Although it sounds like restarting the PC without exiting BOINC works too. I'm in the habit of closing programs before restarting though. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,719,896 RAC: 7,946 |
I did mean File --> Exit BOINC, It's the last menu option, it shuts down both the client and the manager.Careful with that advice - it depends on another setting, which isn't visible by default. If you go to the menu Options --> Other options, you'll see an option for "Enable Manager exit dialog?". If you check that, and follow the Exit BOINC route, you'll another dialog - with a checkbox for 'Shut down connected client on exit?' or words to that effect - I don't want to go too far down that road just at the moment! Work your way through that route just once, and check that the setting is right for your choice - you can turn off the Exit dialog when you're satisfied. Your choice will be remembered. |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,891,506 RAC: 18,742 |
Just triggered a manual update of my Windows 11 lappy between jobs, and watched the restart with Task Manager. As soon as I could open it, that is - it took ages. Wow, a 1.6GHz CPU, just above the minimum of 1GHz for Windows11. Your tasks still finish in about 11 days, which, looking at stats, seems like about average time. Are the tasks running boosted? It seems like that processor can boost up to 3.4GHz. My i7-4790 is also very slow when updating, but that's almost certainly due to it still having the original HDD. Your tasks run a bit faster than mine though!? Maybe that's because I run 4 at a time and you seem to run 2. |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,891,506 RAC: 18,742 |
I did mean File --> Exit BOINC, It's the last menu option, it shuts down both the client and the manager.Careful with that advice - it depends on another setting, which isn't visible by default. Yes, that's true, you can make some customizations. I believe by default it asks for confirmations. I think mine is on default. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,719,896 RAC: 7,946 |
Wow, a 1.6GHz CPU, just above the minimum of 1GHz for Windows11. Your tasks still finish in about 11 days, which, looking at stats, seems like about average time. Are the tasks running boosted? It seems like that processor can boost up to 3.4GHz.Yes, it's a nice machine. It's an ultra-portable Dell XPS 13, which got very good reviews on release. This is the 2018 model, but I got it well below list price in the end-of-line factory clearance in early 2019. It makes an excellent travelling companion. I'm processing these batches at two tasks per machine (max_concurrent), but with a server limit of 4 - that way, I can download new tasks while the old ones are still running: avoids the 1 hour delay after the last trickle, before I can download a replacement. The lappy has an SSD, but only 8 GB of RAM: I'm keeping the loading below maximum, to avoid over-stressing the cooling system - that's probably the weak spot for this sort of work. That probably helps with the boost, although I haven't deliberately tried to set that myself. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,490,541 RAC: 15,784 |
There are currently ~310 workunits completed per day, from the East Asia 25km batches numbers 1020 >> 1027. The results of which are going to the South Korean upload server. --- CPDN Visiting Scientist |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Well, another four tasks failed at the same time. I infer an unscheduled system update occurred o my Windows box. This was very early Tuesday morning. Here is one of them. Task 22473689 Name wah2_eas25_n2yc_201812_24_1022_012316003_0 Workunit 12316003 Created 24 Jul 2024, 13:55:41 UTC Sent 30 Aug 2024, 3:22:27 UTC Report deadline 8 Dec 2024, 3:22:27 UTC Received 11 Sep 2024, 3:32:06 UTC Server state Over Outcome Computation error Client state Compute error Exit status 9 (0x00000009) Unknown error code Computer ID 1512658 Run time 10 days 19 hours 54 min 33 sec CPU time 9 days 14 hours 57 min 27 sec Validate state Invalid Credit 14,931.45 Device peak FLOPS 3.68 GFLOPS Application version Weather At Home 2 (wah2) (region independent) v8.32 windows_intelx86 Peak working set size 342.49 MB Peak swap size 309.91 MB Peak disk usage 94.95 MB Stderr <core_client_version>8.0.2</core_client_version> <![CDATA[ <message> The storage control block address is invalid. (0x9) - exit code 9 (0x9)</message> <stderr_txt> modelGetExecutables: check control files, strTemp0 & 1 : C:\ProgramData\BOINC/projects/climateprediction.net/wah2_eas25_n2yc_201812_24_1022_012316003/jobs/xadae.namelists C:\ProgramData\BOINC/projects/climateprediction.net/wah2_eas25_n2yc_201812_24_1022_012316003/jobs/xacxf.namelists modelGetExecutables: unzipping control files : strInput & strTmp wah2_eas25_n2yc_201812_24_1022_012316003.zip wah2_eas25_n2yc_201812_24_1022_012316003/jobs gstrDump[0] = generic_phase1_spinup_eas25_global_aabaka_f gstrDump[1] = generic_phase1_spinup_eas25_regional_aabaka_f global model: command string: "C:\ProgramData\BOINC/projects/climateprediction.net/wah2am3m2_um_8.32_windows_intelx86.exe" wah2_eas25_n2yc_201812_24_1022_012316003 generic_phase1_spinup_eas25_global_aabaka_f ic19610310_14_N96 NATclim_ancil_168months_CMIP6-ACCESS-CM2_SST_2009-01-01_2022-12-30_v2404b NATclim_ancil_168months_CMIP6-ACCESS-CM2_SIC_2009-01-01_2022-12-30_v2404b so2dms_prei_N96_1855_0000P oxi.addfa ozone_preind_N96_1879_0000Pv5 regional model: command string: "C:\ProgramData\BOINC/projects/climateprediction.net/wah2rm3m2t_um_8.32_windows_intelx86.exe" wah2_eas25_n2yc_201812_24_1022_012316003 cpdn_check_running: got RM PID of zero; ignoring this call and waiting for PID via shMem. cpdn_check_running: got RM PID of zero; ignoring this call and waiting for PID via shMem. executeModelProcess: MonID=13968, GCM_PID=7496, RCM_PID=3268 Suspended CPDN Monitor - Suspend request from BOINC... Queuing intermediate upload for CPDN/BOINC: cpdnout1.zip Queuing intermediate upload for CPDN/BOINC: cpdnout2.zip Suspended CPDN Monitor - Suspend request from BOINC... Queuing intermediate upload for CPDN/BOINC: cpdnout3.zip Queuing intermediate upload for CPDN/BOINC: cpdnout4.zip Suspended CPDN Monitor - Suspend request from BOINC... Queuing intermediate upload for CPDN/BOINC: cpdnout5.zip Queuing intermediate upload for CPDN/BOINC: cpdnout6.zip Suspended CPDN Monitor - Suspend request from BOINC... Queuing intermediate upload for CPDN/BOINC: cpdnout7.zip Queuing intermediate upload for CPDN/BOINC: cpdnout8.zip Queuing intermediate upload for CPDN/BOINC: cpdnout9.zip Queuing intermediate upload for CPDN/BOINC: cpdnout10.zip Queuing intermediate upload for CPDN/BOINC: cpdnout11.zip Suspended CPDN Monitor - Suspend request from BOINC... Queuing intermediate upload for CPDN/BOINC: cpdnout_restart.zip Queuing intermediate upload for CPDN/BOINC: cpdnout12.zip Queuing intermediate upload for CPDN/BOINC: cpdnout13.zip Suspended CPDN Monitor - Suspend request from BOINC... Queuing intermediate upload for CPDN/BOINC: cpdnout14.zip Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Queuing intermediate upload for CPDN/BOINC: cpdnout15.zip Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Queuing intermediate upload for CPDN/BOINC: cpdnout16.zip Suspended CPDN Monitor - Suspend request from BOINC... Suspended CPDN Monitor - Suspend request from BOINC... Queuing intermediate upload for CPDN/BOINC: cpdnout17.zip Queuing intermediate upload for CPDN/BOINC: cpdnout18.zip Global Worker:: CPDN process is not running, exiting, bRetVal = T, checkPID = 7496, selfPID = 7496, iMonCtr = 1 Regional Worker:: CPDN process is not running, exiting, bRetVal = T, checkPID = 7496, selfPID = 3268, iMonCtr = 1 </stderr_txt> ]]> |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Well, another four tasks failed at the same time. I infer an unscheduled system update occurred o my Windows box. This was very early Tuesday morning. Perhaps not. Mine are running under Wine so Windows updates have no bearing on them and I have had several fail on starting today. I tried exiting BOINC and restarting. Running tasks all carried on but another two new ones failed. Subsequent ones have started fine. I currently have three completed tasks and four failed ones waiting to report on Monday when Andy sorts out the server issues. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,490,541 RAC: 15,784 |
Yes, this message in your log is typical of the 'fail caused by Windows Update'. It's related to installation of new software if I understand the online posts. Best advice is to shutdown the client while Update is running, reboot, then restart the client. <message> --- CPDN Visiting Scientist |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 31,038,916 RAC: 14,611 |
Just picked up some resends from these three batches and also batches 1024-1027, 19 in total. Some were aborted but lots were timed out on the original machines. Are the results still needed? |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
I am letting mine run. I think the results are wanted or the batches would have been cancelled to prevent resends. If you can be bothered, you can do what I did. - Three of those I got sent were still running on the original hosts but being resent because past the deadline. They finished after I had been sent them and I then cancelled them. I think the rest are unlikely to do so but I will have another check later today and if any finish on other hosts i will abort them too. Edit:this taskfor example you can see that I have leap frogged the original cruncher. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,490,541 RAC: 15,784 |
Just picked up some resends from these three batches and also batches 1024-1027, 19 in total. Some were aborted but lots were timed out on the original machines. Are the results still needed? Yes, absolutely. If results were not needed you would not have got the resends. I am managing the batches and any finished ones are closed promptly so no resends go out. --- CPDN Visiting Scientist |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,891,506 RAC: 18,742 |
It seems like there's been a lot of timed out tasks lately. Not sure if it's normal and I just haven't noticed before. |
©2024 cpdn.org