Message boards : Number crunching : Several WU's fail after a few seconds - see log below
Message board moderation
Author | Message |
---|---|
Send message Joined: 29 Mar 14 Posts: 20 Credit: 1,281,898 RAC: 0 |
9/10/2014 2:07:37 AM | climateprediction.net | task hadam3p_pnw_sbg2_2011_1_009084100_0 suspended by user 9/10/2014 2:07:37 AM | climateprediction.net | task hadam3p_pnw_sbop_2011_1_009084411_0 suspended by user 9/10/2014 2:07:37 AM | climateprediction.net | task hadam3p_pnw_sbg3_2011_1_009084101_0 suspended by user 9/10/2014 2:07:37 AM | climateprediction.net | task hadam3p_pnw_sbox_2011_1_009084419_0 suspended by user 9/10/2014 2:07:37 AM | World Community Grid | task MCM1_0008064_9989_1 suspended by user 9/10/2014 2:07:37 AM | World Community Grid | task MCM1_0008064_9853_0 suspended by user 9/10/2014 2:07:57 AM | climateprediction.net | task hadcm3s_2wrx_2003_2_009071888_1 resumed by user 9/10/2014 2:07:58 AM | climateprediction.net | Starting task hadcm3s_2wrx_2003_2_009071888_1 9/10/2014 2:08:10 AM | climateprediction.net | Task hadcm3s_2wrx_2003_2_009071888_1 exited with zero status but no 'finished' file 9/10/2014 2:08:10 AM | climateprediction.net | If this happens repeatedly you may need to reset the project. 9/10/2014 2:08:17 AM | climateprediction.net | task hadcm3s_2wrx_2003_2_009071888_1 suspended by user 9/10/2014 2:08:21 AM | climateprediction.net | Task hadcm3s_2wrx_2003_2_009071888_1 exited with zero status but no 'finished' file 9/10/2014 2:08:21 AM | climateprediction.net | If this happens repeatedly you may need to reset the project. 9/10/2014 2:08:30 AM | climateprediction.net | task hadam3p_pnw_sbox_2011_1_009084419_0 resumed by user 9/10/2014 2:08:31 AM | climateprediction.net | Starting task hadam3p_pnw_sbox_2011_1_009084419_0 9/10/2014 2:08:46 AM | climateprediction.net | Task hadam3p_pnw_sbox_2011_1_009084419_0 exited with zero status but no 'finished' file 9/10/2014 2:08:46 AM | climateprediction.net | If this happens repeatedly you may need to reset the project. 9/10/2014 2:08:57 AM | climateprediction.net | Computation for task hadam3p_pnw_sbox_2011_1_009084419_0 finished 9/10/2014 2:08:57 AM | climateprediction.net | Output file hadam3p_pnw_sbox_2011_1_009084419_0_1.zip for task hadam3p_pnw_sbox_2011_1_009084419_0 absent 9/10/2014 2:08:57 AM | climateprediction.net | Output file hadam3p_pnw_sbox_2011_1_009084419_0_2.zip for task hadam3p_pnw_sbox_2011_1_009084419_0 absent 9/10/2014 2:08:57 AM | climateprediction.net | Output file hadam3p_pnw_sbox_2011_1_009084419_0_3.zip for task hadam3p_pnw_sbox_2011_1_009084419_0 absent 9/10/2014 2:08:57 AM | climateprediction.net | Output file hadam3p_pnw_sbox_2011_1_009084419_0_4.zip for task hadam3p_pnw_sbox_2011_1_009084419_0 absent 9/10/2014 2:08:57 AM | climateprediction.net | Output file hadam3p_pnw_sbox_2011_1_009084419_0_5.zip for task hadam3p_pnw_sbox_2011_1_009084419_0 absent 9/10/2014 2:08:57 AM | climateprediction.net | Output file hadam3p_pnw_sbox_2011_1_009084419_0_6.zip for task hadam3p_pnw_sbox_2011_1_009084419_0 absent 9/10/2014 2:08:57 AM | climateprediction.net | Output file hadam3p_pnw_sbox_2011_1_009084419_0_7.zip for task hadam3p_pnw_sbox_2011_1_009084419_0 absent 9/10/2014 2:08:57 AM | climateprediction.net | Output file hadam3p_pnw_sbox_2011_1_009084419_0_8.zip for task hadam3p_pnw_sbox_2011_1_009084419_0 absent 9/10/2014 2:08:57 AM | climateprediction.net | Output file hadam3p_pnw_sbox_2011_1_009084419_0_9.zip for task hadam3p_pnw_sbox_2011_1_009084419_0 absent 9/10/2014 2:08:57 AM | climateprediction.net | Output file hadam3p_pnw_sbox_2011_1_009084419_0_10.zip for task hadam3p_pnw_sbox_2011_1_009084419_0 absent 9/10/2014 2:08:57 AM | climateprediction.net | Output file hadam3p_pnw_sbox_2011_1_009084419_0_11.zip for task hadam3p_pnw_sbox_2011_1_009084419_0 absent 9/10/2014 2:08:57 AM | climateprediction.net | Output file hadam3p_pnw_sbox_2011_1_009084419_0_12.zip for task hadam3p_pnw_sbox_2011_1_009084419_0 absent 9/10/2014 2:08:57 AM | climateprediction.net | Output file hadam3p_pnw_sbox_2011_1_009084419_0_13.zip for task hadam3p_pnw_sbox_2011_1_009084419_0 absent 9/10/2014 2:09:01 AM | climateprediction.net | task hadam3p_pnw_sbox_2011_1_009084419_0 suspended by user |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Mike That's not the error log, just a BOINC log. The real error log is called Stderr, and is found on the page for each individual model. Click the + (plus) to expand the list. What you posted is: 1) A list of the files that BOINC couldn't find when the model crashed. And that's because the model didn't run for long enough to reach a zip creation point. 2) An indication that you're still using the default setting for one of the options. OK for other projects, often fatal here. Climate models do NOT like being constantly interrupted. As has been said zillions of times. Finally, please read the thread here about the problems with the latest batch of PNW models, especially the first post in the thread. |
Send message Joined: 29 Mar 14 Posts: 20 Credit: 1,281,898 RAC: 0 |
Is there more information useful to you in the Error Log you mentioned in your last post? In any event the following stuff from Event Log is clearly not good. I've done 'reset project' many times - it makes no difference to the WU processing. I've had this type of WU failure for the past 2 weeks at least. What should I do now Les Mike 9/10/2014 12:07:24 PM | climateprediction.net | Task hadcm3s_32ih_2003_2_009079324_1 exited with zero status but no 'finished' file 9/10/2014 12:07:24 PM | climateprediction.net | If this happens repeatedly you may need to reset the project. 9/10/2014 12:07:36 PM | climateprediction.net | Task hadcm3s_32ih_2003_2_009079324_1 exited with zero status but no 'finished' file 9/10/2014 12:07:36 PM | climateprediction.net | If this happens repeatedly you may need to reset the project. 9/10/2014 12:07:48 PM | climateprediction.net | Task hadcm3s_32ih_2003_2_009079324_1 exited with zero status but no 'finished' file 9/10/2014 12:07:48 PM | climateprediction.net | If this happens repeatedly you may need to reset the project. 9/10/2014 12:07:59 PM | climateprediction.net | Task hadcm3s_32ih_2003_2_009079324_1 exited with zero status but no 'finished' file 9/10/2014 12:07:59 PM | climateprediction.net | If this happens repeatedly you may need to reset the project. 9/10/2014 12:08:11 PM | climateprediction.net | Task hadcm3s_32ih_2003_2_009079324_1 exited with zero status but no 'finished' file 9/10/2014 12:08:11 PM | climateprediction.net | If this happens repeatedly you may need to reset the project. 9/10/2014 12:08:22 PM | climateprediction.net | Task hadcm3s_32ih_2003_2_009079324_1 exited with zero status but no 'finished' file 9/10/2014 12:08:22 PM | climateprediction.net | If this happens repeatedly you may need to reset the project. 9/10/2014 12:08:34 PM | climateprediction.net | Task hadcm3s_32ih_2003_2_009079324_1 exited with zero status but no 'finished' file 9/10/2014 12:08:34 PM | climateprediction.net | If this happens repeatedly you may need to reset the project. 9/10/2014 12:08:45 PM | climateprediction.net | Task hadcm3s_32ih_2003_2_009079324_1 exited with zero status but no 'finished' file 9/10/2014 12:08:45 PM | climateprediction.net | If this happens repeatedly you may need to reset the project. 9/10/2014 12:08:56 PM | climateprediction.net | Task hadcm3s_32ih_2003_2_009079324_1 exited with zero status but no 'finished' file 9/10/2014 12:08:56 PM | climateprediction.net | If this happens repeatedly you may need to reset the project. 9/10/2014 12:09:08 PM | climateprediction.net | Task hadcm3s_32ih_2003_2_009079324_1 exited with zero status but no 'finished' file 9/10/2014 12:09:08 PM | climateprediction.net | If this happens repeatedly you may need to reset the project. 9/10/2014 12:09:19 PM | climateprediction.net | Task hadcm3s_32ih_2003_2_009079324_1 exited with zero status but no 'finished' file 9/10/2014 12:09:19 PM | climateprediction.net | If this happens repeatedly you may need to reset the project. 9/10/2014 12:09:31 PM | climateprediction.net | Task hadcm3s_32ih_2003_2_009079324_1 exited with zero status but no 'finished' file 9/10/2014 12:09:31 PM | climateprediction.net | If this happens repeatedly you may need to reset the project. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
You could try changing the setting for Suspend work if CPU usage is above to zero. But also, leaving the models alone may work. That message doesn't indicate a fatal error is imminent, whereas a project reset is ALWAYS fatal for running models, because that's what it's supposed to do. And we can see the stderr list for your models, so there's no need to post them. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Another thing to check: Leave tasks in memory while suspended? This is best set for Yes for this project. |
Send message Joined: 29 Mar 14 Posts: 20 Credit: 1,281,898 RAC: 0 |
Where do I find "suspend work if cpu usage is above" |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
<tools> <computing preferences> <processor usage> |
Send message Joined: 29 Mar 14 Posts: 20 Credit: 1,281,898 RAC: 0 |
I must have an old version of BOINC - this option not in Tools / Computing Preferences? |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
It has been there for a long time just realised that the format is slightly different. under the computing allowed section underneath only after computer idle for Mins. You will see, "While computer usage is less than .... percent (0 means no restriction.) Setting this to 0 for some people reduces the incidence of the messages you posted. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The options are also in the Computing Preferences section on your Account page on the project's server. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,718,239 RAC: 8,054 |
You have to use either the web-based preferences set through this web site, or local preferences set locally via BOINC Manager. You can't mix the two. If you set even one value locally via BOINC Manager, the whole set is 'frozen in' and web-based preferences are ignored from that point on. |
Send message Joined: 29 Mar 14 Posts: 20 Credit: 1,281,898 RAC: 0 |
You said "That message doesn't indicate a fatal error is imminent, whereas a project reset is ALWAYS fatal for running models, because that's what it's supposed to do." Les, Does this mean that I should not have done a 'Project Reset"? (I did it after 'suspending' the failing WU). Is there something else I should do now that I have done a "Project Reset" e.g. reload the Climate Change programs? Mike |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Probably not an issue. You said that models were failing anyway. When you get the occasional message of the type 9/10/2014 12:07:24 PM | climateprediction.net | Task hadcm3s_32ih_2003_2_009079324_1 exited with zero status but no 'finished' file it isn't a problem but a long stream of them almost inevitably means the task ends up failing. Main thing to do is check the settings in computing preferences and then see if that resolves the issue. Other than making the settings more cpdn friendly, I don't think there is much else you could have done. |
©2024 cpdn.org