Message boards : Number crunching : hadam3p_pnw task not making progress
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
I have just (on the advice of the project team) aborted a task from the new batch of WAH PNW tasks (hadam3p_pnw_w1wr_2006_1_009087617_0). After 35 minutes it was still stuck at 0% with no checkpoints made, with less than 1 second CPU time for the worker processes. Going by the contents of the task's datain, dataout and jobs directories it had been set up properly. I restarted BOINC and the task still wasn't showing any significant CPU time after 25 minutes (1.765 seconds for the controller process, 0.203 seconds for the global worker and 0.140 seconds for the regional worker). Andy said this task was from batch 82 which will have task names starting with hadam3p_pnw_w0ny_ through to hadam3p_pnw_w20j_. If anyone experiences the same problem with a task from this batch you should abort it and report the problem here. "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 31,036,409 RAC: 14,604 |
I've just had 10 tasks from PNW series w1?? all crash with computation error within 30 seconds of starting :- 08/10/2014 12:00:30 | climateprediction.net | Computation for task hadam3p_pnw_w1jb_2005_1_009087133_0 finished 08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_1.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent 08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_2.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent 08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_3.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent 08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_4.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent 08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_5.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent 08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_6.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent 08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_7.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent 08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_8.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent 08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_9.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent 08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_10.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent 08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_11.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent 08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_12.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent 08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_13.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent or similar for all 10. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
CHAVK these may be the same problem as described in this thread http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=7908 where an antivirus program is truncating the file or stopping it downloading. |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 31,036,409 RAC: 14,604 |
Both BOINC folders are excluded from my virus checker - Microsoft Security Essentials and I haven't experienced problems before. I have HADCM3 short runuing at the moment and they appear to be OK. I did get a fortran error flagged as well with the PNW runs. |
Send message Joined: 8 Jul 05 Posts: 33 Credit: 1,274,211 RAC: 0 |
hadam3p_pnw_w1j9_2003_1_009087131_0 Put up a big fortran run time error box. Forgot to screenshot it before I quit it. D'oh! |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
hadam3p_pnw_w1i6_2006_1_009087092_1 has also been aborted after failing to make progress. "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
Put me down for a long list of about 40 failures on PNW. In addition to _w0ny_ through to _w20j_, I'm having other failures e.g. _ukd1_ & _uhe6_. In fact all the latest PNW tasks have failed. Looking back, pretty mixed results on PNW when it comes to errors/completed. If this keeps up I'll have to deselect PNW as well as the HadCM3 Short. |
Send message Joined: 19 Dec 05 Posts: 6 Credit: 2,011,429 RAC: 0 |
My two crashed immediately upon completion of big download. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Probably misconfiguration of ancillary files. Programmers have emailed the scientists responsible and asked them to look into it. |
Send message Joined: 26 Apr 14 Posts: 7 Credit: 78,072 RAC: 0 |
Same problem here. I had the big Fortran warning too. I've been battling problems with CPDN the last few days but most of it has been to do with my antivirus (Bitdefender). Generally I've ben unable to download hadam3p_pnw_7.22_windows_intelx86.exe and hadcm3s_7.24_windows_intelx86.exe due to Bitdefender blocking the web access. I thought I got around it finally this arvo by downloading these programs on my Linux machine then thumb driving it over to my windows (whilst authorising some dangerous exceptions for Bitdefender). In the last hour I've finally been able to completely download a model, in this case 3 Pacific North West models, without DOWNLOAD FAILED showing up. But now as soon as I've tried to run these models I get the same probs as the OP 9/10/2014 4:46:44 PM | climateprediction.net | task hadam3p_pnw_w1gw_2009_1_009087046_1 resumed by user 9/10/2014 4:46:45 PM | climateprediction.net | Starting task hadam3p_pnw_w1gw_2009_1_009087046_1 9/10/2014 4:47:17 PM | climateprediction.net | Computation for task hadam3p_pnw_w1gw_2009_1_009087046_1 finished 9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_1.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent 9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_2.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent 9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_3.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent 9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_4.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent 9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_5.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent 9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_6.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent 9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_7.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent 9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_8.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent 9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_9.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent 9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_10.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent 9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_11.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent 9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_12.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent 9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_13.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent 9/10/2014 4:49:12 PM | climateprediction.net | Sending scheduler request: To report completed tasks. 9/10/2014 4:49:12 PM | climateprediction.net | Reporting 1 completed tasks 9/10/2014 4:49:12 PM | climateprediction.net | Not requesting tasks: "no new tasks" requested via Manager 9/10/2014 4:49:24 PM | climateprediction.net | Scheduler request completed 9/10/2014 4:52:01 PM | climateprediction.net | task hadam3p_pnw_sbu3_2011_1_009084605_2 resumed by user 9/10/2014 4:52:02 PM | climateprediction.net | Starting task hadam3p_pnw_sbu3_2011_1_009084605_2 9/10/2014 4:52:21 PM | climateprediction.net | Computation for task hadam3p_pnw_sbu3_2011_1_009084605_2 finished 9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_1.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent 9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_2.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent 9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_3.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent 9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_4.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent 9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_5.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent 9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_6.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent 9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_7.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent 9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_8.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent 9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_9.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent 9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_10.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent 9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_11.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent 9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_12.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent 9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_13.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent I've added exceptions for boinc.exe and also the Program/Data and Program Files/Boinc to Bitdefender (I've also contacted Bitdefender about this, havent heard anything yet). I'm not really sure where to go from here. Help please? |
Send message Joined: 26 Apr 14 Posts: 7 Credit: 78,072 RAC: 0 |
I ran the 3rd model and got the Fortran error again. I've copied it but I dont know how to add it here. Sorry! |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,826,970 RAC: 5,066 |
I ran the 3rd model and got the Fortran error again. I've copied it but I dont know how to add it here. Sorry! ... when you create a new post there is a link to the left - "Use BBCode tags to format your text". Follow that link and it will tell you how to add pictures and perform other formatting or linking tasks. If you want to display a picture you'll have to load the picture onto some public storage first - the message board doesn't store the images itself. |
Send message Joined: 26 Apr 14 Posts: 7 Credit: 78,072 RAC: 0 |
Ah ok, cheers Iain. I wont add the image in that case, but if anyone wants/needs to view it (for diagnosing purposes) I'll happily email it |
Send message Joined: 13 Jan 07 Posts: 195 Credit: 10,581,566 RAC: 0 |
My model w10p failed in the manner described. The FORTRAN error message displayed as this: forrtl: severe(17): syntax error in NAMELIST input, unit 5, file C:\ProgramData\BOINC\projects\climateprediction.net\hadam3p_pnw_w10p_2007_1_009086463\jobs\xaakm.namelists, line 631, position 0 Image PC Routine Line Source hadam3p_pnw_um_7. 0052881A Unknown Unknown Unknown hadam3p_pnw_um_7. 004D4B70 Unknown Unknown Unknown hadam3p_pnw_um_7. 004D3D4A Unknown Unknown Unknown hadam3p_pnw_um_7. 004B343F Unknown Unknown Unknown hadam3p_pnw_um_7. 00237A29 Unknown Unknown Unknown hadam3p_pnw_um_7. 00340DF4 Unknown Unknown Unknown hadam3p_pnw_um_7. 0022A07B Unknown Unknown Unknown hadam3p_pnw_um_7. 0023FEC6 Unknown Unknown Unknown hadam3p_pnw_um_7. 0023FEC6 Unknown Unknown Unknown Stack trace terminated abnormally |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 31,036,409 RAC: 14,604 |
Just had another four PNWs terminate with a comp[utation error and the same fortran error as the previous post. |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,826,970 RAC: 5,066 |
Thankfully two PNW downloaded onto a Mac ended rapidly with "Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH". I say thankfully because HADAM3P_PNW 7.22 and HADAM3P Moses II 7.03 always end with error code 9 on my Mac even after uploading a full set of Zips, which means that the model gets pointlessly re-run by someone else. PNW now excluded for that machine. [Edit: If the same happens for the upcoming AFR application I won't be able to run anything ...] |
Send message Joined: 26 Apr 14 Posts: 7 Credit: 78,072 RAC: 0 |
Iain, Is it possible to specifically exlude models, such as PNW? Or are you just aborting them if they get downloaded? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Selection of model types, is done in the Climatepredection.net preferences section of your account on the project. Be sure to also untick If no work for selected applications is available, accept work from other applications? or else you're still likely to get some. |
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
Like Iain, I'm also getting the REPLANCA error on PNW and now deselected those tasks, but notice that run has finished anyway. Looking around, similar PCs are doing OKish with HadCM3 Short so I'll try those again. Other than that there is not much to do. Let's hope for some better programming in the future. |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 31,036,409 RAC: 14,604 |
Similar replanca errors:- <stderr_txt> Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048 Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3568, selfPID=3568, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3568, selfPID=6216, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 19:09:57 (6216): called boinc_finish </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_pnw_sc4w_2011_1_009084994_2_1.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadam3p_pnw_sc4w_2011_1_009084994_2_2.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> for most of my PNW fails. |
©2024 cpdn.org