climateprediction.net (CPDN) home page
Thread 'hadam3p_pnw task not making progress'

Thread 'hadam3p_pnw task not making progress'

Message boards : Number crunching : hadam3p_pnw task not making progress
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfileThyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 50407 - Posted: 8 Oct 2014, 14:39:05 UTC

I have just (on the advice of the project team) aborted a task from the new batch of WAH PNW tasks (hadam3p_pnw_w1wr_2006_1_009087617_0).

After 35 minutes it was still stuck at 0% with no checkpoints made, with less than 1 second CPU time for the worker processes. Going by the contents of the task's datain, dataout and jobs directories it had been set up properly.

I restarted BOINC and the task still wasn't showing any significant CPU time after 25 minutes (1.765 seconds for the controller process, 0.203 seconds for the global worker and 0.140 seconds for the regional worker).

Andy said this task was from batch 82 which will have task names starting with hadam3p_pnw_w0ny_ through to hadam3p_pnw_w20j_. If anyone experiences the same problem with a task from this batch you should abort it and report the problem here.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 50407 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,036,409
RAC: 14,604
Message 50408 - Posted: 8 Oct 2014, 14:54:24 UTC - in response to Message 50407.  
Last modified: 8 Oct 2014, 14:54:48 UTC

I've just had 10 tasks from PNW series w1?? all crash with computation error within 30 seconds of starting :-

08/10/2014 12:00:30 | climateprediction.net | Computation for task hadam3p_pnw_w1jb_2005_1_009087133_0 finished
08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_1.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent
08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_2.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent
08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_3.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent
08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_4.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent
08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_5.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent
08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_6.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent
08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_7.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent
08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_8.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent
08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_9.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent
08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_10.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent
08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_11.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent
08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_12.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent
08/10/2014 12:00:30 | climateprediction.net | Output file hadam3p_pnw_w1jb_2005_1_009087133_0_13.zip for task hadam3p_pnw_w1jb_2005_1_009087133_0 absent

or similar for all 10.
ID: 50408 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 50409 - Posted: 8 Oct 2014, 15:08:09 UTC

CHAVK these may be the same problem as described in this thread http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=7908 where an antivirus program is truncating the file or stopping it downloading.
ID: 50409 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,036,409
RAC: 14,604
Message 50410 - Posted: 8 Oct 2014, 15:34:40 UTC - in response to Message 50409.  

Both BOINC folders are excluded from my virus checker - Microsoft Security Essentials and I haven't experienced problems before. I have HADCM3 short runuing at the moment and they appear to be OK. I did get a fortran error flagged as well with the PNW runs.
ID: 50410 · Report as offensive     Reply Quote
KWSN Sir Clark

Send message
Joined: 8 Jul 05
Posts: 33
Credit: 1,274,211
RAC: 0
Message 50411 - Posted: 8 Oct 2014, 15:59:56 UTC

hadam3p_pnw_w1j9_2003_1_009087131_0

Put up a big fortran run time error box. Forgot to screenshot it before I quit it. D'oh!
ID: 50411 · Report as offensive     Reply Quote
ProfileThyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 50413 - Posted: 8 Oct 2014, 16:41:30 UTC

hadam3p_pnw_w1i6_2006_1_009087092_1 has also been aborted after failing to make progress.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 50413 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 50422 - Posted: 9 Oct 2014, 2:33:54 UTC - in response to Message 50407.  

Put me down for a long list of about 40 failures on PNW. In addition to _w0ny_ through to _w20j_, I'm having other failures e.g. _ukd1_ & _uhe6_. In fact all the latest PNW tasks have failed.

Looking back, pretty mixed results on PNW when it comes to errors/completed.

If this keeps up I'll have to deselect PNW as well as the HadCM3 Short.
ID: 50422 · Report as offensive     Reply Quote
ProfileTiggus

Send message
Joined: 19 Dec 05
Posts: 6
Credit: 2,011,429
RAC: 0
Message 50435 - Posted: 9 Oct 2014, 8:49:03 UTC - in response to Message 50407.  

My two crashed immediately upon completion of big download.
ID: 50435 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 50438 - Posted: 9 Oct 2014, 9:01:09 UTC

Probably misconfiguration of ancillary files. Programmers have emailed the scientists responsible and asked them to look into it.
ID: 50438 · Report as offensive     Reply Quote
Jami

Send message
Joined: 26 Apr 14
Posts: 7
Credit: 78,072
RAC: 0
Message 50439 - Posted: 9 Oct 2014, 9:04:59 UTC - in response to Message 50435.  

Same problem here. I had the big Fortran warning too.
I've been battling problems with CPDN the last few days but most of it has been to do with my antivirus (Bitdefender).

Generally I've ben unable to download hadam3p_pnw_7.22_windows_intelx86.exe and hadcm3s_7.24_windows_intelx86.exe due to Bitdefender blocking the web access. I thought I got around it finally this arvo by downloading these programs on my Linux machine then thumb driving it over to my windows (whilst authorising some dangerous exceptions for Bitdefender). In the last hour I've finally been able to completely download a model, in this case 3 Pacific North West models, without DOWNLOAD FAILED showing up.

But now as soon as I've tried to run these models I get the same probs as the OP

9/10/2014 4:46:44 PM | climateprediction.net | task hadam3p_pnw_w1gw_2009_1_009087046_1 resumed by user
9/10/2014 4:46:45 PM | climateprediction.net | Starting task hadam3p_pnw_w1gw_2009_1_009087046_1
9/10/2014 4:47:17 PM | climateprediction.net | Computation for task hadam3p_pnw_w1gw_2009_1_009087046_1 finished
9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_1.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent
9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_2.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent
9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_3.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent
9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_4.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent
9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_5.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent
9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_6.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent
9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_7.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent
9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_8.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent
9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_9.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent
9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_10.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent
9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_11.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent
9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_12.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent
9/10/2014 4:47:17 PM | climateprediction.net | Output file hadam3p_pnw_w1gw_2009_1_009087046_1_13.zip for task hadam3p_pnw_w1gw_2009_1_009087046_1 absent
9/10/2014 4:49:12 PM | climateprediction.net | Sending scheduler request: To report completed tasks.
9/10/2014 4:49:12 PM | climateprediction.net | Reporting 1 completed tasks
9/10/2014 4:49:12 PM | climateprediction.net | Not requesting tasks: "no new tasks" requested via Manager
9/10/2014 4:49:24 PM | climateprediction.net | Scheduler request completed
9/10/2014 4:52:01 PM | climateprediction.net | task hadam3p_pnw_sbu3_2011_1_009084605_2 resumed by user
9/10/2014 4:52:02 PM | climateprediction.net | Starting task hadam3p_pnw_sbu3_2011_1_009084605_2
9/10/2014 4:52:21 PM | climateprediction.net | Computation for task hadam3p_pnw_sbu3_2011_1_009084605_2 finished
9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_1.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent
9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_2.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent
9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_3.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent
9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_4.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent
9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_5.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent
9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_6.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent
9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_7.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent
9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_8.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent
9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_9.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent
9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_10.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent
9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_11.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent
9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_12.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent
9/10/2014 4:52:21 PM | climateprediction.net | Output file hadam3p_pnw_sbu3_2011_1_009084605_2_13.zip for task hadam3p_pnw_sbu3_2011_1_009084605_2 absent

I've added exceptions for boinc.exe and also the Program/Data and Program Files/Boinc to Bitdefender (I've also contacted Bitdefender about this, havent heard anything yet).

I'm not really sure where to go from here. Help please?
ID: 50439 · Report as offensive     Reply Quote
Jami

Send message
Joined: 26 Apr 14
Posts: 7
Credit: 78,072
RAC: 0
Message 50440 - Posted: 9 Oct 2014, 9:09:44 UTC - in response to Message 50439.  
Last modified: 9 Oct 2014, 9:10:14 UTC

I ran the 3rd model and got the Fortran error again. I've copied it but I dont know how to add it here. Sorry!
ID: 50440 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,826,970
RAC: 5,066
Message 50441 - Posted: 9 Oct 2014, 9:32:22 UTC - in response to Message 50440.  

I ran the 3rd model and got the Fortran error again. I've copied it but I dont know how to add it here. Sorry!

... when you create a new post there is a link to the left - "Use BBCode tags to format your text". Follow that link and it will tell you how to add pictures and perform other formatting or linking tasks. If you want to display a picture you'll have to load the picture onto some public storage first - the message board doesn't store the images itself.
ID: 50441 · Report as offensive     Reply Quote
Jami

Send message
Joined: 26 Apr 14
Posts: 7
Credit: 78,072
RAC: 0
Message 50442 - Posted: 9 Oct 2014, 9:39:50 UTC - in response to Message 50441.  
Last modified: 9 Oct 2014, 9:40:24 UTC

Ah ok, cheers Iain.
I wont add the image in that case, but if anyone wants/needs to view it (for diagnosing purposes) I'll happily email it
ID: 50442 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 50444 - Posted: 9 Oct 2014, 16:13:10 UTC

My model w10p failed in the manner described. The FORTRAN error message displayed as this:

forrtl: severe(17): syntax error in NAMELIST input, unit 5, file
C:\ProgramData\BOINC\projects\climateprediction.net\hadam3p_pnw_w10p_2007_1_009086463\jobs\xaakm.namelists, line 631, position 0
Image PC Routine Line Source
hadam3p_pnw_um_7. 0052881A Unknown Unknown Unknown
hadam3p_pnw_um_7. 004D4B70 Unknown Unknown Unknown
hadam3p_pnw_um_7. 004D3D4A Unknown Unknown Unknown
hadam3p_pnw_um_7. 004B343F Unknown Unknown Unknown
hadam3p_pnw_um_7. 00237A29 Unknown Unknown Unknown
hadam3p_pnw_um_7. 00340DF4 Unknown Unknown Unknown
hadam3p_pnw_um_7. 0022A07B Unknown Unknown Unknown
hadam3p_pnw_um_7. 0023FEC6 Unknown Unknown Unknown
hadam3p_pnw_um_7. 0023FEC6 Unknown Unknown Unknown

Stack trace terminated abnormally
ID: 50444 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,036,409
RAC: 14,604
Message 50445 - Posted: 9 Oct 2014, 18:14:04 UTC - in response to Message 50444.  

Just had another four PNWs terminate with a comp[utation error and the same fortran error as the previous post.

ID: 50445 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,826,970
RAC: 5,066
Message 50453 - Posted: 10 Oct 2014, 0:06:39 UTC
Last modified: 10 Oct 2014, 0:09:21 UTC

Thankfully two PNW downloaded onto a Mac ended rapidly with "Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH". I say thankfully because HADAM3P_PNW 7.22 and HADAM3P Moses II 7.03 always end with error code 9 on my Mac even after uploading a full set of Zips, which means that the model gets pointlessly re-run by someone else. PNW now excluded for that machine.

[Edit: If the same happens for the upcoming AFR application I won't be able to run anything ...]
ID: 50453 · Report as offensive     Reply Quote
Jami

Send message
Joined: 26 Apr 14
Posts: 7
Credit: 78,072
RAC: 0
Message 50455 - Posted: 10 Oct 2014, 0:40:28 UTC - in response to Message 50453.  

Iain,
Is it possible to specifically exlude models, such as PNW? Or are you just aborting them if they get downloaded?
ID: 50455 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 50456 - Posted: 10 Oct 2014, 5:41:28 UTC - in response to Message 50455.  

Selection of model types, is done in the Climatepredection.net preferences section of your account on the project.
Be sure to also untick If no work for selected applications is available, accept work from other applications? or else you're still likely to get some.

ID: 50456 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 50459 - Posted: 10 Oct 2014, 10:05:37 UTC - in response to Message 50453.  

Like Iain, I'm also getting the REPLANCA error on PNW and now deselected those tasks, but notice that run has finished anyway. Looking around, similar PCs are doing OKish with HadCM3 Short so I'll try those again. Other than that there is not much to do.

Let's hope for some better programming in the future.
ID: 50459 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,036,409
RAC: 14,604
Message 50463 - Posted: 10 Oct 2014, 13:41:10 UTC - in response to Message 50459.  

Similar replanca errors:-

<stderr_txt>

Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xaakm.pipe_dummy 2048
Regional Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3568, selfPID=3568, iMonCtr=2
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=3568, selfPID=6216, iMonCtr=1
Model crash detected, will try to restart...
Leaving CPDN_Main::Monitor...
19:09:57 (6216): called boinc_finish

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>hadam3p_pnw_sc4w_2011_1_009084994_2_1.zip</file_name>
<error_code>-161 (not found)</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>hadam3p_pnw_sc4w_2011_1_009084994_2_2.zip</file_name>
<error_code>-161 (not found)</error_code>
</file_xfer_error>

for most of my PNW fails.
ID: 50463 · Report as offensive     Reply Quote

Message boards : Number crunching : hadam3p_pnw task not making progress

©2024 cpdn.org