climateprediction.net (CPDN) home page
Thread 'hadam3p_eu crash 45 seconds in.'

Thread 'hadam3p_eu crash 45 seconds in.'

Message boards : Number crunching : hadam3p_eu crash 45 seconds in.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42578 - Posted: 5 Jul 2011, 11:41:44 UTC - in response to Message 42573.  

metalius

The "zip file missing" is not an error message.
It's just BOINC saying that it can't find the zip files to upload them.
Which is obvious, because the model didn't run long enough to produce the files.


Backups: Here
ID: 42578 · Report as offensive     Reply Quote
ProfileByron Leigh Hatch @ team Carl ...
Avatar

Send message
Joined: 17 Aug 04
Posts: 289
Credit: 44,103,664
RAC: 0
Message 42581 - Posted: 5 Jul 2011, 13:44:58 UTC - in response to Message 42566.  


Les Bayliss wrote:
<quote>
Please provide a link to them, so that I can have a look at the batch numbers.
I leave a message for the project people for when they're out of bed and at work.
For anyone else who wants to provide details, it's preferable to use links such as Byron and Darmok,
as it saves having to go to each linked model to see the name.
</quote>

Hi Les,

hadam3p_eu_2j0r_1960_1_007340205
hadam3p_eu_2k1o_1990_1_007340639
hadam3p_eu_2lhm_1980_1_007341230
hadam3p_eu_2k3r_1960_1_007340663
hadam3p_eu_vc6a_1996_1_007339245
hadam3p_eu_vb11_1999_1_007339567
hadam3p_eu_vab2_1976_1_007339526
hadam3p_eu_var9_1982_1_007338985
hadam3p_eu_2qef_1990_1_007341600
hadam3p_eu_va51_1999_1_007339312
hadam3p_eu_v9uh_2002_1_007338996
hadam3p_eu_vbww_1993_1_007337686
hadam3p_eu_vbwh_1978_1_007337685
hadam3p_eu_vbwa_1971_1_007337684
hadam3p_eu_vbvn_1996_1_007337682

Byron
ID: 42581 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 42584 - Posted: 6 Jul 2011, 0:08:33 UTC - in response to Message 42547.  

Oh, man, I don't envy you --

data-dependent problems in a multi-year tested code-base of (N-mega-LOC) with the BOINC infrastructure to deal with too? And maybe the compiler?
I totally apologize for my earlier grumpiness, but hey, maybe some procedural checks might catch these kinds of things sooner?

Totally supportive --

Eric


We have been investigating the problem with the Hadam3p work units. It appears that the crash is caused by the combination of two perfectly normal forcing files.

The SST and SI files were altered in the previous suspect run. If either of these files is substituted for the previous version, the model runs perfectly well. The crash only occurs when both files are specified as inputs to the same work unit.

We are conducting tests on the Met Office UM in order to try to find out why this should be the case.

In the mean time, the current release of Hadam3p work units are resubmission jobs of proven work units. These should be fully functional since we are extending the duration of previous experimental runs.

Jonathan


ID: 42584 · Report as offensive     Reply Quote
Jord
Avatar

Send message
Joined: 5 Aug 04
Posts: 250
Credit: 93,274
RAC: 0
Message 42594 - Posted: 9 Jul 2011, 16:24:02 UTC
Last modified: 9 Jul 2011, 16:25:27 UTC

My http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=13135246 crashed within 11 seconds of running.

Model crashed: INITDUMP: Wrong no of atmos prognostic fields tmp/xaakm.pipe_dummy 2048
Leaving CPDN_Main::Monitor...
Called boinc_finish

Edit: I see that all the spaces go missing, but believe me, there's plenty of them in there in the actual error. :P

Am now waiting the hour of back-off time for the system to try again. ;-)
Jord.
ID: 42594 · Report as offensive     Reply Quote
Profile[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 21 Oct 10
Posts: 53
Credit: 2,101,753
RAC: 3,985
Message 42595 - Posted: 9 Jul 2011, 20:17:38 UTC
Last modified: 9 Jul 2011, 20:19:10 UTC

Hi,

juste wanting to let u know this :

1008 climateprediction.net 09/07/2011 21:17:00 Starting task hadcm3n_yjmu_1900_40_007358304_1 using hadcm3n version 607
1018 climateprediction.net 09/07/2011 21:17:23 Computation for task hadcm3n_yjmu_1900_40_007358304_1 finished
1019 climateprediction.net 09/07/2011 21:17:23 Output file hadcm3n_yjmu_1900_40_007358304_1_1.zip for task hadcm3n_yjmu_1900_40_007358304_1 absent
1020 climateprediction.net 09/07/2011 21:17:23 Output file hadcm3n_yjmu_1900_40_007358304_1_2.zip for task hadcm3n_yjmu_1900_40_007358304_1 absent
1021 climateprediction.net 09/07/2011 21:17:23 Output file hadcm3n_yjmu_1900_40_007358304_1_3.zip for task hadcm3n_yjmu_1900_40_007358304_1 absent
1022 climateprediction.net 09/07/2011 21:17:23 Output file hadcm3n_yjmu_1900_40_007358304_1_4.zip for task hadcm3n_yjmu_1900_40_007358304_1 absent
1024 climateprediction.net 09/07/2011 21:18:52 Sending scheduler request: To fetch work.
1025 climateprediction.net 09/07/2011 21:18:52 Reporting 1 completed tasks, requesting new tasks for CPU
1026 climateprediction.net 09/07/2011 21:18:55 Scheduler request completed: got 1 new tasks
1027 climateprediction.net 09/07/2011 21:18:57 Started download of hadcm3n_yi6h_1900_40_007356419.zip
1028 climateprediction.net 09/07/2011 21:19:00 Finished download of hadcm3n_yi6h_1900_40_007356419.zip
1030 climateprediction.net 09/07/2011 21:20:36 Starting task hadcm3n_yi6h_1900_40_007356419_2 using hadcm3n version 607
1040 climateprediction.net 09/07/2011 21:20:57 Computation for task hadcm3n_yi6h_1900_40_007356419_2 finished
1041 climateprediction.net 09/07/2011 21:20:57 Output file hadcm3n_yi6h_1900_40_007356419_2_1.zip for task hadcm3n_yi6h_1900_40_007356419_2 absent
1042 climateprediction.net 09/07/2011 21:20:57 Output file hadcm3n_yi6h_1900_40_007356419_2_2.zip for task hadcm3n_yi6h_1900_40_007356419_2 absent
1043 climateprediction.net 09/07/2011 21:20:57 Output file hadcm3n_yi6h_1900_40_007356419_2_3.zip for task hadcm3n_yi6h_1900_40_007356419_2 absent
1044 climateprediction.net 09/07/2011 21:20:57 Output file hadcm3n_yi6h_1900_40_007356419_2_4.zip for task hadcm3n_yi6h_1900_40_007356419_2 absent
1101 climateprediction.net 09/07/2011 22:07:55 update requested by user
1102 climateprediction.net 09/07/2011 22:07:58 Sending scheduler request: Requested by user.
1103 climateprediction.net 09/07/2011 22:07:58 Reporting 1 completed tasks, requesting new tasks for CPU
1104 climateprediction.net 09/07/2011 22:08:01 Scheduler request completed: got 0 new tasks
1105 climateprediction.net 09/07/2011 22:08:01 Not sending work - last request too recent: 2946 sec

I hadn't been monitoring but I now realize they have been failing for the last 3 days, maybe (cannot sort the WU order in the result list by date, and a bit tired by now to make a in depth analysis).

They were all "UK Met Office Coupled Model Full Resolution Ocean v6.07" and "UK Met Office HADAM3P European Region v6.09" on Mac OS X 10.6.8.

I don't know if it's the same issue than the one you are discussing above, too much text to read for my headache right now...
ID: 42595 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42596 - Posted: 9 Jul 2011, 20:44:46 UTC - in response to Message 42595.  

Jerome

I suspect from the error message on the model's page, that you have a different problem.

Please read this post at the top of the Macintosh section.


Backups: Here
ID: 42596 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 28 Nov 06
Posts: 89
Credit: 11,985,507
RAC: 2,216
Message 42608 - Posted: 11 Jul 2011, 5:54:24 UTC - in response to Message 42578.  

metalius
The "zip file missing" is not an error message.

Which message then? Mistake, lapse, inaccuracy, solecism, fault, fluff, impropriety... Or bug in task? Or task's death?
Less, I don't speak English - just ignore all bugs :-) in my replies.
I promise, I will not abuse the forum with my wry (skew, hooked, lopsided, curved, awry...) replies in the future... :-)

ID: 42608 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42609 - Posted: 11 Jul 2011, 8:48:42 UTC - in response to Message 42608.  

metalius

Error messages are stored in Stderr on the page for a model.
Click the + button to expand it for reading.

In this case, the error message is:
Model crashed: INITDUMP: Wrong no of atmos prognostic fields
tmp/xaakm.pipe_dummy 2048


This must have been one of the series that had an incorrect auxilliary data file.
I haven't been paying much attention to the regional models.



Backups: Here
ID: 42609 · Report as offensive     Reply Quote
Tom_unoduetre

Send message
Joined: 27 Aug 04
Posts: 5
Credit: 40,886
RAC: 0
Message 42610 - Posted: 11 Jul 2011, 12:16:12 UTC

Another errord out WU: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=13130587

Which was probably a leftover from the aborted batch I think.
ID: 42610 · Report as offensive     Reply Quote
Profile[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 21 Oct 10
Posts: 53
Credit: 2,101,753
RAC: 3,985
Message 42618 - Posted: 15 Jul 2011, 9:30:14 UTC - in response to Message 42596.  

Jerome

I suspect from the error message on the model's page, that you have a different problem.

Please read this post at the top of the Macintosh section.



I hadn't seen that answer (forum notification doesn't seem to work much), I've not been monitoring CPDN closely but I suspect that all the other WUs have been in error since then...

The other thread suggests to detach / reattach, I'll do it when I'm home...
ID: 42618 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 28 Nov 06
Posts: 89
Credit: 11,985,507
RAC: 2,216
Message 42640 - Posted: 18 Jul 2011, 9:57:18 UTC - in response to Message 42609.  

metalius
Error messages are stored in Stderr on the page for a model.

Ok, Les! :-) Thank You.
ID: 42640 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3

Message boards : Number crunching : hadam3p_eu crash 45 seconds in.

©2024 cpdn.org