climateprediction.net (CPDN) home page
Thread 'model 4.12 still crashing in phase 1 :-((('

Thread 'model 4.12 still crashing in phase 1 :-((('

Questions and Answers : Unix/Linux : model 4.12 still crashing in phase 1 :-(((
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user49040

Send message
Joined: 31 Jan 05
Posts: 6
Credit: 1,735,538
RAC: 0
Message 11592 - Posted: 3 Apr 2005, 18:16:17 UTC

Last week 4.12 was released which would address some issues relating to the crashing of the models. After resetting all projects on my linux systems (I've got three), they all started with 4.12, downloaded new projects etc...

But now they are still crashing, and again in phase 1 around the same timestep (around 118000). It's getting really frustrating to see them crashing all the time!

Can't you guys just rollback to 4.10?? March has been completely lost for all participants running Linux hosts. First 4.11 crashing, then the site down during easter weekend and now 4.12 also seem crashing.

boinc (4.19) output itself shows:

1n06_000097254 - PH 1 TS 117922 - 27/09/1817 17:00 - H:M:S=0080:03:37 AVG= 2.44 DLT= 0.00
1n06_000097254 - PH 1 TS 117923 - 27/09/1817 17:30 - H:M:S=0080:03:38 AVG= 2.44 DLT= 1.00
Model crashed...retrying...restart level 0
Preparing for restart...
Rewinding a model-day...
Starting model ID 1n06_000097254 Phase 1
Stack size=48.00 MB
Waiting for model startup, this may take a minute...
1n06_000097254 - PH 1 TS 117793 - 25/09/1817 00:30 - H:M:S=0080:03:39 AVG= 2.45 DLT= 0.00
1n06_000097254 - PH 1 TS 117794 - 25/09/1817 01:00 - H:M:S=0080:03:49 AVG= 2.45 DLT= 9.98

and some time later, followed by:

1n06_000097254 - PH 1 TS 117922 - 27/09/1817 17:00 - H:M:S=0080:09:06 AVG= 2.45 DLT= 1.00
Model crashed...retrying...restart level 1
Preparing for restart...
Rewinding a model-month...
Copying restart files for model retry...
Starting model ID 1n06_000097254 Phase 1
Stack size=48.00 MB
Waiting for model startup, this may take a minute...
1n06_000097254 - PH 1 TS 116641 - 01/09/1817 00:30 - H:M:S=0080:09:07 AVG= 2.47 DLT= 0.00

and the latest crash:

1n06_000097254 - PH 1 TS 133626 - 24/08/1818 21:00 - H:M:S=0091:46:10 AVG= 2.47 DLT= 1.00
Model crashed...retrying...restart level 2
Preparing for restart...
Rewinding a model-year...
Copying restart files for model retry...
Starting model ID 1n06_000097254 Phase 1
Stack size=48.00 MB
Waiting for model startup, this may take a minute...
1n06_000097254 - PH 1 TS 120961 - 01/12/1817 00:30 - H:M:S=0091:46:11 AVG= 2.73 DLT= 0.00

Next crash will cause the model to abort and download a new one.

There is not much else information in the logfiles. The yabs.out file doesn't contain the 'negative pressure' message and all stderr logfiles from the model itself are empty.

Any pointers to get this stable??
ID: 11592 · Report as offensive     Reply Quote
ProfileastroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 11594 - Posted: 3 Apr 2005, 18:41:56 UTC

Hi, Koen,

Welcome to the Forum. Pity it isn't under better circumstances.

Your post tells us what we have to look forward to...

Speculation on my part, but it is probably another batch of bad Models. (Though it could be a fault in the compiled version 4.12.) I run Sulfur Alpha on four Linux P4's and one M$ P4. The M$ run is in 4.09 and nearly half way through Phase 5. No problem. The Linux boxes run 4.12 (second release of 4.12 in fact) and they are all crashing, albeit at earlier Time Steps than you experienced.

It doesn't look good for 4.12, either in its initial release or updated release. For either public version or Alpha.

Wish I could be bearer of happier tidings.

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 11594 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 11600 - Posted: 3 Apr 2005, 22:26:27 UTC

Looks like the two results I started last Wednesday both crashed at just over halfway through phase 1, with -251 errors.

<a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?hostid=126514">http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?hostid=126514</a>
ID: 11600 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : model 4.12 still crashing in phase 1 :-(((

©2024 cpdn.org