Questions and Answers :
Unix/Linux :
model 4.12 still crashing in phase 1 :-(((
Message board moderation
Author | Message |
---|---|
Send message Joined: 31 Jan 05 Posts: 6 Credit: 1,735,538 RAC: 0 |
Last week 4.12 was released which would address some issues relating to the crashing of the models. After resetting all projects on my linux systems (I've got three), they all started with 4.12, downloaded new projects etc... But now they are still crashing, and again in phase 1 around the same timestep (around 118000). It's getting really frustrating to see them crashing all the time! Can't you guys just rollback to 4.10?? March has been completely lost for all participants running Linux hosts. First 4.11 crashing, then the site down during easter weekend and now 4.12 also seem crashing. boinc (4.19) output itself shows: 1n06_000097254 - PH 1 TS 117922 - 27/09/1817 17:00 - H:M:S=0080:03:37 AVG= 2.44 DLT= 0.00 1n06_000097254 - PH 1 TS 117923 - 27/09/1817 17:30 - H:M:S=0080:03:38 AVG= 2.44 DLT= 1.00 Model crashed...retrying...restart level 0 Preparing for restart... Rewinding a model-day... Starting model ID 1n06_000097254 Phase 1 Stack size=48.00 MB Waiting for model startup, this may take a minute... 1n06_000097254 - PH 1 TS 117793 - 25/09/1817 00:30 - H:M:S=0080:03:39 AVG= 2.45 DLT= 0.00 1n06_000097254 - PH 1 TS 117794 - 25/09/1817 01:00 - H:M:S=0080:03:49 AVG= 2.45 DLT= 9.98 and some time later, followed by: 1n06_000097254 - PH 1 TS 117922 - 27/09/1817 17:00 - H:M:S=0080:09:06 AVG= 2.45 DLT= 1.00 Model crashed...retrying...restart level 1 Preparing for restart... Rewinding a model-month... Copying restart files for model retry... Starting model ID 1n06_000097254 Phase 1 Stack size=48.00 MB Waiting for model startup, this may take a minute... 1n06_000097254 - PH 1 TS 116641 - 01/09/1817 00:30 - H:M:S=0080:09:07 AVG= 2.47 DLT= 0.00 and the latest crash: 1n06_000097254 - PH 1 TS 133626 - 24/08/1818 21:00 - H:M:S=0091:46:10 AVG= 2.47 DLT= 1.00 Model crashed...retrying...restart level 2 Preparing for restart... Rewinding a model-year... Copying restart files for model retry... Starting model ID 1n06_000097254 Phase 1 Stack size=48.00 MB Waiting for model startup, this may take a minute... 1n06_000097254 - PH 1 TS 120961 - 01/12/1817 00:30 - H:M:S=0091:46:11 AVG= 2.73 DLT= 0.00 Next crash will cause the model to abort and download a new one. There is not much else information in the logfiles. The yabs.out file doesn't contain the 'negative pressure' message and all stderr logfiles from the model itself are empty. Any pointers to get this stable?? |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Hi, Koen, Welcome to the Forum. Pity it isn't under better circumstances. Your post tells us what we have to look forward to... Speculation on my part, but it is probably another batch of bad Models. (Though it could be a fault in the compiled version 4.12.) I run Sulfur Alpha on four Linux P4's and one M$ P4. The M$ run is in 4.09 and nearly half way through Phase 5. No problem. The Linux boxes run 4.12 (second release of 4.12 in fact) and they are all crashing, albeit at earlier Time Steps than you experienced. It doesn't look good for 4.12, either in its initial release or updated release. For either public version or Alpha. Wish I could be bearer of happier tidings. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 7 Aug 04 Posts: 2183 Credit: 64,822,615 RAC: 5,275 |
Looks like the two results I started last Wednesday both crashed at just over halfway through phase 1, with -251 errors. <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?hostid=126514">http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?hostid=126514</a> |
©2024 cpdn.org