climateprediction.net home page
hadsm3 v4.13 loses CPU-time after a model crash

hadsm3 v4.13 loses CPU-time after a model crash

Questions and Answers : Unix/Linux : hadsm3 v4.13 loses CPU-time after a model crash
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user21969

Send message
Joined: 29 Sep 04
Posts: 3
Credit: 136,448
RAC: 0
Message 12121 - Posted: 27 Apr 2005, 23:18:21 UTC

Here is a excerpt from the log:
29lj_300126824 - PH 1 TS 098104 - 04/08/1816 20:00 - H:M:S=0238:41:46 AVG= 8.76 DLT= 3.95
29lj_300126824 - PH 1 TS 098105 - 04/08/1816 20:30 - H:M:S=0238:41:49 AVG= 8.76 DLT= 2.96
29lj_300126824 - PH 1 TS 098106 - 04/08/1816 21:00 - H:M:S=0238:41:53 AVG= 8.76 DLT= 3.96
29lj_300126824 - PH 1 TS 098107 - 04/08/1816 21:30 - H:M:S=0238:41:57 AVG= 8.76 DLT= 3.95
29lj_300126824 - PH 1 TS 098108 - 04/08/1816 22:00 - H:M:S=0238:42:31 AVG= 8.76 DLT=33.48
Model crashed...retrying...restart level 0
Preparing for restart...
Rewinding a model-day...
Starting model ID 29lj_300126824 Phase 1
Stack size=48.00 MB
Waiting for model startup, this may take a minute...
29lj_300126824 - PH 1 TS 098108 - 04/08/1816 22:00 - H:M:S=0238:42:31 AVG= 8.76 DLT= 0.00
29lj_300126824 - PH 1 TS 098109 - 04/08/1816 22:30 - H:M:S=0124:36:33 AVG= 4.57 DLT=-410757.89
29lj_300126824 - PH 1 TS 098110 - 04/08/1816 23:00 - H:M:S=0124:36:36 AVG= 4.57 DLT= 2.99
29lj_300126824 - PH 1 TS 098111 - 04/08/1816 23:30 - H:M:S=0124:36:40 AVG= 4.57 DLT= 3.99
29lj_300126824 - PH 1 TS 098112 - 05/08/1816 00:00 - H:M:S=0124:36:43 AVG= 4.57 DLT= 2.98
29lj_300126824 - PH 1 TS 098113 - 05/08/1816 00:30 - H:M:S=0124:36:47 AVG= 4.57 DLT= 3.97

The 124 hours is approximately the CPU-time used by CPDN before the last restart of the Boinc client.
ID: 12121 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2185
Credit: 64,822,615
RAC: 5,275
Message 12128 - Posted: 28 Apr 2005, 21:24:14 UTC

I saw something similar, but I didn't see any crashes. I killed the running models with a Ctrl-C in the terminal window, rebooted, and then started it back up. One went back to its correct 2.54 sec/TS while the other dropped from 2.77 to 0.98 sec/TS.

The 194 hours the 0.98 sec/TS model thinks it is at, was also the number of hours since the last BOINC restart. I hadn't seen this problem with older versions of hadsm.
ID: 12128 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : hadsm3 v4.13 loses CPU-time after a model crash

©2024 cpdn.org