climateprediction.net home page
CPDN monitor got quit request (resurrected)

CPDN monitor got quit request (resurrected)

Questions and Answers : Unix/Linux : CPDN monitor got quit request (resurrected)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 11564 - Posted: 1 Apr 2005, 23:41:03 UTC
Last modified: 1 Apr 2005, 23:45:09 UTC

This is an old issue and not generally fatal. Nonetheless, one hopes it is on the To Do List to be fixed.

Note the H:M:S time before & after re-start. How did it finesse three minutes into 5+ hours? (This is an issue that I thought WAS fixed.)

[Edit: This is from the second set of 4.12 WU, on 4.19, P4 2.8 HT {running parallel with a Sulfur Model}, SuSE 9.0.]

1v12_300107757 - PH 1 TS 000067 - 02/12/1810 09:30 - H:M:S=0000:02:59 AVG= 2.68 DLT= 0.97
CPDN Monitor got quit request...
Detaching shared memory...
2005-04-01 14:00:30 [climateprediction.net] Result 1v12_300107757_0 exited with zero status but no 'finished' file
2005-04-01 14:00:30 [climateprediction.net] If this happens repeatedly you may need to reset the project.
Starting model in /home/jim/CPDNboinc/projects/climateapps2.oucs.ox.ac.uk_cpdnboinc...
Created shared memory region key = 24775
Env Used=LD_LIBRARY_PATH=/home/jim/CPDNboinc/projects/climateapps2.oucs.ox.ac.uk_cpdnboinc:/usr/local/lib:/usr/lib:/lib
Copying files for startup...
In pre_initialise_phase (part 1 of 3)
In initialise_phase (part 2 of 3)
In startup_phase (part 3 of 3)
2005-04-01 14:00:30 [climateprediction.net] Restarting result 1v12_300107757_0 using hadsm3 version 4.12
Starting model ID 1v12_300107757 Phase 1
Stack size=4096.00 MB
Waiting for model startup, this may take a minute...
1v12_300107757 - PH 1 TS 000001 - 01/12/1810 00:30 - H:M:S=0005:18:30 AVG=19110.14 DLT= 0.00
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 11564 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 12206 - Posted: 1 May 2005, 21:37:38 UTC

Still not fatal, but three of the last five 4.13 Models started this weekend upchucked, all on TS 60. All on P4s, on SuSE 9.0 or 9.1 (the good part is that elapsed time is now reset to zero, so AVG= has meaning):

2m69_200143283 - PH 1 TS 000060 - 02/12/1810 06:00 - H:M:S=0000:02:43 AVG= 2.73 DLT= 0.91
CPDN Monitor got quit request...
Detaching shared memory...
2005-05-01 14:55:52 [climateprediction.net] Result 2m69_200143283_0 exited with zero status but n
o 'finished' file
2005-05-01 14:55:52 [climateprediction.net] If this happens repeatedly you may need to reset the
project.
2005-05-01 14:55:52 [climateprediction.net] Restarting result 2m69_200143283_0 using hadsm3 versi
on 4.13
Starting model in /home/jim/CPDNboinc/projects/climateapps2.oucs.ox.ac.uk_cpdnboinc...
Created shared memory region key = 24485
Env Used=LD_LIBRARY_PATH=/home/jim/CPDNboinc/projects/climateapps2.oucs.ox.ac.uk_cpdnboinc:/usr/l
ocal/lib:/usr/lib:/lib
Copying files for startup...
In pre_initialise_phase (part 1 of 3)
In initialise_phase (part 2 of 3)
In startup_phase (part 3 of 3)
Starting model ID 2m69_200143283 Phase 1
Stack size=4096.00 MB
Waiting for model startup, this may take a minute...
2m69_200143283 - PH 1 TS 000001 - 01/12/1810 00:30 - H:M:S=0000:00:00 AVG= 0.00 DLT= 0.00

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 12206 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 12225 - Posted: 2 May 2005, 22:57:53 UTC

Hmmm, what is it about Model start-ups? This time, the existing Run croaked, 47 TS into the new Run. Zeroed the existing Run's time, so AVG= garbage now. (P4 3.0, SuSE 9.0)


2ft3_300134952 - PH 1 TS 030470 - 05/09/1812 19:00 - H:M:S=0025:08:08 AVG= 2.97 DLT=11.00
CPDN Monitor got quit request...
Detaching shared memory...
2005-05-02 16:05:32 [climateprediction.net] Result 2ft3_300134952_0 exited with zero status but n
o 'finished' file
2005-05-02 16:05:32 [climateprediction.net] If this happens repeatedly you may need to reset the
project.
Starting model in /home/jim/CPDNboincSM/projects/climateapps2.oucs.ox.ac.uk_cpdnboinc...
Created shared memory region key = 25150
Env Used=LD_LIBRARY_PATH=/home/jim/CPDNboincSM/projects/climateapps2.oucs.ox.ac.uk_cpdnboinc:/usr
/local/lib:/usr/lib:/lib
2005-05-02 16:05:32 [climateprediction.net] Restarting result 2ft3_300134952_0 using hadsm3 versi
on 4.13
Starting model ID 2ft3_300134952 Phase 1
Stack size=4096.00 MB
Waiting for model startup, this may take a minute...
2ft3_300134952 - PH 1 TS 030385 - 04/09/1812 00:30 - H:M:S=0000:00:00 AVG= 0.00 DLT= 0.00
2mhf_200143689 - PH 1 TS 000050 - 02/12/1810 01:00 - H:M:S=0000:02:38 AVG= 3.18 DLT=10.37
2mhf_200143689 - PH 1 TS 000051 - 02/12/1810 01:30 - H:M:S=0000:02:40 AVG= 3.15 DLT= 2.00
2mhf_200143689 - PH 1 TS 000052 - 02/12/1810 02:00 - H:M:S=0000:02:41 AVG= 3.11 DLT= 1.00
2mhf_200143689 - PH 1 TS 000053 - 02/12/1810 02:30 - H:M:S=0000:02:42 AVG= 3.07 DLT= 1.00
2mhf_200143689 - PH 1 TS 000054 - 02/12/1810 03:00 - H:M:S=0000:02:44 AVG= 3.05 DLT= 2.00
2mhf_200143689 - PH 1 TS 000055 - 02/12/1810 03:30 - H:M:S=0000:02:45 AVG= 3.01 DLT= 1.00
2ft3_300134952 - PH 1 TS 030386 - 04/09/1812 01:00 - H:M:S=0000:00:12 AVG= 0.00 DLT=12.01
2ft3_300134952 - PH 1 TS 030387 - 04/09/1812 01:30 - H:M:S=0000:00:13 AVG= 0.00 DLT= 1.85

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 12225 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : CPDN monitor got quit request (resurrected)

©2024 cpdn.org