climateprediction.net home page
Comm outage, CPDN (boinc?) upchucked

Comm outage, CPDN (boinc?) upchucked

Questions and Answers : Unix/Linux : Comm outage, CPDN (boinc?) upchucked
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 4175 - Posted: 16 Sep 2004, 1:10:25 UTC

Local telco had a regional DSL outage this afternoon -- when Bbox attempted to upload a Trickle. -113 errors.

P4 3.0, SuSE Linux 9.0 (\'twas top-ranked box until a slower machine running slower Models roared past it today[!]). For whatever reason, CPDN got a shutdown request. Tried several times.

(On re-start, each Model processed ~ a dozen TS and then turned belly-up again.)

When the DSL problem cleared, everything continued normally (except for redundant Trickles) when boinc re-started. (Two times Phase 3 continue processing -- Phew!)

I see a significant problem if the boinc-CPDN interface can\'t distinguish between comm problems and processing problems (if such be the case). The response SHOULD be, \"Can\'t send a Trickle? No biggee -- keep on processing\"; \'twas ever thus in Classic CPDN, eh?

In the trauma/recovery process, the Trickle was uploaded three times. Fortunately, Carl counts only one....


ID: 4175 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 4331 - Posted: 18 Sep 2004, 18:50:25 UTC

Three more occurrences yesterday (thanks to four DSL outages), one on each of my three Linux boxes: A design flaw, IMO, for Trickle-related comm issues. The Models should be able to continue processing and allow Trickles to pile up.

2004-09-17 19:12:30 [climateprediction.net] Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi
035h_400029070 - PH 2 TS 248937 - 27/04/1840 04:30 - H:M:S=0417:03:18 AVG= 2.95 DLT= 0.95
035h_400029070 - PH 2 TS 248938 - 27/04/1840 05:00 - H:M:S=0417:03:19 AVG= 2.95 DLT= 0.96
035h_400029070 - PH 2 TS 248939 - 27/04/1840 05:30 - H:M:S=0417:03:21 AVG= 2.95 DLT= 1.95
035h_400029070 - PH 2 TS 248940 - 27/04/1840 06:00 - H:M:S=0417:03:22 AVG= 2.95 DLT= 0.95
035h_400029070 - PH 2 TS 248941 - 27/04/1840 06:30 - H:M:S=0417:03:23 AVG= 2.95 DLT= 0.95
035i_400029071 - PH 3 TS 004748 - 09/03/2051 22:00 - H:M:S=0413:45:44 AVG= 2.85 DLT= 9.93
035i_400029071 - PH 3 TS 004749 - 09/03/2051 22:30 - H:M:S=0413:45:45 AVG= 2.85 DLT= 1.00
035i_400029071 - PH 3 TS 004750 - 09/03/2051 23:00 - H:M:S=0413:45:47 AVG= 2.85 DLT= 2.00
035i_400029071 - PH 3 TS 004751 - 09/03/2051 23:30 - H:M:S=0413:45:48 AVG= 2.85 DLT= 1.00
035i_400029071 - PH 3 TS 004752 - 10/03/2051 00:00 - H:M:S=0413:45:49 AVG= 2.85 DLT= 1.00
035i_400029071 - PH 3 TS 004753 - 10/03/2051 00:30 - H:M:S=0413:45:51 AVG= 2.85 DLT= 2.25
035h_400029070 - PH 2 TS 248942 - 27/04/1840 07:00 - H:M:S=0417:03:34 AVG= 2.95 DLT=11.59
035h_400029070 - PH 2 TS 248943 - 27/04/1840 07:30 - H:M:S=0417:03:35 AVG= 2.95 DLT= 0.95
035h_400029070 - PH 2 TS 248944 - 27/04/1840 08:00 - H:M:S=0417:03:36 AVG= 2.95 DLT= 0.96
035h_400029070 - PH 2 TS 248945 - 27/04/1840 08:30 - H:M:S=0417:03:38 AVG= 2.95 DLT= 1.95
035h_400029070 - PH 2 TS 248946 - 27/04/1840 09:00 - H:M:S=0417:03:39 AVG= 2.95 DLT= 0.95
035h_400029070 - PH 2 TS 248947 - 27/04/1840 09:30 - H:M:S=0417:03:41 AVG= 2.95 DLT= 1.90
035i_400029071 - PH 3 TS 004754 - 10/03/2051 01:00 - H:M:S=0413:46:01 AVG= 2.85 DLT= 9.62
035i_400029071 - PH 3 TS 004755 - 10/03/2051 01:30 - H:M:S=0413:46:03 AVG= 2.85 DLT= 2.00
035i_400029071 - PH 3 TS 004756 - 10/03/2051 02:00 - H:M:S=0413:46:04 AVG= 2.85 DLT= 1.00
035i_400029071 - PH 3 TS 004757 - 10/03/2051 02:30 - H:M:S=0413:46:05 AVG= 2.85 DLT= 1.00
CPDN Monitor got quit request...
Detaching shared memory...
035i_400029071 - PH 3 TS 004759 - 10/03/2051 03:30 - H:M:S=0413:46:07 AVG= 2.85 DLT= 1.00
CPDN Monitor got quit request...
Detaching shared memory...
2004-09-17 19:13:10 [---] Can't resolve hostname climateapps2.oucs.ox.ac.uk (host not found or server failure)
2004-09-17 19:13:10 [---] Can't resolve hostname climateapps2.oucs.ox.ac.uk (host not found or server failure)
2004-09-17 19:13:10 [climateprediction.net] scheduler init_op_project to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi failed, error -113
2004-09-17 19:13:10 [climateprediction.net] scheduler init_op_project to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi failed, error -113
2004-09-17 19:13:10 [climateprediction.net] Deferring communication with project for 3 hours, 47 minutes, and 55 seconds
2004-09-17 19:13:10 [climateprediction.net] Deferring communication with project for 3 hours, 47 minutes, and 55 seconds


Why a "quit request" for a Trickle's comm failure? Why the huge initial time delay?

ID: 4331 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 4334 - Posted: 18 Sep 2004, 20:43:12 UTC - in response to Message 4331.  

are these BOINC version 4.05, I would try 4.09 as I think it fixes a lot of these oddities. It should be linked on the download page here now.

ID: 4334 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 4338 - Posted: 19 Sep 2004, 0:47:55 UTC - in response to Message 4334.  

> are these BOINC version 4.05, I would try 4.09 as I think it fixes a lot of
> these oddities. It should be linked on the download page here now.
>

Hi, Carl,

Roger on the 4.05. I'll install 4.09 after uploads of Models, which are now in Phase 3. (Meanwhile, I'm trusting the ISP's new Routers--> because all eight Models on my four machines required the file-size change...) One puckery change at a time....

Thanks.

Jim
ID: 4338 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : Comm outage, CPDN (boinc?) upchucked

©2024 cpdn.org