Questions and Answers : Unix/Linux : Comm outage, CPDN (boinc?) upchucked
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Local telco had a regional DSL outage this afternoon -- when Bbox attempted to upload a Trickle. -113 errors. P4 3.0, SuSE Linux 9.0 (\'twas top-ranked box until a slower machine running slower Models roared past it today[!]). For whatever reason, CPDN got a shutdown request. Tried several times. (On re-start, each Model processed ~ a dozen TS and then turned belly-up again.) When the DSL problem cleared, everything continued normally (except for redundant Trickles) when boinc re-started. (Two times Phase 3 continue processing -- Phew!) I see a significant problem if the boinc-CPDN interface can\'t distinguish between comm problems and processing problems (if such be the case). The response SHOULD be, \"Can\'t send a Trickle? No biggee -- keep on processing\"; \'twas ever thus in Classic CPDN, eh? In the trauma/recovery process, the Trickle was uploaded three times. Fortunately, Carl counts only one.... |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Three more occurrences yesterday (thanks to four DSL outages), one on each of my three Linux boxes: A design flaw, IMO, for Trickle-related comm issues. The Models should be able to continue processing and allow Trickles to pile up. 2004-09-17 19:12:30 [climateprediction.net] Sending request to scheduler: http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi 035h_400029070 - PH 2 TS 248937 - 27/04/1840 04:30 - H:M:S=0417:03:18 AVG= 2.95 DLT= 0.95 035h_400029070 - PH 2 TS 248938 - 27/04/1840 05:00 - H:M:S=0417:03:19 AVG= 2.95 DLT= 0.96 035h_400029070 - PH 2 TS 248939 - 27/04/1840 05:30 - H:M:S=0417:03:21 AVG= 2.95 DLT= 1.95 035h_400029070 - PH 2 TS 248940 - 27/04/1840 06:00 - H:M:S=0417:03:22 AVG= 2.95 DLT= 0.95 035h_400029070 - PH 2 TS 248941 - 27/04/1840 06:30 - H:M:S=0417:03:23 AVG= 2.95 DLT= 0.95 035i_400029071 - PH 3 TS 004748 - 09/03/2051 22:00 - H:M:S=0413:45:44 AVG= 2.85 DLT= 9.93 035i_400029071 - PH 3 TS 004749 - 09/03/2051 22:30 - H:M:S=0413:45:45 AVG= 2.85 DLT= 1.00 035i_400029071 - PH 3 TS 004750 - 09/03/2051 23:00 - H:M:S=0413:45:47 AVG= 2.85 DLT= 2.00 035i_400029071 - PH 3 TS 004751 - 09/03/2051 23:30 - H:M:S=0413:45:48 AVG= 2.85 DLT= 1.00 035i_400029071 - PH 3 TS 004752 - 10/03/2051 00:00 - H:M:S=0413:45:49 AVG= 2.85 DLT= 1.00 035i_400029071 - PH 3 TS 004753 - 10/03/2051 00:30 - H:M:S=0413:45:51 AVG= 2.85 DLT= 2.25 035h_400029070 - PH 2 TS 248942 - 27/04/1840 07:00 - H:M:S=0417:03:34 AVG= 2.95 DLT=11.59 035h_400029070 - PH 2 TS 248943 - 27/04/1840 07:30 - H:M:S=0417:03:35 AVG= 2.95 DLT= 0.95 035h_400029070 - PH 2 TS 248944 - 27/04/1840 08:00 - H:M:S=0417:03:36 AVG= 2.95 DLT= 0.96 035h_400029070 - PH 2 TS 248945 - 27/04/1840 08:30 - H:M:S=0417:03:38 AVG= 2.95 DLT= 1.95 035h_400029070 - PH 2 TS 248946 - 27/04/1840 09:00 - H:M:S=0417:03:39 AVG= 2.95 DLT= 0.95 035h_400029070 - PH 2 TS 248947 - 27/04/1840 09:30 - H:M:S=0417:03:41 AVG= 2.95 DLT= 1.90 035i_400029071 - PH 3 TS 004754 - 10/03/2051 01:00 - H:M:S=0413:46:01 AVG= 2.85 DLT= 9.62 035i_400029071 - PH 3 TS 004755 - 10/03/2051 01:30 - H:M:S=0413:46:03 AVG= 2.85 DLT= 2.00 035i_400029071 - PH 3 TS 004756 - 10/03/2051 02:00 - H:M:S=0413:46:04 AVG= 2.85 DLT= 1.00 035i_400029071 - PH 3 TS 004757 - 10/03/2051 02:30 - H:M:S=0413:46:05 AVG= 2.85 DLT= 1.00 CPDN Monitor got quit request... Detaching shared memory... 035i_400029071 - PH 3 TS 004759 - 10/03/2051 03:30 - H:M:S=0413:46:07 AVG= 2.85 DLT= 1.00 CPDN Monitor got quit request... Detaching shared memory... 2004-09-17 19:13:10 [---] Can't resolve hostname climateapps2.oucs.ox.ac.uk (host not found or server failure) 2004-09-17 19:13:10 [---] Can't resolve hostname climateapps2.oucs.ox.ac.uk (host not found or server failure) 2004-09-17 19:13:10 [climateprediction.net] scheduler init_op_project to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi failed, error -113 2004-09-17 19:13:10 [climateprediction.net] scheduler init_op_project to http://climateapps2.oucs.ox.ac.uk/cpdnboinc_cgi/cgi failed, error -113 2004-09-17 19:13:10 [climateprediction.net] Deferring communication with project for 3 hours, 47 minutes, and 55 seconds 2004-09-17 19:13:10 [climateprediction.net] Deferring communication with project for 3 hours, 47 minutes, and 55 seconds Why a "quit request" for a Trickle's comm failure? Why the huge initial time delay? |
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
are these BOINC version 4.05, I would try 4.09 as I think it fixes a lot of these oddities. It should be linked on the download page here now. |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
> are these BOINC version 4.05, I would try 4.09 as I think it fixes a lot of > these oddities. It should be linked on the download page here now. > Hi, Carl, Roger on the 4.05. I'll install 4.09 after uploads of Models, which are now in Phase 3. (Meanwhile, I'm trusting the ISP's new Routers--> because all eight Models on my four machines required the file-size change...) One puckery change at a time.... Thanks. Jim |
©2024 cpdn.org