climateprediction.net (CPDN) home page
Thread 'March is a Lost Month'

Thread 'March is a Lost Month'

Questions and Answers : Unix/Linux : March is a Lost Month
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user1265

Send message
Joined: 26 Aug 04
Posts: 13
Credit: 458,996
RAC: 0
Message 11461 - Posted: 27 Mar 2005, 2:53:07 UTC

Since March 5th, every model run I've downloaded
(7 and counting) have failed before the end of
phase 1, usually around credit 1134.21 with a
Client Error. This is 2 different Linux hosts,
each running model 4.11.

Before the jump from 4.04 to 4.10 to 4.11,
almost all my model runs finished normally, no I
haven't had one in almost a month.

Is this a misconfig, or are the models just buggy?
ID: 11461 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 11477 - Posted: 29 Mar 2005, 19:39:01 UTC

This is a problem with the Linux CPDN client. See other recent posts to this forum for more complaints about stability of 4.11.
ID: 11477 · Report as offensive     Reply Quote
old_user1265

Send message
Joined: 26 Aug 04
Posts: 13
Credit: 458,996
RAC: 0
Message 11487 - Posted: 30 Mar 2005, 4:53:36 UTC - in response to Message 11477.  

> This is a problem with the Linux CPDN client. See other recent posts to this
> forum for more complaints about stability of 4.11.
>
I figured as much. Looking at my machines, I have 2 more model runs
that just started up on 4.11, I'm sure they will die soon enough.

Does anyone know how to move a work unit from one machine to another? I have a
slow machine that has almost a years worth of 4.04 models yet to do on it,
that I would like to move to faster machines that are starving for work. But
I dont' want to tar up the whole dir so as to replace the identity. I'd
just like to copy the proper client_state.xml and whatever else is
needed so the faster machines will pick up the copied model as if it was
originally given to them. I've attempted to copy selected parts of
stuff before, but boinc doesn't seem to like my changes and blows them off.
ID: 11487 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 11897 - Posted: 18 Apr 2005, 1:15:33 UTC - in response to Message 11477.  

> This is a problem with the Linux CPDN client. See other recent posts to this
> forum for more complaints about stability of 4.11.
>
I am running 4.13 and most of them crash too. I have not gotten to phase 2 in I do not remember how long.
ID: 11897 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 11898 - Posted: 18 Apr 2005, 1:23:11 UTC - in response to Message 11897.  

> > This is a problem with the Linux CPDN client. See other recent posts to
> this
> > forum for more complaints about stability of 4.11.
> >
> I am running 4.13 and most of them crash too. I have not gotten to phase 2 in
> I do not remember how long.
>
P.S.: Mainly complains No Heartbeat in 31 seconds...
ID: 11898 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 11995 - Posted: 21 Apr 2005, 11:21:20 UTC - in response to Message 11897.  

> > This is a problem with the Linux CPDN client. See other recent posts to
> this
> > forum for more complaints about stability of 4.11.
> >
> I am running 4.13 and most of them crash too. I have not gotten to phase 2 in
> I do not remember how long.
>
This BOINC is really frustrating.

It has been running 3 instances of climateprediction on my machine for the last several days, yet no trickles at all in about 5 days. Furthermore, this machine has 2 hyperthreaded 3.06GHz Xeon processors, so it should be running four applications most of the time. It seldom does, although once in a while it runs five, which it should not. So the BOINC client is, IMAO, defective for one thing. And I guess the 4.13 application, or its data, are bad too since I never get out of Phase 1 anymore. I used to.
ID: 11995 · Report as offensive     Reply Quote
ProfileAndrew Hingston
Volunteer moderator

Send message
Joined: 17 Aug 04
Posts: 753
Credit: 9,804,700
RAC: 0
Message 12002 - Posted: 21 Apr 2005, 13:07:49 UTC - in response to Message 11995.  


> It has been running 3 instances of climateprediction on my machine for the
> last several days, yet no trickles at all in about 5 days.

Nobody has had any trickles credited since 18 April. This is a server fault, and should not affect upload or result in data being lost. So unlees there are messages indicating a problem communicating with the server you can ignore this particular problem for now.
ID: 12002 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 12149 - Posted: 29 Apr 2005, 20:51:46 UTC - in response to Message 11898.  

> > > This is a problem with the Linux CPDN client. See other recent
> posts to
> > this
> > > forum for more complaints about stability of 4.11.
> > >
> > I am running 4.13 and most of them crash too. I have not gotten to phase
> 2 in
> > I do not remember how long.
> >
> P.S.: Mainly complains No Heartbeat in 31 seconds...
>
I am still getting mostly failures; here is a typical one:

Server state Over
Outcome Client error
Client state Computing
Exit status 251 (0xfb)
Host ID 45631
Report deadline 16 Mar 2006 17:41:39 UTC
CPU time 370877.60
stderr out

4.19
process exited with code 251 (0xfb)

1
0

No heartbeat from core client for 31 sec - exiting
No heartbeat from core client for 31 sec - exiting



I am running a Dual Hyperthreaded Xeon system with 4 GBytes RAM that is up 24/7. Does everyone experience this, or should I do something? If so, what?
ID: 12149 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : March is a Lost Month

©2024 cpdn.org