climateprediction.net (CPDN) home page
Thread 'Server State Over, but wu is in progress!'

Thread 'Server State Over, but wu is in progress!'

Message boards : Number crunching : Server State Over, but wu is in progress!
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 19239 - Posted: 13 Jan 2006, 0:53:37 UTC

6000+ !
Oops! Sorry. Fingers crossed.

ID: 19239 · Report as offensive     Reply Quote
ProfileAnanas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 19240 - Posted: 13 Jan 2006, 1:24:16 UTC

Well, nearly 500 have a RAC of zero now so it\'s only 5500+ - I guess we can handle that ;-)

Actually it will not be that extreme, most of them are not active forum members, of those who are in the forum, many do not have problems and many problems have been solved by other team members. Many people signed up in the forum because they had problems - and we had quite a lot of new forum members lately.

The smaller of both teams (~200 crunchers) has TheBigJens (housekeeper of boinc.de) so there are less problems anyway :-)
ID: 19240 · Report as offensive     Reply Quote
Profileold_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 19248 - Posted: 13 Jan 2006, 10:17:00 UTC - in response to Message 19225.  
Last modified: 13 Jan 2006, 10:40:09 UTC

As long as Berkeley keeps adding one new feature each day to keep the bugs happy, 5.x isn\'t really an option for me. Too many headless crunchers to install a bugfixed version with new surprises every week.

YMMV ... :)

I recall many had lots of problems with 4.19 too ... in particular upload/dowloading issues ...

If you have multiple computers, well start with one and let it get comfortable, and you too, before you migrate. I only have 9 systems, but I also don\'t like to constantly upgrade even so. My USUAL recomendation is to find a stable version and stick with it.

At the moment, I am standardized on 5.2.13; I went from 4.19 to 4,25, to 4.35, to 4.45, to 4.72 and lastly to 5.2.13 ... if there were too many issues with a version, I would revert down ...

Anyway, MOST of the problems with 5.2.13 are managable, many of the newer features are not immediately compelling I agree, but, better client server handshaking with fewer download issues is one reason *I* would move up (and did).

==== Edit

And of course, as soon as I say that ... Bruce Allen finds a serious bug in 5.2.13 that suggests updating from it when the next one comes out ... :)

Ah, Murphy at work ...

ID: 19248 · Report as offensive     Reply Quote
ProfileAnanas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 19530 - Posted: 22 Jan 2006, 15:23:22 UTC

@Les : Let\'s start with this -161 :

PP_CTL: Error Buffering in Fixed length Header
Empty PP File in Climate Mode?


is the only thing I can see in the out file.

It belongs to this host

It already has quite a collection of errors, the problems started with Sulphur.

Not one of the DirectX crashes, the owner explained that they mostly happened at night when he isn\'t using his PC.
ID: 19530 · Report as offensive     Reply Quote
ProfileAnanas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 19532 - Posted: 22 Jan 2006, 15:36:40 UTC

For the BOINC trouble collection :

A problem that happens all the time with later BOINC versions (> 5.2) is that the scheduler sticks to one project (\"overcommitted\" message).
___________________________

Some hosts (or project combinations?) don\'t download work anymore after a while (\"not requesting work\"), even if they ran dry.

I asked a team mate who had it to reset all his >debt< entries in client_state.xml and that fixed the problem for him. I forgot to ask what they contained before he reset them though :-/

Others reset the projects to get rid of the problem - the reset resets the >debt< entries too of course so I guess the debit values are the main reason.
ID: 19532 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 19562 - Posted: 23 Jan 2006, 3:34:59 UTC

Ananas
Re: the \'overcommitted\' machines; Are they older/slower? If so, they may really BE overcommitted with too many projects for their processing power.

As for:
PP_CTL: Error Buffering in Fixed length Header
Empty PP File in Climate Mode?

Doesn\'t mean anything to me. Unless the user is trying to use a network drive, which is a no-no.

ID: 19562 · Report as offensive     Reply Quote
ProfileAnanas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 19571 - Posted: 23 Jan 2006, 8:11:22 UTC - in response to Message 19562.  
Last modified: 23 Jan 2006, 8:13:55 UTC

Ananas
Re: the \'overcommitted\' machines; Are they older/slower? If so, they may really BE overcommitted with too many projects for their processing power.


It seems not to be restricted to special computers, even dual core machines get it now and then. One teammate posted this :

Message from server: (won\'t finish in time) Computer on -64.4% of time, BOINC on 100.0% of that, this project gets 50.0% of that

so it seems to be quite clearly a BOINC bug, not just a normal result of calculations. I didn\'t ask about lightspeed travels though ;-)


As for:
PP_CTL: Error Buffering in Fixed length Header
Empty PP File in Climate Mode?

Doesn\'t mean anything to me. Unless the user is trying to use a network drive, which is a no-no.


This PP_CTL thing was the only message that looked a little wrong in yabsd.out, it was not a message from BOINC. It is one of those lovely -161 errors with exit status 0, that\'s why I asked the owner for his yabsd.out.
ID: 19571 · Report as offensive     Reply Quote
ProfileKeck_Komputers
Avatar

Send message
Joined: 5 Aug 04
Posts: 426
Credit: 2,426,069
RAC: 0
Message 19603 - Posted: 24 Jan 2006, 5:17:13 UTC

Message from server: (won\'t finish in time) Computer on -64.4% of time, BOINC on 100.0% of that, this project gets 50.0% of that

This looks like a problem the computer can\'t be on a negative percent of the time. You may need to edit your time stats in client_state.xml to make this a positive number.
BOINC WIKI

BOINCing since 2002/12/8
ID: 19603 · Report as offensive     Reply Quote
ProfileAnanas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 19607 - Posted: 24 Jan 2006, 19:21:35 UTC
Last modified: 24 Jan 2006, 19:25:58 UTC

It\'s not an open problem in this case, I told him already to zero all debits and put 0.9999 into all time_stats.

But it\'s one of those errors that you can find quite often in BOINC boards and team fora. Not so good especially for headless crunchers.
ID: 19607 · Report as offensive     Reply Quote
Profileold_user733
Avatar

Send message
Joined: 9 Aug 04
Posts: 25
Credit: 4,756,979
RAC: 0
Message 20287 - Posted: 15 Feb 2006, 16:37:14 UTC

On this note, I have a machine which is running a sulphur cycle 4.19 model, this is the WU:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=711985

It is now overdue by 6 days, the BOINC CC is telling me I should abort it, but it is on phase 5. It seems that the deadline was a bit short on this WU, since I got it on Sept.12, 05, I thought these had about a year to finish.

Anyway, the WU is listed as \"over\" with \"too many total results\" as the error. Should I abort? I\'m really reluctant to do so since it is on phase 5, unless the result is worthless. It is continuing to run normally otherwise.

The machine is a 1GHz P3, it has no other project running at the moment, BOINC has been in EDF mode essentially since I got this WU. Now running BOINC CC 5.2.15.
ID: 20287 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 20288 - Posted: 15 Feb 2006, 17:00:25 UTC - in response to Message 20287.  

It is now overdue by 6 days, the BOINC CC is telling me I should abort it, but it is on phase 5. It seems that the deadline was a bit short on this WU, since I got it on Sept.12, 05, I thought these had about a year to finish.

Anyway, the WU is listed as \"over\" with \"too many total results\" as the error. Should I abort? I\'m really reluctant to do so since it is on phase 5, unless the result is worthless. It is continuing to run normally otherwise.

The machine is a 1GHz P3, it has no other project running at the moment, BOINC has been in EDF mode essentially since I got this WU. Now running BOINC CC 5.2.15.

Don\'t abort! The CC is stupid in that CPDN does not care that the deadline has been passed, but with other projects, it is a big deal. The original bunch of work units were sent out with too short of a deadline. Later sulphurs had a deadline of about a year. Congratulations on getting so far with your P3 1 GHz and good luck on finishing!
ID: 20288 · Report as offensive     Reply Quote
Profileold_user733
Avatar

Send message
Joined: 9 Aug 04
Posts: 25
Credit: 4,756,979
RAC: 0
Message 20290 - Posted: 15 Feb 2006, 17:14:11 UTC - in response to Message 20288.  

Don\'t abort! The CC is stupid in that CPDN does not care that the deadline has been passed, but with other projects, it is a big deal. The original bunch of work units were sent out with too short of a deadline. Later sulphurs had a deadline of about a year. Congratulations on getting so far with your P3 1 GHz and good luck on finishing!


OK, OK!! I\'ll take my finger off the button! ;-) Thanks for the quick reply.

Seriously, though, should I detach this machine from CPDN in the future? It looks like the new models will, if anything, require even more powerful CPU\'s. This machine is at the low end of my collection, so I will still be running CPDN with other, faster machines in any case. Is there a minimum recommended machine?

Thanks.
-Gene

ID: 20290 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 20294 - Posted: 15 Feb 2006, 18:28:16 UTC

I think 1GHz is the minimum recommended at the moment, but that\'s running 24/7, and dedicated to cpdn.
The less run time a day, and the more projects, the higher the recommended speed to compensate.

The main thing is regular trickles to let the server know the model is still running.

ID: 20294 · Report as offensive     Reply Quote
old_user56785
Avatar

Send message
Joined: 23 Feb 05
Posts: 55
Credit: 240,119
RAC: 0
Message 20320 - Posted: 16 Feb 2006, 0:45:52 UTC - in response to Message 20290.  

Seriously, though, should I detach this machine from CPDN in the future? It looks like the new models will, if anything, require even more powerful CPU\'s. This machine is at the low end of my collection, so I will still be running CPDN with other, faster machines in any case. Is there a minimum recommended machine?

Thanks.
-Gene


You can expect 466 days 24/7 for a hadcm3 model on a 1Gig. In energy-terms that is 600 to 700 kWh. A multi-core will do it faster and with less energy use.

Among others this is the reason that my 1cpu, 1Gig or less are not used anymore and some dual 1Gig\'s are toggled on/off on a need for room heating basis, while faster pc\'s are running 24/7.

Of course everyone has to make his own decision and hope that above will help you with that.




ID: 20320 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Server State Over, but wu is in progress!

©2024 cpdn.org