Message boards : Number crunching : Server State Over, but wu is in progress!
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
6000+ ! Oops! Sorry. Fingers crossed. |
Send message Joined: 31 Oct 04 Posts: 336 Credit: 3,316,482 RAC: 0 |
Well, nearly 500 have a RAC of zero now so it\'s only 5500+ - I guess we can handle that ;-) Actually it will not be that extreme, most of them are not active forum members, of those who are in the forum, many do not have problems and many problems have been solved by other team members. Many people signed up in the forum because they had problems - and we had quite a lot of new forum members lately. The smaller of both teams (~200 crunchers) has TheBigJens (housekeeper of boinc.de) so there are less problems anyway :-) |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
As long as Berkeley keeps adding one new feature each day to keep the bugs happy, 5.x isn\'t really an option for me. Too many headless crunchers to install a bugfixed version with new surprises every week. YMMV ... :) I recall many had lots of problems with 4.19 too ... in particular upload/dowloading issues ... If you have multiple computers, well start with one and let it get comfortable, and you too, before you migrate. I only have 9 systems, but I also don\'t like to constantly upgrade even so. My USUAL recomendation is to find a stable version and stick with it. At the moment, I am standardized on 5.2.13; I went from 4.19 to 4,25, to 4.35, to 4.45, to 4.72 and lastly to 5.2.13 ... if there were too many issues with a version, I would revert down ... Anyway, MOST of the problems with 5.2.13 are managable, many of the newer features are not immediately compelling I agree, but, better client server handshaking with fewer download issues is one reason *I* would move up (and did). ==== Edit And of course, as soon as I say that ... Bruce Allen finds a serious bug in 5.2.13 that suggests updating from it when the next one comes out ... :) Ah, Murphy at work ... |
Send message Joined: 31 Oct 04 Posts: 336 Credit: 3,316,482 RAC: 0 |
@Les : Let\'s start with this -161 : PP_CTL: Error Buffering in Fixed length Header is the only thing I can see in the out file. It belongs to this host It already has quite a collection of errors, the problems started with Sulphur. Not one of the DirectX crashes, the owner explained that they mostly happened at night when he isn\'t using his PC. |
Send message Joined: 31 Oct 04 Posts: 336 Credit: 3,316,482 RAC: 0 |
For the BOINC trouble collection : A problem that happens all the time with later BOINC versions (> 5.2) is that the scheduler sticks to one project (\"overcommitted\" message). ___________________________ Some hosts (or project combinations?) don\'t download work anymore after a while (\"not requesting work\"), even if they ran dry. I asked a team mate who had it to reset all his >debt< entries in client_state.xml and that fixed the problem for him. I forgot to ask what they contained before he reset them though :-/ Others reset the projects to get rid of the problem - the reset resets the >debt< entries too of course so I guess the debit values are the main reason. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Ananas Re: the \'overcommitted\' machines; Are they older/slower? If so, they may really BE overcommitted with too many projects for their processing power. As for: PP_CTL: Error Buffering in Fixed length Header Empty PP File in Climate Mode? Doesn\'t mean anything to me. Unless the user is trying to use a network drive, which is a no-no. |
Send message Joined: 31 Oct 04 Posts: 336 Credit: 3,316,482 RAC: 0 |
Ananas It seems not to be restricted to special computers, even dual core machines get it now and then. One teammate posted this : Message from server: (won\'t finish in time) Computer on -64.4% of time, BOINC on 100.0% of that, this project gets 50.0% of that so it seems to be quite clearly a BOINC bug, not just a normal result of calculations. I didn\'t ask about lightspeed travels though ;-) As for: This PP_CTL thing was the only message that looked a little wrong in yabsd.out, it was not a message from BOINC. It is one of those lovely -161 errors with exit status 0, that\'s why I asked the owner for his yabsd.out. |
Send message Joined: 5 Aug 04 Posts: 426 Credit: 2,426,069 RAC: 0 |
Message from server: (won\'t finish in time) Computer on -64.4% of time, BOINC on 100.0% of that, this project gets 50.0% of that This looks like a problem the computer can\'t be on a negative percent of the time. You may need to edit your time stats in client_state.xml to make this a positive number. BOINC WIKI BOINCing since 2002/12/8 |
Send message Joined: 31 Oct 04 Posts: 336 Credit: 3,316,482 RAC: 0 |
It\'s not an open problem in this case, I told him already to zero all debits and put 0.9999 into all time_stats. But it\'s one of those errors that you can find quite often in BOINC boards and team fora. Not so good especially for headless crunchers. |
Send message Joined: 9 Aug 04 Posts: 25 Credit: 4,756,979 RAC: 0 |
On this note, I have a machine which is running a sulphur cycle 4.19 model, this is the WU: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=711985 It is now overdue by 6 days, the BOINC CC is telling me I should abort it, but it is on phase 5. It seems that the deadline was a bit short on this WU, since I got it on Sept.12, 05, I thought these had about a year to finish. Anyway, the WU is listed as \"over\" with \"too many total results\" as the error. Should I abort? I\'m really reluctant to do so since it is on phase 5, unless the result is worthless. It is continuing to run normally otherwise. The machine is a 1GHz P3, it has no other project running at the moment, BOINC has been in EDF mode essentially since I got this WU. Now running BOINC CC 5.2.15. |
Send message Joined: 7 Aug 04 Posts: 2186 Credit: 64,822,615 RAC: 5,275 |
It is now overdue by 6 days, the BOINC CC is telling me I should abort it, but it is on phase 5. It seems that the deadline was a bit short on this WU, since I got it on Sept.12, 05, I thought these had about a year to finish. Don\'t abort! The CC is stupid in that CPDN does not care that the deadline has been passed, but with other projects, it is a big deal. The original bunch of work units were sent out with too short of a deadline. Later sulphurs had a deadline of about a year. Congratulations on getting so far with your P3 1 GHz and good luck on finishing! |
Send message Joined: 9 Aug 04 Posts: 25 Credit: 4,756,979 RAC: 0 |
Don\'t abort! The CC is stupid in that CPDN does not care that the deadline has been passed, but with other projects, it is a big deal. The original bunch of work units were sent out with too short of a deadline. Later sulphurs had a deadline of about a year. Congratulations on getting so far with your P3 1 GHz and good luck on finishing! OK, OK!! I\'ll take my finger off the button! ;-) Thanks for the quick reply. Seriously, though, should I detach this machine from CPDN in the future? It looks like the new models will, if anything, require even more powerful CPU\'s. This machine is at the low end of my collection, so I will still be running CPDN with other, faster machines in any case. Is there a minimum recommended machine? Thanks. -Gene |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I think 1GHz is the minimum recommended at the moment, but that\'s running 24/7, and dedicated to cpdn. The less run time a day, and the more projects, the higher the recommended speed to compensate. The main thing is regular trickles to let the server know the model is still running. |
Send message Joined: 23 Feb 05 Posts: 55 Credit: 240,119 RAC: 0 |
Seriously, though, should I detach this machine from CPDN in the future? It looks like the new models will, if anything, require even more powerful CPU\'s. This machine is at the low end of my collection, so I will still be running CPDN with other, faster machines in any case. Is there a minimum recommended machine? You can expect 466 days 24/7 for a hadcm3 model on a 1Gig. In energy-terms that is 600 to 700 kWh. A multi-core will do it faster and with less energy use. Among others this is the reason that my 1cpu, 1Gig or less are not used anymore and some dual 1Gig\'s are toggled on/off on a need for room heating basis, while faster pc\'s are running 24/7. Of course everyone has to make his own decision and hope that above will help you with that. |
©2024 cpdn.org