Message boards :
Number crunching :
CPDN you may recycle these, I know I killed them.
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Aug 04 Posts: 9 Credit: 15,780 RAC: 0 |
262958 252242 26 Sep 2004 4:14:10 UTC --- In Progress Unknown New 0.00 0.00 30569 20718 30 Aug 2004 23:18:54 UTC --- In Progress Unknown New 179644.00 453.68 <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?userid=3566&PHPSESSID=bbdead271526d179c24d7c5d8dd95971">http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?userid=3566&PHPSESSID=bbdead271526d179c24d7c5d8dd95971</a> Don't wanna make ya wait a year. ----------------------- Click to see my tag <a href="http://boinc.mundayweb.com/one/stats.php?userID=1049">My tag</a> SNAFU'ed? Turn the Page! :D |
Send message Joined: 4 Sep 04 Posts: 61 Credit: 80,585 RAC: 0 |
I've got a couple of results that can be recycled, too: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=139624 http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=281738 In the first, the machine died a horrible death at the hand of an mutinous power supply. The second machine was detached when I realized there was no way it could possibly finish the work unit within the deadline. trane |
Send message Joined: 26 Aug 04 Posts: 59 Credit: 438,133 RAC: 0 |
I have quite a few units too that may be recycled, if possible: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=25426 http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=26392 http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=281885 http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=426705 http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=450785 http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=496961 Quite a lot, I know :-( |
Send message Joined: 5 Aug 04 Posts: 390 Credit: 2,475,242 RAC: 0 |
I believe there is an automatic re-sent mechanism. Each WU can be sended to 5 users. If your model performs a crash, it should be automaticaly reported to the server database. Still, it would be good to known the reason of crash and prevent farther computation loss. |
Send message Joined: 15 Jan 05 Posts: 31 Credit: 1,249,348 RAC: 0 |
I’ve got some WUs that BOINC downloaded after a fatal crash, but I restored BOINC to a previous state. I copied the BOINC directory before restoring, therefore can I copy the new WUs into the working directory …/projects/climate prediction.net ? Do I need to update any *.xml in the main BOINC directory for it to recognise these WUs and starts them when the present models are completed ? BTW BOINC did not do much work on these new WUs hence the CP server thinks they are in progress and new, see <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?hostid=93819">results</a>. These restored WUs are sending trickles to the CP server with no problems, see <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=93819">computer summary</a>. My only concern is the CP server believes these restored WUs are over with the outcome of client error; so when it completes the model will the CP server accept it ? |
Send message Joined: 25 Aug 04 Posts: 28 Credit: 6,522,252 RAC: 0 |
> My only concern is the CP server believes these restored WUs are > over with the outcome of client error; so when it completes the model will > the CP server accept it ? > My experience has been that the CPDN central BOINC servers are pretty good at sorting out this kind of problem. Once they see the trickles coming in for a given model from the same client they seem to get merged. With model restored from a backup you start getting credit again once you pass the point of the last valid trickle held by the server. Andrew Andrew <a href="http://cpdnforum.info">CPDNforum<a> |
Send message Joined: 22 Oct 04 Posts: 1 Credit: 289,691 RAC: 0 |
There is a need, I believe, for users to be able to manmually notify a CPDN WU as dead. This is unique to CPDN. Most boinc projects have deadlines of a fortnight as a maximum for WUS to be crunched.. If a result is not retunred, by default the server reissues. Thus the analysis of results timescale is not massively delayed. CPDN with it's lengthy times (necessary due to the size of WUS), can be unaware for months that a result has failed. The WU hangs in limbo on the server as "in prgress". It could take a year before it is reissued, only to suffer the same delay. That, I expect, harms CPDN's ability to analyse results. Where there is a complete hardware failure, the client computer will not return an error message. Thus the result stays as "in progress". I have a number of these due to issues at my end on one PC that had an intermittent CPU / motherboard fault, such that the hard disk was reformatted and so downloaded new CPDN WUs without the old being properly terminated. Such dead WUs include: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=309504 http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=312529 http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=317599 http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=329727 |
Send message Joined: 27 Aug 04 Posts: 55 Credit: 1,106,201 RAC: 0 |
> There is a need, I believe, for users to be able to manmually notify a CPDN WU > as dead. This is unique to CPDN. > > Most boinc projects have deadlines of a fortnight as a maximum for WUS to be > crunched.. If a result is not retunred, by default the server reissues. Thus > the analysis of results timescale is not massively delayed. > > CPDN with it's lengthy times (necessary due to the size of WUS), can be > unaware for months that a result has failed. The WU hangs in limbo on the > server as "in prgress". It could take a year before it is reissued, only to > suffer the same delay. That, I expect, harms CPDN's ability to analyse > results. > > Where there is a complete hardware failure, the client computer will not > return an error message. Thus the result stays as "in progress". > I agree w/this post, and also feel there is an additional component supporting this argument: Since the CP WU generally takes anywhere from 3 wks, to several months in order to complete a single WU, there is an even greater probability of occurance of a computing or operator error for any given WU, due to the long WU period of time. Mains A.C. power failures, accidentaal reboots or shutdowns, hardware failures, & other unexpected issues DO occur from time-to-time, and are unavoidable. If this was not the case, the commmercial computing & telcom industries would not have been ardently, and unsuccessfully, chasing the elusive "five nines" for the past several decades!!! I've run in several gauntlets for a DC project called Seventeen-or-Bust, which is a subset of GIMPS (Mersenne Primes Search). There is an area on the project board that shows all WU's (factoring exponents, actually) which are assigned to your user_ID. the same page then has a function which allows the user to "release" any of thier assigned WU's immediately back to the WU pool. This works excellently in theory & in practice, and came in really handy when I fouled up a service install when I was new to the project. JMO Strat |
Send message Joined: 10 Oct 04 Posts: 223 Credit: 4,664 RAC: 0 |
I believe that there are more possible cpdn models (all the possible parameter combinations) than we can realistically ever do, even if we all recruit extra friends, family and machines. So some possible models will probably never be crunched. The important thing seems to be to complete as many models as we can so that the researchers have the largest possible data set, rather than worrying about the fate of particular failed models. They will be automatically reissued anyway. The delays before models are reissued would only matter if we were obliged to complete all possible models before a particular date. So don't worry and just keep crunching, if possible sorting out the problems that caused the crash. __________________________________________________ |
Send message Joined: 30 Aug 04 Posts: 9 Credit: 15,780 RAC: 0 |
yes, but still a waste to have 1000 or more waiting for one more 'completed' before being validated. Edit: and to add, even those projects that have a short due date are lengthened by a considerable percentage while waiting for that last non-error completion. ----------------------- Click to see my tag <a href="http://boinc.mundayweb.com/one/stats.php?userID=1049">My tag</a> SNAFU'ed? Turn the Page! :D |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I read on an another thread (somewhere), that if the server hasn't had a trickle from a host for 6 weeks, it labels it 'dead', and re-issues it. Up to the limit of 5 attemps, I guess. But I haven't seen any "official" documentation. Les |
Send message Joined: 30 Aug 04 Posts: 9 Credit: 15,780 RAC: 0 |
Edit: Never mind, 6 weeks is a long time. ----------------------- Click to see my tag <a href="http://boinc.mundayweb.com/one/stats.php?userID=1049">My tag</a> SNAFU'ed? Turn the Page! :D |
©2024 cpdn.org