Message boards : Number crunching : Sulphur units constantly failing
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 Apr 05 Posts: 10 Credit: 129,186 RAC: 0 |
Ever since getting sulphur 4.22 dl\'d to my STABLE machine, this has happened. Any explanations? |
Send message Joined: 20 Sep 04 Posts: 14 Credit: 30,765 RAC: 0 |
Ever since getting sulphur 4.22 dl\'d to my STABLE machine, this has happened. Any explanations? you should post the workunits, we can\'t see your results page well, i think that sulphur 4.22 has some problems, i hope that the next experiment or next suplhur version will be more stable |
Send message Joined: 15 Apr 05 Posts: 10 Credit: 129,186 RAC: 0 |
you should post the workunits, we can\'t see your results page There are quite a few: 1780404 1775160 1709086 1626999 1619956 1617821 1617728 (I reset the project after this WU failed, but before it reported, so it still shows as active) 1618826 I just dl\'d and have started 1782971 I have used CPDN on this machine since April alongside SETI, Einstein, LHC, and PrimeGrid without any troubles until now. |
Send message Joined: 15 Apr 05 Posts: 10 Credit: 129,186 RAC: 0 |
I just noticed this thread, so I\'ll be watching that one. |
Send message Joined: 8 Jul 05 Posts: 33 Credit: 1,274,211 RAC: 0 |
Phew, thought it was just me..... Not had much luck at all with CPDN.........haven\'t finished a WU yet for various reasons. |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
I just had another 4.22 die on me at 60% (?) forgot to write it down, it was at either 40 or 60%, It was this one. I know it says aborted by GUI, but the app was not running, the directory had only files that were 1K in size (I did save this as a copy, send me a note if you want the copy of the slots dir). Very strange indeed. the only good news I guess is that I still have 3 4.19 models and they seem to be runing well. The tension is rising ... 1 hour something on one of them .. :) I hate to be brusk, but are any of the 4.22 models completing? And, even it not, is it worth my time to run paritals? I know you probably told me somewhere ... Anyhow, p.d.buck@comcast.net if you want the slots dir ... not sure what good it will do, all the files as small and zips ... a runaway delete all files gremelin? NO matter I suppose ... :) |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
If people can finish pahse one, there is extra info in it, (compared to slab), that is very usefull. After that, the team need all of the rest. Frequent backups are a must! |
Send message Joined: 7 Aug 04 Posts: 2186 Credit: 64,822,615 RAC: 5,275 |
|
Send message Joined: 20 Sep 04 Posts: 14 Credit: 30,765 RAC: 0 |
If people can finish pahse one, there is extra info in it, (compared to slab), that is very usefull. After that, the team need all of the rest. daily backups and internet disabled is a must for 4.22 :) I keep 3 days of backups to be sure 100% :) it\'s a challenge for me to finish this workunit :) |
Send message Joined: 8 Jul 05 Posts: 33 Credit: 1,274,211 RAC: 0 |
I thought the idea of BOINC was you can crunch multiple projects but just attaching and then leaving it be. I can\'t be bothered doing backups etc. Knowing my luck I\'d forget to backup. If the next WU fails, I\'m gonna stop crunching CPDN until a new app. is released. |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
Frequent backups are a must! Well, if this is true, then it needs to be part of the application. Like Clark said, it is difficult for me to remember/perform daily back-up on the CPDN project on the Daily even times ~80 days times 8 systems is a lot of additional work. To this point I have had models die on occasion ... but, on systems that had completed regularly and without trouble 4.19 models ... My only concern is that I am wasting my and your time even trying to run these models. I cannot be sure, but I do not think I have yet to complete a 4.22 model. Many die immedately others seem to wait a bit .. Ok, I saw on the other board that some new work has be created so I will try a couple more. ==== edit I don\'t mind the space, and would like to have \"rotations\" (for me I would pick 2) but there is litterally no way that I am going to remember to do this consistently enough to be of value. In my situation it is just too hard for *ME* to do on a manual basis. Perhaps this should become a fairly high priority item for the Devs on the next major update when the new models are released. In all honesty, it probably should be made part of the BOINC application so that any project could avail itself of the utility. The control would be on the Preferences page for the project, allow backup or perhaps simply a number daily back up allowed 0, 1, 2 to probably a max of 4 ... |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
It jsut occurred to me that this would HAVE to be a BOINC Client Software change as along with the CPDN slot folder back up, a client state file extract would also have to be made. Requiring the participant to back up the entire BOINC folder is obviously impractical for those of us that run multiple projects as ther remaining project contents would be very dynamic. I am not sure how the system as a whole will react with a client error that has already been reported if such a back-up system was in place in that the \"rewind\" would \"reserrect\" the dead! :) |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
Guys, I am not sure if this is completely relevant, but, I had two idle computers and downloaded work on both. One has 12 hours in on the work, the other died immediately. Looking at the ones that are dying right away all of them have been recently created. The one that started running was generated back in August. I mean, it may die soon too ... but, are we sure that the work being created now is valid? This is really odd to me, computers that just recently were successful in doing models now cannot even start one up? |
Send message Joined: 5 Aug 04 Posts: 66 Credit: 2,146,056 RAC: 0 |
I hate to be brusk, but are any of the 4.22 models completing? I have had two machines having problems with 4.22 but two others not showing any signs of stress. So far, one completed 4.22 model and the next most complete nearing the end of phase 3. I don\'t take any precautions - no back-ups, never stop before defrag nor before shutdown. Internet on always. |
Send message Joined: 20 Sep 04 Posts: 14 Credit: 30,765 RAC: 0 |
If people can finish pahse one, there is extra info in it, (compared to slab), that is very usefull. After that, the team need all of the rest. i managed to finish the first phase :) i don\'t know who has done more work, the cpu or I with the backups ;) |
Send message Joined: 10 Jan 06 Posts: 55 Credit: 2,520,659 RAC: 4,227 |
I would have to agree with Paul on that. I\'ve only remembered to backup once so far and that was only one of the two CPDN\'s I have running. Click here to join the #1 Aussie Alliance on Climate Prediction |
Send message Joined: 16 Dec 05 Posts: 27 Credit: 242,905 RAC: 1,153 |
wow.. this many problems? hmmm sounds like both Climateprediction.net and Bionic have a lot of work ahead for them. Is it important that people keep having these errors and sending in results or would it be better for climateprediction to halt the 4.22s and have people work on a more stable work unit? I would like these results to help out as much as possible as im sure everyone else doing this is hoping. So what should we do? CPDN any comments on this? Thanks Oh there is no way that im going to make back ups on a daily basis or even a weekly basis, so somethings going to have to change. |
Send message Joined: 5 Aug 04 Posts: 85 Credit: 2,924,043 RAC: 0 |
Frequent backups are a must! It seems that some lucky bastards don\'t need to do extra work ... I do backups only once a month - just before steping up to next phase! Until now, no problem at all with Sulphur 4.22 & Boinc 5.3.x under Linux (& yes, connected to the Internet 24/7). |
Send message Joined: 16 Dec 05 Posts: 27 Credit: 242,905 RAC: 1,153 |
actually now that belgix has put up his post i see that i could try to back up the model when it gets close to phase 2. what would be a recomended way to back up CPDN? Thanks everyone! |
Send message Joined: 20 Sep 04 Posts: 14 Credit: 30,765 RAC: 0 |
Frequent backups are a must! every time an application use 100% cpu for a while, the sulphur 4.22 crash. it can be a game or another kind of application, the result is the same a crash. I hope that after the BBC project release the dev\'s will correct this problem |
©2024 cpdn.org