Message boards : Number crunching : WUs constantly failing
Message board moderation
Author | Message |
---|---|
Send message Joined: 25 Aug 04 Posts: 41 Credit: 34,555 RAC: 0 |
I've had a batch of WUs this summer that have all failed one after another without reaaching 1%. Computer is a laptop, is sitting in the middle of a table, uncluttered, with an open window with a breeze coming in most of the time, it's not overheating and without any other problems. It's a P4 @3.2 GHz, 512Mb RAM. I thought it was a problem with 4.13, so I was very happy when I got a Sulphur Cycle WU. It has failed as well, at about 1.40% crunched. What's up? and I am not the only one on my team having problems. Here are a couple of threads: http://wwseti.net/forum/viewtopic.php?t=3877 http://wwseti.net/forum/viewtopic.php?t=4032 http://wwseti.net/forum/viewtopic.php?t=4097 http://wwseti.net/forum/viewtopic.php?t=4128 Any suggestions..... Can anyone explain what's going on here? Here are some of my results http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?hostid=117229 <br>***********************************<br> Win2KPro, P4 1.8GHz, 512Mb RAM. Running Folding<br> WinXP Home, P4 3.2GHz HT, 512Mb RAM. Running SETI, CPDN, Predictor, LHC, Einstein, Orbit and Folding |
Send message Joined: 16 Oct 04 Posts: 692 Credit: 277,679 RAC: 0 |
What anti virus software/settings have you got? Have you read http://www.climateprediction.net/board/viewtopic.php?t=2895&start=0 ? _______________________________ Visit <a href="http://boinc-doc.net/boinc-wiki/index.php?title=Climateprediction_FAQ">BOINC WIKI</a> for help And join <a href="http://www.boincsynergy.com/">BOINC Synergy</a> for all the news in one place. |
Send message Joined: 25 Aug 04 Posts: 41 Credit: 34,555 RAC: 0 |
Hmmm.... thank you, crandles! I am currently Using AVG 7.0 free, I have used avast! and Norton Internet Security over the last few months and had models crash with each of these programmes. Currently I have Zone Alarm (free version) installed, I have had it for years and was working just fine, both with CPDN and other BOINC projects - no problem there. I now have AVG 7.0 free. When BOINC will let me get a CPDN WU (my pc is apparently overcommitted because of a LHC WU) I shall pay closer attention at what happens. <br>***********************************<br> Win2KPro, P4 1.8GHz, 512Mb RAM. Running Folding<br> WinXP Home, P4 3.2GHz HT, 512Mb RAM. Running SETI, CPDN, Predictor, LHC, Einstein, Orbit and Folding |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
I guess Norton AV is what is killing my OS-X models. I read the thread, and all it says is that there are problems, has anyone found a solution? Like how to stop the models from Client errors? I mean, I like CPDN, and would love to run models for you all on the PowerMac, but not if I have to be naked to the world... especially as the AV program has stopped some attacks. <p> <a href="http://boinc-doc.net/boinc-wiki/index.php"><b>BOINC-Wiki</b></a> <img src="http://www.boincstats.com/stats/banner.php?cpid=a6477942e70ed39f669d1ff2ede05be8"> |
Send message Joined: 31 Aug 04 Posts: 14 Credit: 113,008 RAC: 0 |
i find it hard to believe that AVG would be killing your WU processes. I would look at your onboard RAM. I had a bad stick and it would consistently fail on a WU. I suggest doing a memtest64 and seeing if its the RAM. |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
Um, well, it is a PowerMac so I don't think that will run. BUt I have run Apple's diagnostics (which have a memory test). Oh, well, not important. I can run the other three projects. <p> <a href="http://boinc-doc.net/boinc-wiki/index.php"><b>BOINC-Wiki</b></a> <img src="http://www.boincstats.com/stats/banner.php?cpid=a6477942e70ed39f669d1ff2ede05be8"> |
Send message Joined: 6 Aug 04 Posts: 2 Credit: 156,464 RAC: 0 |
Just curious as to what was figured out here? Someone enlighten me please. |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
Just curious as to what was figured out here? Someone enlighten me please. What I figured out is that trying to run CPDN on my PowerMac does not pay off. For whatever reason the chances of killing the model from one or two different bugs is high enough that I will never complete one. One bug is networking related where OS-X seems to get distracted and the \"hang\" caused by it waiting on the network can cause the model to hang on starts and restarts. In theory this bug has been fixed in the later versions of the BOINC Client, which I am running, but after killing 20 models or so I am not much in the mood to slaugher more. The other problem seems to come when rebooting. Not sure exactly what happens here, but, I have done a reboot and the next thing I know the model is toasted. Since I can do models on the 8 windows computers I have I guess I will live with that ... :) |
Send message Joined: 31 Aug 04 Posts: 14 Credit: 113,008 RAC: 0 |
perhaps killing the process will stop the rebooting problem but i am not a mac network wonk so...? The littlewhitedog your friends on the net and in space. Talk to the LWD at Littleblackdog |
Send message Joined: 27 Aug 05 Posts: 156 Credit: 112,423 RAC: 0 |
You are running 4.45 cc need to update and try these Prime95 Torture test Memtest86, to check for stability BOINC Wiki |
Send message Joined: 31 Aug 04 Posts: 239 Credit: 2,933,299 RAC: 0 |
Um, Memtest86 is for a PC ... not a G5 ... |
Send message Joined: 23 Nov 05 Posts: 18 Credit: 407,491 RAC: 0 |
My new WUs are also failing. I completed a slab model with this machine but the sulphur model continually crashes on a Client Error. I have rerun prime95 17hrs and memtest for 8 hrs with 8 passes with no errors E@H, Predictor, LHC run without a hitch. Any suggestions? Thanks DP |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
dp Error 161 is a red herring. The REAL error number doesn\'t appear, but you can see it near the end of yabsd.out, which is in the dataout folder of the model. It may give you/us a clue. |
Send message Joined: 23 Nov 05 Posts: 18 Credit: 407,491 RAC: 0 |
Thanks for getting back I found this at the end of a yabsd.out Is this what you need? DP LOOKUP TABLE 58368 64-bit words long ANCILLARY_STEPSim(s_im) 5 INITMOS : MOS_OUTPUT_LENGTH = 1129 im,sm,ngroup,new_im,new_sm 1 1 48 T F PPCTL: Opening preattached file on unit 60 PPCTL: Opening preattached file on unit 61 PPCTL: Opening preattached file on unit 62 PP_CTL: Error Buffering in Fixed length Header Empty PP File in Climate Mode? Error code = 0.00 Length requested = 0 Length actually transferred = 256 PPCTL: Opening preattached file on unit 63 PPCTL: Opening preattached file on unit 64 PPCTL: Opening preattached file on unit 65 PPCTL: Opening preattached file on unit 66 PPCTL: Opening preattached file on unit 67 PPCTL: Opening preattached file on unit 68 |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
It may indeed help the programmers when they get time. Ananas has reported this same thing from a team member. edit I\'ve emailed Carl. At least he will have the info. |
Send message Joined: 5 Aug 04 Posts: 66 Credit: 2,146,056 RAC: 0 |
One of my machines has crashed out three times over the last 10 days or so. In each case its -161. I won\'t have access to the machine again until the weekend, but I\'ll look in the yabsd file then. |
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
yikes, looks like it\'s either looking for a different named ancil file or the current one was erased or truncated. |
Send message Joined: 5 Aug 04 Posts: 173 Credit: 1,843,046 RAC: 0 |
This is due to a batch of bad WU\'s sent out previously. Its been resolved. any new WU\'s you get will be ok. |
Send message Joined: 20 Sep 04 Posts: 14 Credit: 30,765 RAC: 0 |
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=1108852 this workunit crash everyday :) this is a bad workunit or there is a bug on the application? usually my computer is turned on in the evening and I turn off it in the morning, before turning it off i made a backup of boinc folder. sometimes my brother in the afternoon play with some computer games and the evening i found the model crashed, unfortunately i don\'t find any yabd.out file beacuse it\'s all ready to send back the creshed result. I\'m glad that i have not an internet connection at home ;) My system is stable, i tried with prime95 and climateprediction for 18 Hours (50/50). and also with memtest86+. maybe there is a problem when the model start and don\'t reach the first checkpoint because there is an evil game that eat every cpu cycle ;) P.S. I can see this morning that my english is worse that ever ;) |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
You could try removing BOINC from the startup folder, so that, when your brother turns on the computer to play games, BOINC doesn\'t start. When you want to run BOINC, start it manually by clicking on the boincmgr icon in the BOINC folder. |
©2024 cpdn.org