climateprediction.net (CPDN) home page
Thread 'WUs constantly failing'

Thread 'WUs constantly failing'

Message boards : Number crunching : WUs constantly failing
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
old_user1098
Avatar

Send message
Joined: 25 Aug 04
Posts: 41
Credit: 34,555
RAC: 0
Message 15697 - Posted: 5 Sep 2005, 10:11:04 UTC
Last modified: 5 Sep 2005, 10:28:49 UTC

I've had a batch of WUs this summer that have all failed one after another without reaaching 1%. Computer is a laptop, is sitting in the middle of a table, uncluttered, with an open window with a breeze coming in most of the time, it's not overheating and without any other problems. It's a P4 @3.2 GHz, 512Mb RAM. I thought it was a problem with 4.13, so I was very happy when I got a Sulphur Cycle WU. It has failed as well, at about 1.40% crunched.
What's up? and I am not the only one on my team having problems. Here are a couple of threads:

http://wwseti.net/forum/viewtopic.php?t=3877
http://wwseti.net/forum/viewtopic.php?t=4032
http://wwseti.net/forum/viewtopic.php?t=4097
http://wwseti.net/forum/viewtopic.php?t=4128

Any suggestions..... Can anyone explain what's going on here?
Here are some of my results
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?hostid=117229
<br>***********************************<br>
Win2KPro, P4 1.8GHz, 512Mb RAM. Running Folding<br>
WinXP Home, P4 3.2GHz HT, 512Mb RAM. Running SETI, CPDN, Predictor, LHC, Einstein, Orbit and Folding
ID: 15697 · Report as offensive     Reply Quote
crandles
Volunteer moderator

Send message
Joined: 16 Oct 04
Posts: 692
Credit: 277,679
RAC: 0
Message 15700 - Posted: 5 Sep 2005, 12:58:19 UTC

What anti virus software/settings have you got? Have you read http://www.climateprediction.net/board/viewtopic.php?t=2895&amp;start=0 ?
_______________________________
Visit <a href="http://boinc-doc.net/boinc-wiki/index.php?title=Climateprediction_FAQ">BOINC WIKI</a> for help

And join <a href="http://www.boincsynergy.com/">BOINC Synergy</a> for all the news in one place.
ID: 15700 · Report as offensive     Reply Quote
old_user1098
Avatar

Send message
Joined: 25 Aug 04
Posts: 41
Credit: 34,555
RAC: 0
Message 15701 - Posted: 5 Sep 2005, 13:24:49 UTC

Hmmm.... thank you, crandles! I am currently Using AVG 7.0 free, I have used avast! and Norton Internet Security over the last few months and had models crash with each of these programmes. Currently I have Zone Alarm (free version) installed, I have had it for years and was working just fine, both with CPDN and other BOINC projects - no problem there.
I now have AVG 7.0 free. When BOINC will let me get a CPDN WU (my pc is apparently overcommitted because of a LHC WU) I shall pay closer attention at what happens.
<br>***********************************<br>
Win2KPro, P4 1.8GHz, 512Mb RAM. Running Folding<br>
WinXP Home, P4 3.2GHz HT, 512Mb RAM. Running SETI, CPDN, Predictor, LHC, Einstein, Orbit and Folding
ID: 15701 · Report as offensive     Reply Quote
Profileold_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 15706 - Posted: 5 Sep 2005, 15:13:58 UTC

I guess Norton AV is what is killing my OS-X models. I read the thread, and all it says is that there are problems, has anyone found a solution? Like how to stop the models from Client errors?

I mean, I like CPDN, and would love to run models for you all on the PowerMac, but not if I have to be naked to the world... especially as the AV program has stopped some attacks.
<p>
<a href="http://boinc-doc.net/boinc-wiki/index.php"><b>BOINC-Wiki</b></a>
<img src="http://www.boincstats.com/stats/banner.php?cpid=a6477942e70ed39f669d1ff2ede05be8">
ID: 15706 · Report as offensive     Reply Quote
old_user5738

Send message
Joined: 31 Aug 04
Posts: 14
Credit: 113,008
RAC: 0
Message 15709 - Posted: 5 Sep 2005, 16:45:26 UTC

i find it hard to believe that AVG would be killing your WU processes. I would look at your onboard RAM. I had a bad stick and it would consistently fail on a WU. I suggest doing a memtest64 and seeing if its the RAM.
ID: 15709 · Report as offensive     Reply Quote
Profileold_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 15731 - Posted: 6 Sep 2005, 14:25:32 UTC

Um, well, it is a PowerMac so I don't think that will run. BUt I have run Apple's diagnostics (which have a memory test). Oh, well, not important. I can run the other three projects.
<p>
<a href="http://boinc-doc.net/boinc-wiki/index.php"><b>BOINC-Wiki</b></a>
<img src="http://www.boincstats.com/stats/banner.php?cpid=a6477942e70ed39f669d1ff2ede05be8">
ID: 15731 · Report as offensive     Reply Quote
Profileold_user346

Send message
Joined: 6 Aug 04
Posts: 2
Credit: 156,464
RAC: 0
Message 18999 - Posted: 4 Jan 2006, 17:51:06 UTC

Just curious as to what was figured out here? Someone enlighten me please.
ID: 18999 · Report as offensive     Reply Quote
Profileold_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 19001 - Posted: 4 Jan 2006, 18:05:31 UTC - in response to Message 18999.  

Just curious as to what was figured out here? Someone enlighten me please.

What I figured out is that trying to run CPDN on my PowerMac does not pay off. For whatever reason the chances of killing the model from one or two different bugs is high enough that I will never complete one.

One bug is networking related where OS-X seems to get distracted and the \"hang\" caused by it waiting on the network can cause the model to hang on starts and restarts. In theory this bug has been fixed in the later versions of the BOINC Client, which I am running, but after killing 20 models or so I am not much in the mood to slaugher more.

The other problem seems to come when rebooting. Not sure exactly what happens here, but, I have done a reboot and the next thing I know the model is toasted.

Since I can do models on the 8 windows computers I have I guess I will live with that ... :)
ID: 19001 · Report as offensive     Reply Quote
old_user5738

Send message
Joined: 31 Aug 04
Posts: 14
Credit: 113,008
RAC: 0
Message 19533 - Posted: 22 Jan 2006, 15:52:44 UTC

perhaps killing the process will stop the rebooting problem but i am not a mac network wonk so...?

The littlewhitedog your friends on the net and in space.
Talk to the LWD at Littleblackdog
ID: 19533 · Report as offensive     Reply Quote
old_user94880

Send message
Joined: 27 Aug 05
Posts: 156
Credit: 112,423
RAC: 0
Message 19534 - Posted: 22 Jan 2006, 16:10:23 UTC

You are running 4.45 cc need to update and try these Prime95 Torture test

Memtest86, to check for stability
BOINC Wiki
ID: 19534 · Report as offensive     Reply Quote
Profileold_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 19561 - Posted: 23 Jan 2006, 2:50:23 UTC

Um, Memtest86 is for a PC ... not a G5 ...
ID: 19561 · Report as offensive     Reply Quote
old_user113466

Send message
Joined: 23 Nov 05
Posts: 18
Credit: 407,491
RAC: 0
Message 19613 - Posted: 25 Jan 2006, 3:02:18 UTC

My new WUs are also failing.
I completed a slab model with this machine but the sulphur model continually crashes on a Client Error.

I have rerun prime95 17hrs and memtest for 8 hrs with 8 passes with no errors E@H, Predictor, LHC run without a hitch.

Any suggestions?

Thanks
DP
ID: 19613 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 19621 - Posted: 25 Jan 2006, 4:49:57 UTC

dp
Error 161 is a red herring. The REAL error number doesn\'t appear, but you can see it near the end of yabsd.out, which is in the dataout folder of the model.

It may give you/us a clue.

ID: 19621 · Report as offensive     Reply Quote
old_user113466

Send message
Joined: 23 Nov 05
Posts: 18
Credit: 407,491
RAC: 0
Message 19632 - Posted: 25 Jan 2006, 13:19:28 UTC - in response to Message 19621.  

Thanks for getting back

I found this at the end of a yabsd.out
Is this what you need?
DP


LOOKUP TABLE
58368 64-bit words long
ANCILLARY_STEPSim(s_im) 5
INITMOS : MOS_OUTPUT_LENGTH = 1129
im,sm,ngroup,new_im,new_sm 1 1 48 T F
PPCTL: Opening preattached file on unit 60
PPCTL: Opening preattached file on unit 61
PPCTL: Opening preattached file on unit 62

PP_CTL: Error Buffering in Fixed length Header
Empty PP File in Climate Mode?

Error code = 0.00
Length requested = 0
Length actually transferred = 256
PPCTL: Opening preattached file on unit 63
PPCTL: Opening preattached file on unit 64
PPCTL: Opening preattached file on unit 65
PPCTL: Opening preattached file on unit 66
PPCTL: Opening preattached file on unit 67
PPCTL: Opening preattached file on unit 68



ID: 19632 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 19647 - Posted: 25 Jan 2006, 16:28:43 UTC
Last modified: 25 Jan 2006, 17:05:38 UTC

It may indeed help the programmers when they get time.

Ananas has reported this same thing from a team member.

edit
I\'ve emailed Carl. At least he will have the info.

ID: 19647 · Report as offensive     Reply Quote
KeeperC

Send message
Joined: 5 Aug 04
Posts: 66
Credit: 2,146,056
RAC: 0
Message 19650 - Posted: 25 Jan 2006, 16:43:05 UTC - in response to Message 19647.  


One of my machines has crashed out three times over the last 10 days or so. In each case its -161. I won\'t have access to the machine again until the weekend, but I\'ll look in the yabsd file then.
ID: 19650 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 19655 - Posted: 25 Jan 2006, 19:58:45 UTC

yikes, looks like it\'s either looking for a different named ancil file or the current one was erased or truncated.
ID: 19655 · Report as offensive     Reply Quote
old_user3

Send message
Joined: 5 Aug 04
Posts: 173
Credit: 1,843,046
RAC: 0
Message 19673 - Posted: 26 Jan 2006, 16:18:08 UTC - in response to Message 19650.  


One of my machines has crashed out three times over the last 10 days or so. In each case its -161. I won\'t have access to the machine again until the weekend, but I\'ll look in the yabsd file then.

This is due to a batch of bad WU\'s sent out previously.
Its been resolved. any new WU\'s you get will be ok.
ID: 19673 · Report as offensive     Reply Quote
old_user19523

Send message
Joined: 20 Sep 04
Posts: 14
Credit: 30,765
RAC: 0
Message 19682 - Posted: 27 Jan 2006, 9:12:13 UTC
Last modified: 27 Jan 2006, 9:12:59 UTC


http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=1108852

this workunit crash everyday :)

this is a bad workunit or there is a bug on the application?

usually my computer is turned on in the evening and I turn off it in the morning, before turning it off i made a backup of boinc folder. sometimes my brother in the
afternoon play with some computer games and the evening i found the model crashed, unfortunately i don\'t find any yabd.out file beacuse it\'s all ready to send back the creshed result.

I\'m glad that i have not an internet connection at home ;)

My system is stable, i tried with prime95 and climateprediction for 18 Hours (50/50). and also with memtest86+.

maybe there is a problem when the model start and don\'t reach the first checkpoint because there is an evil game that eat every cpu cycle ;)

P.S.
I can see this morning that my english is worse that ever ;)
ID: 19682 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 19683 - Posted: 27 Jan 2006, 9:46:34 UTC

You could try removing BOINC from the startup folder, so that, when your brother turns on the computer to play games, BOINC doesn\'t start.
When you want to run BOINC, start it manually by clicking on the boincmgr icon in the BOINC folder.

ID: 19683 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : WUs constantly failing

©2024 cpdn.org