climateprediction.net (CPDN) home page
Thread 'Am I a wu-killer?'

Thread 'Am I a wu-killer?'

Message boards : Number crunching : Am I a wu-killer?
Message board moderation

To post messages, you must log in.

AuthorMessage
B-Roy

Send message
Joined: 26 Aug 04
Posts: 11
Credit: 71,235
RAC: 55
Message 10431 - Posted: 5 Mar 2005, 15:31:34 UTC

I run several projects on my machine and I experience no problems exept for cpdn, where most of the models exit with client errors.
Is this behavior normal or should I detach?

See <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?userid=1587">here</a>



ID: 10431 · Report as offensive     Reply Quote
ProfileJohn Hunt
Avatar

Send message
Joined: 5 Mar 05
Posts: 64
Credit: 790,577
RAC: 0
Message 10432 - Posted: 5 Mar 2005, 15:50:39 UTC
Last modified: 5 Mar 2005, 15:53:26 UTC

Regrettably, I've just had the same problem.
I have been with Boinc/Seti for some time; yesterday I added Einstein@home and today I added CPDN.
On the first attempt at running CPDN, my PC didn't want to know - and neither Seti or Einstein would run either.
I've had to detach from the project until I can sort this out.
Seti and Einstein both working OK now that I've re-started PC......

(edit) should have mentioned - Windows XP Home + SP2 running Boinc 4.19.....(end edit)
ID: 10432 · Report as offensive     Reply Quote
Profileold_user5994

Send message
Joined: 31 Aug 04
Posts: 239
Credit: 2,933,299
RAC: 0
Message 10433 - Posted: 5 Mar 2005, 16:16:43 UTC

It happens. Of my 6 machines I had one that would only get part way throught the models before ending. However, I have noticed that there has been a significant version number change from 4.04 to now 4.10 ...

Though I have been keeping the one "bad" machine off this project, I have a new MB on the way and I hope to improve reliability and speed. Check to see if you are having other runtime errors by looking in you event logs.
ID: 10433 · Report as offensive     Reply Quote
crandles
Volunteer moderator

Send message
Joined: 16 Oct 04
Posts: 692
Credit: 277,679
RAC: 0
Message 10435 - Posted: 5 Mar 2005, 17:04:36 UTC
Last modified: 5 Mar 2005, 17:05:14 UTC

If your computers are hidden it may be worth posting what exit status and/or error messages you have in logs (such as stderr_um.txt).

CPDN does stress computers more than most programs you are likely to run.

See <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=2026">this thread</a> for some suggested tests.

&gt;from 4.04 to 4.10
I think 4.05 to 4.09 were all sulphur alpha tests. Some improvements were relevant to the current slab model so these have been brought in as 4.10 for alpha and slab model.

Visit BOINC WIKI for help

And join BOINC Synergy for all the news in one place.
ID: 10435 · Report as offensive     Reply Quote
ProfileAndrew Hingston
Volunteer moderator

Send message
Joined: 17 Aug 04
Posts: 753
Credit: 9,804,700
RAC: 0
Message 10436 - Posted: 5 Mar 2005, 17:14:17 UTC

Tolu suggested that the new version should fix one of the problems that people had been experiencing - see <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=2086">this thread</a>. I agree that this is unlikely to account for the problems reported here, though, and they probably each have different causes.
ID: 10436 · Report as offensive     Reply Quote
B-Roy

Send message
Joined: 26 Aug 04
Posts: 11
Credit: 71,235
RAC: 55
Message 10445 - Posted: 5 Mar 2005, 22:11:37 UTC - in response to Message 10435.  
Last modified: 5 Mar 2005, 22:12:02 UTC

&gt; If your computers are hidden it may be worth posting what exit status and/or
&gt; error messages you have in logs (such as stderr_um.txt).
&gt;

Exept the 1 wu that I finished successfully, I got these stderr messages:


4.24
- exit code -5 (0xfffffffb)


4.19
- exit code -5 (0xfffffffb)

1
0

4.09
app_version download error: couldn't get input files:
hadsm3se_4.03_windows_intelx86.zip: MD5 check failed

0
0

4.09
app_version download error: couldn't get input files:
hadsm3data_4.03_windows_intelx86.zip: MD5 check failed
hadsm3um_4.03_windows_intelx86.zip: MD5 check failed

0
0

4.05
- exit code -5 (0xfffffffb)

1
0
ID: 10445 · Report as offensive     Reply Quote
old_user23880
Volunteer tester

Send message
Joined: 10 Oct 04
Posts: 223
Credit: 4,664
RAC: 0
Message 10449 - Posted: 6 Mar 2005, 0:21:16 UTC

Code -5 is a calculation error.

A few of us with machines that seem to be boinc-unfriendly have reverted to classic cpdn.

But first do the tests suggested by Crandles, and make sure that whenever you turn the machine off, you suspend before exiting and give the model plenty of time to respond to this, because incorrect exiting can cause the model to crash.

Classic cpdn is INSTEAD of boinc, not in addition. It generally runs very stably and tolerates mistakes (and abuses) on the part of the user. The model runs at a similar speed to boinc cpdn, but the credit you gain goes to classic (which is largely a defunct system) and not to boinc stats. You can still post on the boinc forums. So classic is for people who really want to do only cpdn, can't run boinc cpdn models, and aren't worried about stats. There are good visualisation add-ons to view classic models, and you get graphs of the work you've completed.

How to transfer:
Uninstall boinc.
Go to cpdn home page.
Click on Open University course.
To download the model Click here.


__________________________________________________

ID: 10449 · Report as offensive     Reply Quote
ProfileJohn Hunt
Avatar

Send message
Joined: 5 Mar 05
Posts: 64
Credit: 790,577
RAC: 0
Message 10766 - Posted: 12 Mar 2005, 13:47:35 UTC

Hi guys!
I've re-attached to the project now and things are going OK at the moment.
Work unit no. 411250 was downloaded OK and has been running on my machine for the last 20 mins. or so with no problems!

Please, everyone, accept my apologies for the false start on this project a week ago - I'm afraid I have "killed" two work units - 400319 &amp; 400591.
If any admin see this, is it possible to re-issue them and clear them off my account?
ID: 10766 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 10768 - Posted: 12 Mar 2005, 14:04:09 UTC - in response to Message 10766.  

&gt; Please, everyone, accept my apologies for the false start on this project a
&gt; week ago - I'm afraid I have "killed" two work units - 400319 &amp; 400591.
&gt; If any admin see this, is it possible to re-issue them and clear them off my
&gt; account?
&gt;
Don't worry about it, it happens. But, it's unlikely that anyone will clear them off your account. There are so many failed models that the admin (1) wouldn't have time to clear them off. But the work units you "killed" will likely be given to someone else reasonably soon. I ran one in February that someone failed on in January, and I'm running another one now that someone killed in February.
ID: 10768 · Report as offensive     Reply Quote
ProfileJohn Hunt
Avatar

Send message
Joined: 5 Mar 05
Posts: 64
Credit: 790,577
RAC: 0
Message 10769 - Posted: 12 Mar 2005, 14:21:08 UTC

Thanks for those words! I hate the idea of 'killing' WUs.....
and I suppose they will vanish from my account when the due date is up....


ID: 10769 · Report as offensive     Reply Quote
old_user23880
Volunteer tester

Send message
Joined: 10 Oct 04
Posts: 223
Credit: 4,664
RAC: 0
Message 10779 - Posted: 12 Mar 2005, 22:46:15 UTC

Don't know whether they will stay on your account or not, but sooner or later they will be automatically reissued.
__________________________________________________

ID: 10779 · Report as offensive     Reply Quote

Message boards : Number crunching : Am I a wu-killer?

©2024 cpdn.org