climateprediction.net (CPDN) home page
Thread 'Unrecoverable error after 4100 hours :('

Thread 'Unrecoverable error after 4100 hours :('

Message boards : Number crunching : Unrecoverable error after 4100 hours :(
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
ProfileAnanas
Volunteer moderator

Send message
Joined: 31 Oct 04
Posts: 336
Credit: 3,316,482
RAC: 0
Message 28547 - Posted: 9 May 2007, 4:50:40 UTC

His first models have been victims of AVIRA AntiVir,
that\'s correct, but we solved that, BOINC is now
excluded from the scans and result 6513196 is the
first one that had a chance without scans.


I don\'t think that he aborted it, I rather think that
it is a side effect of shutting down without stopping
BOINC first.

The combination of the two messages is what makes me
think that it has not been a user triggered abort :

> CPDN Monitor - Quit request from BOINC...
> Suspended CPDN Monitor - Abort request from BOINC...

Too bad that we cannot examine the BOINC client log :-/


I told him to stop BOINC before shutting down for his
next attempt (if there will be one, currently he is
mad at CPDN) but I still think it would be better to
find a solution like ignoring an abort request that
is triggered during the Windows shutdown phase.
ID: 28547 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 28550 - Posted: 9 May 2007, 11:29:42 UTC


Most likely this will have been a user-initiated abort. But it\'s possible *he* wasn\'t the user who aborted it (perhaps someone else using the PC didn\'t realise the significance of \'abort\' versus \'exit\').

Shutting down windows without exiting Boinc first sometimes causes an exit code 1 or 0, but I\'ve never seen it doing an abort.

Is he using BAM or some other account manager? That can have that effect (detatches from projects unexpectedly).

I notice that CPDN is not the only one with aborted WUs.

http://einstein.phys.uwm.edu/result.php?resultid=83827330


I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 28550 · Report as offensive     Reply Quote
old_user220740

Send message
Joined: 19 Jan 07
Posts: 3
Credit: 43,818
RAC: 0
Message 29392 - Posted: 1 Jul 2007, 8:15:27 UTC

Mine has bombed out with an unrecoverable error at 86%,2800 hours sometime in the night with no user activity. Really disappointing.

The box is a lot cooler now. With thousands of boinc machines running hot and high wattage CPU activity, I really hope the contribution to CO2 is not so high that the benefit of this work is negated. That\'s the end of this stuff for me. Thanks and goodbye.
ID: 29392 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 29397 - Posted: 1 Jul 2007, 12:41:40 UTC


That model ended with:
Model crashed: umshell1.f: ATM_DYN : NEGATIVE THETA DETECTED.


Which means that part of the model developed a negative value, such as in atmospheric pressure.
That\'s part of what the research is about; finding out just HOW far each set of starting parameters will take it.
So now the researchers have another clue about making the modeling program more reliable a long way from the start.
Which is what the research is about.

ID: 29397 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Unrecoverable error after 4100 hours :(

©2024 cpdn.org