climateprediction.net (CPDN) home page
Thread 'Can someone explain these Errors ?'

Thread 'Can someone explain these Errors ?'

Message boards : Number crunching : Can someone explain these Errors ?
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user3434
Avatar

Send message
Joined: 30 Aug 04
Posts: 77
Credit: 1,785,934
RAC: 0
Message 5838 - Posted: 1 Nov 2004, 21:16:12 UTC
Last modified: 1 Nov 2004, 21:17:28 UTC

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=367057

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=271864

Both Models were running on different Systems, and Quit/Failed within the last 2 days, each with a Client Error according to my Stats page ("Computing Error")

Now, are these Models that (naturally) ran out of bounds by their Parameters used (e.g. that "Short Run" I've heard from in various places), or do I need to check whether my Systems are still working correctly ???
___________________________________________
<p>Scientific Network : <a href="http://www.falconfly.de/network.htm">36200 MHz «» 8204 MB «» 815.0 GB</a> </p>
ID: 5838 · Report as offensive     Reply Quote
ProfileThyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 5861 - Posted: 2 Nov 2004, 7:55:06 UTC

It's not possible to tell from the result pages what caused the models to error, FalconFly. All they tell you is what exit status they returned to BOINC (251 and -5 respectively). Not sure what 251 is, but -5 is a catch all for computation errors. The log files in your climateprediction.net/{result id} directory might give a better indication of what caused the errors.
<br><a href="http://www.teampicard.net"><img src="http://www.teampicard.net/templates/fisubice/images/phpbb2_logo.jpg"></a><a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=3">Join us here</a>
ID: 5861 · Report as offensive     Reply Quote
old_user3434
Avatar

Send message
Joined: 30 Aug 04
Posts: 77
Credit: 1,785,934
RAC: 0
Message 5864 - Posted: 2 Nov 2004, 9:21:11 UTC - in response to Message 5861.  

Hm, the one with the -5 was a Win9x Host, from which I've detached CPDN for now...

I'll keep an eye on it on the other Linux machine that is now running another Model.
___________________________________________
<p>Scientific Network : <a href="http://www.falconfly.de/network.htm">36200 MHz «» 8204 MB «» 815.0 GB</a> </p>
ID: 5864 · Report as offensive     Reply Quote
Jord
Avatar

Send message
Joined: 5 Aug 04
Posts: 250
Credit: 93,274
RAC: 0
Message 5879 - Posted: 3 Nov 2004, 4:57:08 UTC

The crash of my http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=282649 was spectacular, to say the least. I was away from my computer during that time, but it had crashed BOINC 4.13, yet the hadsm3 version 4.04 kept on running in the background (I found this out 26 hours later ;)).

Dr Watson was sitting on my screen when I returned. Even that had managed to crash. Nothing a full reboot couldn't fix, but still.

The crash must've come instant, as the stderr.txt for the file shows an abrupt ending of its text:
CLOSE: WARNING: Unit 60 Not Opened
OPEN: File dataout/337raa.pa15c10 Created on Unit 60
CLOSE: WARNING: Unit 62 Not Opened
OPEN: File dataout/337raa.pc15c10 Created on Unit 62
CLOSE: WARNING: Unit 63 Not Opened
OPEN: File dataout/337raa.pd15c10 Created on Unit 63
CLOSE: WARNING: Unit 64 Not Opened
OPEN: File dataout/337raa.pe15c10 Created on Unit 64
CLOSE: WARNING: Unit 65 Not Opened
OPEN: File dataout/337raa.pf15c10 Created on Unit 65
CLOSE: WARNING: Unit 66 Not Opened
OPEN: File dataout/337raa.pg15c10 Created on Unit 66
CLOSE: WARNING: Unit 67 Not Opened
OPEN: File dataout/337raa.ph15c10 Created on Unit 67
OPEN: File dataout/337raa.da14c40 Created on Unit 22
OPEN: File dataout/337raa.da14c70 Created on Unit 22
OPEN: File dataout/337raa.da14ca0 Created on Unit 22
OPEN: File dataout/337raa.da14cd0 Created on Unit 22
OPEN: File dataout/337raa.da14cg0 Created on Unit 22
OPEN: File datao

I noticed I am not the only one whom that unit crashed on, so maybe it's wise to have people look into it, especially for that person who is still crunching it. I'd hate for him to have it crash in trickle 52. ;)

--------------------------------
Jordâ„¢

<img src="http://boinc.mundayweb.com/cpdn/stats.php?userID=2&amp;trans=off">
ID: 5879 · Report as offensive     Reply Quote
ProfileThyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 5882 - Posted: 3 Nov 2004, 7:38:22 UTC - in response to Message 5879.  

&gt; The crash of my
&gt; http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=282649 was
&gt; spectacular, to say the least. I was away from my computer during that time,
&gt; but it had crashed BOINC 4.13, yet the hadsm3 version 4.04 kept on running in
&gt; the background (I found this out 26 hours later ;)).

That sounds like what'll happen if BOINC crashes, Jord. The CPDN programs will continue to run and there's no way you can shut them down cleanly. That can only be done via BOINC, but it doesn't reconnect to orphaned projects if you start it back up (you'll probably get a -144 error and an exit status of -185). You've just got to hope that you get lucky in closing down the hadsm3 programs by not hitting the middle of a critical file write :(

&gt; The crash must've come instant, as the stderr.txt for the file shows an abrupt
&gt; ending of its text:
&gt; OPEN: File dataout/337raa.da14cg0 Created on Unit 22
&gt; OPEN: File datao

It's normal for stderr.txt to end with an incomplete line on running models. My guess is that the output isn't flushed when it's written, so you only see up to the last flush by the OS.
<br><a href="http://www.teampicard.net"><img src="http://www.teampicard.net/templates/fisubice/images/phpbb2_logo.jpg"></a><a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=3">Join us here</a>
ID: 5882 · Report as offensive     Reply Quote

Message boards : Number crunching : Can someone explain these Errors ?

©2024 cpdn.org