Message boards : Number crunching : Can someone explain these Errors ?
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Aug 04 Posts: 77 Credit: 1,785,934 RAC: 0 |
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=367057 http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=271864 Both Models were running on different Systems, and Quit/Failed within the last 2 days, each with a Client Error according to my Stats page ("Computing Error") Now, are these Models that (naturally) ran out of bounds by their Parameters used (e.g. that "Short Run" I've heard from in various places), or do I need to check whether my Systems are still working correctly ??? ___________________________________________ <p>Scientific Network : <a href="http://www.falconfly.de/network.htm">36200 MHz «» 8204 MB «» 815.0 GB</a> </p> |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
It's not possible to tell from the result pages what caused the models to error, FalconFly. All they tell you is what exit status they returned to BOINC (251 and -5 respectively). Not sure what 251 is, but -5 is a catch all for computation errors. The log files in your climateprediction.net/{result id} directory might give a better indication of what caused the errors. <br><a href="http://www.teampicard.net"><img src="http://www.teampicard.net/templates/fisubice/images/phpbb2_logo.jpg"></a><a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=3">Join us here</a> |
Send message Joined: 30 Aug 04 Posts: 77 Credit: 1,785,934 RAC: 0 |
Hm, the one with the -5 was a Win9x Host, from which I've detached CPDN for now... I'll keep an eye on it on the other Linux machine that is now running another Model. ___________________________________________ <p>Scientific Network : <a href="http://www.falconfly.de/network.htm">36200 MHz «» 8204 MB «» 815.0 GB</a> </p> |
Send message Joined: 5 Aug 04 Posts: 250 Credit: 93,274 RAC: 0 |
The crash of my http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=282649 was spectacular, to say the least. I was away from my computer during that time, but it had crashed BOINC 4.13, yet the hadsm3 version 4.04 kept on running in the background (I found this out 26 hours later ;)). Dr Watson was sitting on my screen when I returned. Even that had managed to crash. Nothing a full reboot couldn't fix, but still. The crash must've come instant, as the stderr.txt for the file shows an abrupt ending of its text: CLOSE: WARNING: Unit 60 Not Opened OPEN: File dataout/337raa.pa15c10 Created on Unit 60 CLOSE: WARNING: Unit 62 Not Opened OPEN: File dataout/337raa.pc15c10 Created on Unit 62 CLOSE: WARNING: Unit 63 Not Opened OPEN: File dataout/337raa.pd15c10 Created on Unit 63 CLOSE: WARNING: Unit 64 Not Opened OPEN: File dataout/337raa.pe15c10 Created on Unit 64 CLOSE: WARNING: Unit 65 Not Opened OPEN: File dataout/337raa.pf15c10 Created on Unit 65 CLOSE: WARNING: Unit 66 Not Opened OPEN: File dataout/337raa.pg15c10 Created on Unit 66 CLOSE: WARNING: Unit 67 Not Opened OPEN: File dataout/337raa.ph15c10 Created on Unit 67 OPEN: File dataout/337raa.da14c40 Created on Unit 22 OPEN: File dataout/337raa.da14c70 Created on Unit 22 OPEN: File dataout/337raa.da14ca0 Created on Unit 22 OPEN: File dataout/337raa.da14cd0 Created on Unit 22 OPEN: File dataout/337raa.da14cg0 Created on Unit 22 OPEN: File datao I noticed I am not the only one whom that unit crashed on, so maybe it's wise to have people look into it, especially for that person who is still crunching it. I'd hate for him to have it crash in trickle 52. ;) -------------------------------- Jordâ„¢ <img src="http://boinc.mundayweb.com/cpdn/stats.php?userID=2&trans=off"> |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
> The crash of my > http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=282649 was > spectacular, to say the least. I was away from my computer during that time, > but it had crashed BOINC 4.13, yet the hadsm3 version 4.04 kept on running in > the background (I found this out 26 hours later ;)). That sounds like what'll happen if BOINC crashes, Jord. The CPDN programs will continue to run and there's no way you can shut them down cleanly. That can only be done via BOINC, but it doesn't reconnect to orphaned projects if you start it back up (you'll probably get a -144 error and an exit status of -185). You've just got to hope that you get lucky in closing down the hadsm3 programs by not hitting the middle of a critical file write :( > The crash must've come instant, as the stderr.txt for the file shows an abrupt > ending of its text: > OPEN: File dataout/337raa.da14cg0 Created on Unit 22 > OPEN: File datao It's normal for stderr.txt to end with an incomplete line on running models. My guess is that the output isn't flushed when it's written, so you only see up to the last flush by the OS. <br><a href="http://www.teampicard.net"><img src="http://www.teampicard.net/templates/fisubice/images/phpbb2_logo.jpg"></a><a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=3">Join us here</a> |
©2024 cpdn.org