Message boards : Number crunching : Computational Error
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Aug 04 Posts: 19 Credit: 16,547 RAC: 0 |
A little help please, CPDN has been happily crunching over the last few weeks and had clcoked up 150hrs the other day for no apparant reason it failed. any idea why???? 16/10/2005 06:02:11|climateprediction.net|Restarting result 35ci_000168378_0 using hadsm3 version 4.13 16/10/2005 06:02:11|SETI@home|Pausing result 13oc03aa.3322.15313.1003390.138_0 (removed from memory) 16/10/2005 06:02:12||request_reschedule_cpus: process exited 16/10/2005 06:03:05|climateprediction.net|Unrecoverable error for result 35ci_000168378_0 ( - exit code -5 (0xfffffffb)) 16/10/2005 06:03:05||request_reschedule_cpus: process exited 16/10/2005 06:03:05|climateprediction.net|Computation for result 35ci_000168378_0 finished 16/10/2005 06:03:05|SETI@home|Restarting result 13oc03aa.3322.15313.1003390.138_0 using setiathome version 4.18 16/10/2005 08:22:26||request_reschedule_cpus: process exited Click the Sig Join UBT |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
There is a bug in BOINC 4.45, described in <a href=\"http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=2855\"> this</a> thread, which is linked to <a href=\"http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=2921\"> this</a> thread. This contains a link to an unofficial fix which appears to work, if you want to try it. The problem is apparently fixed in the 5.x version, due for release Real Soon Now. As for error 5, this is a general purpose label for several problems. It may be that your computer isn\'t stable enough for cpdn. There are some ideas for a fix <a href=\"http://www.climateprediction.net/board/viewtopic.php?t=2126\"> here,</a> and <a href=\"http://www.climateprediction.net/board/viewtopic.php?t=2124\"> here.</a> It\'s also possible that your power supply is not up to it. |
Send message Joined: 2 Sep 04 Posts: 44 Credit: 372,682 RAC: 0 |
Regarding preferences: On your preferences settings, set \"Leave applications in memory while preempted?\" to yes. This will prevent the model from being unloaded each time BOINC switches between projects, and you therefore have less chance of a -5 error when the model \"restarts\" because it will resume instead of restart. Regarding BOINC clients: You have two choices for clients. v5.2.x of BOINC will not unload the model from memory when benchmarking (unless \"Leave applications in memory while preempted?\" is no), so using this client is peferable to avoid the problem described in your initial post. v4.45 of BOINC will unload the model when benchmarking, but only waits 10 secs for the model to terminate. Should the model take more than 10 secs to terminate, BOINC will abort the benchmarks and you\'re likely to have your system running idle until you notice it. The \"unofficial\" v4.45b that Les has indicated waits 30 secs (avoiding the idle state), but the model is still unloaded and you still run the risk of it dying on restart. If you\'re determined to stay with a v4 client, then this is the suggested one for the reasons I have explained above. |
Send message Joined: 6 Jul 06 Posts: 147 Credit: 3,615,496 RAC: 420 |
> Hello to the Climate team. I have resurected this thread as it has the correct title. I noticed today that one of my CP WU\'s had disappeared from my computer. Being a bit naive I thought \"you beauty, I have completed my first WU\". On checking the result I was informed that the WU had failed with a \"computational error\". My next words uttered was \'bullshit, I don\'t believe it, what happened there?\'. It would appear that after 6,434,209.302419 seconds (1,787.28 Hours) that the workunit decided to die on it\'s sword for no real reason that I could see. http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=5217487 got the error \"exit code 1\" and the following pp2netcdf crashed: Error in getting file type Error in converting file dataout/2jkgfo.pjo2c10 to netcdf format. pp2netcdf crashed: Error in getting file type Error in converting file dataout/2jkgfo.pio2c10 to netcdf format. pp2netcdf crashed: Error in getting file type Error in converting file dataout/2jkgfo.pfo2c10 to netcdf format. pp2netcdf crashed: Error in getting file type Error in converting file dataout/2jkgfa.pho2c10 to netcdf format. pp2netcdf crashed: Error in getting file type Error in converting file dataout/2jkgfa.pgo2c10 to netcdf format. pp2netcdf crashed: Error in getting file type Error in converting file dataout/2jkgfa.peo2c10 to netcdf format. pp2netcdf crashed: Error in getting file type Error in converting file dataout/2jkgfa.pdo2c10 to netcdf format. </stderr_txt> Validate state OK Claimed credit 32,549.14 Granted credit 31,622.40 application version 5.08 Has all my time been worth it? Or wasted? Is the WU now of any use or has all the trickles I sent in given the data that the scientists needed? |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Useful information is returned at the end of each Model Year, on 04Dec. A lot more information is returned every 10 Model Years and a full Restart Dump every 40 Model Years. Your effort isn\'t wasted. The Run could be restarted from a backup if you have one. Les\' comments for Exit Code -1 and -107... here: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=4710#23372 "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 6 Jul 06 Posts: 147 Credit: 3,615,496 RAC: 420 |
Thanks astroWX, I might try from a backup, which is 1 or 2 weeks old, so have not lost much. The \'exit code 1\' that I am getting must be from a different souce as I am not running Windows but Linux and that thread from Les is about Windows machines. Do I just locate the old file in the backup project folder and copy that back into the current working project folder? Will Boinc detect this? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Not the old FILE. The entire BOINC FOLDER, along with all of the sub-folders. The data needed to restart is scattered over several folders, starting in the main BOINC folder, and extending down to a sub-folder of the models folder. The error code 1 may well be because of shutting down your computer without first exiting from BOINC, whatever operating system you use. Or because of an older version of the graphics software. |
Send message Joined: 6 Jul 06 Posts: 147 Credit: 3,615,496 RAC: 420 |
Not the old FILE. Thanks Les, it looks like I have lost that WU then, as I have only been backing up the climateprediction.net subfolder under the project subfolder in Boinc folder. This would explain why removing the climate subfolder and then replacing with my backup did not change anything and the client kept on doing what it was doing before. I have not been backing up the whole Boinc folder, and as I run 6 other projects on the same computer I believe any restart from a backed up folder will create errors and problems with the other projects, so I will just forget about that WU. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
It is possible to restore a backup made while running multiple projects. There is a section in the BOINC Wiki explaining it, but this site is unreachable at the moment. When it\'s up, search for: Backup_BOINC |
©2024 cpdn.org