Message boards : Number crunching : models always for 10 years crash
Message board moderation
Author | Message |
---|---|
Send message Joined: 17 Sep 04 Posts: 9 Credit: 19,604,231 RAC: 296 |
Help *I am still receiving these errors, and have been for 10 years. can anyone tell me what to do? Show: All | In progress | Completed | Valid | Invalid | Error Task ID click for details Show names Work unit ID click for details Sent Time reported or deadline explain Status Run time (sec) CPU time (sec) Claimed credit Granted credit Application 16216713 8542749 13 Jan 2014 22:36:56 UTC 15 Apr 2014 6:04:07 UTC In progress --- --- --- --- UK Met Office Coupled Model Full Resolution Ocean v6.07 16216632 8443228 13 Jan 2014 22:36:56 UTC 13 Jan 2014 22:44:19 UTC Error while computing 13.63 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09 16200612 8621110 2 Jan 2014 19:36:04 UTC 6 Jan 2014 22:46:59 UTC Error while computing 30,946.17 30,785.69 622.08 622.08 UK Met Office Coupled Model Full Resolution Ocean v6.07 16157657 8230823 23 Dec 2013 20:08:44 UTC 24 Dec 2013 20:03:09 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09 16157656 8230558 23 Dec 2013 20:08:44 UTC 24 Dec 2013 20:03:09 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09 16157653 8230574 23 Dec 2013 20:08:44 UTC 24 Dec 2013 20:03:09 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09 16156960 8230456 23 Dec 2013 18:38:29 UTC 23 Dec 2013 20:08:44 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09 16155797 8229872 23 Dec 2013 16:14:54 UTC 23 Dec 2013 16:23:59 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09 16155796 8229252 23 Dec 2013 16:14:55 UTC 23 Dec 2013 16:23:59 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09 16155794 8229298 23 Dec 2013 16:14:56 UTC 23 Dec 2013 16:23:59 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09 16155793 8229871 23 Dec 2013 16:14:54 UTC 23 Dec 2013 16:23:59 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09 16155719 8229843 23 Dec 2013 16:14:53 UTC 23 Dec 2013 16:23:59 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09 16155718 8229842 23 Dec 2013 16:14:53 UTC 23 Dec 2013 16:23:59 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09 16155717 8229605 23 Dec 2013 16:14:54 UTC 23 Dec 2013 16:23:59 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09 16146903 8559510 18 Dec 2013 20:41:30 UTC 23 Dec 2013 16:23:59 UTC Error while computing 21,757.95 21,605.19 311.04 311.04 UK Met Office Coupled Model Full Resolution Ocean v6.07 16145092 8566230 15 Dec 2013 9:00:14 UTC 16 Mar 2014 16:27:25 UTC In progress --- --- --- --- UK Met Office Coupled Model Full Resolution Ocean v6.07 16074017 8502992 25 Oct 2013 12:45:34 UTC 27 Oct 2013 20:02:43 UTC Error while computing 13,926.68 13,926.68 0.00 --- UK Met Office Coupled Model Full Resolution Ocean v6.07 16070362 8612229 19 Oct 2013 12:29:42 UTC 20 Oct 2013 21:48:00 UTC Error while computing 14,859.37 14,766.23 0.00 --- UK Met Office Coupled Model Full Resolution Ocean v6.07 16060662 8607938 7 Oct 2013 21:10:43 UTC 23 Oct 2013 21:03:42 UTC Error while computing 124,037.75 121,672.00 1,866.24 1,866.24 UK Met Office Coupled Model Full Resolution Ocean v6.07 16060078 8498903 7 Oct 2013 10:37:43 UTC 8 Oct 2013 22:21:05 UTC Error while computing 73.38 0.44 0.00 --- UK Met Office Coupled Model Full Resolution Ocean v6.07 |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi Steve It looks as if you have two nice computers. Here they are: http://climateapps2.oerc.ox.ac.uk/cpdnboinc/hosts_user.php?userid=18823 But as you say, they've been crashing quite a few tasks and #1 in that list has been having more of a struggle. Here are its tasks: http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?hostid=1307835 We can click on any of the task numbers and then see the details of the task by clicking on Stderr+. The messages that then appear and (sometimes) the error code (or exit status) often provide useful clues as to what happened. * The tasks (models) that say Download error are almost certainly due to some problem inherent in the models eg a missing file. Don't worry about them. * A small number of the crash messages show 5 or 6 times that there was an INITTIME error. This is something wrong with the model which tries to restart 5 times and is programmed to crash on the 6th attempt. Don't worry about these. * A few of your models crashed with code 25. This can be caused by a bluescreen crash. Is this ever a problem on this computer? * Sometimes two or three models seem to have crashed at the same time. I'm guessing that because they reported to the Oxford server at the same time. Something the models didn't like must have happened. One possible cause is that you turn off the computer without completely exiting from BOINC first. Before shutting down the computer you should: - open BOINC Manager - in the Activity tab stop computation. I also stop network access. - in the File tab click on Exit Don't just close BOINC Manager before shutting down the computer; that doesn't work because the models will still be crunching. * It's safer not to let your antivirus program scan BOINC and the models while they're running. Either exit from BOINC before scans or exclude BOINC from scans. Other members may well have extra suggestions. Cpdn news |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
This is a well known issue. You were hit by the decadal crash problem. The 40 year WU�s, when they reach the end of a decade (at 25%, 50%, 75% and 100%), pause the computation to create a zip file. Far to many models crash at this point. The only thing you can do to help prevent this is not to interrupt the model when it is at this point. Don�t suspend it of shut it down when it is creating the zip files. Some will crash anyway. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,009,815 RAC: 21,293 |
The only thing I would add is suspend computation before doing anything very processor intensive such as video editing. I know the theory is that BOINC should handle things without problems but my experience is that id doesn't always and a couple of times when a large file has been going through the render process and I have forgotten to suspend computation by BOINC it has crashed models. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Steve You also have another problem, as per the current latest post in News and Announcements. You have 2 models that fit this description. Continuing with them is a waste of electricity. ********** Another thing that may cause a crash, due to model interruption at a critical moment, is leaving the option: Suspend work if CPU usage is above set to the default, or to anything else that constantly interrupts BOINC and it's tasks if the cpu load load gets above that %. Backups: Here |
©2024 cpdn.org