climateprediction.net (CPDN) home page
Thread 'models always for 10 years crash'

Thread 'models always for 10 years crash'

Message boards : Number crunching : models always for 10 years crash
Message board moderation

To post messages, you must log in.

AuthorMessage
Steve in Pimlico

Send message
Joined: 17 Sep 04
Posts: 9
Credit: 19,604,231
RAC: 296
Message 48002 - Posted: 18 Jan 2014, 15:10:17 UTC

Help

*I am still receiving these errors, and have been for 10 years.

can anyone tell me what to do?


Show: All | In progress | Completed | Valid | Invalid | Error

Task ID
click for details
Show names Work unit ID
click for details Sent Time reported
or deadline
explain Status Run time
(sec) CPU time
(sec) Claimed credit Granted credit Application
16216713 8542749 13 Jan 2014 22:36:56 UTC 15 Apr 2014 6:04:07 UTC In progress --- --- --- --- UK Met Office Coupled Model Full Resolution Ocean v6.07
16216632 8443228 13 Jan 2014 22:36:56 UTC 13 Jan 2014 22:44:19 UTC Error while computing 13.63 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09
16200612 8621110 2 Jan 2014 19:36:04 UTC 6 Jan 2014 22:46:59 UTC Error while computing 30,946.17 30,785.69 622.08 622.08 UK Met Office Coupled Model Full Resolution Ocean v6.07
16157657 8230823 23 Dec 2013 20:08:44 UTC 24 Dec 2013 20:03:09 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09
16157656 8230558 23 Dec 2013 20:08:44 UTC 24 Dec 2013 20:03:09 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09
16157653 8230574 23 Dec 2013 20:08:44 UTC 24 Dec 2013 20:03:09 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09
16156960 8230456 23 Dec 2013 18:38:29 UTC 23 Dec 2013 20:08:44 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09
16155797 8229872 23 Dec 2013 16:14:54 UTC 23 Dec 2013 16:23:59 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09
16155796 8229252 23 Dec 2013 16:14:55 UTC 23 Dec 2013 16:23:59 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09
16155794 8229298 23 Dec 2013 16:14:56 UTC 23 Dec 2013 16:23:59 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09
16155793 8229871 23 Dec 2013 16:14:54 UTC 23 Dec 2013 16:23:59 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09
16155719 8229843 23 Dec 2013 16:14:53 UTC 23 Dec 2013 16:23:59 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09
16155718 8229842 23 Dec 2013 16:14:53 UTC 23 Dec 2013 16:23:59 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09
16155717 8229605 23 Dec 2013 16:14:54 UTC 23 Dec 2013 16:23:59 UTC Error while downloading 0.00 0.00 0.00 --- UK Met Office HADAM3P European Region v6.09
16146903 8559510 18 Dec 2013 20:41:30 UTC 23 Dec 2013 16:23:59 UTC Error while computing 21,757.95 21,605.19 311.04 311.04 UK Met Office Coupled Model Full Resolution Ocean v6.07
16145092 8566230 15 Dec 2013 9:00:14 UTC 16 Mar 2014 16:27:25 UTC In progress --- --- --- --- UK Met Office Coupled Model Full Resolution Ocean v6.07
16074017 8502992 25 Oct 2013 12:45:34 UTC 27 Oct 2013 20:02:43 UTC Error while computing 13,926.68 13,926.68 0.00 --- UK Met Office Coupled Model Full Resolution Ocean v6.07
16070362 8612229 19 Oct 2013 12:29:42 UTC 20 Oct 2013 21:48:00 UTC Error while computing 14,859.37 14,766.23 0.00 --- UK Met Office Coupled Model Full Resolution Ocean v6.07
16060662 8607938 7 Oct 2013 21:10:43 UTC 23 Oct 2013 21:03:42 UTC Error while computing 124,037.75 121,672.00 1,866.24 1,866.24 UK Met Office Coupled Model Full Resolution Ocean v6.07
16060078 8498903 7 Oct 2013 10:37:43 UTC 8 Oct 2013 22:21:05 UTC Error while computing 73.38 0.44 0.00 --- UK Met Office Coupled Model Full Resolution Ocean v6.07
ID: 48002 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 48003 - Posted: 18 Jan 2014, 17:35:56 UTC
Last modified: 18 Jan 2014, 17:36:24 UTC

Hi Steve

It looks as if you have two nice computers. Here they are:

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/hosts_user.php?userid=18823

But as you say, they've been crashing quite a few tasks and #1 in that list has been having more of a struggle. Here are its tasks:

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?hostid=1307835

We can click on any of the task numbers and then see the details of the task by clicking on Stderr+. The messages that then appear and (sometimes) the error code (or exit status) often provide useful clues as to what happened.

* The tasks (models) that say Download error are almost certainly due to some problem inherent in the models eg a missing file. Don't worry about them.

* A small number of the crash messages show 5 or 6 times that there was an INITTIME error. This is something wrong with the model which tries to restart 5 times and is programmed to crash on the 6th attempt. Don't worry about these.

* A few of your models crashed with code 25. This can be caused by a bluescreen crash. Is this ever a problem on this computer?

* Sometimes two or three models seem to have crashed at the same time. I'm guessing that because they reported to the Oxford server at the same time. Something the models didn't like must have happened. One possible cause is that you turn off the computer without completely exiting from BOINC first. Before shutting down the computer you should:

- open BOINC Manager
- in the Activity tab stop computation. I also stop network access.
- in the File tab click on Exit

Don't just close BOINC Manager before shutting down the computer; that doesn't work because the models will still be crunching.

* It's safer not to let your antivirus program scan BOINC and the models while they're running. Either exit from BOINC before scans or exclude BOINC from scans.

Other members may well have extra suggestions.
Cpdn news
ID: 48003 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 48004 - Posted: 18 Jan 2014, 17:37:53 UTC

This is a well known issue. You were hit by the decadal crash problem. The 40 year WU�s, when they reach the end of a decade (at 25%, 50%, 75% and 100%), pause the computation to create a zip file. Far to many models crash at this point.

The only thing you can do to help prevent this is not to interrupt the model when it is at this point. Don�t suspend it of shut it down when it is creating the zip files. Some will crash anyway.

ID: 48004 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4538
Credit: 19,008,987
RAC: 21,524
Message 48005 - Posted: 18 Jan 2014, 17:54:36 UTC

The only thing I would add is suspend computation before doing anything very processor intensive such as video editing. I know the theory is that BOINC should handle things without problems but my experience is that id doesn't always and a couple of times when a large file has been going through the render process and I have forgotten to suspend computation by BOINC it has crashed models.
ID: 48005 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 48006 - Posted: 18 Jan 2014, 22:07:32 UTC - in response to Message 48002.  

Steve

You also have another problem, as per the current latest post in News and Announcements.

You have 2 models that fit this description. Continuing with them is a waste of electricity.

**********

Another thing that may cause a crash, due to model interruption at a critical moment, is leaving the option: Suspend work if CPU usage is above set to the default, or to anything else that constantly interrupts BOINC and it's tasks if the cpu load load gets above that %.


Backups: Here
ID: 48006 · Report as offensive     Reply Quote

Message boards : Number crunching : models always for 10 years crash

©2024 cpdn.org