climateprediction.net (CPDN) home page
Thread 'Losing out to Computation Errors. Which app's should I suspend??'

Thread 'Losing out to Computation Errors. Which app's should I suspend??'

Questions and Answers : Preferences : Losing out to Computation Errors. Which app's should I suspend??
Message board moderation

To post messages, you must log in.

AuthorMessage
pspinks

Send message
Joined: 11 Sep 09
Posts: 2
Credit: 806,780
RAC: 0
Message 43239 - Posted: 17 Oct 2011, 6:00:05 UTC

I have lost the last 3 work units due to Computation Errors, which has wasted 400+ hours of processing :(

To limit the time lost when such errors occur in future, I'd like to modify my Climateprediction Preferences to focus on applications with the shortest run-times. Can anyone advise me which applications they are?

Incidentally, I don't understand why the software cannot save its state when processing resumes, then revert to those files if an error is encountered. I've read advice about manually backing up work units, but I won't be investing time and effort in that.
ID: 43239 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,884,997
RAC: 4,577
Message 43240 - Posted: 17 Oct 2011, 8:48:14 UTC

If you follow the 'Your account' link on the menu to the left of this page then your account settings will be displayed. If you then follow the link 'climateprediction.net preferences' then you can select which types of model you want to run.

The shortest models currently available are 'UK Met Office HADAM3P European Region', 'UK Met Office HADAM3P Southern Africa' and 'UK Met Office HADAM3P Pacific North West'.

The model does save its state as it progresses: in most cases the model is reported as a failure only after it has failed a number of times (i.e. six) - other times a fatal error occurs that invalidates the restart data (as in one of the crashes on your machine).
ID: 43240 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 43241 - Posted: 17 Oct 2011, 9:07:31 UTC - in response to Message 43239.  

The 400 hours wasn't wasted. Data is returned all the way through the models, by what is called a "trickle_up" file, as well as zip files for the longer models.

The point of this project is NOT to have a model run from start to finish.
They are started with certain values for any of the many variables and forcings that make up a model, and then they are left to run for as long as possible.
When they fail, the researchers have learned what the end result is for those values.
This result then becomes part of a huge ensemble that's being built, to better understand the workings of modelling the real weather/climate.

Making backups is not compulsory. It's just for those people who want to recover a model after a hardware failure. Or, in your case, what appears to be a software conflict.


Backups: Here
ID: 43241 · Report as offensive     Reply Quote
pspinks

Send message
Joined: 11 Sep 09
Posts: 2
Credit: 806,780
RAC: 0
Message 43276 - Posted: 25 Oct 2011, 23:16:23 UTC - in response to Message 43240.  

Iain and Les, thanks for the replies. It's reassuring to know that tasks which ended in computation errors still provided some useful results.

Your replies suggested that a software conflict on my machine had caused at least one of the crashes. Can you give me any more information on that, or tell me where information might be logged on my own system?
ID: 43276 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 43290 - Posted: 26 Oct 2011, 9:18:24 UTC

A couple of your hadcm3n models have crashed probably due to something inherent in the model, not a fault on the computer. So I wouldn't worry. Just keep crunching your current models and let us know if they have problems.
Cpdn news
ID: 43290 · Report as offensive     Reply Quote
Steve in Pimlico

Send message
Joined: 17 Sep 04
Posts: 9
Credit: 19,604,231
RAC: 296
Message 47441 - Posted: 30 Oct 2013, 1:15:59 UTC - in response to Message 43239.  

I agree I have lost the last 9 years
ID: 47441 · Report as offensive     Reply Quote

Questions and Answers : Preferences : Losing out to Computation Errors. Which app's should I suspend??

©2024 cpdn.org