climateprediction.net (CPDN) home page
Thread 'What is with all the models crashing after running for 20-30 seconds?'

Thread 'What is with all the models crashing after running for 20-30 seconds?'

Message boards : Number crunching : What is with all the models crashing after running for 20-30 seconds?
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user552217

Send message
Joined: 7 Jan 09
Posts: 8
Credit: 177,252
RAC: 0
Message 43931 - Posted: 13 Mar 2012, 4:02:24 UTC
Last modified: 13 Mar 2012, 4:05:24 UTC

I have a Mac OSX machine and I have not had one Model run for more than one minute in the last week. I look at the models and they are crashing not just for me but for all 3 supported OS's ( Win,Linux,and OSX). Has anyone found a reason for this?

EDIT just for emphasis none of the WU have been finished by any computer assigned to them. This is not an OSX problem. This appears to be a model problem.
ID: 43931 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 43932 - Posted: 13 Mar 2012, 4:32:05 UTC - in response to Message 43931.  

Actually it IS a Mac problem, and is described in this sticky thread right at the top of the Macintosh section.

There is a different problem with some Linux systems, described in a sticky thread at the top of the Linux section.

Most Windows models run without problems.


Backups: Here
ID: 43932 · Report as offensive     Reply Quote
old_user552217

Send message
Joined: 7 Jan 09
Posts: 8
Credit: 177,252
RAC: 0
Message 43933 - Posted: 13 Mar 2012, 6:47:47 UTC - in response to Message 43932.  
Last modified: 13 Mar 2012, 6:53:14 UTC

Actually it IS a Mac problem, and is described in this sticky thread right at the top of the Macintosh section.

There is a different problem with some Linux systems, described in a sticky thread at the top of the Linux section.

Most Windows models run without problems.


The funny thing is I have done no system changes and my system was doing good work until mid to end of February last month and then nothing has worked since. I have done the detach reattach and am looking to see if this fixes the problem. I suspect it won't but am hoping it will.I looked at most of my failed work units and even the windows machines error out the WU that died on my machine. My task list
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?userid=552217
some WU that both a windows machine and mine killed
this one one of each OS killed it
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=7973162
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=7978576
Here on OSX and Window machine killed it
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=7976667
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=7976725
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=7974387
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=7927903

Here is are window machines generating errors like mine
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/show_host_detail.php?hostid=1205752
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?hostid=1192900
If you want more I can easily fish up some more
ID: 43933 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 43934 - Posted: 13 Mar 2012, 7:00:51 UTC - in response to Message 43933.  

There's no need for more info.
It's well known that some hadcm3n models are very unstable for some reason, but these models haven't been available for some time now.

As for Mac computers, when BOINC went from 6.10.* to 6.12.*, the extra 'sandboxing' meant that some permissions would no longer work. I think it's because some parts got moved to different folders/sub folders.
However it was a long time back in terms of BOINC versions and talk about it on the BOINC/dev boards, and I don't have a Mac, so I wasn't interested in the fine details.

There's a very long thread in the climateprediction.net Science section of this board under Misconfiguration e-mail, which is a discussion thread for people who have had their computers blocked until they fixed the problem.
So we know that the disconnect/re-attach works. If it doesn't for you, then it's a first, so let us know.


Backups: Here
ID: 43934 · Report as offensive     Reply Quote

Message boards : Number crunching : What is with all the models crashing after running for 20-30 seconds?

©2024 cpdn.org