climateprediction.net (CPDN) home page
Thread 'several crash right after initialization'

Thread 'several crash right after initialization'

Message boards : Number crunching : several crash right after initialization
Message board moderation

To post messages, you must log in.

AuthorMessage
NewtonianRefractor

Send message
Joined: 22 May 08
Posts: 49
Credit: 2,335,997
RAC: 0
Message 41001 - Posted: 10 Nov 2010, 1:02:22 UTC
Last modified: 10 Nov 2010, 1:05:51 UTC

My computer: 1109774 just had a batch of HadCM3's fail right after initialization.
Is that normal?

This is the erorr code:


<core_client_version>6.10.58</core_client_version>
<![CDATA[
<message>
The drive cannot locate a specific area or track on the disk. (0x19) - exit code 25 (0x19)
</message>
<stderr_txt>
called boinc_finish

</stderr_txt>
]]>
ID: 41001 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 41004 - Posted: 10 Nov 2010, 9:03:48 UTC
Last modified: 10 Nov 2010, 9:04:26 UTC

I saw this last night and downloaded 3 HadCM 6.04 for myself to check. All crashed after 1 timestep with 25 errors, which is an error type none of my computers had ever produced before. On Linux the models are also producing a 25 exit code but with slightly different error messages in stderr.

I informed Milo last night and he saw what I'd said before he went to bed. He'll probably have to deprecate this whole new batch but is at a meeting today so may not be able to do so immediately.

I recommend we edit our climateprediction preferences in our accounts to exclude HadCM for the time being.
Cpdn news
ID: 41004 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 41005 - Posted: 10 Nov 2010, 9:12:22 UTC

Milo had just removed HadCM from the work queue.
Cpdn news
ID: 41005 · Report as offensive     Reply Quote
old_user596405

Send message
Joined: 4 Oct 09
Posts: 73
Credit: 7,242,427
RAC: 0
Message 41007 - Posted: 10 Nov 2010, 9:36:14 UTC - in response to Message 41005.  

Milo had just removed HadCM from the work queue.

Good. Just wasted time watching three going down the pan.
CM3 version 6.05 is also crashing right after starting in Beta.
May be a while before CM3 gets sorted. The scientists will have to wait.
ID: 41007 · Report as offensive     Reply Quote
Nigel Garvey

Send message
Joined: 5 May 10
Posts: 69
Credit: 1,169,103
RAC: 2,258
Message 41037 - Posted: 15 Nov 2010, 9:30:33 UTC

It's still happening. I had two this morning which crashed out before I knew I'd got them. (That's my CPDN download quota for today.)

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=6964086
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=6964078

I don't know if it's coincidence, but as I write, it's v6.03 applications which have errored, not v6.04.
ID: 41037 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,827,799
RAC: 5,038
Message 41040 - Posted: 15 Nov 2010, 10:41:58 UTC - in response to Message 41037.  

It's still happening. I had two this morning which crashed out before I knew I'd got them. (That's my CPDN download quota for today.)

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=6964086
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=6964078

I don't know if it's coincidence, but as I write, it's v6.03 applications which have errored, not v6.04.

As I understand it, the fix did not involve an application release, but only a change to the names of the files being downloaded. The v6.03/04 difference is Mac vas Windows/Linux (according to the application page), so maybe the MAc configuration hasn't been patched.

I'll pass the message on.
ID: 41040 · Report as offensive     Reply Quote
ProfileThyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 41046 - Posted: 15 Nov 2010, 15:05:48 UTC - in response to Message 41037.  

Both tasks failed with the message Insufficient Memory/Stack Space Available! in the stderr messages. The solution is described in detail here.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 41046 · Report as offensive     Reply Quote
Nigel Garvey

Send message
Joined: 5 May 10
Posts: 69
Credit: 1,169,103
RAC: 2,258
Message 41052 - Posted: 16 Nov 2010, 9:34:03 UTC - in response to Message 41046.  

Both tasks failed with the message Insufficient Memory/Stack Space Available! in the stderr messages. The solution is described in detail here.


Thanks for the link. It's an interesting article. But, on balance of desirability, I've simply edited my CPDN preferences to exclude these tasks.
ID: 41052 · Report as offensive     Reply Quote

Message boards : Number crunching : several crash right after initialization

©2024 cpdn.org