climateprediction.net (CPDN) home page
Thread 'model just quit??'

Thread 'model just quit??'

Message boards : Number crunching : model just quit??
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user437970

Send message
Joined: 20 Mar 07
Posts: 11
Credit: 135,012
RAC: 0
Message 29522 - Posted: 12 Jul 2007, 22:55:43 UTC

Here is what the message said:
7/12/2007 9:27:15 AM|climateprediction.net|Computation for task hadcm3ohe_1ccq_05711004_1 finished
7/12/2007 9:27:15 AM|climateprediction.net|Output file hadcm3ohe_1ccq_05711004_1_14.zip for task hadcm3ohe_1ccq_05711004_1 absent
7/12/2007 9:27:15 AM|climateprediction.net|Output file hadcm3ohe_1ccq_05711004_1_15.zip for task hadcm3ohe_1ccq_05711004_1 absent
7/12/2007 9:27:15 AM|climateprediction.net|Output file hadcm3ohe_1ccq_05711004_1_16.zip for task hadcm3ohe_1ccq_05711004_1 absent
7/12/2007 9:27:16 AM|climateprediction.net|Reason: Unrecoverable error for result hadcm3ohe_1ccq_05711004_1 (<file_xfer_error> <file_name>hadcm3ohe_1ccq_05711004_1_14.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_1ccq_05711004_1_15.zip</file_name> <error_code>-161</error_code></file_xfer_error><file_xfer_error> <file_name>hadcm3ohe_1ccq_05711004_1_16.zip</file_name> <error_code>-161</error_code></file_xfer_error>)
7/12/2007 9:28:17 AM|climateprediction.net|Requesting 172800 seconds of new work, and reporting 1 completed tasks

This all happened while I was at work, so am I to assume that the model finished or aborted due to an error. There was about 500 hours left to finish and it was at the 85% level.

Anyone have any thoughts. BTW, the computer did not shut down for any unknown reason.

Thanks



ID: 29522 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 29523 - Posted: 12 Jul 2007, 23:22:50 UTC


Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA


This might have been a momentary problem with the computer, but usually means that the model has run as far as it can and become unstable.
It\'s a normal outcome of a lot of models, sooner or later.
Even those that reach the 160 year limit may do the same if they were run a bit longer.

ID: 29523 · Report as offensive     Reply Quote
old_user437970

Send message
Joined: 20 Mar 07
Posts: 11
Credit: 135,012
RAC: 0
Message 29524 - Posted: 13 Jul 2007, 0:00:49 UTC - in response to Message 29523.  


Model crashed: umshell1.f: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. GA


This might have been a momentary problem with the computer, but usually means that the model has run as far as it can and become unstable.
It\'s a normal outcome of a lot of models, sooner or later.
Even those that reach the 160 year limit may do the same if they were run a bit longer.



So the data generated up to then still can be used? Still crunching several others. Onward through the heat. Thanks.
ID: 29524 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 29525 - Posted: 13 Jul 2007, 0:41:52 UTC


With these new models, data is uploaded for storage and use, once per model year, with a larger amount every 10 years in the form of a zip file, and a slightly larger zip file every 40 years.

ID: 29525 · Report as offensive     Reply Quote
Mike.Gibson

Send message
Joined: 2 May 07
Posts: 20
Credit: 657,542
RAC: 0
Message 30911 - Posted: 8 Oct 2007, 23:37:45 UTC

I have 2 machines running ClimatePrediction & WCG.

1 is Dual core intel with XP. Other is Dual core AMD with Vista. The first one runs CPDN without problem and I have 2 units nearing completion. The second one frequently finishes early when I shutdown my machine, usually after at least 10% has been completed. WCG continues regardless.

Any ideas what is happening, please?

Mike
ID: 30911 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 30912 - Posted: 9 Oct 2007, 0:49:05 UTC
Last modified: 9 Oct 2007, 0:59:47 UTC

Hi Mike

I looked at some of your crashed models, and these all failed with: exit code 1073807364
This is a windows \'stop\' message, and is often associated with a display problem.

Have a look at the READMEs linked in my signature - in particular: Crashes and other problems, at the: \"Solutions to models crashing\" near the top.

Always Exit from BOINC before shutting down; Windows doesn\'t give the model time to close it\'s many files.

edit
Changed RAEDME to README.


Backups: Here
ID: 30912 · Report as offensive     Reply Quote
ProfileastroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 30913 - Posted: 9 Oct 2007, 0:49:52 UTC
Last modified: 9 Oct 2007, 0:50:57 UTC

Do you stop CPDN, then EXIT boinc, before shutting down? Failure to do so is asking for trouble.

Edit -- oops, beat me again, Les.

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 30913 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 30914 - Posted: 9 Oct 2007, 0:58:16 UTC




But I\'ve just noticed that I can\'t spell README.

I\'m glad we can edit.

ID: 30914 · Report as offensive     Reply Quote
Mike.Gibson

Send message
Joined: 2 May 07
Posts: 20
Credit: 657,542
RAC: 0
Message 31369 - Posted: 14 Nov 2007, 15:33:57 UTC

Thanks, Guys.

Spelling doesn\'t matter if your heart is in the right place!

That seems to have stopped the regular problem. However, I lost another unit overnight last night. I think I must have had an automatic update which re-booted my sytem. I have an idea that this can be prevented, but can\'t remember how to change the settings. Any suggestions, please?

Fortunately, the hadsm3 unit which had got past the 50% mark did not disappear, but the hadcm3 unit with only 1 or 2 % completed did disappear. This surprises me because the hadsm3 unit was permanently running (50% of dual-core machine allocated to ClimatePrediction) whereas the hadcm3 unit was suspended by user. Was it because the hadcm3 unit is much bigger, judging from the much longer processing time expected (over 1000 hours)?

Cheers

Mike
ID: 31369 · Report as offensive     Reply Quote
ProfileIain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 31372 - Posted: 14 Nov 2007, 16:39:04 UTC - in response to Message 31369.  
Last modified: 14 Nov 2007, 16:39:42 UTC

... I think I must have had an automatic update which re-booted my sytem. I have an idea that this can be prevented, but can\'t remember how to change the settings. Any suggestions, please? ...

In Vista, Control Panel | Windows Update | Change Settings. If you set it to the download option, then you can choose when to install and then handle a restart manually.
ID: 31372 · Report as offensive     Reply Quote
ProfileJohnofWem
Avatar

Send message
Joined: 15 Feb 06
Posts: 16
Credit: 7,131,865
RAC: 8,566
Message 31375 - Posted: 14 Nov 2007, 17:17:07 UTC

Those automatic updates are a pain in the ~~~~

Slightly different in XP

Go to Control Panel -> System and find the Automatic Update tab.
ID: 31375 · Report as offensive     Reply Quote
Mike.Gibson

Send message
Joined: 2 May 07
Posts: 20
Credit: 657,542
RAC: 0
Message 31377 - Posted: 14 Nov 2007, 19:33:15 UTC - in response to Message 31372.  

... I think I must have had an automatic update which re-booted my sytem. I have an idea that this can be prevented, but can\'t remember how to change the settings. Any suggestions, please? ...

In Vista, Control Panel | Windows Update | Change Settings. If you set it to the download option, then you can choose when to install and then handle a restart manually.



Thanks, Guys

It is Vista and I have changed the setting. And meanwhile yet more updates have arrived!!!!!

Cheers

Mike
ID: 31377 · Report as offensive     Reply Quote
Profile[B@H] Ray
Avatar

Send message
Joined: 19 Aug 05
Posts: 104
Credit: 1,866,495
RAC: 0
Message 31610 - Posted: 5 Dec 2007, 18:14:01 UTC - in response to Message 30912.  

edit
Changed RAEDME to README.


Reverseing letters is usually my job, please don\'t put me out of work.
Keep on crunching Pizza@Home
ID: 31610 · Report as offensive     Reply Quote

Message boards : Number crunching : model just quit??

©2024 cpdn.org