climateprediction.net (CPDN) home page
Thread 'Update on HadCM3 'Short' WU crashes with shutdown in Windows'

Thread 'Update on HadCM3 'Short' WU crashes with shutdown in Windows'

Message boards : Number crunching : Update on HadCM3 'Short' WU crashes with shutdown in Windows
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfilePete B

Send message
Joined: 26 Aug 04
Posts: 67
Credit: 10,299,683
RAC: 10,424
Message 50666 - Posted: 29 Oct 2014, 9:57:14 UTC

There are a large number of reports across the thread about the HadCM3 'Short' WU's crashing in Windows if BOINC is stopped. I don't know about earlier WU's but with the current batch of WU's, I am not experiencing this.

Last week, I (absentmindedly as I thought immediately afterwards) suspended BOINC and shut my PC down for a reboot after installing a driver for some unrelated software. I fully expected a crash of the 3 running 'Short' WU's on restarting BOINC but they didn't, they carried on running to completion as if nothing had happened.

I, intentionally this time, repeated the exercise of BOINC suspension and PC reboot for some Windows updates yesterday and the 2 'Short' WU's, together with an EU AM3 all restarted without a problem.

I'm running Windows 7 incl Sevice Pack 1, BOINC version 7.2.42 and the method I use, and always have, is to first suspend the running project via the Activities dropdown which suspends all running WU's simultaneously. I then wait about 30 secs to give a chance for any disc writing to complete, then exit from BOINC. I then shut the PC down.

On restarting, I wait until everything has started, then start the BOINC manager. I then restart the project via the 'Activities' window. No crash with 'Short' WU's yet.

I haven't tried a drastic BOINC process stop by shutting down the PC with BOINC still running, maybe that would crash the WU's?.
ID: 50666 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 50670 - Posted: 29 Oct 2014, 16:20:39 UTC - in response to Message 50666.  

What I've noticed, since you posted a week or two ago --

Your machines seem to fail the hadcm3n - r models
with the "theta" error -- while almost all machines fail with some Linux or Windows "stack overflow"

There's a zillion machines out there that fail these "short" wu's -- but your machines seem to get as far and get to the "THETA" thingy.

Cant see how your boxes get so more far forwaarder--

Whaat you have different that makes your machine fail "as expected" raather than the "stack overflow" that most of us seen.





ID: 50670 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 50705 - Posted: 2 Nov 2014, 2:20:08 UTC - in response to Message 50670.  

Interesting Erik, but not quite what Pete was on about. However to continue your train of thought, I noticed it is an AMD box so went through the top 300 PCs, but no, only found one other AMD giving similar results here. (The CPU run time is a dead giveaway as to the type of error; numbers less than 100 sec normally mean Invalid Theta.) Several others crashing, several hadn't run the model.

Then thought I should have a look at some others and found a windows laptop (with more suspends than I've had hot breakfasts - ever!!) and yet was chugging all the 'r' models through getting Invalid Theta.

Gave up after that, and I guess the researchers will by now have figured out what is going on.
ID: 50705 · Report as offensive     Reply Quote
ProfileBonsai911

Send message
Joined: 9 Sep 04
Posts: 228
Credit: 30,756,611
RAC: 3,303
Message 50802 - Posted: 12 Nov 2014, 6:29:09 UTC - in response to Message 50705.  

with BOINC 7.4.27 , it works A LOT BETTER.
ID: 50802 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 50805 - Posted: 12 Nov 2014, 7:39:28 UTC - in response to Message 50802.  

About time we had some good news. :)

Thanks.

ID: 50805 · Report as offensive     Reply Quote

Message boards : Number crunching : Update on HadCM3 'Short' WU crashes with shutdown in Windows

©2024 cpdn.org