climateprediction.net (CPDN) home page
Thread 'HadCM3n release'

Thread 'HadCM3n release'

Message boards : Number crunching : HadCM3n release
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
ProfileastroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 53019 - Posted: 4 Dec 2015, 20:22:18 UTC

One of my machines downloaded four of these tasks. All failed with:
Model crashed: ATM_DYN : INVALID THETA DETECTED.

Model behavior: Run several seconds, restart at zero, repeat until failure.

Staff was notified.

If you have one running normally, please advise.
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 53019 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 53020 - Posted: 4 Dec 2015, 20:40:47 UTC - in response to Message 53019.  

8 running, 4 at just over 2 hours, 4 at just on 2.5 hours.
8 days to go.

ID: 53020 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 53021 - Posted: 4 Dec 2015, 22:36:41 UTC

The last time that I ran one of these was January this year.
They were 40 year models, with 20 trickle_ups.

This latest lot appears to be either 12 months or 12 years, with only 4 zips expected.
And they're now showing about 480 hours, or 20 days to run. Which might just be because I've recently run so many of the UK Met Office HadAM3P and HadRM3P model with MOSES II and TRIFFID Europe v7.01. Time will tell.


ID: 53021 · Report as offensive     Reply Quote
ProfileastroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 53022 - Posted: 4 Dec 2015, 23:42:53 UTC


Thanks, Les.


"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 53022 · Report as offensive     Reply Quote
ProfileastroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 53023 - Posted: 5 Dec 2015, 2:54:34 UTC


Well . . .

Les,
Five more tried on two i5-Haswell boxes, both in Win10. No joy. (To try on other boxes, including a couple trapped in Vista, I'd have to abort some _eu_ work.)

HadCM3n option trashed for now. No sense in my burning tasks in case this is a "Linux good -- Windows bad" situation.

I'll advise Sarah.
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 53023 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 53024 - Posted: 5 Dec 2015, 8:15:56 UTC

Mine are trickling at about 5 hour intervals.
One has failed with the invalid theta message.

BUT ...
The mods have had a message that this experiment is testing low sensitivity differences near the edge of stability in parameter space, so failures should be expected more than for other experiments.
And be "normal".


ID: 53024 · Report as offensive     Reply Quote
jrapdx

Send message
Joined: 4 Jul 15
Posts: 63
Credit: 3,223,760
RAC: 0
Message 53025 - Posted: 5 Dec 2015, 9:37:33 UTC

Me too. On Linux 3 new HadCM3n are running fine (well, I think they're fine), but one failed on first trickle up, giving the (by now) notorious "invalid theta" message.

Does anyone know what that's about? Odd that one crashes, while the others don't. Of course, having said that, all of them will crash now...
ID: 53025 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 53026 - Posted: 5 Dec 2015, 9:43:27 UTC - in response to Message 53025.  

Invalid theta means something produced a result outside the bounds of the experiment parameters. A negative atmospheric pressure is one example of this. This is the type of failure Les was writing about below.
ID: 53026 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,841,902
RAC: 5,047
Message 53028 - Posted: 5 Dec 2015, 11:51:00 UTC - in response to Message 53025.  

[jrapdx wrote:]Me too. On Linux 3 new HadCM3n are running fine (well, I think they're fine), but one failed on first trickle up, giving the (by now) notorious "invalid theta" message.

Does anyone know what that's about? Odd that one crashes, while the others don't. Of course, having said that, all of them will crash now...

I believe the quantity that goes out of bounds is equivalent potential temperature.
ID: 53028 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 53031 - Posted: 5 Dec 2015, 15:22:17 UTC
Last modified: 5 Dec 2015, 15:28:27 UTC

So far two of my Linux PCs have downloaded a total of 6 of the hadcm3n tasks. None have crashed to this point, but they are only one trickle in. On the other hand, one of my Win PCs downloaded 3 of them and they all crashed almost immediately with invalid theta.

Edit...Invalid theta would be an invalid potential temperature. Not that it matters as I'm sure if theta is not realistic, neither is equivalent potential temperature (theta-e).
-
ID: 53031 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,841,902
RAC: 5,047
Message 53034 - Posted: 6 Dec 2015, 0:05:13 UTC - in response to Message 53031.  

Edit...Invalid theta would be an invalid potential temperature. Not that it matters as I'm sure if theta is not realistic, neither is equivalent potential temperature (theta-e).
-

Thanks for the clarification: sloppy googling on my part ...
ID: 53034 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 53035 - Posted: 6 Dec 2015, 8:46:53 UTC

Not sure if this is just down to having run other model types for so long but estimated time on my box is still over 1,700 hours when 8.5 hours in they are about 1.1% completed. If it is and we don't get more of them in the medium term, I will probably forget and post the same comment next time they come around!
ID: 53035 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 53040 - Posted: 6 Dec 2015, 19:59:19 UTC

I'm having problems keeping what is where straight. So, moved from the WaH2 thread:

Just had a failure on the other computer, so only 6 running now.

Because these models are supposedly running close to the edge of stability, I suspect that the reason for them working on one computer and not on another, may be to do with the different processor maths libraries, and the compiler flags used for different OSs. It wouldn't take much to move some calculated values one way or the other.

And of course, there's always those who overclock, which in in this case is NOT going to help.


ID: 53040 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 53041 - Posted: 6 Dec 2015, 22:34:41 UTC

The first zips for the remaining 3 on one computer appeared after a bit over a day and a half.
About 52 megs each.

ID: 53041 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 53043 - Posted: 7 Dec 2015, 5:04:35 UTC - in response to Message 53035.  

Not sure if this is just down to having run other model types for so long but estimated time on my box is still over 1,700 hours when 8.5 hours in they are about 1.1% completed. If it is and we don't get more of them in the medium term, I will probably forget and post the same comment next time they come around!


1700 hours is excessive. On my Windows laptop with an i3 2.2 GHz processor and 8 GB�s of RAM the Wah2 seem to be headed for a finish in appromx. 250 hours. That�s with the machines maxed out running 4 WAH2 tasks side by side and a Seti task on the GPU.

ID: 53043 · Report as offensive     Reply Quote
jrapdx

Send message
Joined: 4 Jul 15
Posts: 63
Credit: 3,223,760
RAC: 0
Message 53046 - Posted: 7 Dec 2015, 10:50:14 UTC

Thanks for the info about "theta" errors, lots to learn about these subjects.

As far projected duration, my Linux system has 3 tasks running with a range of ~637 to 641 hours to go (and about 46 already elapsed). That seems pretty realistic but further adjustments may still happen.
ID: 53046 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 53051 - Posted: 8 Dec 2015, 11:20:22 UTC
Last modified: 8 Dec 2015, 11:43:38 UTC

Still have two of these running on desktop, but as noted elsewhere, they don't like being interrupted and despite suspending computation, waiting five minutes and closing down boinc via the file exit route the one on laptop crashed along with two of the hadam3prm3pm2t_eu tasks. I needed to reboot the lappy to resolve a network issue. Interestingly the desktop which I hibernate at night hasn't crashed it's two tasks yet.
ID: 53051 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 53083 - Posted: 13 Dec 2015, 5:41:34 UTC

My 2 that remained on the Haswell have now completed without incident.
194 hours.

3 still running on the Ivy Bridge.

ID: 53083 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 53089 - Posted: 13 Dec 2015, 22:10:50 UTC

And the 3 left on the Ivy Bridge have now completed and uploaded.
205 hours.


ID: 53089 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 53378 - Posted: 3 Feb 2016, 3:01:39 UTC

I have one grinding along seemingly running OK. Workunit 10222673 Task 19181832
My two "partners" have had bad luck with this one. My machine is Linux 64-bit, but with 32-bit compatibility libraries available as needed.

495 hours run-time so afar. 331 hours more predicted.
ID: 53378 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : HadCM3n release

©2024 cpdn.org