Message boards : Number crunching : HadCM3n release
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
One of my machines downloaded four of these tasks. All failed with: Model crashed: ATM_DYN : INVALID THETA DETECTED. Model behavior: Run several seconds, restart at zero, repeat until failure. Staff was notified. If you have one running normally, please advise. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
8 running, 4 at just over 2 hours, 4 at just on 2.5 hours. 8 days to go. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The last time that I ran one of these was January this year. They were 40 year models, with 20 trickle_ups. This latest lot appears to be either 12 months or 12 years, with only 4 zips expected. And they're now showing about 480 hours, or 20 days to run. Which might just be because I've recently run so many of the UK Met Office HadAM3P and HadRM3P model with MOSES II and TRIFFID Europe v7.01. Time will tell. |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Thanks, Les. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Well . . . Les, Five more tried on two i5-Haswell boxes, both in Win10. No joy. (To try on other boxes, including a couple trapped in Vista, I'd have to abort some _eu_ work.) HadCM3n option trashed for now. No sense in my burning tasks in case this is a "Linux good -- Windows bad" situation. I'll advise Sarah. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Mine are trickling at about 5 hour intervals. One has failed with the invalid theta message. BUT ... The mods have had a message that this experiment is testing low sensitivity differences near the edge of stability in parameter space, so failures should be expected more than for other experiments. And be "normal". |
Send message Joined: 4 Jul 15 Posts: 63 Credit: 3,223,760 RAC: 0 |
Me too. On Linux 3 new HadCM3n are running fine (well, I think they're fine), but one failed on first trickle up, giving the (by now) notorious "invalid theta" message. Does anyone know what that's about? Odd that one crashes, while the others don't. Of course, having said that, all of them will crash now... |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Invalid theta means something produced a result outside the bounds of the experiment parameters. A negative atmospheric pressure is one example of this. This is the type of failure Les was writing about below. |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,841,902 RAC: 5,047 |
[jrapdx wrote:]Me too. On Linux 3 new HadCM3n are running fine (well, I think they're fine), but one failed on first trickle up, giving the (by now) notorious "invalid theta" message. I believe the quantity that goes out of bounds is equivalent potential temperature. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
So far two of my Linux PCs have downloaded a total of 6 of the hadcm3n tasks. None have crashed to this point, but they are only one trickle in. On the other hand, one of my Win PCs downloaded 3 of them and they all crashed almost immediately with invalid theta. Edit...Invalid theta would be an invalid potential temperature. Not that it matters as I'm sure if theta is not realistic, neither is equivalent potential temperature (theta-e). - |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,841,902 RAC: 5,047 |
Edit...Invalid theta would be an invalid potential temperature. Not that it matters as I'm sure if theta is not realistic, neither is equivalent potential temperature (theta-e). Thanks for the clarification: sloppy googling on my part ... |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Not sure if this is just down to having run other model types for so long but estimated time on my box is still over 1,700 hours when 8.5 hours in they are about 1.1% completed. If it is and we don't get more of them in the medium term, I will probably forget and post the same comment next time they come around! |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I'm having problems keeping what is where straight. So, moved from the WaH2 thread: Just had a failure on the other computer, so only 6 running now. Because these models are supposedly running close to the edge of stability, I suspect that the reason for them working on one computer and not on another, may be to do with the different processor maths libraries, and the compiler flags used for different OSs. It wouldn't take much to move some calculated values one way or the other. And of course, there's always those who overclock, which in in this case is NOT going to help. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The first zips for the remaining 3 on one computer appeared after a bit over a day and a half. About 52 megs each. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Not sure if this is just down to having run other model types for so long but estimated time on my box is still over 1,700 hours when 8.5 hours in they are about 1.1% completed. If it is and we don't get more of them in the medium term, I will probably forget and post the same comment next time they come around! 1700 hours is excessive. On my Windows laptop with an i3 2.2 GHz processor and 8 GB�s of RAM the Wah2 seem to be headed for a finish in appromx. 250 hours. That�s with the machines maxed out running 4 WAH2 tasks side by side and a Seti task on the GPU. |
Send message Joined: 4 Jul 15 Posts: 63 Credit: 3,223,760 RAC: 0 |
Thanks for the info about "theta" errors, lots to learn about these subjects. As far projected duration, my Linux system has 3 tasks running with a range of ~637 to 641 hours to go (and about 46 already elapsed). That seems pretty realistic but further adjustments may still happen. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Still have two of these running on desktop, but as noted elsewhere, they don't like being interrupted and despite suspending computation, waiting five minutes and closing down boinc via the file exit route the one on laptop crashed along with two of the hadam3prm3pm2t_eu tasks. I needed to reboot the lappy to resolve a network issue. Interestingly the desktop which I hibernate at night hasn't crashed it's two tasks yet. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
My 2 that remained on the Haswell have now completed without incident. 194 hours. 3 still running on the Ivy Bridge. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
And the 3 left on the Ivy Bridge have now completed and uploaded. 205 hours. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I have one grinding along seemingly running OK. Workunit 10222673 Task 19181832 My two "partners" have had bad luck with this one. My machine is Linux 64-bit, but with 32-bit compatibility libraries available as needed. 495 hours run-time so afar. 331 hours more predicted. |
©2024 cpdn.org