climateprediction.net (CPDN) home page
Thread 'UK Met Office HadAM4 at N216 resolution'

Thread 'UK Met Office HadAM4 at N216 resolution'

Message boards : Number crunching : UK Met Office HadAM4 at N216 resolution
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5

AuthorMessage
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 62048 - Posted: 26 Jan 2020, 18:33:54 UTC

Under my i7-4790 I run only 4 N216s, they checkpoint every 38-40 minutes, 30-31 sec/TS (12,5 days to complete). Some WUs reached 39 sec/TS, but at that time was running 6 or 8 cores with WCG along. So for the moment I do not go over 4 real cores. Reading on the other thread even with RYZEN 3600 (6C, 12T) going beyond 4-5 WUs decreases performance a lot. Completion time seems faster though.

On my other machine with i7-3520M I run one N216 and one WCG. The N216 speed was 24-sec/TS and completed in 10 days.
ID: 62048 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 62741 - Posted: 1 Oct 2020, 18:50:12 UTC

Yey I got one from batch 843. The task timed out after one year no response. I wonder if it is of any use except for upping my points.
ID: 62741 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 62742 - Posted: 3 Oct 2020, 1:39:51 UTC - in response to Message 62741.  

There's been quite a few fails, and several hundred still running, (possibly not for the first time), so if you put your foot down and go for it, you're in with a chance. :)
ID: 62742 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 62743 - Posted: 3 Oct 2020, 13:25:35 UTC - in response to Message 62742.  

There's been quite a few fails, and several hundred still running, (possibly not for the first time), so if you put your foot down and go for it, you're in with a chance. :)

Good then, I will let it run. Thanks.
ID: 62743 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,026,382
RAC: 20,431
Message 62744 - Posted: 4 Oct 2020, 11:57:58 UTC

And I have just picked up one from #843 as well. (On its fifth and final attempt.
ID: 62744 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 62760 - Posted: 7 Oct 2020, 6:58:02 UTC - in response to Message 62744.  

And I have just picked up one from #843 as well. (On its fifth and final attempt.

I also got another one but from #842. On its second attempt after a whole year with no response. I still think deadlines should be shortened.
ID: 62760 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,000,748
RAC: 14,638
Message 62764 - Posted: 7 Oct 2020, 22:52:22 UTC - in response to Message 62741.  

! got 1 from 843 as well after it had been dormant for a year - but it failed almost immediately with a REPLANCA error :((
ID: 62764 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,026,382
RAC: 20,431
Message 62765 - Posted: 8 Oct 2020, 10:01:44 UTC - in response to Message 62764.  

I have now got 8 retreads from 843 and 842. Have set to no new tasks now as running more than the number of real cores slows things down too much.
ID: 62765 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 62772 - Posted: 9 Oct 2020, 7:03:20 UTC - in response to Message 62765.  

I also got 5 of the new ones. So with 6 N216 my /var climbed to ~ 16 GB. With 4 WCG ARP in the queue I almost ran out of space on /var ~20GB and BOINC manager crashed. I needed to clean some journals. Luckily no CPDN models crashed due to the low disk issue. With reducing work to real cores and cleaning ARPs will get things back to normal.
ID: 62772 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5

Message boards : Number crunching : UK Met Office HadAM4 at N216 resolution

©2024 cpdn.org