climateprediction.net (CPDN) home page
Thread 'Feedback on running OpenIFS large memory (16-25 Gb+) configurations requested'

Thread 'Feedback on running OpenIFS large memory (16-25 Gb+) configurations requested'

Message boards : Number crunching : Feedback on running OpenIFS large memory (16-25 Gb+) configurations requested
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 318
Credit: 15,011,722
RAC: 7,015
Message 71666 - Posted: 18 Oct 2024, 21:20:19 UTC - in response to Message 71660.  
Last modified: 18 Oct 2024, 21:20:54 UTC

I'd say at the very least make the core count user configurable from the get go, regardless of which default you'll choose.
I'd prefer not to. From previous experience it's easier to debug remote problems with the setup the same. Options can come later once we are sure it all works satisfactorily.

In that case it seems like a default of 4 will get the results back quicker and thus bugs will show up quicker. However, is 4 going to reduce the number of users in any way? If not, then 4 seems to be the better default.
ID: 71666 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 71667 - Posted: 19 Oct 2024, 20:37:45 UTC

Similar performance results for Zen3 when enabling THP.

Ryzen 5 5600X
2X16 GB DDR4-3600
Linux Mint 21.03

4C results:
40 min THP disabled
36 min THP enabled

so ~10% speedup with THP enabled.
ID: 71667 · Report as offensive     Reply Quote
Ingleside

Send message
Joined: 5 Aug 04
Posts: 127
Credit: 24,595,258
RAC: 6,633
Message 71668 - Posted: 19 Oct 2024, 22:44:28 UTC - in response to Message 71664.  

8C8T 25.32 25.26 25.21 25.26 4.29

8C8T 25.32 25.26 25.21 25.26 5.26

Hmm, is where really no penalty for running 8 cores on a busy computer, or is this an accidental copy+paste?
ID: 71668 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 127
Credit: 43,528,552
RAC: 67,420
Message 71669 - Posted: 19 Oct 2024, 23:42:29 UTC - in response to Message 71668.  
Last modified: 19 Oct 2024, 23:44:56 UTC

8C8T 25.32 25.26 25.21 25.26 4.29

8C8T 25.32 25.26 25.21 25.26 5.26

Hmm, is where really no penalty for running 8 cores on a busy computer, or is this an accidental copy+paste?

This host only has 8 cores, so running the OpenIFS test with 8 threads is already a fully loaded host. Thus I didn't rerun it again but reused the results just to calculate the new scaling factor against the busy 1C1T result. :-)
ID: 71669 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 318
Credit: 15,011,722
RAC: 7,015
Message 71670 - Posted: 20 Oct 2024, 6:24:24 UTC - in response to Message 71664.  

I'm not Linux savvy but wanted to test THP on my system after reading some posts about it. After looking things up a bit, it seems like THP is enabled by default, at least it is in WSL2 Ubuntu 22.04. Is that not the default setting in most distributions?
ID: 71670 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 71671 - Posted: 20 Oct 2024, 15:03:06 UTC - in response to Message 71670.  

I'm not Linux savvy but wanted to test THP on my system after reading some posts about it. After looking things up a bit, it seems like THP is enabled by default, at least it is in WSL2 Ubuntu 22.04. Is that not the default setting in most distributions?

In Linux Mint 21.3 and Ubuntu 22.04, for my installations, the default setting is "madvise". Changing it to "enabled" improved the performance on my PCs.
ID: 71671 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 318
Credit: 15,011,722
RAC: 7,015
Message 71672 - Posted: 21 Oct 2024, 7:07:08 UTC - in response to Message 71671.  

I'm not Linux savvy but wanted to test THP on my system after reading some posts about it. After looking things up a bit, it seems like THP is enabled by default, at least it is in WSL2 Ubuntu 22.04. Is that not the default setting in most distributions?

In Linux Mint 21.3 and Ubuntu 22.04, for my installations, the default setting is "madvise". Changing it to "enabled" improved the performance on my PCs.

Ok, so it seems that it's unique to WSL2 to have THP on by default.

I disabled it to compare and it's ~7% improvement with THP on, ~88 min vs. ~94 min, using 4 core configuration, on a busy 5900X.
ID: 71672 · Report as offensive     Reply Quote
wateroakley

Send message
Joined: 6 Aug 04
Posts: 195
Credit: 28,549,799
RAC: 8,278
Message 71673 - Posted: 21 Oct 2024, 9:35:33 UTC - in response to Message 71595.  

...There are two key issues: memory required and the size of the checkpoint files.
OpenIFS@60km would have a peak memory requirement of roughly 25Gb. The checkpoint (or restart) files which are normally written periodically would be approx 4Gb. This compares to 6Gb RAM & 1Gb checkpoint filesize for the resolution configurations we have run to date.
The question is how to volunteers feel about this?....
Glenn, that's fine by me.
The VirtualBox VM (ubuntu 22.04 LTS) is presently configured with 5 cores (6 physical cores), 42GB of memory (64GB physical memory) and 160GB disc used (from 900GB).
Thank you.
ID: 71673 · Report as offensive     Reply Quote
Ingleside

Send message
Joined: 5 Aug 04
Posts: 127
Credit: 24,595,258
RAC: 6,633
Message 71674 - Posted: 21 Oct 2024, 17:17:38 UTC
Last modified: 21 Oct 2024, 17:18:50 UTC

My results for running the test, all running through WSL2 + Ubuntu 24.04 on Ryzen 7700, meaning 8 real core and 16 HT threads.
The timings are from the zero-trickle since didn't find any other indication of when model really started. In practice this means total run-time should be roughly 1 minute or something longer.

To keep computer busy ran for most of the tests WAH2 in Windows at the same time.
Table is sorted by increasing "Speed up". Due to the last result, "Speed up" is calculated for finishing 2 models.

Cores + WAH2 --- run-time --- Speed up
1 + 8 WAH2 --- 2.839 hours --- 1.000
1 + 7 WAH2 --- 2.666 hours --- 1.065
4 + 8 WAH2 --- 1.068 hours --- 2.658
8 + 8 WAH2 --- 0.804 hours --- 3.532
4 + 4 WAH2 --- 0.747 hours --- 3.800
16+8 WAH2 --- 0.710 hours --- 3.998
4 + 0 WAH2 --- 0.576 hours --- 4.925
8 + 0 WAH2 --- 0.411 hours --- 6.914
16+0 WAH2 --- 0.403 hours --- 7.043
2 x 4 cores ---- 0.778 hours --- 7.293

A few extra numbers, going 8 + 8 WAH2 to 16 + 8 WAH2 gave 13.2% HT-benefit, possibly due to mixture of WAH2 runs native Windows and OpenIFS virtualized Linux. Keeping it under virtaulized Linux benefit was only 1.9%.

1 OpenIFS + 7 WAH2 to 8 OpenIFS had speed-up 6.493, at least to me this isn't a bad speed-up despite going past 4 cores.

If looks on 4 OpenIFS and otherwise idle computer to 8 OpenIFS the speed-up was 1.404, while for 2x 4 the speed-up was 1.481, this is 5.5% better than 8 core but with the huge penalty of roughly doubling memory usage.
ID: 71674 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 127
Credit: 43,528,552
RAC: 67,420
Message 71675 - Posted: 21 Oct 2024, 17:29:48 UTC - in response to Message 71636.  

It's also important that the option 'leave non-GPU tasks in memory while suspended' is selected for these tasks otherwise the model will be forced to restart from checkpoint files any time it gets suspended.


IIRC, the previous OpenIFS tasks didn't just restart from the checkpoint. Each restart also dumps additional ~800MB of data onto the disk. Same behavior when boinc client restarts. I've seen tasks had multiple restarts and eventually run out of the rsc_disk_bound (?) value and fail.

Would this restart penalty become 4GB too? Would the maximal disk usage per task also be increased to accommodate the much longer run time? While not ideal, I feel it's reasonable for people to turn off computers like once a day and that might be a challenge if just few restarts can fail the task due to disk usage.
ID: 71675 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 71677 - Posted: 24 Oct 2024, 8:52:04 UTC

We reported this issue and David Anderson has now fixed that bug. Andy@CPDN has tested that it works. It will be rolled out with client 8.0.4 and we'll send out a note encouraging people to upgrade.
Some testing that was more to do with website than OIFS, I can confirm that in 8.1.0 at least BOINC now respects the limits defined by the task files. Boinc would only let me run six out of 8 tasks at a time. For this particular configuration there was enough memory to run all 8 as I never dropped below 34% of my 64GB free. I know Andy had tested this but always good to get confirmation.
ID: 71677 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1052
Credit: 16,728,373
RAC: 12,646
Message 71678 - Posted: 24 Oct 2024, 13:06:51 UTC - in response to Message 71664.  

Compared to Glenn's previous data, the higher resolution model scales quite a bit better. If all cores are busy anyway, I only lose half a core worth of compute at 4 threads. What's interesting is that in Glenn's data, a busy host scales worse, but mine scales better, even though the actual runtime are all longer.
We're not running the same test. I only varied the number of single-threaded OpenIFS tasks running at a time; whereas you're running a single multi-threaded task with additional load from a different application. So there's no 'scaling of the model' to speak of in my test.

I demonstrated how running the same number of OpenIFS tasks as available threads is a bad idea. It's best to use the core count as the maximum number of tasks for a floating point heavy codes like atmospheric models. I don't know what SiDock@Home is and whether it has a lot of floating point & dynamic memory use. If it doesn't it might not compete with OpenIFS for resource so much.

Your post does nicely demonstrate it's important to run a few tests to figure out the best combination to maximise the task throughput for whatever project combination you want to run.
---
CPDN Visiting Scientist
ID: 71678 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1052
Credit: 16,728,373
RAC: 12,646
Message 71679 - Posted: 24 Oct 2024, 13:16:47 UTC - in response to Message 71665.  
Last modified: 24 Oct 2024, 13:43:47 UTC

Hi Glenn - if the WU is using virtually all the memory on a machine, why would we worry about the efficiency dropping off? From my PoV, giving the WU all the cores is the best overall performance in this case. The extra cores running at only (e.g.) 20% efficiency, is still more work done per unit time. Or is the synchronisation required really that heavyweight?
You could complete a task faster if you ran on 8 cores instead of 4, but that's wasteful as there will be a lot of idle CPU if the efficiency is as low as 20%. By efficiency I mean 'E = S/N' where S is the speedup on N cores. Synchronisation isn't a problem.

Better to think in terms of throughput; the rate at which a host can complete tasks per day. Projects want the highest return rate of tasks to finish the batch ASAP and that's the way to maximise RAC. With 8 cores, I'd run 2 x 4 core OpenIFS tasks or 1xOpenIFS + 4 other projects if not enough memory, than run 1 x 8 core OpenIFS. I'd get a better throughput of work that way.
---
CPDN Visiting Scientist
ID: 71679 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1052
Credit: 16,728,373
RAC: 12,646
Message 71681 - Posted: 25 Oct 2024, 10:52:27 UTC - in response to Message 71675.  

IIRC, the previous OpenIFS tasks didn't just restart from the checkpoint. Each restart also dumps additional ~800MB of data onto the disk. Same behavior when boinc client restarts.
Not sure what you mean by this. OpenIFS tasks do restart from their checkpoint restart files. The size of the checkpoint restart depends on the configuration. The OpenIFS tasks you've seen so far write 1.1Gb of checkpoint files. This OpenIFS@60km configuration will write 4.3Gb.

I've seen tasks had multiple restarts and eventually run out of the rsc_disk_bound (?) value and fail.
That was a bug in the early version of OpenIFS when it didn't delete the old checkpoint restarts and they were left. That's been fixed a while ago.

Would this restart penalty become 4GB too? Would the maximal disk usage per task also be increased to accommodate the much longer run time?
I'm not sure what you mean by 'restart penalty'? There will be always be 1 set of checkpoint restart files on disk plus momentarily a 2nd set whilst a new one is being written before the old set is deleted. Yes, we'll increase the max disk usage for this config.
---
CPDN Visiting Scientist
ID: 71681 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 127
Credit: 43,528,552
RAC: 67,420
Message 71684 - Posted: 25 Oct 2024, 19:53:24 UTC - in response to Message 71681.  

That was a bug in the early version of OpenIFS when it didn't delete the old checkpoint restarts and they were left. That's been fixed a while ago.

Ah, this explains everything. All my impression regarding the restart disk usage is likely due to this bug in early versions. Thanks for the clarification.
ID: 71684 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 71688 - Posted: 27 Oct 2024, 5:45:33 UTC

Something I noticed with Ubuntu24.10 (May have been the case with 24.04 also) is that the recent kernel 6.11.0-9 in my case is much better/quicker at giving back memory when an application closes than older ones were which is good for these tasks.
ID: 71688 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1052
Credit: 16,728,373
RAC: 12,646
Message 71689 - Posted: 27 Oct 2024, 8:36:10 UTC - in response to Message 71688.  
Last modified: 27 Oct 2024, 8:36:31 UTC

Something I noticed with Ubuntu24.10 (May have been the case with 24.04 also) is that the recent kernel 6.11.0-9 in my case is much better/quicker at giving back memory when an application closes than older ones were which is good for these tasks.
How did you measure it? Am interested in trying.
ID: 71689 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 71690 - Posted: 27 Oct 2024, 9:31:08 UTC - in response to Message 71689.  

How did you measure it? Am interested in trying.

Psensor shows memory being freed immediately on closing down Thunderbird or even closing down a few tabs on Firefox. I used to have to reboot to get all my memory back. I thought to look at it with WCG but their tasks use so little RAM, it is hardly noticeable!
ID: 71690 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1052
Credit: 16,728,373
RAC: 12,646
Message 71691 - Posted: 27 Oct 2024, 10:48:27 UTC - in response to Message 71690.  
Last modified: 27 Oct 2024, 10:50:35 UTC

How did you measure it? Am interested in trying.
Psensor shows memory being freed immediately on closing down Thunderbird or even closing down a few tabs on Firefox. I used to have to reboot to get all my memory back. I thought to look at it with WCG but their tasks use so little RAM, it is hardly noticeable!
Ok. Thunderbird and Firefox use the same underlying technology I think.
---
CPDN Visiting Scientist
ID: 71691 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 71692 - Posted: 27 Oct 2024, 12:52:55 UTC

Ok. Thunderbird and Firefox use the same underlying technology I think.
I suppose it is possible this is just an improvement in the Mozilla code but my unscientific hunch is memory gets freed up more from other applications as well. The trouble is, I didn't understand the stuff about memory in what I tried reading about the changes in the kernel!
ID: 71692 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Feedback on running OpenIFS large memory (16-25 Gb+) configurations requested

©2024 cpdn.org