Message boards :
Number crunching :
Feedback on running OpenIFS large memory (16-25 Gb+) configurations requested
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,780,446 RAC: 19,423 |
I'd say at the very least make the core count user configurable from the get go, regardless of which default you'll choose.I'd prefer not to. From previous experience it's easier to debug remote problems with the setup the same. Options can come later once we are sure it all works satisfactorily. In that case it seems like a default of 4 will get the results back quicker and thus bugs will show up quicker. However, is 4 going to reduce the number of users in any way? If not, then 4 seems to be the better default. |
Send message Joined: 7 Aug 04 Posts: 2185 Credit: 64,822,615 RAC: 5,275 |
Similar performance results for Zen3 when enabling THP. Ryzen 5 5600X 2X16 GB DDR4-3600 Linux Mint 21.03 4C results: 40 min THP disabled 36 min THP enabled so ~10% speedup with THP enabled. |
Send message Joined: 5 Aug 04 Posts: 126 Credit: 24,392,887 RAC: 24,267 |
8C8T 25.32 25.26 25.21 25.26 4.29 8C8T 25.32 25.26 25.21 25.26 5.26 Hmm, is where really no penalty for running 8 cores on a busy computer, or is this an accidental copy+paste? |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 41,539,979 RAC: 58,513 |
8C8T 25.32 25.26 25.21 25.26 4.29 This host only has 8 cores, so running the OpenIFS test with 8 threads is already a fully loaded host. Thus I didn't rerun it again but reused the results just to calculate the new scaling factor against the busy 1C1T result. :-) |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,780,446 RAC: 19,423 |
I'm not Linux savvy but wanted to test THP on my system after reading some posts about it. After looking things up a bit, it seems like THP is enabled by default, at least it is in WSL2 Ubuntu 22.04. Is that not the default setting in most distributions? |
Send message Joined: 7 Aug 04 Posts: 2185 Credit: 64,822,615 RAC: 5,275 |
I'm not Linux savvy but wanted to test THP on my system after reading some posts about it. After looking things up a bit, it seems like THP is enabled by default, at least it is in WSL2 Ubuntu 22.04. Is that not the default setting in most distributions? In Linux Mint 21.3 and Ubuntu 22.04, for my installations, the default setting is "madvise". Changing it to "enabled" improved the performance on my PCs. |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,780,446 RAC: 19,423 |
I'm not Linux savvy but wanted to test THP on my system after reading some posts about it. After looking things up a bit, it seems like THP is enabled by default, at least it is in WSL2 Ubuntu 22.04. Is that not the default setting in most distributions? Ok, so it seems that it's unique to WSL2 to have THP on by default. I disabled it to compare and it's ~7% improvement with THP on, ~88 min vs. ~94 min, using 4 core configuration, on a busy 5900X. |
Send message Joined: 6 Aug 04 Posts: 195 Credit: 28,312,639 RAC: 10,179 |
...There are two key issues: memory required and the size of the checkpoint files.Glenn, that's fine by me. The VirtualBox VM (ubuntu 22.04 LTS) is presently configured with 5 cores (6 physical cores), 42GB of memory (64GB physical memory) and 160GB disc used (from 900GB). Thank you. |
Send message Joined: 5 Aug 04 Posts: 126 Credit: 24,392,887 RAC: 24,267 |
My results for running the test, all running through WSL2 + Ubuntu 24.04 on Ryzen 7700, meaning 8 real core and 16 HT threads. The timings are from the zero-trickle since didn't find any other indication of when model really started. In practice this means total run-time should be roughly 1 minute or something longer. To keep computer busy ran for most of the tests WAH2 in Windows at the same time. Table is sorted by increasing "Speed up". Due to the last result, "Speed up" is calculated for finishing 2 models. Cores + WAH2 --- run-time --- Speed up 1 + 8 WAH2 --- 2.839 hours --- 1.000 1 + 7 WAH2 --- 2.666 hours --- 1.065 4 + 8 WAH2 --- 1.068 hours --- 2.658 8 + 8 WAH2 --- 0.804 hours --- 3.532 4 + 4 WAH2 --- 0.747 hours --- 3.800 16+8 WAH2 --- 0.710 hours --- 3.998 4 + 0 WAH2 --- 0.576 hours --- 4.925 8 + 0 WAH2 --- 0.411 hours --- 6.914 16+0 WAH2 --- 0.403 hours --- 7.043 2 x 4 cores ---- 0.778 hours --- 7.293 A few extra numbers, going 8 + 8 WAH2 to 16 + 8 WAH2 gave 13.2% HT-benefit, possibly due to mixture of WAH2 runs native Windows and OpenIFS virtualized Linux. Keeping it under virtaulized Linux benefit was only 1.9%. 1 OpenIFS + 7 WAH2 to 8 OpenIFS had speed-up 6.493, at least to me this isn't a bad speed-up despite going past 4 cores. If looks on 4 OpenIFS and otherwise idle computer to 8 OpenIFS the speed-up was 1.404, while for 2x 4 the speed-up was 1.481, this is 5.5% better than 8 core but with the huge penalty of roughly doubling memory usage. |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 41,539,979 RAC: 58,513 |
It's also important that the option 'leave non-GPU tasks in memory while suspended' is selected for these tasks otherwise the model will be forced to restart from checkpoint files any time it gets suspended. IIRC, the previous OpenIFS tasks didn't just restart from the checkpoint. Each restart also dumps additional ~800MB of data onto the disk. Same behavior when boinc client restarts. I've seen tasks had multiple restarts and eventually run out of the rsc_disk_bound (?) value and fail. Would this restart penalty become 4GB too? Would the maximal disk usage per task also be increased to accommodate the much longer run time? While not ideal, I feel it's reasonable for people to turn off computers like once a day and that might be a challenge if just few restarts can fail the task due to disk usage. |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,966,742 RAC: 21,869 |
We reported this issue and David Anderson has now fixed that bug. Andy@CPDN has tested that it works. It will be rolled out with client 8.0.4 and we'll send out a note encouraging people to upgrade.Some testing that was more to do with website than OIFS, I can confirm that in 8.1.0 at least BOINC now respects the limits defined by the task files. Boinc would only let me run six out of 8 tasks at a time. For this particular configuration there was enough memory to run all 8 as I never dropped below 34% of my 64GB free. I know Andy had tested this but always good to get confirmation. |
Send message Joined: 29 Oct 17 Posts: 1048 Credit: 16,404,330 RAC: 16,403 |
Compared to Glenn's previous data, the higher resolution model scales quite a bit better. If all cores are busy anyway, I only lose half a core worth of compute at 4 threads. What's interesting is that in Glenn's data, a busy host scales worse, but mine scales better, even though the actual runtime are all longer.We're not running the same test. I only varied the number of single-threaded OpenIFS tasks running at a time; whereas you're running a single multi-threaded task with additional load from a different application. So there's no 'scaling of the model' to speak of in my test. I demonstrated how running the same number of OpenIFS tasks as available threads is a bad idea. It's best to use the core count as the maximum number of tasks for a floating point heavy codes like atmospheric models. I don't know what SiDock@Home is and whether it has a lot of floating point & dynamic memory use. If it doesn't it might not compete with OpenIFS for resource so much. Your post does nicely demonstrate it's important to run a few tests to figure out the best combination to maximise the task throughput for whatever project combination you want to run. --- CPDN Visiting Scientist |
Send message Joined: 29 Oct 17 Posts: 1048 Credit: 16,404,330 RAC: 16,403 |
Hi Glenn - if the WU is using virtually all the memory on a machine, why would we worry about the efficiency dropping off? From my PoV, giving the WU all the cores is the best overall performance in this case. The extra cores running at only (e.g.) 20% efficiency, is still more work done per unit time. Or is the synchronisation required really that heavyweight?You could complete a task faster if you ran on 8 cores instead of 4, but that's wasteful as there will be a lot of idle CPU if the efficiency is as low as 20%. By efficiency I mean 'E = S/N' where S is the speedup on N cores. Synchronisation isn't a problem. Better to think in terms of throughput; the rate at which a host can complete tasks per day. Projects want the highest return rate of tasks to finish the batch ASAP and that's the way to maximise RAC. With 8 cores, I'd run 2 x 4 core OpenIFS tasks or 1xOpenIFS + 4 other projects if not enough memory, than run 1 x 8 core OpenIFS. I'd get a better throughput of work that way. --- CPDN Visiting Scientist |
Send message Joined: 29 Oct 17 Posts: 1048 Credit: 16,404,330 RAC: 16,403 |
IIRC, the previous OpenIFS tasks didn't just restart from the checkpoint. Each restart also dumps additional ~800MB of data onto the disk. Same behavior when boinc client restarts.Not sure what you mean by this. OpenIFS tasks do restart from their checkpoint restart files. The size of the checkpoint restart depends on the configuration. The OpenIFS tasks you've seen so far write 1.1Gb of checkpoint files. This OpenIFS@60km configuration will write 4.3Gb. I've seen tasks had multiple restarts and eventually run out of the rsc_disk_bound (?) value and fail.That was a bug in the early version of OpenIFS when it didn't delete the old checkpoint restarts and they were left. That's been fixed a while ago. Would this restart penalty become 4GB too? Would the maximal disk usage per task also be increased to accommodate the much longer run time?I'm not sure what you mean by 'restart penalty'? There will be always be 1 set of checkpoint restart files on disk plus momentarily a 2nd set whilst a new one is being written before the old set is deleted. Yes, we'll increase the max disk usage for this config. --- CPDN Visiting Scientist |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 41,539,979 RAC: 58,513 |
That was a bug in the early version of OpenIFS when it didn't delete the old checkpoint restarts and they were left. That's been fixed a while ago. Ah, this explains everything. All my impression regarding the restart disk usage is likely due to this bug in early versions. Thanks for the clarification. |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,966,742 RAC: 21,869 |
Something I noticed with Ubuntu24.10 (May have been the case with 24.04 also) is that the recent kernel 6.11.0-9 in my case is much better/quicker at giving back memory when an application closes than older ones were which is good for these tasks. |
Send message Joined: 29 Oct 17 Posts: 1048 Credit: 16,404,330 RAC: 16,403 |
Something I noticed with Ubuntu24.10 (May have been the case with 24.04 also) is that the recent kernel 6.11.0-9 in my case is much better/quicker at giving back memory when an application closes than older ones were which is good for these tasks.How did you measure it? Am interested in trying. |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,966,742 RAC: 21,869 |
How did you measure it? Am interested in trying. Psensor shows memory being freed immediately on closing down Thunderbird or even closing down a few tabs on Firefox. I used to have to reboot to get all my memory back. I thought to look at it with WCG but their tasks use so little RAM, it is hardly noticeable! |
Send message Joined: 29 Oct 17 Posts: 1048 Credit: 16,404,330 RAC: 16,403 |
Ok. Thunderbird and Firefox use the same underlying technology I think.How did you measure it? Am interested in trying.Psensor shows memory being freed immediately on closing down Thunderbird or even closing down a few tabs on Firefox. I used to have to reboot to get all my memory back. I thought to look at it with WCG but their tasks use so little RAM, it is hardly noticeable! --- CPDN Visiting Scientist |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,966,742 RAC: 21,869 |
Ok. Thunderbird and Firefox use the same underlying technology I think.I suppose it is possible this is just an improvement in the Mozilla code but my unscientific hunch is memory gets freed up more from other applications as well. The trouble is, I didn't understand the stuff about memory in what I tried reading about the changes in the kernel! |
©2024 cpdn.org