Thread 'Feedback on running OpenIFS large memory (16-25 Gb+) configurations requested'

Author	Message
AndreyOR Send message Joined: 12 Apr 21 Posts: 319 Credit: 15,031,602 RAC: 4,207	Message 71666 - Posted: 18 Oct 2024, 21:20:19 UTC - in response to Message 71660. Last modified: 18 Oct 2024, 21:20:54 UTC I'd say at the very least make the core count user configurable from the get go, regardless of which default you'll choose. I'd prefer not to. From previous experience it's easier to debug remote problems with the setup the same. Options can come later once we are sure it all works satisfactorily. In that case it seems like a default of 4 will get the results back quicker and thus bugs will show up quicker. However, is 4 going to reduce the number of users in any way? If not, then 4 seems to be the better default. ID: 71666 · Reply Quote

geophi Volunteer moderator Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275	Message 71667 - Posted: 19 Oct 2024, 20:37:45 UTC Similar performance results for Zen3 when enabling THP. Ryzen 5 5600X 2X16 GB DDR4-3600 Linux Mint 21.03 4C results: 40 min THP disabled 36 min THP enabled so ~10% speedup with THP enabled. ID: 71667 · Reply Quote

Ingleside Send message Joined: 5 Aug 04 Posts: 127 Credit: 24,864,594 RAC: 2,613	Message 71668 - Posted: 19 Oct 2024, 22:44:28 UTC - in response to Message 71664. 8C8T 25.32 25.26 25.21 25.26 4.29 8C8T 25.32 25.26 25.21 25.26 5.26 Hmm, is where really no penalty for running 8 cores on a busy computer, or is this an accidental copy+paste? ID: 71668 · Reply Quote

wujj123456 Send message Joined: 14 Sep 08 Posts: 130 Credit: 44,254,664 RAC: 9,487	Message 71669 - Posted: 19 Oct 2024, 23:42:29 UTC - in response to Message 71668. Last modified: 19 Oct 2024, 23:44:56 UTC 8C8T 25.32 25.26 25.21 25.26 4.29 8C8T 25.32 25.26 25.21 25.26 5.26 Hmm, is where really no penalty for running 8 cores on a busy computer, or is this an accidental copy+paste? This host only has 8 cores, so running the OpenIFS test with 8 threads is already a fully loaded host. Thus I didn't rerun it again but reused the results just to calculate the new scaling factor against the busy 1C1T result. :-) ID: 71669 · Reply Quote

AndreyOR Send message Joined: 12 Apr 21 Posts: 319 Credit: 15,031,602 RAC: 4,207	Message 71670 - Posted: 20 Oct 2024, 6:24:24 UTC - in response to Message 71664. I'm not Linux savvy but wanted to test THP on my system after reading some posts about it. After looking things up a bit, it seems like THP is enabled by default, at least it is in WSL2 Ubuntu 22.04. Is that not the default setting in most distributions? ID: 71670 · Reply Quote

geophi Volunteer moderator Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275	Message 71671 - Posted: 20 Oct 2024, 15:03:06 UTC - in response to Message 71670. I'm not Linux savvy but wanted to test THP on my system after reading some posts about it. After looking things up a bit, it seems like THP is enabled by default, at least it is in WSL2 Ubuntu 22.04. Is that not the default setting in most distributions? In Linux Mint 21.3 and Ubuntu 22.04, for my installations, the default setting is "madvise". Changing it to "enabled" improved the performance on my PCs. ID: 71671 · Reply Quote

AndreyOR Send message Joined: 12 Apr 21 Posts: 319 Credit: 15,031,602 RAC: 4,207	Message 71672 - Posted: 21 Oct 2024, 7:07:08 UTC - in response to Message 71671. I'm not Linux savvy but wanted to test THP on my system after reading some posts about it. After looking things up a bit, it seems like THP is enabled by default, at least it is in WSL2 Ubuntu 22.04. Is that not the default setting in most distributions? In Linux Mint 21.3 and Ubuntu 22.04, for my installations, the default setting is "madvise". Changing it to "enabled" improved the performance on my PCs. Ok, so it seems that it's unique to WSL2 to have THP on by default. I disabled it to compare and it's ~7% improvement with THP on, ~88 min vs. ~94 min, using 4 core configuration, on a busy 5900X. ID: 71672 · Reply Quote

wateroakley Send message Joined: 6 Aug 04 Posts: 195 Credit: 28,836,638 RAC: 3,986	Message 71673 - Posted: 21 Oct 2024, 9:35:33 UTC - in response to Message 71595. ...There are two key issues: memory required and the size of the checkpoint files. OpenIFS@60km would have a peak memory requirement of roughly 25Gb. The checkpoint (or restart) files which are normally written periodically would be approx 4Gb. This compares to 6Gb RAM & 1Gb checkpoint filesize for the resolution configurations we have run to date. The question is how to volunteers feel about this?.... Glenn, that's fine by me. The VirtualBox VM (ubuntu 22.04 LTS) is presently configured with 5 cores (6 physical cores), 42GB of memory (64GB physical memory) and 160GB disc used (from 900GB). Thank you. ID: 71673 · Reply Quote

Ingleside Send message Joined: 5 Aug 04 Posts: 127 Credit: 24,864,594 RAC: 2,613	Message 71674 - Posted: 21 Oct 2024, 17:17:38 UTC Last modified: 21 Oct 2024, 17:18:50 UTC My results for running the test, all running through WSL2 + Ubuntu 24.04 on Ryzen 7700, meaning 8 real core and 16 HT threads. The timings are from the zero-trickle since didn't find any other indication of when model really started. In practice this means total run-time should be roughly 1 minute or something longer. To keep computer busy ran for most of the tests WAH2 in Windows at the same time. Table is sorted by increasing "Speed up". Due to the last result, "Speed up" is calculated for finishing 2 models. Cores + WAH2 --- run-time --- Speed up 1 + 8 WAH2 --- 2.839 hours --- 1.000 1 + 7 WAH2 --- 2.666 hours --- 1.065 4 + 8 WAH2 --- 1.068 hours --- 2.658 8 + 8 WAH2 --- 0.804 hours --- 3.532 4 + 4 WAH2 --- 0.747 hours --- 3.800 16+8 WAH2 --- 0.710 hours --- 3.998 4 + 0 WAH2 --- 0.576 hours --- 4.925 8 + 0 WAH2 --- 0.411 hours --- 6.914 16+0 WAH2 --- 0.403 hours --- 7.043 2 x 4 cores ---- 0.778 hours --- 7.293 A few extra numbers, going 8 + 8 WAH2 to 16 + 8 WAH2 gave 13.2% HT-benefit, possibly due to mixture of WAH2 runs native Windows and OpenIFS virtualized Linux. Keeping it under virtaulized Linux benefit was only 1.9%. 1 OpenIFS + 7 WAH2 to 8 OpenIFS had speed-up 6.493, at least to me this isn't a bad speed-up despite going past 4 cores. If looks on 4 OpenIFS and otherwise idle computer to 8 OpenIFS the speed-up was 1.404, while for 2x 4 the speed-up was 1.481, this is 5.5% better than 8 core but with the huge penalty of roughly doubling memory usage. ID: 71674 · Reply Quote

wujj123456 Send message Joined: 14 Sep 08 Posts: 130 Credit: 44,254,664 RAC: 9,487	Message 71675 - Posted: 21 Oct 2024, 17:29:48 UTC - in response to Message 71636. It's also important that the option 'leave non-GPU tasks in memory while suspended' is selected for these tasks otherwise the model will be forced to restart from checkpoint files any time it gets suspended. IIRC, the previous OpenIFS tasks didn't just restart from the checkpoint. Each restart also dumps additional ~800MB of data onto the disk. Same behavior when boinc client restarts. I've seen tasks had multiple restarts and eventually run out of the rsc_disk_bound (?) value and fail. Would this restart penalty become 4GB too? Would the maximal disk usage per task also be increased to accommodate the much longer run time? While not ideal, I feel it's reasonable for people to turn off computers like once a day and that might be a challenge if just few restarts can fail the task due to disk usage. ID: 71675 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4559 Credit: 19,039,635 RAC: 18,944	Message 71677 - Posted: 24 Oct 2024, 8:52:04 UTC We reported this issue and David Anderson has now fixed that bug. Andy@CPDN has tested that it works. It will be rolled out with client 8.0.4 and we'll send out a note encouraging people to upgrade. Some testing that was more to do with website than OIFS, I can confirm that in 8.1.0 at least BOINC now respects the limits defined by the task files. Boinc would only let me run six out of 8 tasks at a time. For this particular configuration there was enough memory to run all 8 as I never dropped below 34% of my 64GB free. I know Andy had tested this but always good to get confirmation. ID: 71677 · Reply Quote

Glenn Carver Send message Joined: 29 Oct 17 Posts: 1067 Credit: 17,020,946 RAC: 5,160	Message 71678 - Posted: 24 Oct 2024, 13:06:51 UTC - in response to Message 71664. Compared to Glenn's previous data, the higher resolution model scales quite a bit better. If all cores are busy anyway, I only lose half a core worth of compute at 4 threads. What's interesting is that in Glenn's data, a busy host scales worse, but mine scales better, even though the actual runtime are all longer. We're not running the same test. I only varied the number of single-threaded OpenIFS tasks running at a time; whereas you're running a single multi-threaded task with additional load from a different application. So there's no 'scaling of the model' to speak of in my test. I demonstrated how running the same number of OpenIFS tasks as available threads is a bad idea. It's best to use the core count as the maximum number of tasks for a floating point heavy codes like atmospheric models. I don't know what SiDock@Home is and whether it has a lot of floating point & dynamic memory use. If it doesn't it might not compete with OpenIFS for resource so much. Your post does nicely demonstrate it's important to run a few tests to figure out the best combination to maximise the task throughput for whatever project combination you want to run. --- CPDN Visiting Scientist ID: 71678 · Reply Quote

Glenn Carver Send message Joined: 29 Oct 17 Posts: 1067 Credit: 17,020,946 RAC: 5,160	Message 71679 - Posted: 24 Oct 2024, 13:16:47 UTC - in response to Message 71665. Last modified: 24 Oct 2024, 13:43:47 UTC Hi Glenn - if the WU is using virtually all the memory on a machine, why would we worry about the efficiency dropping off? From my PoV, giving the WU all the cores is the best overall performance in this case. The extra cores running at only (e.g.) 20% efficiency, is still more work done per unit time. Or is the synchronisation required really that heavyweight? You could complete a task faster if you ran on 8 cores instead of 4, but that's wasteful as there will be a lot of idle CPU if the efficiency is as low as 20%. By efficiency I mean 'E = S/N' where S is the speedup on N cores. Synchronisation isn't a problem. Better to think in terms of throughput; the rate at which a host can complete tasks per day. Projects want the highest return rate of tasks to finish the batch ASAP and that's the way to maximise RAC. With 8 cores, I'd run 2 x 4 core OpenIFS tasks or 1xOpenIFS + 4 other projects if not enough memory, than run 1 x 8 core OpenIFS. I'd get a better throughput of work that way. --- CPDN Visiting Scientist ID: 71679 · Reply Quote

Glenn Carver Send message Joined: 29 Oct 17 Posts: 1067 Credit: 17,020,946 RAC: 5,160	Message 71681 - Posted: 25 Oct 2024, 10:52:27 UTC - in response to Message 71675. IIRC, the previous OpenIFS tasks didn't just restart from the checkpoint. Each restart also dumps additional ~800MB of data onto the disk. Same behavior when boinc client restarts. Not sure what you mean by this. OpenIFS tasks do restart from their checkpoint restart files. The size of the checkpoint restart depends on the configuration. The OpenIFS tasks you've seen so far write 1.1Gb of checkpoint files. This OpenIFS@60km configuration will write 4.3Gb. I've seen tasks had multiple restarts and eventually run out of the rsc_disk_bound (?) value and fail. That was a bug in the early version of OpenIFS when it didn't delete the old checkpoint restarts and they were left. That's been fixed a while ago. Would this restart penalty become 4GB too? Would the maximal disk usage per task also be increased to accommodate the much longer run time? I'm not sure what you mean by 'restart penalty'? There will be always be 1 set of checkpoint restart files on disk plus momentarily a 2nd set whilst a new one is being written before the old set is deleted. Yes, we'll increase the max disk usage for this config. --- CPDN Visiting Scientist ID: 71681 · Reply Quote

wujj123456 Send message Joined: 14 Sep 08 Posts: 130 Credit: 44,254,664 RAC: 9,487	Message 71684 - Posted: 25 Oct 2024, 19:53:24 UTC - in response to Message 71681. That was a bug in the early version of OpenIFS when it didn't delete the old checkpoint restarts and they were left. That's been fixed a while ago. Ah, this explains everything. All my impression regarding the restart disk usage is likely due to this bug in early versions. Thanks for the clarification. ID: 71684 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4559 Credit: 19,039,635 RAC: 18,944	Message 71688 - Posted: 27 Oct 2024, 5:45:33 UTC Something I noticed with Ubuntu24.10 (May have been the case with 24.04 also) is that the recent kernel 6.11.0-9 in my case is much better/quicker at giving back memory when an application closes than older ones were which is good for these tasks. ID: 71688 · Reply Quote

Glenn Carver Send message Joined: 29 Oct 17 Posts: 1067 Credit: 17,020,946 RAC: 5,160	Message 71689 - Posted: 27 Oct 2024, 8:36:10 UTC - in response to Message 71688. Last modified: 27 Oct 2024, 8:36:31 UTC Something I noticed with Ubuntu24.10 (May have been the case with 24.04 also) is that the recent kernel 6.11.0-9 in my case is much better/quicker at giving back memory when an application closes than older ones were which is good for these tasks. How did you measure it? Am interested in trying. ID: 71689 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4559 Credit: 19,039,635 RAC: 18,944	Message 71690 - Posted: 27 Oct 2024, 9:31:08 UTC - in response to Message 71689. How did you measure it? Am interested in trying. Psensor shows memory being freed immediately on closing down Thunderbird or even closing down a few tabs on Firefox. I used to have to reboot to get all my memory back. I thought to look at it with WCG but their tasks use so little RAM, it is hardly noticeable! ID: 71690 · Reply Quote

Glenn Carver Send message Joined: 29 Oct 17 Posts: 1067 Credit: 17,020,946 RAC: 5,160	Message 71691 - Posted: 27 Oct 2024, 10:48:27 UTC - in response to Message 71690. Last modified: 27 Oct 2024, 10:50:35 UTC How did you measure it? Am interested in trying. Psensor shows memory being freed immediately on closing down Thunderbird or even closing down a few tabs on Firefox. I used to have to reboot to get all my memory back. I thought to look at it with WCG but their tasks use so little RAM, it is hardly noticeable! Ok. Thunderbird and Firefox use the same underlying technology I think. --- CPDN Visiting Scientist ID: 71691 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4559 Credit: 19,039,635 RAC: 18,944	Message 71692 - Posted: 27 Oct 2024, 12:52:55 UTC Ok. Thunderbird and Firefox use the same underlying technology I think. I suppose it is possible this is just an improvement in the Mozilla code but my unscientific hunch is memory gets freed up more from other applications as well. The trouble is, I didn't understand the stuff about memory in what I tried reading about the changes in the kernel! ID: 71692 · Reply Quote