Message boards : Number crunching : Dr Lisa Su says up to 192MB L3 on newer Ryzen -- hope it's true
Message board moderation
Author | Message |
---|---|
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
There's been some discussion here about the high demand for L3 cache with many recent climate models. If this leaked unverified "news" turns out true -- No links here, totally unconfirmed. But good news, if/when it happens. e |
Send message Joined: 15 May 09 Posts: 4537 Credit: 19,001,532 RAC: 21,726 |
There's been some discussion here about the high demand for L3 cache with many recent climate models. Saw this on Tom's Hardware site. However having just upgraded to a Ryzen7 I suspect one of these will be beyond my price range when available. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
Well if Dr. Lisa Su says so, I would say that it is confirmed enough. https://www.tomshardware.com/amp/news/amd-shows-new-3d-v-cache-ryzen-chiplets-up-to-192mb-of-l3-cache-per-chip-15-gaming-improvement I was planning on a Ryzen 5900X towards the end of this year anyway. I wonder how much this will add to the cost? There are a number of projects that could use more cache these days. |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
They've demonstrated it. https://www.anandtech.com/show/16725/amd-demonstrates-stacked-vcache-technology-2-tbsec-for-15-gaming And according to their presentation, they intend to put it into production.
Now, what that will be in, or what it will cost, is up in the air. But it certainly looks like something that's coming, and I agree, it's very exciting. I've been messing around with some older Intel eDRAM chips for CPDN, and it seems to help, but I definitely can't run 8 threads of CPDN with very good performance, even with 128MB L4. My turnaround time on the N216s is up towards 2 months wall clock, which I've been told is fine, just... it takes me a while, since I don't run most of the workloads overnight (solar powered off grid office, using the surplus for BOINC). |
Send message Joined: 28 May 17 Posts: 49 Credit: 17,295,644 RAC: 6,078 |
Several die of near bleeding edge SRAM at 6mm/sq. It's not going to be cheap |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I definitely can't run 8 threads of CPDN with very good performance, even with 128MB L4. My turnaround time on the N216s is up towards 2 months wall clock, which I've been told is fine, My machine has about 64 Bytes of RAM and it runs N216 models, four at a time usually, at about 8 1/2 days each. I normally run 8 threads of BOINC, but Ilimit CPDN to 4. Machine Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7] Number of processors 16 [8 real, 8 hyperthreaded] Red Hat Enterprise Linux 8.4 (Ootpa) [4.18.0-305.7.1.el8_4.x86_64|libc 2.28 (GNU libc)] BOINC version 7.16.11 Memory 62.4 GB Cache 16896 KB |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
Now, for the real kicker. The 5800X3D has a 96MB L3 cache (32+64) as compared to the 32MB cache found on the standard 5800X. That’s a whole 64MB more, putting it above even the Ryzen 9 5950X in terms of cache.https://appuals.com/ryzen-7-5800x3d-is-the-first-ryzen-chip-to-use-the-3d-v-cache-tech-and-its-faster-than-the-core-i9-12900k/ Looks good to me. I would go for it if OpenIFS ever comes along, and needs a lot of cache. But the glaciers may have receded by then anyway. |
Send message Joined: 15 May 09 Posts: 4537 Credit: 19,001,532 RAC: 21,726 |
Looks good to me. I would go for it if OpenIFS ever comes along, and needs a lot of cache.I haven't noticed in testing the OpenIFS needing lots of cache, just lots of RAM though I suppose increasing the cache might reduce how often stuff (technical term) gets swapped out to RAM. Increasing the speed of RAM I guess could also make a significant difference. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I haven't noticed in testing the OpenIFS needing lots of cache, just lots of RAM though I suppose increasing the cache might reduce how often stuff (technical term) gets swapped out to RAM. Increasing the speed of RAM I guess could also make a significant difference. How do you measure your cache consumption? Here is what my machine looks like: Computer 1511241 CPU type GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7] Number of processors 16 Operating System Linux Red Hat Enterprise Linux Red Hat Enterprise Linux 8.5 (Ootpa) [4.18.0-348.12.2.el8_5.x86_64|libc 2.28 (GNU libc)] BOINC version 7.16.11 Memory 62.4 GB Cache 16896 KB At the moment, I am running the following BOINC processes (and not much else): # ps -fu boinc UID PID PPID C STIME TTY TIME CMD boinc 19484 1 0 Jan23 ? 00:02:19 /usr/bin/boinc boinc 45446 19484 0 Jan23 ? 00:01:29 ../../projects/climateprediction.net/hadam4_8.52_i686-pc-linux-gnu hadam4h_20l6_209402_4 boinc 45448 19484 0 Jan23 ? 00:01:33 ../../projects/climateprediction.net/hadam4_8.52_i686-pc-linux-gnu hadam4h_h1av_200602_4 boinc 45453 19484 0 Jan23 ? 00:01:29 ../../projects/climateprediction.net/hadam4_8.52_i686-pc-linux-gnu hadam4h_h06e_201108_4 boinc 45457 45446 97 Jan23 ? 1-07:57:44 /var/lib/boinc/projects/climateprediction.net/hadam4_um_8.52_i686-pc-linux-gnu 175955 boinc 45473 45448 96 Jan23 ? 1-07:49:05 /var/lib/boinc/projects/climateprediction.net/hadam4_um_8.52_i686-pc-linux-gnu 181965 boinc 45477 45453 95 Jan23 ? 1-07:21:02 /var/lib/boinc/projects/climateprediction.net/hadam4_um_8.52_i686-pc-linux-gnu 178635 boinc 138843 19484 99 Jan24 ? 09:02:38 ../../projects/www.worldcommunitygrid.org/wcgrid_arp1_wrf_7.32_i686-pc-linux-gnu boinc 168519 19484 99 06:20 ? 00:47:13 ../../projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu -Sett boinc 168580 19484 99 06:22 ? 00:45:23 ../../projects/www.worldcommunitygrid.org/wcgrid_opn1_autodock_7.21_x86_64-pc-linux-gnu boinc 170495 19484 99 06:51 ? 00:16:38 ../../projects/www.worldcommunitygrid.org/wcgrid_opn1_autodock_7.21_x86_64-pc-linux-gnu boinc 171173 19484 99 07:01 ? 00:06:28 ../../projects/universeathome.pl_universe/BHspin2_19_x86_64-pc-linux-gnu And my cache is supplying about half the requested memory references: # perf stat -aB -e cache-references,cache-misses ^C Performance counter stats for 'system wide': 33,364,888,278 cache-references 17,805,920,648 cache-misses # 53.367 % of all cache refs 64.185688537 seconds time elapsed I suppose the instructions are mostly in the cache, and very little of the data are in there. Increasing the speed of the RAM can help only to the extent that the processor(s) (including the associated chip set) can use the information; there could be some improvement there if you put slower RAM on your machine that it could use. But who does that? Only other way to speed up the RAM is replacement of the whole computer. Are you confusing the speed between the cache and the RAM with the speed between RAM and the swap space? If you are using a lot of swap space, you certainly do need more RAM for your task load. |
Send message Joined: 15 May 09 Posts: 4537 Credit: 19,001,532 RAC: 21,726 |
How do you measure your cache consumption? Normally just use the free command. I have to look it up any time I want to look at what applications are using cache because I don't use it often enough to remember it. I think I had to actually install something to be able to see that. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
How do you measure your cache consumption? The free command tells you nothing about your processor cache; it gives you the use of the RAM in the first line and the amount of swap space used in the second line. So below I have 55.9 GBytes of RAM available although most of it is currently used as an input data cache (input from hard drives). As far as swap space is concerned, I seem to be using 47 megabytes of disk for that: negligible. $ free total used free shared buff/cache available Mem: 65435804 8673136 1682200 99192 55080468 55932264 Swap: 16375804 47104 16328700 If you want to see your usage of the processor cache, you need the perf command as shown in my previous post. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
And my cache is supplying about half the requested memory references: Thanks, I have been trying to measure my cache in Ubuntu. I was not able to get that command to fly on my Ryzen 3600 with Ubuntu 20.04.3, but that is not my concern. I probably could with some work. I have been using the "cachestat" command, but am not quite sure how to interpret the results. When running the HadAM4 (N216) on 8 cores, plus two Rosetta pythons on 2 cores (85% of the cores), I see: $ sudo ./cachestat Counting cache functions... Output every 1 seconds. HITS MISSES DIRTIES RATIO BUFFERS_MB CACHE_MB 18658 0 45 100.0% 139 36237 57334 0 43 100.0% 139 36237 30930 0 26 100.0% 139 36237 21124 0 31 100.0% 139 36237 92343 0 108 100.0% 139 36237 26557 0 75 100.0% 139 36237 25485 0 26 100.0% 139 36237 97719 2 75 100.0% 139 36237 21042 0 25 100.0% 139 36237 38118 0 60 100.0% 139 36237 46525 0 29 100.0% 139 36237 25127 0 44 100.0% 139 36237 98529 0 64 100.0% 139 36237 25745 1 15 100.0% 139 36237 23106 0 66 100.0% 139 36237 92583 0 72 100.0% 139 36237 8580 0 50 100.0% 139 36237 38967 0 55 100.0% 139 36237 64163 0 43 100.0% 139 36237 25698 0 29 100.0% 139 36237 86728 0 61 100.0% 139 36237 24077 0 44 100.0% 139 36237 21742 0 17 100.0% 139 36237 77441 0 63 100.0% 139 36237 26411 0 38 100.0% 139 36237 18575 0 24 100.0% 139 36237 85779 0 60 100.0% 139 36237 29630 0 34 100.0% 139 36237 41840 0 45 100.0% 139 36237 30779 0 81 100.0% 139 36238 35510 0 44 100.0% 139 36238 98186 0 109 100.0% 139 36238 20706 0 17 100.0% 139 36238 16524 0 9 100.0% 139 36238 72171 0 54 100.0% 139 36238 1469 0 13 100.0% 139 36238 43854 0 66 100.0% 140 36238 52263 0 39 100.0% 140 36238 My guess is that my cache hits are not really 100%, but probably more in line with what you see. But if you want to try it, you can install it as follows: To install perf-tools, open terminal and run: sudo apt-get install linux-tools-common linux-tools-generic Then, to install cachestat, run: wget https://raw.githubusercontent.com/brendangregg/perf-tools/master/fs/cachestat To make it executable, run: chmod +x cachestat Finally run it: sudo ./cachestat It probably is not measuring the CPU cache. I have a large write-cache (12.5 GB) in main memory (DDR4), and that may be what it is seeing. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
It probably is not measuring the CPU cache. I have a large write-cache (12.5 GB) in main memory (DDR4), and that may be what it is seeing. I do not know what you call a CPU cache. I infer you refer to the part of your RAM that is currently devoted to that purpose. In a normally running modern Linux, (almost) all RAM not used for something else is given over to the disk input cache. Anytime the kernel wants more RAM for a process, it can grab it from the disk input cache. If that is not enough, it can get it from the output buffer, but it would have to write it out first. And I suppose cachestat can tell you about that, but it is deprecated and not available for my distro. It seems to me that by the time you need a tool like that, you have long since passed the point where you seriously should increase the size of your RAM. So it seems that you still need to find a version of perf that will run on your system. # perf stat -aB -e cache-references,cache-misses |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
I do not know what you call a CPU cache. I infer you refer to the part of your RAM that is currently devoted to that purpose.No, it is the cache on the CPU itself. A Ryzen 3600 has Total L1 Cache: 384KB Total L2 Cache: 3MB Total L3 Cache: 32MB It is the L3 cache that distinguishes one CPU from another, and largely determines how many work units you should run at a time so that they fit mainly in the cache. I usually run six of the N216 for that purpose, though running eight may give slightly more output. But beyond a certain point, the total output actually decreases. In a normally running modern Linux, (almost) all RAM not used for something else is given over to the disk input cache. Anytime the kernel wants more RAM for a process, it can grab it from the disk input cache. If that is not enough, it can get it from the output buffer, but it would have to write it out first. And I suppose cachestat can tell you about that, but it is deprecated and not available for my distro. It seems to me that by the time you need a tool like that, you have long since passed the point where you seriously should increase the size of your RAM.I have 64 GB on the Ryzen 3600, so however Linux handles it, that is more than enough. It is the on-chip CPU cache that I need to monitor. Maybe perf can do it. I will look some more. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
No, it is the cache on the CPU itself. A Ryzen 3600 has I do not doubt your processor is as you say. My (by comparison) little Intel Xeon W-2245 is like this: Level 1 cache size 8 x 32 KB 8-way set associative instruction caches 8 x 32 KB 8-way set associative data caches Level 2 cache size 8 x 1 MB 16-way set associative caches Level 3 cache size 16.5 MB I suppose those L1 caches are one per (real) core and the L2 caches are one per core (real or hyperthreaded). So ideally, I would like the working set of instructions (the "inner loop" to fit into the L1 instruction cache or, lacking that, into the L2 cache. I wonder about my L3 cache size. Why is it not 16.384 MB? Why is it 16.384+0.512 MB? If this web page correctly describes cachestat, it is concerned with paging disk pages into RAM, not paging regular RAM into the L1, L2, or L3 caches. https://www.brendangregg.com/blog/2021-08-30/high-rate-of-paging.html |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
In my case, simply installing linux-tools-common linux-tools-generic which should link to the latest kernel tools did not work using perf pointed to possible missing tool libraries, and looking at my current kernel number and available packages I went to add linux-tools-generic-hwe-20.04 which points to the latest kernel Then ran perf as superuser and it showed this for my Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz Performance counter stats for 'system wide': 12,511,363,439 cache-references 6,135,922,943 cache-misses # 49.043 % of all cache refs 73.725181985 seconds time elapsed I run 1/2 of the cores = 4 CPDN WUs, RAM is 16Gb |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Then ran perf as superuser and it showed this for my Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz Very much like what I get with four times the amount of RAM. I, too, use half my cores for Boinc. Right now, It is set up to run at most 4 CPDN work units, at most 5 WCG work units, and a few rosetta and universe work units. $ ps -fu boinc UID PID PPID C STIME TIME CMD boinc 19484 1 0 Jan23 00:04:09 /usr/bin/boinc boinc 45446 19484 0 Jan23 00:01:59 ../../projects/climateprediction.net/hadam4_8.52_i686-pc-linux-gnu hadam4h_20l6_209402_4 boinc 45448 19484 0 Jan23 00:02:56 ../../projects/climateprediction.net/hadam4_8.52_i686-pc-linux-gnu hadam4h_h1av_200602_4 boinc 45453 19484 0 Jan23 00:02:01 ../../projects/climateprediction.net/hadam4_8.52_i686-pc-linux-gnu hadam4h_h06e_201108_4 boinc 45457 45446 96 Jan23 2-11:00:37 /var/lib/boinc/projects/climateprediction.net/hadam4_um_8.52_i686-pc-linux-gnu 175955 boinc 45473 45448 96 Jan23 2-11:25:40 /var/lib/boinc/projects/climateprediction.net/hadam4_um_8.52_i686-pc-linux-gnu 181965 boinc 45477 45453 95 Jan23 2-10:18:10 /var/lib/boinc/projects/climateprediction.net/hadam4_um_8.52_i686-pc-linux-gnu 178635 boinc 235585 19484 98 02:40 08:53:06 ../../projects/www.worldcommunitygrid.org/wcgrid_arp1_wrf_7.32_i686-pc-linux-gnu boinc 260823 19484 98 09:54 01:43:48 ../../projects/www.worldcommunitygrid.org/wcgrid_mcm1_map_7.61_x86_64-pc-linux-gnu -Sett boinc 263368 19484 98 10:33 01:04:29 ../../projects/www.worldcommunitygrid.org/wcgrid_opn1_autodock_7.21_x86_64-pc-linux-gnu boinc 264875 19484 98 10:55 00:43:37 ../../projects/www.worldcommunitygrid.org/wcgrid_opn1_autodock_7.21_x86_64-pc-linux-gnu boinc 267032 19484 98 11:29 00:09:32 ../../projects/universeathome.pl_universe/BHspin2_19_x86_64-pc-linux-gnu CPU type GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7] Number of processors 16 Operating System Red Hat Enterprise Linux 8.5 (Ootpa) [4.18.0-348.12.2.el8_5.x86_64|libc 2.28 (GNU libc)] BOINC version 7.16.11 Memory 62.4 GB Cache 16896 KB # perf stat -aB -e cache-references,cache-misses Performance counter stats for 'system wide': 33,368,527,491 cache-references 18,222,615,823 cache-misses # 54.610 % of all cache refs 59.656775576 seconds time elapsed |
©2024 cpdn.org