Message boards : Number crunching : Excessive checkpointing on new Linux hadcm3s tasks?
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
You're right about that. The first is the time since the machine was booted up. Then the succeeding ones are at the spacing specified. Here is mine for a few hours. sdd is the partition with /home/boinc on it I have two hadcm3s tasks running on it, and one Rosetta Mini 3.78 and one WCG OpenZika 7.20 running. Is two GBytes/hour too many? My interval is set to 600 seconds, I believe. avg-cpu: %user %nice %system %iowait %steal %idle 5.29 93.54 1.03 0.05 0.00 0.09 Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn sdd 9.12 0.01 0.38 5052 155967 sdb 9.64 0.32 0.05 133119 19256 sda 0.00 0.00 0.00 2 0 sdc 0.02 0.00 0.00 538 79 sde 3.07 0.00 0.31 205 126202 avg-cpu: %user %nice %system %iowait %steal %idle 5.76 92.91 1.20 0.05 0.00 0.07 Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn sdd 15.03 0.13 0.65 455 2341 sdb 1.95 0.01 0.04 44 132 sda 0.00 0.00 0.00 0 0 sdc 0.19 0.02 0.00 80 0 sde 0.00 0.00 0.00 0 0 avg-cpu: %user %nice %system %iowait %steal %idle 8.53 89.76 1.58 0.05 0.00 0.09 Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn sdd 12.76 0.00 0.53 0 1911 sdb 3.05 0.00 0.08 2 278 sda 0.00 0.00 0.00 0 0 sdc 0.00 0.00 0.00 0 0 sde 0.00 0.00 0.00 0 0 |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
Is two GBytes/hour too many? My interval is set to 600 seconds, I believe. That is fine. Your SSD will last longer than anything else on the PC at that rate. |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
Ok in my case 1h=10929 MB so in 24h I'm at 256 GB per day. Should I worry for the HDD and should I do something? |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
Ok in my case 1h=10929 MB so in 24h I'm at 256 GB per day. Should I worry for the HDD and should I do something? I would. If you are on Linux, you can use the built-in cache by adjusting the values as I suggested. You could probably use less cache (2 GB) and a shorter time (10 minutes) and still get a lot of protection. Between checking the writes with "iostat", and checking the amount of memory you have left with "free", you can get an idea of how much cache to use. Remember, the longer the write-delay, the more protection you get, since more of the writes are only to main memory without getting to the SSD at all. However, that of course requires more cache memory. Windows is another story, and I have my own favorite cache utilities for that. However, the present work is Linux only, so I will save that story for later. |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
[ I would. If you are on Linux, you can use the built-in cache by adjusting the values as I suggested. You could probably use less cache (2 GB) and a shorter time (10 minutes) and still get a lot of protection. Between checking the writes with "iostat", and checking the amount of memory you have left with "free", you can get an idea of how much cache to use. Remember, the longer the write-delay, the more protection you get, since more of the writes are only to main memory without getting to the SSD at all. However, that of course requires more cache memory. Thanks. On my 16Gb witih 4x2 cores I'l try with these. sudo sysctl vm.dirty_background_ratio=25 (from 10) sudo sysctl vm.dirty_ratio=70 (from 20) sudo sysctl vm.dirty_expire_centisecs=90000 |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
Thanks. On my 16Gb witih 4x2 cores I'l try with these. That looks good. |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
Thanks. On my 16Gb witih 4x2 cores I'l try with these. After two hours iostat still gives around 10GB per hour. Do I need to restart the computer or else for sysctl to take effect or I need to use another programme to monitor writing activity? |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
After two hours iostat still gives around 10GB per hour. Do I need to restart the computer or else for sysctl to take effect or I need to use another programme to monitor writing activity? I always reboot my computer after setting the cache, though I don't know that is necessary. But iostat just shows the writes that the operating system is making to the "disk", whatever that is. So it doesn't know that they are going to the write-cache instead of the SSD. Therefore, iostat will show the same value with or without the cache. I don't at present have a good way of monitoring the writes to the disk (SSD) itself in Linux. When I am on Windows, the PrimoCache utility that I use shows the difference easily enough. There probably is a way; someone once showed me how to monitor the "dirty" writes in Linux. Those are the writes to cache that have not yet been transferred to the disk. I am not sure that is quite what I need however. EDIT: My settings for Linux are in part based on what I have learned in Windows under comparable situations. I wish I had better tools for Ubuntu. |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
Thanks Jim, if I find something useful I will post it here. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Is two GBytes/hour too many? My interval is set to 600 seconds, I believe. So 2.7GB/hour on my slow machine without any cache should be OK too I guess. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
So 2.7GB/hour on my slow machine without any cache should be OK too I guess. Yes, 65 GB/day is OK for any SSD that I know of. They really don't publish lifetime ratings, but that should work. |
Send message Joined: 31 Aug 04 Posts: 37 Credit: 9,581,380 RAC: 3,853 |
Some follow-up information... I looked at changing some of the cache control values as per Jim1348's notes; however, I don't think it makes any difference because I'm using ext4 filesystems and (as I understand it) they effectively force regular synchronization (5 seconds by default!). That would explain why iostat and cat /proc/diskstats didn't report any difference in amounts written when I tried (and might explain why some others aren't seeing changes...) (Apparently, iostat and friends are supposed to report actual device activity, not user write requests...) I'm actually measuring output to a spinning disk, not an SSD, so I can't use smartctl to confirm how much data is actually being written. Perhaps someone who is using an SSD and has adjusted those sysctl parameters could have a look at that? If it turns out not to be possible to alter the checkpoint interval, I certainly won't be letting BOINC use an SSD if I intend to continue doing CPDN work! Each HadCM3s task writes about 383GB of checkpoint data during a 20-year model run, so 3 jobs -> 1 Terabyte! By the way, the current HadAM4 jobs seem to checkpoint about once every 20 minutes on my machine (as against the once a minute of HadCM3s) but the checkpoint file is nearly 4 times the size of the pair of files written by HadCM3s tasks. It comes out at about 71GB of checkpoint data per 12-month model task. Cheers - Al. |
©2024 cpdn.org