Message boards : Number crunching : New work Discussion
Message board moderation
Previous · 1 . . . 66 · 67 · 68 · 69 · 70 · 71 · 72 . . . 91 · Next
Author | Message |
---|---|
Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228 |
From an old memory, I think that the climate models checkpoint at the end of each model year. Woah, that would be even worse, for me once every 2 days with the trickle. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
It is an old memory, perhaps from the original Slab Ocean models at the start of the project. I just let mine get on with it. They can checkpoint when they want to. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Back in the old days, with the slab models, the models checkpointed every 3 model days. The WAH2 models, HADCM3S and the HADAM4 models, checkpoint each model day. The HADAM4H (N216) models checkpoint every 6 model hours. Most of the models upload and trickle once per month, on the first model day following the end of the month. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Getting back to batches 920/921: My batch 921 finished and uploaded OK in the early hours of this morning, so I didn't get to see the file sizes. Now on another 921. This one's on it's 2nd last life, so fingers crossed. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Getting back to batches 920/921: These completed successfully over the last day or two on my Computer 1511241 Red Hat Enterprise Linux release 8.4 (Ootpa) 4.18.0-305.25.1.el8_4.x86_64 Name hadam4h_h12y_200902_4_920_012116620_1 Workunit 12116620 Name hadam4h_10x3_209602_4_921_012118509_0 Workunit 12118509 Name hadam4h_11cx_209902_4_921_012119079_0 Workunit 12119079 |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,708,278 RAC: 9,361 |
One machine has completed and reported its last batch 920 tasks. Combination of machine speed and line speed ensured that all uploads were completed well before the danger point. Got some batch 921 resends in return. All downloads complete, and the upload size limit has been set to 200,000,000 bytes - that should be plenty, and signal the end of that particular problem. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I seem to be getting those too. Why is nbytes zero in all these? <file> <name>hadam4h_h013_200602_4_920_012115257_2_r1137320672_4.zip</name> <nbytes>0.000000</nbytes> <max_nbytes>200000000.000000</max_nbytes> <status>0</status> <upload_url>http://upload11.cpdn.org/cgi-bin/file_upload_handler</upload_url> </file> <file> <name>hadam4h_h013_200602_4_920_012115257_2_r1137320672_restart.zip</name> <nbytes>0.000000</nbytes> <max_nbytes>200000000.000000</max_nbytes> <status>0</status> <upload_url>http://upload11.cpdn.org/cgi-bin/file_upload_handler</upload_url> </file> <file> <name>hadam4h_h013_200602_4_920_012115257_2_r1137320672_out.zip</name> <nbytes>0.000000</nbytes> <max_nbytes>200000000.000000</max_nbytes> <status>0</status> <upload_url>http://upload11.cpdn.org/cgi-bin/file_upload_handler</upload_url> </file> |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
That's were the actual size will be written when it's known, after the zip has been created. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The much larger <max_nbytes>, is because Sarah has sent out new tasks with this increased value, and is waiting to see what happens before doing anything about the original tasks. As all appears to be well, we can relax and crunch. :) |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,023,069 RAC: 20,515 |
Time to order some more RAM. Peak usage for latest OpenIFS tasks in testing is about 12GB! |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Time to order some more RAM. Is that total virtual memory size, or working-set size? How much is shared if more than one task of the same code (but different data, of course) are running? Ready when you are -- I think. I run only eight Boinc tasks at a time, of which only four are CPDN. Computer 1511241 Computer information CPU type GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7] Number of processors 16 Operating System Linux Red Hat Enterprise Linux Red Hat Enterprise Linux 8.5 (Ootpa) [4.18.0-348.el8.x86_64|libc 2.28 (GNU libc)] BOINC version 7.16.11 Memory 62.4 GB Cache 16896 KB Swap space 15.62 GB Total disk space 117.21 GB Free Disk Space 91.53 GB Measured floating point speed 6.58 billion ops/sec Measured integer speed 31.49 billion ops/sec And my RAM is now: $ free -h total used free shared buff/cache available Mem: 62Gi 10Gi 2.7Gi 108Mi 49Gi 51Gi Swap: 15Gi 105Mi 15Gi |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
Peak usage for latest OpenIFS tasks in testing is about 12GB! I have already retired one machine (64 GB) waiting for this project. I am now up to 96 GB, and can do 128 GB if needed. They are just trying to support the memory companies. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,023,069 RAC: 20,515 |
Is that total virtual memory size, or working-set size? How much is shared if more than one task of the same code (but different data, of course) are running? Not shared but per task. Some have been as low as 4GB/task in the past so this small testing batch of three tasks is no guarantee that they will be as heavy on RAM when they finally make it to the main site or it may be like the testing ones, some are as bad and others are lower. But I am ordering some more RAM as with 8 real cores, it is pretty clear 32GB does not cut the mustard any more. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Is that total virtual memory size, or working-set size? How much is shared if more than one task of the same code (but different data, of course) are running? What I meant was that Linux will let processes share RAM if the RAM that is being shared is identical, It does not need to be explicitly coded into the program being run. For example, any libraries being used in common would share the binary code in question. So if I am running four instances of UK Met Office HadAM4 at N216 resolution v8.52 i686-pc-linux-gnu, there would be only one copy of that code in Physical RAM shared by all four instances of it actually running. It seems to me that most of the RAM used by these program tasks are data (that is not shared) rather than the instructions. top - 12:53:54 up 7 days, 22:57, 1 user, load average: 8.45, 8.63, 8.67 Tasks: 462 total, 9 running, 453 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.5 us, 0.2 sy, 49.6 ni, 47.3 id, 2.2 wa, 0.2 hi, 0.1 si, 0.0 st MiB Mem : 63902.2 total, 3604.0 free, 10906.3 used, 49391.8 buff/cache MiB Swap: 15992.0 total, 15874.7 free, 117.2 used. 52157.1 avail Mem PID PPID USER PR NI S RES SHR %MEM %CPU P TIME+ COMMAND 529767 529746 boinc 39 19 R 1.3g 19940 2.2 99.4 5 3595:28 /var/lib/boinc/projects/climateprediction.net+ 209079 209064 boinc 39 19 R 1.3g 19864 2.1 99.4 4 7314:56 /var/lib/boinc/projects/climateprediction.net+ 767343 767321 boinc 39 19 R 1.3g 19944 2.1 99.3 13 303:14.35 /var/lib/boinc/projects/climateprediction.net+ 721167 721157 boinc 39 19 R 1.3g 19920 2.1 99.2 1 939:53.41 /var/lib/boinc/projects/climateprediction.net+ ... 13809 1 boinc 30 10 S 36956 17404 0.1 0.1 4 77470:47 /usr/bin/boinc [Boinc Client] 209064 13809 boinc 39 19 S 19088 17340 0.0 0.0 12 5:46.09 ../../projects/climateprediction.net/hadam4_8+ 767321 13809 boinc 39 19 S 17808 17148 0.0 0.1 10 0:23.02 ../../projects/climateprediction.net/hadam4_8+ 529746 13809 boinc 39 19 S 17720 17288 0.0 0.0 10 2:26.96 ../../projects/climateprediction.net/hadam4_8+ 721157 13809 boinc 39 19 S 17348 17216 0.0 0.1 12 0:37.06 ../../projects/climateprediction.net/hadam4_8+ |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
We'll all find out when / if they get released. But these models appear to not be for wimpy under resourced computers. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
We'll all find out when / if they get released. I have two machines. My wimpy machine runs Windows 10 so I suppose it will not be getting any of these big work units when they come out. Computer 1512658 CPU type GenuineIntel 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz [Family 6 Model 140 Stepping 1] Number of processors 8 Operating System Microsoft Windows 10 Memory 15.64 GB Cache 256 KB Swap space 19.39 GB Total disk space 460.73 GB Free Disk Space 359.43 GB Measured floating point speed 4.24 billion ops/sec Measured integer speed 12.61 billion ops/sec I think my main machine, that runs Linux, is not too wimpy. Computer 1511241 CPU type GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7] Number of processors 16 Operating System Linux Red Hat Enterprise Linux Red Hat Enterprise Linux 8.5 (Ootpa) [4.18.0-348.el8.x86_64|libc 2.28 (GNU libc)] Memory 62.4 GB Cache 16896 KB Swap space 15.62 GB Total disk space 117.21 GB Free Disk Space 92.64 GB Measured floating point speed 6.58 billion ops/sec Measured integer speed 31.49 billion ops/sec |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
Some have been as low as 4GB/task in the past so this small testing batch of three tasks is no guarantee that they will be as heavy on RAM when they finally make it to the main site or it may be like the testing ones, some are as bad and others are lower. But I am ordering some more RAM as with 8 real cores, it is pretty clear 32GB does not cut the mustard any more. Can you determine anything about cache requirements yet? That often determines how many work units we can run (efficiently), rather than the RAM requirements. I am all in favor of using lots of RAM, but there is no point in buying it if it can't be used. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,023,069 RAC: 20,515 |
I have two machines. My wimpy machine runs Windows 10 so I suppose it will not be getting any of these big work units when they come out. OpenIFS is only for Linux and Mac with as far as I am aware no plans to develop a Windows version. Can you determine anything about cache requirements yet? That often determines how many work units we can run (efficiently), rather than the RAM requirements. I have forgotten how to look at cache usage, My only experience running several of these was on a wimpy underpowered machine and I found running more than one task up to the maximum of four on the machine resulted in an increasing throughput with adding a second getting close to doubling throughput but the third and fourth tasks only gave marginal gains. (Those ones peaked around 5GB/task and the machine had its maximum of 8GB installed.) |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
... with adding a second getting close to doubling throughput but the third and fourth tasks only gave marginal gains. Thanks. I think that is a good first indication. It is not surprising that they use a lot of cache. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I have forgotten how to look at cache usage, Does this help? (I have not tried it yet.) https://www.geeksforgeeks.org/see-cache-statistics-linux/ |
©2024 cpdn.org