climateprediction.net home page
Best Swap file size for CPDN?

Best Swap file size for CPDN?

Message boards : Number crunching : Best Swap file size for CPDN?
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 53,400,150
RAC: 3,821
Message 67136 - Posted: 30 Dec 2022, 11:05:33 UTC
Last modified: 30 Dec 2022, 11:13:24 UTC

Server: 128c/256t

Currently running 512GB memory with a default OS Swap file of 2.1GB (which is 75% - 100% in usage).

What amount I should set it for the best results for CPDN?
ID: 67136 · Report as offensive     Reply Quote
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 149
Credit: 12,830,559
RAC: 228
Message 67137 - Posted: 30 Dec 2022, 11:15:27 UTC - in response to Message 67136.  

Server: 128c/256t

Currently running 512GB memory with a default OS Swap file of 2.1GB (which is 75% - 100% in usage).

What amount I should set it for the best results for CPDN?


Zero.

Adjust then number of tasks you run so that you don’t swap out - more efficient and less likely to crash the tasks.
ID: 67137 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 53,400,150
RAC: 3,821
Message 67138 - Posted: 30 Dec 2022, 11:21:29 UTC - in response to Message 67137.  
Last modified: 30 Dec 2022, 12:05:30 UTC

In terms of running less tasks, I actually built this server in order to try and do the opposite; but I hear what you are saying.

What is the max amount of memory you would reserve for each task? Thinking about what I have seen in usage, I would say each needs probably around 3-5GB
ID: 67138 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 53,400,150
RAC: 3,821
Message 67139 - Posted: 30 Dec 2022, 12:05:42 UTC
Last modified: 30 Dec 2022, 12:06:27 UTC

If I am right in thinking that it only uses the Swap file when there is not enough space in Memory,

EG: L1 Cache->L2 Cache->L3 Cache->Memory->Swap File

Then why would it use the Swap file for less than 2GB when there is 300GB+ of free memory? (as reported by Ubuntu).
ID: 67139 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1048
Credit: 16,404,330
RAC: 16,403
Message 67140 - Posted: 30 Dec 2022, 12:50:03 UTC - in response to Message 67138.  

These OIFS PS tasks have a high water memory of ~5.5Gb which it hits during every timestep. However, that's just the model itself, the controlling wrapper will also take a small amount of RAM for zipping the upload files. So take a look at the rsc_memory_bound in the client_state.xml for the task and use that figure. Do not allocate less.

Do not let the model start swapping, it won't kill the task but it will slow it considerably. I tried this myself as part of testing. Bear in mind modern linuxes also use spare RAM as a disk cache, and that's a good thing for maintaining good throughput because the model log & status files are read by the oifs controlling wrapper so it's ideal to have these in RAM. RAM is your friend for CPDN, swap is your enemy.

I hope these current tasks, and the next batches of the BL OIFS app, will be the last of the lowest resolution version of OpenIFS. The next batches I aim to test in 2023 will be higher resolutions for which the high-water memory will be 8Gb, 14Gb & 22Gb respectively. However, in order to get decent throughput I'm working on multi-core, though that's proving tricky to get stable across different platforms, so I may be forced to use a VM.

In terms of running less tasks, I actually built this server in order to try and do the opposite; but I hear what you are saying.

What is the max amount of memory you would reserve for each task? Thinking about what I have seen in usage, I would say each needs probably around 3-5GB
ID: 67140 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 53,400,150
RAC: 3,821
Message 67144 - Posted: 30 Dec 2022, 14:21:08 UTC
Last modified: 30 Dec 2022, 14:21:34 UTC

Thank you Glenn that's really well explained and very helpful.

Looking at:

/var/lib/boinc-client/projects/climateprediction.net/client_state.xml

<rsc_memory_bound>6010000000.000000</rsc_memory_bound>

So that's just over 6GB. So if I reserved 6GB per task would that be enough?


Thank you also for your heads up on future developments, that's very interesting and helps when thinking about future memory purchases to run CPDN.

Hopefully future apps won't use Virtualbox (if that's what you mean by VM), as for many of us its not worth installing/using due to all the issues it causes. This why we don't run LHC or Rosetta (Python) even though we would love to.
ID: 67144 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 53,400,150
RAC: 3,821
Message 67150 - Posted: 30 Dec 2022, 17:10:58 UTC
Last modified: 30 Dec 2022, 17:42:11 UTC

These OIFS PS tasks have a high water memory of ~5.5Gb which it hits during every timestep. However, that's just the model itself, the controlling wrapper will also take a small amount of RAM for zipping the upload files. So take a look at the rsc_memory_bound in the client_state.xml for the task and use that figure. Do not allocate less.

Do not let the model start swapping, it won't kill the task but it will slow it considerably. I tried this myself as part of testing. Bear in mind modern linuxes also use spare RAM as a disk cache, and that's a good thing for maintaining good throughput because the model log & status files are read by the oifs controlling wrapper so it's ideal to have these in RAM. RAM is your friend for CPDN, swap is your enemy.

I hope these current tasks, and the next batches of the BL OIFS app, will be the last of the lowest resolution version of OpenIFS. The next batches I aim to test in 2023 will be higher resolutions for which the high-water memory will be 8Gb, 14Gb & 22Gb respectively. However, in order to get decent throughput I'm working on multi-core, though that's proving tricky to get stable across different platforms, so I may be forced to use a VM.

Hopefully you are okay if I share this advice? I think many people will find it interesting and/or essential; I'll assume you are.

I'll also add the e-Research Group CPDN donation link when sharing this.
ID: 67150 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1048
Credit: 16,404,330
RAC: 16,403
Message 67151 - Posted: 30 Dec 2022, 18:19:33 UTC - in response to Message 67150.  
Last modified: 30 Dec 2022, 18:20:57 UTC

Hopefully you are okay if I share this advice? I think many people will find it interesting and/or essential; I'll assume you are.
Not sure I understand - my reply is on a public forum, so just point people to the message URL? I've already said similar on these forums many times, it's nothing new.

I forgot to add that if you want to check a running task go into the slot directory and inspect the contents of the file boinc_task_state.xml which will give you the boinc client measured values for peak swap size, disk etc for the 'task' (not just the model):
$ cd slots/0
$ cat boinc_task_state.xml
<active_task>
    <project_master_url>https://climateprediction.net/</project_master_url>
    <result_name>oifs_43r3_ps_0928_2007050100_123_976_12193572_0</result_name>
    <checkpoint_cpu_time>5150.980000</checkpoint_cpu_time>
    <checkpoint_elapsed_time>5241.193137</checkpoint_elapsed_time>
    <fraction_done>0.109151</fraction_done>
    <peak_working_set_size>5009731584</peak_working_set_size>
    <peak_swap_size>5215879168</peak_swap_size>
    <peak_disk_usage>1446309687</peak_disk_usage>
</active_task>
ID: 67151 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1048
Credit: 16,404,330
RAC: 16,403
Message 67152 - Posted: 30 Dec 2022, 18:28:30 UTC - in response to Message 67144.  

Looking at:
/var/lib/boinc-client/projects/climateprediction.net/client_state.xml
<rsc_memory_bound>6010000000.000000</rsc_memory_bound>
So that's just over 6GB. So if I reserved 6GB per task would that be enough?
Yes for these batches. But the same app can be used in batches with different memory requirements - bear that in mind.

Hopefully future apps won't use Virtualbox (if that's what you mean by VM), as for many of us its not worth installing/using due to all the issues it causes. This why we don't run LHC or Rosetta (Python) even though we would love to.
May not have a choice. It's the only route to a Windows capable app and may well turn out to be the only route to a multicore app.

'Fraid I don't understand what you mean by 'all the issues VBox causes'. I have it running on my machines without a problem. The only issue I did have was boinc had stuffed up some fsys permissions in the systemd config for the client which prevented it working, but that was the fault of boinc not VBox. Your choice of course if you don't want to run it.
ID: 67152 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 67153 - Posted: 30 Dec 2022, 18:43:16 UTC - in response to Message 67144.  

Hopefully future apps won't use Virtualbox (if that's what you mean by VM), as for many of us its not worth installing/using due to all the issues it causes. This why we don't run LHC or Rosetta (Python) even though we would love to.


I hope so too. I have 64 GigaBytes of RAM (and could double it if necessary) on my "16-core" machine (though I am allowing Boinc to use only 12 at the moment). And I did run some multi-process Milkyway tasks. But I will not be running Virtualbox. I had no trouble running 5 Oifs_ps tasks at the same time as 7 other Boinc tasks for other projects were running. I think at one point I was running 5 Oifs_ps and 1 Oifs_bl at one time and 6 other Boinc tasks for other projects, also with no trouble.
ID: 67153 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 53,400,150
RAC: 3,821
Message 67154 - Posted: 30 Dec 2022, 18:58:25 UTC
Last modified: 30 Dec 2022, 18:59:37 UTC

Okay I have removed the postings. Something to keep in mind though, participation in BOINC is dropping and so if the only project engagement is within the forums then you are not really going to increase awareness and participation in BOINC and/or CPDN.

If you are versed in Linux, and BOINC, then you can probably can solve most of the issues when using VirtualBox. But to those from a Windows background (like myself and many others) Linux can be headache when its working, but when you have issues its a nightmare.

Thanks for all your help.
ID: 67154 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 53,400,150
RAC: 3,821
Message 67155 - Posted: 30 Dec 2022, 20:21:09 UTC

If you are versed in Linux, and BOINC, then you can probably can solve most of the issues when using VirtualBox. But to those from a Windows background (like myself and many others) Linux can be headache when its working, but when you have issues its a nightmare.

What I meant to say is if you are versed in BOINC then you can probably work out all the issues with VirtualBox, but most people are not.

In terms of the other stuff, if you are versed in Linux then yes perhaps it is all common stuff and easy, but to those from a Windows background its not.

Anyway just thought I would clarify.
ID: 67155 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 67156 - Posted: 30 Dec 2022, 20:42:10 UTC - in response to Message 67154.  

Something to keep in mind though, participation in BOINC is dropping and so if the only project engagement is within the forums then you are not really going to increase awareness and participation in BOINC and/or CPDN.


I have been running BOINC since 1999-06-10 or thereabouts, and ran seti@home for a little while before that. In those days there was always work available. But seti@home is no more.

I started running ClimatePrediction whenever I discovered it and it joined up with Boinc and there was always work available for it too. Sometime a task took over a month to run. Machines were slower then and the tasks covered very long time intervals. Now it is difficult to get work units, and in the last week or two, it has become impossible to return results.

Similarly for Rosetta. It goes for long intervals with no work at all, and lately, it became impossible to download such work as they tried to send. Now they are not even trying.

And World Community Grid! They were down for about eight months! Hardly encouraging for possible new participants. Even now that they are supposedly up, they support only four or five different projects and really only three. And I often run out or work from them.

My view is that while some former computer users may have dropped out of distributed computing and just play with FaceBook on their cell phones, what affects me most is that those who could benefit from it are just not supplying work.
ID: 67156 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,780,446
RAC: 19,423
Message 67158 - Posted: 30 Dec 2022, 22:02:08 UTC - in response to Message 67152.  

May not have a choice. It's the only route to a Windows capable app and may well turn out to be the only route to a multicore app.

I don't think you're saying that multicore app might only be a Windows app, correct?

I'd vote for a Linux native app and a VBox app, like what LHC has with ATLAS and Theory. What about different platforms that's making it difficult to get multicore stable, the different hardware configurations or OSs?
ID: 67158 · Report as offensive     Reply Quote
Vato

Send message
Joined: 4 Oct 19
Posts: 15
Credit: 9,174,915
RAC: 3,722
Message 67159 - Posted: 30 Dec 2022, 22:29:22 UTC - in response to Message 67152.  

If vbox is the way to a windows app, so be it.
But given we have a functioning native linux app, i would love to see a multi-threaded native linux app.
I would run much of those.
Whereas linux vbox is a total non-starter for me (many attempts, zero successes), and my few windows boxes won't have enough memory.

Fingers crossed.
ID: 67159 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1048
Credit: 16,404,330
RAC: 16,403
Message 67169 - Posted: 31 Dec 2022, 16:05:23 UTC - in response to Message 67154.  

If you are versed in Linux, and BOINC, then you can probably can solve most of the issues when using VirtualBox. But to those from a Windows background (like myself and many others) Linux can be headache when its working, but when you have issues its a nightmare..

I'm afraid I don't understand what you mean by 'issues with VBox'. Maybe you can PM me directly and explain exactly what you mean. I would be interested to know what we might have to deal with & provide support for. I've used virtualization for many yrs. To get Vbox working on linux was nothing more than apt install, reboot & make sure bios virtualization support is enabled, and off boinc went. I can't tell from your messages whether you hit showstoppers or it's just the added technical load that's the issue (I appreciate these concerns). Maybe it's worth a new thread for discussion on this?

I agree linux is an issue for Windows users which is why CPDN would love to have a windows app for OpenIFS. But the only way we can do that is with Vbox I'm afraid. We had a 3 month project to build OpenIFS on Windows but we were not able to get the key library it needs working well enough let alone the model. Unfortunately ECMWF does not support Windows for its inhouse software libraries so there's really no other option (unless someone volunteers to do the work).

I would love to have a native multicore linux app. I have already done some testing with one but it started failing on some test machines in the pthread library for reasons I don't yet understand. I need to do more work on this but a VBox app looks straightforward, so I'm tempted to go for the low-hanging fruit first which would open up more Windows users & multicore, and then get back to a native multicore linux app. We still have the single core app of course but with higher resolution will come much longer runtimes.

But for the immediate future I will be working on getting the model more stable for the current setup. These latest batches have exposed a few more issues that I would like to resolve first before I move on.
ID: 67169 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 53,400,150
RAC: 3,821
Message 67175 - Posted: 31 Dec 2022, 16:55:10 UTC - in response to Message 67169.  
Last modified: 31 Dec 2022, 16:57:24 UTC

I agree let's split the VirtualBox issues into its own thread.

https://www.cpdn.org/forum_thread.php?id=9171#67174

In terms of Windows/Linux etc, I'll reply back in the New year.
ID: 67175 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 53,400,150
RAC: 3,821
Message 67200 - Posted: 2 Jan 2023, 7:42:45 UTC
Last modified: 2 Jan 2023, 7:43:29 UTC

The 256t server is now being told by CPDN that: "You have reached your quota of 1 task per day".

Yesterday I spent £200 on a new U.2 drive in order to try and continue running CPDN as the old drive is currently full of completed tasks.

Can anything be done to remove this quota?

The server is now running at 30% eg 6+ GB /task, so there should be no reason why any further errors should happen (from my end).
ID: 67200 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,966,742
RAC: 21,869
Message 67203 - Posted: 2 Jan 2023, 9:17:37 UTC

The 256t server is now being told by CPDN that: "You have reached your quota of 1 task per day".
I am also getting that message in the log and am only able to download and run one task per day. For me, it isn't a massive issue as my slow bored band means it will take a while to clear the backlog of completed tasks. I have been able to get more tasks by running BOINC in VB as well as in my host OS but even so, it will take 2-3 days of saturating my upload link to clear it all. I have only been running one task at a time in the VM but could change the VM settings to allow more.
ID: 67203 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,690,861
RAC: 10,559
Message 67205 - Posted: 2 Jan 2023, 9:33:16 UTC - in response to Message 67200.  
Last modified: 2 Jan 2023, 9:40:38 UTC

The 256t server is now being told by CPDN that: "You have reached your quota of 1 task per day".
'Quota' is a concept bound up with successful completion of tasks. Looking at the task list for your 256t machine, the vast majority are ending with "Error while computing".

You're wasting time and energy. You need to sort that out before your quota will start to heal. It's nothing to do with the network problems.

Edit - you've made me check my own traffic jam, just to check that nothing on the server is causing it to mark delayed tasks as errors. All are still shown as "in progress", which is as it should be for delayed uploads. Phew.
ID: 67205 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : Best Swap file size for CPDN?

©2024 cpdn.org