climateprediction.net (CPDN) home page
Thread 'Hardware requirements for upcoming models'

Thread 'Hardware requirements for upcoming models'

Message boards : Number crunching : Hardware requirements for upcoming models
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 65934 - Posted: 22 Aug 2022, 5:04:08 UTC

An example. My 24 thread Ryzen. With Nbody, which only does up to 16 threads per task, if I left it on defaults, it would run one task, using up to 16 threads, a waste. So I can set it to 12 using avg_ncpus. But then what? Two tasks using up to 12 each doesn't max it out, they're not often both at 12 each.. I use MSI afterburner to watch GPU and CPU usage. So i settled on two at 16. I tell Boinc they use 12, and I tell the app to use 16.
ID: 65934 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,718,239
RAC: 8,054
Message 65935 - Posted: 22 Aug 2022, 7:30:56 UTC - in response to Message 65933.  

* both --nthreads x and --nthreads=x work.
Thanks. I'd like to nail that one exactly, if we can. I copied my original reference directly from the BOINC User Manual, where they document it without the equals sign - I can change that if it's wrong. I'll try and look it up in the OpenMP documentation if I can.

I'm surprised by the reports that avg_ncpus also controls --nthreads - I can't see any point in the pathway where that could be implemented. But it's possible that MilkyWay have found a way of implementing it, which might be for their project only. I have a couple of six-core Linux machines, so I can test with one of those.
ID: 65935 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,718,239
RAC: 8,054
Message 65936 - Posted: 22 Aug 2022, 7:58:02 UTC

@Glenn Carver:

In the specific case of planning IFS for CPDN, it might be easiest to have the BOINC requirements in mind at an early stage in the process. For example,

https://boinc.berkeley.edu/trac/wiki/AppMultiThread
https://boinc.berkeley.edu/trac/wiki/AppPlan (where the --nthreads [space] N format is again mentioned).
ID: 65936 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,888,193
RAC: 18,910
Message 65937 - Posted: 22 Aug 2022, 10:04:53 UTC - in response to Message 65935.  
Last modified: 22 Aug 2022, 10:17:31 UTC

Richard,

You might be right about the need for --nthreads in N-Body. It seems like maybe in some situations, like when you only run N-Body and calculate things for yourself (6x4=24 cores), and maybe use project_max_concurrent, it's easy to mask the lack of need for it. But if you have multiple projects and tasks going, I think I may have seen evidence that avg_ncpus doesn't control how many threads the app uses. I'll have to run some tests.

As for syntax of --nthreads, LHC ATLAS native only accepts it with the space and no equals sign. It'll be interesting to see what you discover.
ID: 65937 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,718,239
RAC: 8,054
Message 65941 - Posted: 22 Aug 2022, 16:37:45 UTC - in response to Message 65937.  

OK, I'm running the tests now. Computer is 945095: it has 6 cores, but I normally run it at 85% to keep things loose. Milkyway has picked it up as a 5-core, and all the comms have come through for 5: ncpus is set to five, and there's no sign of nthreads.

First task was 424163425: Using OpenMP 5 max threads on a system with 6 processors

Then I set an app_config to avg_ncpus 3 (BOINC Manager still said 5 - that doesn't change until you fetch new work).
Task ID 424163428: Using OpenMP 3 max threads on a system with 6 processors

Third test: I released the overall CPU count to 100%, and allowed two tasks to run at once.
Tasks 424163431, 424163427, both together Using OpenMP 3 max threads on a system with 6 processors

Fourth test - back to avg_ncpus 5 in app config, still at 100% CPU.
Task 424163430: Using OpenMP 5 max threads on a system with 6 processors

Finally, avg_ncpus 6 and 100% CPU.
Tasks 424163455, 424163432: Using OpenMP 6 max threads on a system with 6 processors

So it seems you're right: In the specific case of MilkyWay nbody, thread usage is entirely controlled by avg_ncpus, and doesn't need --nthreads. I'm impressed - I don't know how they've done that.If Glenn can find out how they've pulled off a neat trick, he might want to use it: but for the rest of us, RTFM is safest.
ID: 65941 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,718,239
RAC: 8,054
Message 65942 - Posted: 22 Aug 2022, 16:38:36 UTC

And now I've got the machine idle, time for the security updates!
ID: 65942 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 65943 - Posted: 22 Aug 2022, 19:04:40 UTC - in response to Message 65884.  

Processor speed is not that relevant here, what is though is available core count. IFS (and OpenIFS) is a highly parallel model. If I enable threading, there is a 2x speedup with 2 cores, 3.5x with 4 cores etc.


My machine has 64 GBytes RAM and claims 16 cores.8 of these are real and 8 are hyperthreaded, so I tell the Boinc-Client to use only 8 cores. The only multi-threading I have been doing is the Milkyway nbody ones, and I let each of those take 4 cores. Four of those claim to be using about 350% of a cpu; I would have hoped it would be closer to 400%

Since network upload is the bottleneck (and maybe upload server capacity at the remote end), available local disk space is not really a big deal.


My Verizon FIOS should handle it right? They claim 75 Megabits/sec up and 75 Megabits/sec down. They usually deliver a little bit more.

Speakeasy Speed Test
Ping 7 ms
Jitter 1 ms
Download 81.1 Mbps
Upload 89.1 Mbps

ID: 65943 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 65944 - Posted: 22 Aug 2022, 19:15:43 UTC - in response to Message 65943.  

My Verizon FIOS should handle it right? They claim 75 Megabits/sec up and 75 Megabits/sec down. They usually deliver a little bit more.

Speakeasy Speed Test
Ping 7 ms
Jitter 1 ms
Download 81.1 Mbps
Upload 89.1 Mbps
I want that! I've been promised Gigabit, but in FOUR YEARS!!!! I have to put up with 32Mbit down, 7Mbit up. I used to get 54 down and they reduced it because of over usage by the town!
ID: 65944 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 65945 - Posted: 22 Aug 2022, 19:38:28 UTC - in response to Message 65944.  

I want that! I've been promised Gigabit, but in FOUR YEARS!!!! I have to put up with 32Mbit down, 7Mbit up. I used to get 54 down and they reduced it because of over usage by the town!
As I get less than a sixth of that and plan on running these tasks I don't think anything higher will have a problem. I will just let the uploads take two hours or whatever it is when not busy.
ID: 65945 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,478,945
RAC: 15,019
Message 65946 - Posted: 22 Aug 2022, 19:49:28 UTC - in response to Message 65916.  


Currently, if running the higher resolution N216 tasks, performance starts dropping after using 5 out of 8 real cores because they hammer the chip's cache so much. If running N144 tasks I go up to 8 real cores. Assuming, I double my RAM to 64GB, I will probably go to 8 real cores with the OpenIFS tasks. Of course, I will hopefully have the luxury of being able to see what works best in testing. How much the memory (RAM not cache) gets hit will depend a lot on whether all 8 processes peak at the same time or whether they end up all at different times. If the latter,I could probably get away with just 32GB of RAM rather than upping to 64. With 32, I will probably start by seeing what happens with 4 cores.


Hi Dave, just to clarify, this is a single executable using multiple threads so peak memory will be the same as peak memory with only 1 thread (i.e. single core).
ID: 65946 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,478,945
RAC: 15,019
Message 65947 - Posted: 22 Aug 2022, 19:59:07 UTC - in response to Message 65917.  

@Glenn Carver,
If a multi-threaded application is deployed via a BOINC server, the automatic (default) behaviour is for the server to configure each task to use every one of the cores reported by the requesting client - 16, in Jean-David's example. The assumption is that the application will understand the --nthreads directive, and configure itself accordingly. If the proposed IFS application uses a different MT calling convention, it will require a bespoke modification of the CPDN server code.
I've looked at the 'mt' plan on the BOINC pages. It seems to assume that the --nthreads argument is coded in the wrapper around the app (which ours doesn't, currently), and it also seemed to assume linear speedup with increasing cores, which again, is not true in our case. I only had a quick read but it looked to me that we'd need to configure our own 'mt' plan. The CPDN folk are the experts on the server side, so I could be wrong, but I'll be talking with them soon about this.

Seems like there's an appetite and resources for multithreaded higher resolution OpenIFS experiments which is great to hear. I'll write something on the forums when we are ready to send out to everyone.

Cheers, Glenn
ID: 65947 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,888,193
RAC: 18,910
Message 65948 - Posted: 22 Aug 2022, 20:03:45 UTC - in response to Message 65943.  

According to this https://boinc.berkeley.edu/trac/wiki/AppPlan#Predefinedplanclasses, the speed up for multithreaded apps is expected to be .95N where N is the number of threads. So if speed up correlates with utilization then your utilization should be ~380% on average.

I'm a bit confused about the network upload being the bottleneck comment, I think it may have been influenced by the upload issues we've been having with the latest batch a few days ago. Upload issues here tend to be not user related and affect many users at the same time. 75 Mbps is plenty, I don't look often but usually my uploads' average speed is in hundreds of Kbps and only sometimes go over 1 Mbps and are done within a couple of minutes or so. Any issues we encounter is almost always server side and I'd expect it to be the same with OpenIFS. I can't remember the numbers now but I believe LHC ATLAS uploads and downloads are much bigger than CPDN and they go through with no problems for me.
ID: 65948 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,478,945
RAC: 15,019
Message 65949 - Posted: 22 Aug 2022, 20:06:43 UTC - in response to Message 65924.  

'OpenMP' is the tool BOINC is designed around and - as you say - in your case the thread count is limited to 4. My understanding is that under OpenMP, that will have been set by an '--nthreads=4' directive on the command line. If you don't have it in your app_config.xml, it's possible that they have - at long last - configured their server in the way I suggested CPDN may have to do.
The number of OpenMP threads is determined at runtime by an environment variable e.g.:
export OMP_NUM_THREADS=4

and a number of other env vars control what OpenMP does. We don't use --nthreads but if that's the 'preferred' way in BOINC we could implement it, but under the hood all it's doing is a system call to set an environment variable.
ID: 65949 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,478,945
RAC: 15,019
Message 65950 - Posted: 22 Aug 2022, 20:18:02 UTC - in response to Message 65936.  

@Glenn Carver:
In the specific case of planning IFS for CPDN, it might be easiest to have the BOINC requirements in mind at an early stage in the process. For example,
https://boinc.berkeley.edu/trac/wiki/AppPlan (where the --nthreads [space] N format is again mentioned).
Richard, thanks. As mentioned above I've already read about the mt class. However, it says this:

mt   :   An application that can use anywhere from 1 to 64 threads, and whose speedup with N CPUs is .95N. It is passed a command-line argument --nthreads N.
but the 0.95 scaling is not appropriate for OpenIFS, it drops off faster with higher thread count. We'll define our own but document how to control the app when multithreaded. I also find the avg_cpus & nthreads confusing, particularly as they are both in the app XML file.

Thanks for all the input. May get back to you as we get more into this once I've spoken to the CPDN people. I'm sure we'll learn alot from testing, not least how to assign credit (personally I only care about wall-clock time :)
ID: 65950 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,478,945
RAC: 15,019
Message 65951 - Posted: 22 Aug 2022, 20:31:34 UTC - in response to Message 65948.  
Last modified: 22 Aug 2022, 20:35:23 UTC

I'm a bit confused about the network upload being the bottleneck comment, I think it may have been influenced by the upload issues we've been having with the latest batch a few days ago. Upload issues here tend to be not user related and affect many users at the same time. 75 Mbps is plenty, I don't look often but usually my uploads' average speed is in hundreds of Kbps and only sometimes go over 1 Mbps and are done within a couple of minutes or so. Any issues we encounter is almost always server side and I'd expect it to be the same with OpenIFS. I can't remember the numbers now but I believe LHC ATLAS uploads and downloads are much bigger than CPDN and they go through with no problems for me.
Yes, sorry, that wasn't very clear. What I meant is that the model is capable of producing data faster than it could be uploaded over an average broadband connection in a reasonable time. Scientists using the model have to think carefully about what they want to output and we run a check to make sure it doesn't exceed reasonable limits as defined by the CPDN team. That's what was in the back of my mind.
ID: 65951 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 65952 - Posted: 22 Aug 2022, 21:55:10 UTC - in response to Message 65945.  

As I get less than a sixth of that and plan on running these tasks I don't think anything higher will have a problem. I will just let the uploads take two hours or whatever it is when not busy.
Where are you in the UK that you can't get semi-fibre? They've been rolling out full fibre to the home for years, I thought semi-fibre was long since finished.
ID: 65952 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,718,239
RAC: 8,054
Message 65955 - Posted: 23 Aug 2022, 6:24:23 UTC - in response to Message 65952.  

Where are you in the UK that you can't get semi-fibre? They've been rolling out full fibre to the home for years, I thought semi-fibre was long since finished.
Learn to do your own research: https://labs.thinkbroadband.com/local/

Dave lives in England. 97.7 % is not 100%
ID: 65955 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 65956 - Posted: 23 Aug 2022, 6:42:29 UTC - in response to Message 65955.  

Where are you in the UK that you can't get semi-fibre? They've been rolling out full fibre to the home for years, I thought semi-fibre was long since finished.
Learn to do your own research: https://labs.thinkbroadband.com/local/

Dave lives in England. 97.7 % is not 100%
I'm not a stalker, so I don't know where in England he lives. Since the Orkney Isles has fast broadband, he must live in a mud hut somewhere.
ID: 65956 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 65957 - Posted: 23 Aug 2022, 7:12:28 UTC - in response to Message 65952.  

Where are you in the UK that you can't get semi-fibre? They've been rolling out full fibre to the home for years, I thought semi-fibre was long since finished.
I could get fibre to the box by swapping to Virgin but I happen to not like Branson's business practices so will not change from the BT infrastructure. On Faceache they keep advertising that they are upgrading in Cambridge but whenever I check, it suggests I check again in three months time.
ID: 65957 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,888,193
RAC: 18,910
Message 65958 - Posted: 23 Aug 2022, 7:13:53 UTC - in response to Message 65951.  

the model is capable of producing data faster than it could be uploaded over an average broadband connection in a reasonable time.

Glenn, could you please explain that a bit more. Is this going to be something like LHC ATLAS and Theory subprojects where you need constant internet access, because the data is going to be going back and forth? It kind of sounds like you're going to need results almost real-time? I'm having a hard time visualizing how it's going to work.
ID: 65958 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Hardware requirements for upcoming models

©2024 cpdn.org