Message boards : Number crunching : AVX and AVX2; Is it used at CPDN?
Message board moderation
Author | Message |
---|---|
Send message Joined: 16 Aug 16 Posts: 73 Credit: 53,408,433 RAC: 2,038 |
Hi, We have some V1 (V0) and V3 Xeon(s) workstations that we use for crunching, both of which have AVX and AVX2 respectively, and I was wondering if these are used by CPDN? And if so, if you would expect a big difference between AVX and AVX2, the latter performing better. I did a search within the message boards for AVX and AVX2 but nothing was forth coming. Thanks. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The requirements are: That the processor(s) use cisc That they have SSE2 That they run Windows, Linux, or a Mac OS And that computers using 64 bit Linux need to have 32 bit libraries installed. Also, 2 Gigs of ram per processor is recommended. AVX and AVX2 are of no importance. According to BOINCstats, there are Xeons of some type running/used to run tasks here. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
I doubt if AVX/AVX2 are utilized in the compile of the cpdn models. There's a wide range of processor generations running cpdn tasks. They likely are trying to keep the optimizations across those processors as consistent as possible. |
Send message Joined: 16 Aug 16 Posts: 73 Credit: 53,408,433 RAC: 2,038 |
Thanks for the information. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
According to BOINCstats, there are Xeons of some type running/used to run tasks here. I have a 4-core Xeon 64-bit processor running on my current machine. Unfortunately, only 1.8 GHz. It turns out it turns out work faster than my former machine with two hyperthreaded 3.06 GHz Xeons on it. I run Red Hat Enterprise Linux on my machine. I started with RHEL 3, then RHEL 5, now RHEL 6.9. RHEL 7 has been out there for some time, but I have not upgraded. Red Hat support their releases for 10 years. CentOS distributes an OS that is, essentially the same as the RHEL releases, but for free. I ran CentOS4 on an old machine for a long time: two Intel Pentium 3 processors on that one. |
Send message Joined: 7 May 17 Posts: 16 Credit: 3,480,030 RAC: 2,845 |
I profiled the instruction mix of the wah2rm3m2t_um_8.25_i686-pc-linux-gnu model on a platform with SSE* and AVX. As far as I can tell, it uses a mix of x87 and SSE instructions only. Substantial time (~5%) is spent in libm's powf(), which uses legacy x87 instructions. Is there some other way the model could do exponentiation? Ditto for log10. (Both are at FP32 precision afaict). Modern CPUs would prefer (in energy per FLOP) different instructions. |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,884,997 RAC: 4,577 |
I have no special knowledge of the scope of the project's software development. However, the usual description is that the core code is the Met Office's FORTRAN source, which is then tailored by the project to the BOINC distributed computing platform and CPDN. Over the years the project has been less concerned with model performance than might perhaps be expected, but the explanation is partly attributable to the ensemble method of modelling. Presumably the project would prefer an ensemble (i.e. 1000's of models) to yield useful information within the timescale of project funding or a PhD. That ensemble runs on a very unreliable massively parallel virtual machine (i.e. our computers). Having a fast model will bring forward the point at which the ensemble becomes useful, but so will improving reliability, or splitting the model runs into shorter runs that volunteers are prepared to download. It would be fascinating to see a report of how the project team responds to the progress of an actual ensemble, from conception to publication. |
Send message Joined: 9 Apr 14 Posts: 14 Credit: 1,962,018 RAC: 0 |
Substantial time (~5%) is spent in libm's powf(), which uses legacy x87 instructions. Is there some other way the model could do exponentiation? Ditto for log10. (Both are at FP32 precision afaict). There are well known speed issues with the standard linux math library, especially the single-precision math functions. The stance of the library developers has been clearly stated: they only care about accuracy, not speed. It looks like they did not do any effort to optimize single-precision math functions for speed. They often are substantially slower than their double-precision counterparts. It looks like they only cared about ticking off the box "added support for single-precision math functions"... That being said, there has been some improvement in more recent math library versions, in part due to complaints from users, and also third parties submitting their own (better optimized) versions. My hunch is that CPDN is using a very old math library version... If so, switching to a newer version may already help. Avoiding single-precision math functions likely will also help... |
Send message Joined: 3 Sep 04 Posts: 126 Credit: 26,610,380 RAC: 3,377 |
Mathematical functions can easily be calculated in every desired precision using a polynomal or rational function, see J.F. Hart et al., Computer approximations. |
Send message Joined: 7 May 17 Posts: 16 Credit: 3,480,030 RAC: 2,845 |
It looked like a single-precision powf() was used; SSE2 at least can match precision trivially. I suspect that no one has optimized 32-bit libm for recent processors. 64-bit libm uses MULSS (SSE) on my system instead of x87. |
©2024 cpdn.org