climateprediction.net (CPDN) home page
Thread 'AVX and AVX2; Is it used at CPDN?'

Thread 'AVX and AVX2; Is it used at CPDN?'

Message boards : Number crunching : AVX and AVX2; Is it used at CPDN?
Message board moderation

To post messages, you must log in.

AuthorMessage
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 53,408,433
RAC: 2,038
Message 56001 - Posted: 2 Apr 2017, 22:54:05 UTC
Last modified: 2 Apr 2017, 22:55:47 UTC

Hi,

We have some V1 (V0) and V3 Xeon(s) workstations that we use for crunching, both of which have AVX and AVX2 respectively, and I was wondering if these are used by CPDN?

And if so, if you would expect a big difference between AVX and AVX2, the latter performing better.

I did a search within the message boards for AVX and AVX2 but nothing was forth coming.

Thanks.
ID: 56001 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 56002 - Posted: 2 Apr 2017, 23:52:11 UTC - in response to Message 56001.  
Last modified: 2 Apr 2017, 23:53:25 UTC

The requirements are:
That the processor(s) use cisc
That they have SSE2
That they run Windows, Linux, or a Mac OS
And that computers using 64 bit Linux need to have 32 bit libraries installed.
Also, 2 Gigs of ram per processor is recommended.

AVX and AVX2 are of no importance.

According to BOINCstats, there are Xeons of some type running/used to run tasks here.
ID: 56002 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 56003 - Posted: 3 Apr 2017, 1:40:16 UTC

I doubt if AVX/AVX2 are utilized in the compile of the cpdn models. There's a wide range of processor generations running cpdn tasks. They likely are trying to keep the optimizations across those processors as consistent as possible.
ID: 56003 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 53,408,433
RAC: 2,038
Message 56005 - Posted: 4 Apr 2017, 9:44:38 UTC

Thanks for the information.
ID: 56005 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 56023 - Posted: 7 Apr 2017, 20:07:33 UTC - in response to Message 56002.  

According to BOINCstats, there are Xeons of some type running/used to run tasks here.


I have a 4-core Xeon 64-bit processor running on my current machine. Unfortunately, only 1.8 GHz. It turns out it turns out work faster than my former machine with two hyperthreaded 3.06 GHz Xeons on it. I run Red Hat Enterprise Linux on my machine. I started with RHEL 3, then RHEL 5, now RHEL 6.9. RHEL 7 has been out there for some time, but I have not upgraded. Red Hat support their releases for 10 years. CentOS distributes an OS that is, essentially the same as the RHEL releases, but for free. I ran CentOS4 on an old machine for a long time: two Intel Pentium 3 processors on that one.
ID: 56023 · Report as offensive     Reply Quote
Venkatesh Srinivas

Send message
Joined: 7 May 17
Posts: 16
Credit: 3,480,030
RAC: 2,845
Message 56320 - Posted: 1 Jun 2017, 14:50:24 UTC

I profiled the instruction mix of the wah2rm3m2t_um_8.25_i686-pc-linux-gnu model on a platform with SSE* and AVX. As far as I can tell, it uses a mix of x87 and SSE instructions only.

Substantial time (~5%) is spent in libm's powf(), which uses legacy x87 instructions. Is there some other way the model could do exponentiation? Ditto for log10. (Both are at FP32 precision afaict).

Modern CPUs would prefer (in energy per FLOP) different instructions.
ID: 56320 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,884,997
RAC: 4,577
Message 56321 - Posted: 1 Jun 2017, 17:54:50 UTC
Last modified: 1 Jun 2017, 17:58:18 UTC

I have no special knowledge of the scope of the project's software development. However, the usual description is that the core code is the Met Office's FORTRAN source, which is then tailored by the project to the BOINC distributed computing platform and CPDN.

Over the years the project has been less concerned with model performance than might perhaps be expected, but the explanation is partly attributable to the ensemble method of modelling. Presumably the project would prefer an ensemble (i.e. 1000's of models) to yield useful information within the timescale of project funding or a PhD. That ensemble runs on a very unreliable massively parallel virtual machine (i.e. our computers). Having a fast model will bring forward the point at which the ensemble becomes useful, but so will improving reliability, or splitting the model runs into shorter runs that volunteers are prepared to download.

It would be fascinating to see a report of how the project team responds to the progress of an actual ensemble, from conception to publication.
ID: 56321 · Report as offensive     Reply Quote
pvh

Send message
Joined: 9 Apr 14
Posts: 14
Credit: 1,962,018
RAC: 0
Message 56362 - Posted: 11 Jun 2017, 8:49:41 UTC - in response to Message 56320.  

Substantial time (~5%) is spent in libm's powf(), which uses legacy x87 instructions. Is there some other way the model could do exponentiation? Ditto for log10. (Both are at FP32 precision afaict).


There are well known speed issues with the standard linux math library, especially the single-precision math functions. The stance of the library developers has been clearly stated: they only care about accuracy, not speed. It looks like they did not do any effort to optimize single-precision math functions for speed. They often are substantially slower than their double-precision counterparts. It looks like they only cared about ticking off the box "added support for single-precision math functions"... That being said, there has been some improvement in more recent math library versions, in part due to complaints from users, and also third parties submitting their own (better optimized) versions. My hunch is that CPDN is using a very old math library version... If so, switching to a newer version may already help. Avoiding single-precision math functions likely will also help...
ID: 56362 · Report as offensive     Reply Quote
Alex Plantema

Send message
Joined: 3 Sep 04
Posts: 126
Credit: 26,610,380
RAC: 3,377
Message 56365 - Posted: 11 Jun 2017, 18:47:49 UTC - in response to Message 56362.  

Mathematical functions can easily be calculated in every desired precision using a polynomal or rational function, see J.F. Hart et al., Computer approximations.
ID: 56365 · Report as offensive     Reply Quote
Venkatesh Srinivas

Send message
Joined: 7 May 17
Posts: 16
Credit: 3,480,030
RAC: 2,845
Message 56381 - Posted: 14 Jun 2017, 11:42:55 UTC

It looked like a single-precision powf() was used; SSE2 at least can match precision trivially.

I suspect that no one has optimized 32-bit libm for recent processors. 64-bit libm uses MULSS (SSE) on my system instead of x87.
ID: 56381 · Report as offensive     Reply Quote

Message boards : Number crunching : AVX and AVX2; Is it used at CPDN?

©2024 cpdn.org