climateprediction.net (CPDN) home page
Thread 'Relative performance question.'

Thread 'Relative performance question.'

Message boards : Number crunching : Relative performance question.
Message board moderation

To post messages, you must log in.

AuthorMessage
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 68322 - Posted: 14 Feb 2023, 21:03:20 UTC

I have run some Oifs_ps tasks and some _bl tasks. I know the compute program that does almost all the work is the same. Yet they seem to work differently. In particular, the Average processing rate (GFLOPS) is very different. Why is this?

OpenIFS 43r3 Perturbed Surface 1.05 x86_64-pc-linux-gnu
Number of tasks completed 	223
Max tasks per day 	227
Number of tasks today 	0
Consecutive valid tasks 	223
Average processing rate 	28.23 GFLOPS
Average turnaround time 	3.32 days

OpenIFS 43r3 Baroclinic Lifecycle 1.11 x86_64-pc-linux-gnu
Number of tasks completed 	19
Max tasks per day 	13
Number of tasks today 	21
Consecutive valid tasks 	9
Average processing rate 	6.97 GFLOPS
Average turnaround time 	0.72 days

ID: 68322 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,476,460
RAC: 15,681
Message 70031 - Posted: 9 Nov 2023, 13:16:06 UTC - in response to Message 68322.  

The configuration of the two models is different even though it's the same executable.
---
CPDN Visiting Scientist
ID: 70031 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 70032 - Posted: 9 Nov 2023, 14:33:10 UTC

I see it as analogous to the regional models which have the same executables but can have differences in area covered, complexity of the areas and resolution.
ID: 70032 · Report as offensive     Reply Quote
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 150
Credit: 12,830,559
RAC: 228
Message 70033 - Posted: 9 Nov 2023, 15:30:26 UTC

Presumably the _bl is waiting for memory fetch or disk io a lot more than the _ps which is happily sitting in loops computing and racking up the flops.
ID: 70033 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,476,460
RAC: 15,681
Message 70034 - Posted: 9 Nov 2023, 16:54:09 UTC - in response to Message 70033.  

Presumably the _bl is waiting for memory fetch or disk io a lot more than the _ps which is happily sitting in loops computing and racking up the flops.
Nothing to do with memory nor IO. As I said previously they are two very different model configurations. The 'BL' app is running an idealised planet with no land, so all the land surface process code in the model does not run. The PS app is a normal model forecast but with perturbed parameters which potentially gives a different execution time for each individual forecast.
---
CPDN Visiting Scientist
ID: 70034 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,716,561
RAC: 8,355
Message 70035 - Posted: 9 Nov 2023, 18:11:26 UTC

It's still an odd observation, though. The 'number of floating point operations completed per second' should be, to a first approximation, pretty much a constant for any given CPU. Other CPUs, with different architectures, speeds, heat dissipation etc. will differ.

Jean-David's figures look as if they've been copied from the 'Application details' page on this website for his host: that figure is maintained by the server, and I think it's usually the average of the last 100 tasks. What I can't remember offhand is where and how the original 'figures to be averaged' are derived. BOINC in general doesn't make any attempt to count the number of FPOPs performed: it will be calculated from some combination of task size, CPU time, CPU benchmark speed, and (in some circumstances) the credit granted by the project.

The figure for any given host/app/task isn't really significant, but the figure for the BOINC platform as a whole, across all hosts and projects, does matter. It's used to demonstrate the power of the BOINC system to scientific researchers seeking to harness the resources made available through the platform, and also to potential funding providers.
ID: 70035 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,476,460
RAC: 15,681
Message 70037 - Posted: 9 Nov 2023, 20:48:01 UTC - in response to Message 70035.  

Richard, I noted sometime ago the flops/sec figure given for my machines is nonsense. The values are not in the same proportion to the true CPU performance. I have long suspected that code is broken in boinc.
ID: 70037 · Report as offensive     Reply Quote
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 150
Credit: 12,830,559
RAC: 228
Message 70038 - Posted: 10 Nov 2023, 5:44:25 UTC - in response to Message 70034.  

Presumably the _bl is waiting for memory fetch or disk io a lot more than the _ps which is happily sitting in loops computing and racking up the flops.
Nothing to do with memory nor IO. As I said previously they are two very different model configurations. The 'BL' app is running an idealised planet with no land, so all the land surface process code in the model does not run. The PS app is a normal model forecast but with perturbed parameters which potentially gives a different execution time for each individual forecast.


It does not matter what a given CPU is crunching on, it will crunch at the same rate unless it is doing something other than crunch, I was trying to work out what that might be.
ID: 70038 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,476,460
RAC: 15,681
Message 70039 - Posted: 10 Nov 2023, 10:23:16 UTC - in response to Message 70038.  
Last modified: 10 Nov 2023, 10:36:35 UTC

It does not matter what a given CPU is crunching on, it will crunch at the same rate unless it is doing something other than crunch, I was trying to work out what that might be.
That's not quite accurate. It does matter what the code is executing for compute performance. Some parts of the model code execute large loops of triads (e.g. w=x+y*z) which make good use of the vector instructions & floating point units, giving high flops per instruction. However, other parts of the code, particularly in the physical parameterizations, have to execute lots of conditional branches (e.g. is there a cloud? is there sunlight? is the land desert/grass/tree? etc.). Yes, the chip will be using lookahead but overall the 'crunch rate' will be much less.

So if we are running an aquaplanet simulation (which is what the BL app configuration uses), there will be no 'land processes' to simulate and the compute performance will be different from a normal forecast.

Hope that helps.
---
CPDN Visiting Scientist
ID: 70039 · Report as offensive     Reply Quote

Message boards : Number crunching : Relative performance question.

©2024 cpdn.org