Thread 'Relative performance question.'

Author	Message
Jean-David Beyer Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154	Message 68322 - Posted: 14 Feb 2023, 21:03:20 UTC I have run some Oifs_ps tasks and some _bl tasks. I know the compute program that does almost all the work is the same. Yet they seem to work differently. In particular, the Average processing rate (GFLOPS) is very different. Why is this? OpenIFS 43r3 Perturbed Surface 1.05 x86_64-pc-linux-gnu Number of tasks completed 223 Max tasks per day 227 Number of tasks today 0 Consecutive valid tasks 223 Average processing rate 28.23 GFLOPS Average turnaround time 3.32 days OpenIFS 43r3 Baroclinic Lifecycle 1.11 x86_64-pc-linux-gnu Number of tasks completed 19 Max tasks per day 13 Number of tasks today 21 Consecutive valid tasks 9 Average processing rate 6.97 GFLOPS Average turnaround time 0.72 days ID: 68322 · Reply Quote

Glenn Carver Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331	Message 70031 - Posted: 9 Nov 2023, 13:16:06 UTC - in response to Message 68322. The configuration of the two models is different even though it's the same executable. --- CPDN Visiting Scientist ID: 70031 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4538 Credit: 19,004,017 RAC: 21,574	Message 70032 - Posted: 9 Nov 2023, 14:33:10 UTC I see it as analogous to the regional models which have the same executables but can have differences in area covered, complexity of the areas and resolution. ID: 70032 · Reply Quote

Bryn Mawr Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228	Message 70033 - Posted: 9 Nov 2023, 15:30:26 UTC Presumably the _bl is waiting for memory fetch or disk io a lot more than the _ps which is happily sitting in loops computing and racking up the flops. ID: 70033 · Reply Quote

Glenn Carver Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331	Message 70034 - Posted: 9 Nov 2023, 16:54:09 UTC - in response to Message 70033. Presumably the _bl is waiting for memory fetch or disk io a lot more than the _ps which is happily sitting in loops computing and racking up the flops. Nothing to do with memory nor IO. As I said previously they are two very different model configurations. The 'BL' app is running an idealised planet with no land, so all the land surface process code in the model does not run. The PS app is a normal model forecast but with perturbed parameters which potentially gives a different execution time for each individual forecast. --- CPDN Visiting Scientist ID: 70034 · Reply Quote

Richard Haselgrove Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,700,823 RAC: 9,977	Message 70035 - Posted: 9 Nov 2023, 18:11:26 UTC It's still an odd observation, though. The 'number of floating point operations completed per second' should be, to a first approximation, pretty much a constant for any given CPU. Other CPUs, with different architectures, speeds, heat dissipation etc. will differ. Jean-David's figures look as if they've been copied from the 'Application details' page on this website for his host: that figure is maintained by the server, and I think it's usually the average of the last 100 tasks. What I can't remember offhand is where and how the original 'figures to be averaged' are derived. BOINC in general doesn't make any attempt to count the number of FPOPs performed: it will be calculated from some combination of task size, CPU time, CPU benchmark speed, and (in some circumstances) the credit granted by the project. The figure for any given host/app/task isn't really significant, but the figure for the BOINC platform as a whole, across all hosts and projects, does matter. It's used to demonstrate the power of the BOINC system to scientific researchers seeking to harness the resources made available through the platform, and also to potential funding providers. ID: 70035 · Reply Quote

Glenn Carver Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331	Message 70037 - Posted: 9 Nov 2023, 20:48:01 UTC - in response to Message 70035. Richard, I noted sometime ago the flops/sec figure given for my machines is nonsense. The values are not in the same proportion to the true CPU performance. I have long suspected that code is broken in boinc. ID: 70037 · Reply Quote

Bryn Mawr Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228	Message 70038 - Posted: 10 Nov 2023, 5:44:25 UTC - in response to Message 70034. Presumably the _bl is waiting for memory fetch or disk io a lot more than the _ps which is happily sitting in loops computing and racking up the flops. Nothing to do with memory nor IO. As I said previously they are two very different model configurations. The 'BL' app is running an idealised planet with no land, so all the land surface process code in the model does not run. The PS app is a normal model forecast but with perturbed parameters which potentially gives a different execution time for each individual forecast. It does not matter what a given CPU is crunching on, it will crunch at the same rate unless it is doing something other than crunch, I was trying to work out what that might be. ID: 70038 · Reply Quote

Glenn Carver Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331	Message 70039 - Posted: 10 Nov 2023, 10:23:16 UTC - in response to Message 70038. Last modified: 10 Nov 2023, 10:36:35 UTC It does not matter what a given CPU is crunching on, it will crunch at the same rate unless it is doing something other than crunch, I was trying to work out what that might be. That's not quite accurate. It does matter what the code is executing for compute performance. Some parts of the model code execute large loops of triads (e.g. w=x+yz) which make good use of the vector instructions & floating point units, giving high flops per instruction. However, other parts of the code, particularly in the physical parameterizations, have to execute lots of conditional branches (e.g. is there a cloud? is there sunlight? is the land desert/grass/tree? etc.). Yes, the chip will be using lookahead but overall the 'crunch rate' will be much less. So if we are running an aquaplanet simulation (which is what the BL app configuration uses), there will be no 'land processes' to simulate and the compute performance will be different from a normal forecast. Hope that helps. --- CPDN Visiting Scientist* ID: 70039 · Reply Quote