climateprediction.net (CPDN) home page
Thread 'Credit_per_cpu_second efficiency measure'

Thread 'Credit_per_cpu_second efficiency measure'

Message boards : Number crunching : Credit_per_cpu_second efficiency measure
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user677346

Send message
Joined: 16 Apr 12
Posts: 6
Credit: 19,102
RAC: 0
Message 45297 - Posted: 2 Dec 2012, 10:48:55 UTC

I'd like to calculate a rough but broad-based estimate of CPU performance and efficiency using CPDN data on the tens of thousands of active and recently-active hosts.

This was my plan:
1) download the stats/host.xml file
2) extract total_credit, p_vendor, p_model, os_name, os_version, n_cpus, credit_per_cpu_second, m_nbytes
3) add in processor-specific data like cache size, bus speed, TDP, etc by matching p_model/vendor to some database (like wikipedia?)
4) calculate mean credit_per_cpu_second by processor model and speed (multiplied by n_cpus), weighted by total_credit
5) possibility to control by cache size, bus speed, OS, RAM
6) use TDP numbers to calculate crude measure of calculation/wattage efficiency


A couple of hangups, though:
a) a lot of hosts in the host.xml file list credit_per_cpu_second as 0.000000000 even though the host does have credit registered to it. I guess I'll have to throw these out - I traced a couple and found that the tasks for the hosts have disappeared, so there is no measure of cpu seconds. Is there any reason for this disappearance that might affect the statistical validity of these calculations?
b) Is credit comparable across models? If not, I'll need a way to discern which model a host's credit is attributable to, and then I'll have to split the calculations by model.

Any thoughts are greatly appreciated

Philip
ID: 45297 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,011,472
RAC: 21,368
Message 45298 - Posted: 2 Dec 2012, 16:49:45 UTC - in response to Message 45297.  

Another variable is operating system. I seem to remember reading that tasks run more efficiently on windows machines. I don't know how significant this is or if you accept this and assume the distribution of OS types is the same across all processor types?
ID: 45298 · Report as offensive     Reply Quote
ProfileBonsai911

Send message
Joined: 9 Sep 04
Posts: 228
Credit: 30,750,791
RAC: 3,898
Message 45299 - Posted: 2 Dec 2012, 17:48:22 UTC

I guess, os depending speed of calculation also varies by model type. Some prefer Linux, the other windows.

greetings from hamburg

bonsai911
ID: 45299 · Report as offensive     Reply Quote
ProfileastroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 45300 - Posted: 2 Dec 2012, 18:10:50 UTC

... and there is the compiler issue: Intel CPUs fare better with the Intel compiler.

Carl, former lead developer for CPDN, ran such a CPU comparison several years ago. Does anyone have a copy -- or recall any conclusions? (My copy was in Linux -- and I've since sent Linux to the bit-bucket.)
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 45300 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 45301 - Posted: 2 Dec 2012, 21:15:20 UTC
Last modified: 2 Dec 2012, 21:16:51 UTC

... found that the tasks for the hosts have disappeared ...

This project is different to most others, in that credit is based on the return of "trickle" data, and not given as a lump sum on completion of a model.
However, it was found soon after converting from a pre-BOINC system to BOINC, that BOINC had difficulty with this, and on occasion would also allocate credit to some work on completion as well.

So crediting was changed to a script that ran through all work at short intervals, recalculating credit as it went. But as the returned results rapidly increased, this started to take up too much time. So the script was only run twice a day, and then once per day.

But even this was taking up hours of server time, and it was decided to make a cut off point, calculate the credit up to then, store these values, and archive the results elsewhere. Then only the remaining results would be rescanned each day, and the stored credit values added to the credits from the daily scans to produce a total credit.

This is the reason for the missing results that you mention, and if you look at your Account page, you'll see 2 lines not far from the top that say Archived.


Is credit comparable across models?

Yes and no.
Credit per trickle is different for each type of model, and depends on the amount of time taken by that model type to complete a given interval.
The Coupled Ocean models for instance, are more fpu intense than the Regional models.
But an attempt is made to make the "credits per amount of work' comparable across all models.
Backups: Here
ID: 45301 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 45302 - Posted: 2 Dec 2012, 21:33:10 UTC

And another thing ...

The work in this project isn't intended to run to completion at all times.
Each model will only run for as long as the many variables produce a stable climate system. If the starting values are such that a model becomes unstable, then the work will be terminated.
Which is the reason for trickles - small amounts of data gets returned via them, and the researchers can tell roughly where the model crashed by where data stops getting sent back.

So listed credit may be for a different number of trickles/way-through-the-model.


Backups: Here
ID: 45302 · Report as offensive     Reply Quote

Message boards : Number crunching : Credit_per_cpu_second efficiency measure

©2024 cpdn.org