climateprediction.net home page
Possibly Optimized Linux model to download for Beta Testers

Possibly Optimized Linux model to download for Beta Testers

Questions and Answers : Unix/Linux : Possibly Optimized Linux model to download for Beta Testers
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 1433 - Posted: 22 Aug 2004, 0:27:21 UTC
Last modified: 22 Aug 2004, 17:47:36 UTC

[original message edited out by CC -- "alternative" UM isn't so great]
ID: 1433 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 1439 - Posted: 22 Aug 2004, 1:32:59 UTC
Last modified: 22 Aug 2004, 3:27:44 UTC

Hi, Carl,

Second attempt on Bbox went okay. No clue as to why ..._um errored first time. Another D/L & install did the trick.

First attempt on Abox went okay.

Jim

________________________________________________
Video meliora, proboque; Deteriora sequor
I see the better way, and approve it; I follow the worse
-- Ovid (43BC-17AD)
ID: 1439 · Report as offensive     Reply Quote
Desti

Send message
Joined: 6 Aug 04
Posts: 124
Credit: 9,195,838
RAC: 0
Message 1440 - Posted: 22 Aug 2004, 1:35:12 UTC

I have run it some minutes on my machines, Athlon XP @ 1200 mhz, phase 1, old avg 4,36, new 4,36.
Athlon 64 3000+, phase 2, old 2.40, new 2,40

Was that too short to see any differences?
_____
<a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=43">Linux Users Everywhere @ climateprediction.net</a>
<br>
ID: 1440 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2185
Credit: 64,822,615
RAC: 5,275
Message 1442 - Posted: 22 Aug 2004, 1:49:34 UTC - in response to Message 1440.  
Last modified: 22 Aug 2004, 1:50:04 UTC

&gt; I have run it some minutes on my machines, Athlon XP @ 1200 mhz, phase 1, old
&gt; avg 4,36, new 4,36.
&gt; Athlon 64 3000+, phase 2, old 2.40, new 2,40
&gt;
&gt; Was that too short to see any differences?

This brings up the question of how the sec/ts is calculated.

It seems to be a long term average as opposed to something over the last several minutes. Is this the case?

I just ask because on the "classic" client, the sec/ts would invariably change over the course of a model year (1.98 to 2.06 for example), whereas in the BOINC version, it might change from 2.11 to 2.12 over the course of several years.
ID: 1442 · Report as offensive     Reply Quote
old_user194

Send message
Joined: 5 Aug 04
Posts: 63
Credit: 21,399,117
RAC: 0
Message 1447 - Posted: 22 Aug 2004, 7:27:26 UTC
Last modified: 22 Aug 2004, 7:33:54 UTC

First results.
First trickle just uploaded.
It is not <i>obviously</i> faster than the previous version, lucky I kept a copy, AVG still = 4.58 (Athlon 2200+ running at 1.8GHz, 256MB RAM, SuSE 9.0, KDE), but as the TS is currently 237000-odd the law of large numbers means that it would have to be dramatically faster, or slightly slower, to have changed the AVG in the couple of hours I've been running it. Looking at the DLT entry suggests that it might even be very slightly slower, the 'radiation period' step now shows more often 18.xx sec mumble mumble why does it keep trimming my post?
ID: 1447 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 1451 - Posted: 22 Aug 2004, 8:52:07 UTC - in response to Message 1447.  

it would be too short to use the "avg sec / ts" if you're far into, but if you wait until two trickles have gone through you can calculate by hand from the trickle info (i.e. second trickle with the new UM - first trickle with the new UM)

ID: 1451 · Report as offensive     Reply Quote
old_user147

Send message
Joined: 5 Aug 04
Posts: 30
Credit: 422,225
RAC: 0
Message 1457 - Posted: 22 Aug 2004, 9:49:07 UTC

Incredible, Carl! Good work!

On my P4 2Ghz the sec/ts went down from 3.4 to 2.67. This is 27 % faster!
I hope it calculates correct.
What have you done with the compiler?
ID: 1457 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 1462 - Posted: 22 Aug 2004, 10:54:23 UTC - in response to Message 1457.  

&gt; Incredible, Carl! Good work!
&gt; On my P4 2Ghz the sec/ts went down from 3.4 to 2.67. This is 27 % faster!
&gt; I hope it calculates correct.
&gt; What have you done with the compiler?

hopefully someone with the new model will finish a phase to verify it keeps the calculations correct (I believe they will, but you never know with this "sensitive" application). I'm using these settings now:

FFLAGS = -noreentrancy -nothreads -Vaxlib -static -static-libcxa -cm -w90 -w95 -tpp7 -tune -axW -unroll -lowercase -vms -nofor_main

It should still run on anything, but vectorizes/parallelizes some code and optimizes loops, I imagine it's probably only really noticeable on P4's since Intel's compilers seem to do a good job of not letting AMD's take advantage even when they are wholly compatible ops!

ID: 1462 · Report as offensive     Reply Quote
old_user300

Send message
Joined: 6 Aug 04
Posts: 7
Credit: 147,277
RAC: 0
Message 1464 - Posted: 22 Aug 2004, 11:25:34 UTC

What version of the compiler are you using?. It looks from the replies so far that the speed up only occurs with intel kit.

IFC 7.1.040 introduced the 'Genuine Intel' bug/feature which deliberately unset the K and W flags for non-intel kit. This apparently has been fixed in 8.0 versions and there is a patch to libirc.a to circumvent it.

See
http://softwareforums.intel.com/ids/board/message?board.id=11&amp;message.id=1574&amp;view=by_date_ascending&amp;page=1


ID: 1464 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 1465 - Posted: 22 Aug 2004, 11:35:03 UTC - in response to Message 1464.  
Last modified: 22 Aug 2004, 12:01:49 UTC

hmm yeah, well that refers to a runtime bug that 8.0 fixed, but it's given me an idea to try the -xK flag for P3 optimizes forced on always. That would mean you need a P3 to run CPDN, which probably isn't a bad cutoff since it's going to be pathetic on a P1 or P2 (i.e. take a 9 months to run a model).

see the original post, but this zip now contains a model that forces P3 opts (so it should speed up on AMDs as well as Pentiums)


ID: 1465 · Report as offensive     Reply Quote
Profile tullio

Send message
Joined: 6 Aug 04
Posts: 264
Credit: 965,476
RAC: 0
Message 1467 - Posted: 22 Aug 2004, 12:04:20 UTC - in response to Message 1465.  

&gt; hmm yeah, well that refers to a runtime bug that 8.0 fixed, but it's given me
&gt; an idea to try the -xK flag for P3 optimizes forced on always. That would
&gt; mean you need a P3 to run CPDN, which probably isn't a bad cutoff since it's
&gt; going to be pathetic on a P1 or P2 (i.e. take a 9 months to run a model).
&gt;
&gt;
&gt;
The new model runs on my SuSE 9.1 with no problem. I am a little discouraged since I have only a Pentium II CPU. My average s/TS is 17.18 s. How many years will it have to run before completion?
ID: 1467 · Report as offensive     Reply Quote
old_user147

Send message
Joined: 5 Aug 04
Posts: 30
Credit: 422,225
RAC: 0
Message 1469 - Posted: 22 Aug 2004, 12:16:53 UTC - in response to Message 1467.  

&gt; The new model runs on my SuSE 9.1 with no problem. I am a little discouraged
&gt; since I have only a Pentium II CPU. My average s/TS is 17.18 s. How many years
&gt; will it have to run before completion?
&gt;
&gt;
After 155 cpu-days, little more than 5 months, it should be complete.
ID: 1469 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 1470 - Posted: 22 Aug 2004, 12:18:12 UTC - in response to Message 1467.  
Last modified: 22 Aug 2004, 12:18:35 UTC

Hi, did you just download the new model, because according to Intel it shouldn't even run on a PII (since I force P3 procs with -xK flag). But 17 seconds per timestep makes sense for a PII, which is why I was planning to "force" P3's, since the model takes months even on a P3. at 17 sec/ts that's over 5 months, so it may be best for P2 users to run BOINC with SETI &amp; predictor.
ID: 1470 · Report as offensive     Reply Quote
Profile tullio

Send message
Joined: 6 Aug 04
Posts: 264
Credit: 965,476
RAC: 0
Message 1473 - Posted: 22 Aug 2004, 12:58:53 UTC - in response to Message 1470.  

&gt; Hi, did you just download the new model, because according to Intel it
&gt; shouldn't even run on a PII (since I force P3 procs with -xK flag). But 17
&gt; seconds per timestep makes sense for a PII, which is why I was planning to
&gt; "force" P3's, since the model takes months even on a P3. at 17 sec/ts that's
&gt; over 5 months, so it may be best for P2 users to run BOINC with SETI &amp;
&gt; predictor.
&gt;
&gt;
Yes I downloaded the new model. I was running seti@home but that programs is full of problems and is more out than running so I shifted to climate. I may go back when it starts running again.
ID: 1473 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 1478 - Posted: 22 Aug 2004, 13:11:19 UTC - in response to Message 1473.  
Last modified: 22 Aug 2004, 13:13:39 UTC

OK, well I'm sure we'll have our own share of problems; that's the nice thing about BOINC, when one project is down you can get work from another. According to the Intel manual it shouldn't run on a P2 with the settings I used! But if it does, and it optimizes Pentiums &amp; AMD's so much the better!

I see pretty dramatic performance increases with this build (3 sec to 2.4 sec on a Pentium4 on Linux; 2.5 seconds to 2.2 seconds on my AMD64 in Windows); so hopefully it doesn't mess up the model calcs.

If anyone is near a "phase change" (i.e. near 33.33%, 66.66%, or completion) and is trying out this optimized UM please let me know, as I would like to get the *.gmts.* and *.rmts.* files in your dataout dir to see if the calcs are sensible.

ID: 1478 · Report as offensive     Reply Quote
old_user300

Send message
Joined: 6 Aug 04
Posts: 7
Credit: 147,277
RAC: 0
Message 1483 - Posted: 22 Aug 2004, 13:52:34 UTC
Last modified: 22 Aug 2004, 14:09:00 UTC

&gt; Note: for "advanced" users only -- you may crash your current run!

Well I can't say I wasn't warned!

I shut down the client running 2 phase 3 wus and copied and chmoded the new executable.

I restarted the client and it got as far as Resumming CPDN for the two wus and then the two runs zombied!

I am sad!

I have now restored the project from backup and restarted with the old executable. So far so good. But it looks like the new one doesn't like Opterons
ID: 1483 · Report as offensive     Reply Quote
Desti

Send message
Joined: 6 Aug 04
Posts: 124
Credit: 9,195,838
RAC: 0
Message 1484 - Posted: 22 Aug 2004, 14:02:09 UTC - in response to Message 1478.  


&gt;
&gt; If anyone is near a "phase change" (i.e. near 33.33%, 66.66%, or completion)
&gt; and is trying out this optimized UM please let me know, as I would like to get
&gt; the *.gmts.* and *.rmts.* files in your dataout dir to see if the calcs are
&gt; sensible.
&gt;
&gt;
&gt;

I have switched my Athlon XP to the new model at timestep ~170000.

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=2064

_____
<a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=43">Linux Users Everywhere @ climateprediction.net</a>
<br>
ID: 1484 · Report as offensive     Reply Quote
old_user147

Send message
Joined: 5 Aug 04
Posts: 30
Credit: 422,225
RAC: 0
Message 1485 - Posted: 22 Aug 2004, 14:44:23 UTC - in response to Message 1478.  

Carl: Which option should make the build not run on P2s? If you mean -tpp7, it only optimizes for P4,P-M,... but you can run it on older machines, too. The same with -axW.

My next "phase change" will happen next weekend.

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=277
ID: 1485 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 1486 - Posted: 22 Aug 2004, 14:49:22 UTC - in response to Message 1478.  
Last modified: 22 Aug 2004, 15:40:01 UTC

&gt; OK, well I'm sure we'll have our own share of problems; that's the nice thing
&gt; about BOINC, when one project is down you can get work from another.
&gt; According to the Intel manual it shouldn't run on a P2 with the settings I
&gt; used! But if it does, and it optimizes Pentiums &amp; AMD's so much the
&gt; better!
&gt;
&gt; I see pretty dramatic performance increases with this build (3 sec to 2.4 sec
&gt; on a Pentium4 on Linux; 2.5 seconds to 2.2 seconds on my AMD64 in Windows); so
&gt; hopefully it doesn't mess up the model calcs.
&gt;
&gt; If anyone is near a "phase change" (i.e. near 33.33%, 66.66%, or completion)
&gt; and is trying out this optimized UM please let me know, as I would like to get
&gt; the *.gmts.* and *.rmts.* files in your dataout dir to see if the calcs are
&gt; sensible.
&gt;

Hi, Carl,

One of the runs on Abox is at Phase 2 TS 217100 and should have done its end-of-Phase thing by the time I return Tuesday (evening, GMT) from a bit of fun from a test surely conceived by the arch-fiend of the lower dungeon at the Marquis d'Sade School of Medicine.

If you still require results then, this run should be available.

Jim

*edit*
Abox, P4 2.8, SuSE 9.0, was 3.55 sec/TS, now 2.94 &amp; 3.01 sec/TS
Bbox, P4 3.0, SuSE 9.0, was 3.52 &amp; 3.68, now 2.92 sec/TS


________________________________________________
Video meliora, proboque; Deteriora sequor
I see the better way, and approve it; I follow the worse
-- Ovid (43BC-17AD)
ID: 1486 · Report as offensive     Reply Quote
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 1489 - Posted: 22 Aug 2004, 15:35:11 UTC - in response to Message 1485.  

&gt; Carl: Which option should make the build not run on P2s? If you mean -tpp7, it
&gt; only optimizes for P4,P-M,... but you can run it on older machines, too. The
&gt; same with -axW.

the option I'm using on the build in the zip's now is: -xK
using -ax* seems to allow Intel Fortran programs to "choose" giving the unoptimize d "generic" IA32 code for AMD procs, but xK means everybody running will need PIII-compatibility at least. Perhaps it's a little too "strict" for a 10-20% performance gain? We never really had anyone run with less than a P3 on the old CPDN anyway

Anyway the -xK is supposed to be "Pentium III compatible only", although 'eeyore' reported a crash on an AMD Opterton so perhaps that's too "strict?" Any other Opteron users? It's chugging along fine on my AMD64 (after only a few hours, doing about 2.24 sec per ts versus 2.46 on the regular beta UM).

ID: 1489 · Report as offensive     Reply Quote
1 · 2 · Next

Questions and Answers : Unix/Linux : Possibly Optimized Linux model to download for Beta Testers

©2024 cpdn.org