Questions and Answers :
Unix/Linux :
Possibly Optimized Linux model to download for Beta Testers
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
[original message edited out by CC -- "alternative" UM isn't so great] |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Hi, Carl, Second attempt on Bbox went okay. No clue as to why ..._um errored first time. Another D/L & install did the trick. First attempt on Abox went okay. Jim ________________________________________________ Video meliora, proboque; Deteriora sequor I see the better way, and approve it; I follow the worse -- Ovid (43BC-17AD) |
Send message Joined: 6 Aug 04 Posts: 124 Credit: 9,195,838 RAC: 0 |
I have run it some minutes on my machines, Athlon XP @ 1200 mhz, phase 1, old avg 4,36, new 4,36. Athlon 64 3000+, phase 2, old 2.40, new 2,40 Was that too short to see any differences? _____ <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=43">Linux Users Everywhere @ climateprediction.net</a> <br> |
Send message Joined: 7 Aug 04 Posts: 2185 Credit: 64,822,615 RAC: 5,275 |
> I have run it some minutes on my machines, Athlon XP @ 1200 mhz, phase 1, old > avg 4,36, new 4,36. > Athlon 64 3000+, phase 2, old 2.40, new 2,40 > > Was that too short to see any differences? This brings up the question of how the sec/ts is calculated. It seems to be a long term average as opposed to something over the last several minutes. Is this the case? I just ask because on the "classic" client, the sec/ts would invariably change over the course of a model year (1.98 to 2.06 for example), whereas in the BOINC version, it might change from 2.11 to 2.12 over the course of several years. |
Send message Joined: 5 Aug 04 Posts: 63 Credit: 21,399,117 RAC: 0 |
First results. First trickle just uploaded. It is not <i>obviously</i> faster than the previous version, lucky I kept a copy, AVG still = 4.58 (Athlon 2200+ running at 1.8GHz, 256MB RAM, SuSE 9.0, KDE), but as the TS is currently 237000-odd the law of large numbers means that it would have to be dramatically faster, or slightly slower, to have changed the AVG in the couple of hours I've been running it. Looking at the DLT entry suggests that it might even be very slightly slower, the 'radiation period' step now shows more often 18.xx sec mumble mumble why does it keep trimming my post? |
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
it would be too short to use the "avg sec / ts" if you're far into, but if you wait until two trickles have gone through you can calculate by hand from the trickle info (i.e. second trickle with the new UM - first trickle with the new UM) |
Send message Joined: 5 Aug 04 Posts: 30 Credit: 422,225 RAC: 0 |
Incredible, Carl! Good work! On my P4 2Ghz the sec/ts went down from 3.4 to 2.67. This is 27 % faster! I hope it calculates correct. What have you done with the compiler? |
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
> Incredible, Carl! Good work! > On my P4 2Ghz the sec/ts went down from 3.4 to 2.67. This is 27 % faster! > I hope it calculates correct. > What have you done with the compiler? hopefully someone with the new model will finish a phase to verify it keeps the calculations correct (I believe they will, but you never know with this "sensitive" application). I'm using these settings now: FFLAGS = -noreentrancy -nothreads -Vaxlib -static -static-libcxa -cm -w90 -w95 -tpp7 -tune -axW -unroll -lowercase -vms -nofor_main It should still run on anything, but vectorizes/parallelizes some code and optimizes loops, I imagine it's probably only really noticeable on P4's since Intel's compilers seem to do a good job of not letting AMD's take advantage even when they are wholly compatible ops! |
Send message Joined: 6 Aug 04 Posts: 7 Credit: 147,277 RAC: 0 |
What version of the compiler are you using?. It looks from the replies so far that the speed up only occurs with intel kit. IFC 7.1.040 introduced the 'Genuine Intel' bug/feature which deliberately unset the K and W flags for non-intel kit. This apparently has been fixed in 8.0 versions and there is a patch to libirc.a to circumvent it. See http://softwareforums.intel.com/ids/board/message?board.id=11&message.id=1574&view=by_date_ascending&page=1 |
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
hmm yeah, well that refers to a runtime bug that 8.0 fixed, but it's given me an idea to try the -xK flag for P3 optimizes forced on always. That would mean you need a P3 to run CPDN, which probably isn't a bad cutoff since it's going to be pathetic on a P1 or P2 (i.e. take a 9 months to run a model). see the original post, but this zip now contains a model that forces P3 opts (so it should speed up on AMDs as well as Pentiums) |
Send message Joined: 6 Aug 04 Posts: 264 Credit: 965,476 RAC: 0 |
> hmm yeah, well that refers to a runtime bug that 8.0 fixed, but it's given me > an idea to try the -xK flag for P3 optimizes forced on always. That would > mean you need a P3 to run CPDN, which probably isn't a bad cutoff since it's > going to be pathetic on a P1 or P2 (i.e. take a 9 months to run a model). > > > The new model runs on my SuSE 9.1 with no problem. I am a little discouraged since I have only a Pentium II CPU. My average s/TS is 17.18 s. How many years will it have to run before completion? |
Send message Joined: 5 Aug 04 Posts: 30 Credit: 422,225 RAC: 0 |
> The new model runs on my SuSE 9.1 with no problem. I am a little discouraged > since I have only a Pentium II CPU. My average s/TS is 17.18 s. How many years > will it have to run before completion? > > After 155 cpu-days, little more than 5 months, it should be complete. |
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
Hi, did you just download the new model, because according to Intel it shouldn't even run on a PII (since I force P3 procs with -xK flag). But 17 seconds per timestep makes sense for a PII, which is why I was planning to "force" P3's, since the model takes months even on a P3. at 17 sec/ts that's over 5 months, so it may be best for P2 users to run BOINC with SETI & predictor. |
Send message Joined: 6 Aug 04 Posts: 264 Credit: 965,476 RAC: 0 |
> Hi, did you just download the new model, because according to Intel it > shouldn't even run on a PII (since I force P3 procs with -xK flag). But 17 > seconds per timestep makes sense for a PII, which is why I was planning to > "force" P3's, since the model takes months even on a P3. at 17 sec/ts that's > over 5 months, so it may be best for P2 users to run BOINC with SETI & > predictor. > > Yes I downloaded the new model. I was running seti@home but that programs is full of problems and is more out than running so I shifted to climate. I may go back when it starts running again. |
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
OK, well I'm sure we'll have our own share of problems; that's the nice thing about BOINC, when one project is down you can get work from another. According to the Intel manual it shouldn't run on a P2 with the settings I used! But if it does, and it optimizes Pentiums & AMD's so much the better! I see pretty dramatic performance increases with this build (3 sec to 2.4 sec on a Pentium4 on Linux; 2.5 seconds to 2.2 seconds on my AMD64 in Windows); so hopefully it doesn't mess up the model calcs. If anyone is near a "phase change" (i.e. near 33.33%, 66.66%, or completion) and is trying out this optimized UM please let me know, as I would like to get the *.gmts.* and *.rmts.* files in your dataout dir to see if the calcs are sensible. |
Send message Joined: 6 Aug 04 Posts: 7 Credit: 147,277 RAC: 0 |
> Note: for "advanced" users only -- you may crash your current run! Well I can't say I wasn't warned! I shut down the client running 2 phase 3 wus and copied and chmoded the new executable. I restarted the client and it got as far as Resumming CPDN for the two wus and then the two runs zombied! I am sad! I have now restored the project from backup and restarted with the old executable. So far so good. But it looks like the new one doesn't like Opterons |
Send message Joined: 6 Aug 04 Posts: 124 Credit: 9,195,838 RAC: 0 |
> > If anyone is near a "phase change" (i.e. near 33.33%, 66.66%, or completion) > and is trying out this optimized UM please let me know, as I would like to get > the *.gmts.* and *.rmts.* files in your dataout dir to see if the calcs are > sensible. > > > I have switched my Athlon XP to the new model at timestep ~170000. http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=2064 _____ <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=43">Linux Users Everywhere @ climateprediction.net</a> <br> |
Send message Joined: 5 Aug 04 Posts: 30 Credit: 422,225 RAC: 0 |
Carl: Which option should make the build not run on P2s? If you mean -tpp7, it only optimizes for P4,P-M,... but you can run it on older machines, too. The same with -axW. My next "phase change" will happen next weekend. http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=277 |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
> OK, well I'm sure we'll have our own share of problems; that's the nice thing > about BOINC, when one project is down you can get work from another. > According to the Intel manual it shouldn't run on a P2 with the settings I > used! But if it does, and it optimizes Pentiums & AMD's so much the > better! > > I see pretty dramatic performance increases with this build (3 sec to 2.4 sec > on a Pentium4 on Linux; 2.5 seconds to 2.2 seconds on my AMD64 in Windows); so > hopefully it doesn't mess up the model calcs. > > If anyone is near a "phase change" (i.e. near 33.33%, 66.66%, or completion) > and is trying out this optimized UM please let me know, as I would like to get > the *.gmts.* and *.rmts.* files in your dataout dir to see if the calcs are > sensible. > Hi, Carl, One of the runs on Abox is at Phase 2 TS 217100 and should have done its end-of-Phase thing by the time I return Tuesday (evening, GMT) from a bit of fun from a test surely conceived by the arch-fiend of the lower dungeon at the Marquis d'Sade School of Medicine. If you still require results then, this run should be available. Jim *edit* Abox, P4 2.8, SuSE 9.0, was 3.55 sec/TS, now 2.94 & 3.01 sec/TS Bbox, P4 3.0, SuSE 9.0, was 3.52 & 3.68, now 2.92 sec/TS ________________________________________________ Video meliora, proboque; Deteriora sequor I see the better way, and approve it; I follow the worse -- Ovid (43BC-17AD) |
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
> Carl: Which option should make the build not run on P2s? If you mean -tpp7, it > only optimizes for P4,P-M,... but you can run it on older machines, too. The > same with -axW. the option I'm using on the build in the zip's now is: -xK using -ax* seems to allow Intel Fortran programs to "choose" giving the unoptimize d "generic" IA32 code for AMD procs, but xK means everybody running will need PIII-compatibility at least. Perhaps it's a little too "strict" for a 10-20% performance gain? We never really had anyone run with less than a P3 on the old CPDN anyway Anyway the -xK is supposed to be "Pentium III compatible only", although 'eeyore' reported a crash on an AMD Opterton so perhaps that's too "strict?" Any other Opteron users? It's chugging along fine on my AMD64 (after only a few hours, doing about 2.24 sec per ts versus 2.46 on the regular beta UM). |
©2024 cpdn.org