Thread 'Possibly Optimized Linux model to download for Beta Testers'

Author	Message
old_user1 Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0	Message 1433 - Posted: 22 Aug 2004, 0:27:21 UTC Last modified: 22 Aug 2004, 17:47:36 UTC [original message edited out by CC -- "alternative" UM isn't so great] ID: 1433 · Reply Quote

astroWX Volunteer moderator Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0	Message 1439 - Posted: 22 Aug 2004, 1:32:59 UTC Last modified: 22 Aug 2004, 3:27:44 UTC Hi, Carl, Second attempt on Bbox went okay. No clue as to why ..._um errored first time. Another D/L & install did the trick. First attempt on Abox went okay. Jim ________________________________________________ Video meliora, proboque; Deteriora sequor I see the better way, and approve it; I follow the worse -- Ovid (43BC-17AD) ID: 1439 · Reply Quote

Desti Send message Joined: 6 Aug 04 Posts: 124 Credit: 9,195,838 RAC: 0	Message 1440 - Posted: 22 Aug 2004, 1:35:12 UTC I have run it some minutes on my machines, Athlon XP @ 1200 mhz, phase 1, old avg 4,36, new 4,36. Athlon 64 3000+, phase 2, old 2.40, new 2,40 Was that too short to see any differences? _____ <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=43">Linux Users Everywhere @ climateprediction.net</a> <br> ID: 1440 · Reply Quote

geophi Volunteer moderator Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275	Message 1442 - Posted: 22 Aug 2004, 1:49:34 UTC - in response to Message 1440. Last modified: 22 Aug 2004, 1:50:04 UTC > I have run it some minutes on my machines, Athlon XP @ 1200 mhz, phase 1, old > avg 4,36, new 4,36. > Athlon 64 3000+, phase 2, old 2.40, new 2,40 > > Was that too short to see any differences? This brings up the question of how the sec/ts is calculated. It seems to be a long term average as opposed to something over the last several minutes. Is this the case? I just ask because on the "classic" client, the sec/ts would invariably change over the course of a model year (1.98 to 2.06 for example), whereas in the BOINC version, it might change from 2.11 to 2.12 over the course of several years. ID: 1442 · Reply Quote

old_user194 Send message Joined: 5 Aug 04 Posts: 63 Credit: 21,399,117 RAC: 0	Message 1447 - Posted: 22 Aug 2004, 7:27:26 UTC Last modified: 22 Aug 2004, 7:33:54 UTC First results. First trickle just uploaded. It is not <i>obviously</i> faster than the previous version, lucky I kept a copy, AVG still = 4.58 (Athlon 2200+ running at 1.8GHz, 256MB RAM, SuSE 9.0, KDE), but as the TS is currently 237000-odd the law of large numbers means that it would have to be dramatically faster, or slightly slower, to have changed the AVG in the couple of hours I've been running it. Looking at the DLT entry suggests that it might even be very slightly slower, the 'radiation period' step now shows more often 18.xx sec mumble mumble why does it keep trimming my post? ID: 1447 · Reply Quote

old_user1 Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0	Message 1451 - Posted: 22 Aug 2004, 8:52:07 UTC - in response to Message 1447. it would be too short to use the "avg sec / ts" if you're far into, but if you wait until two trickles have gone through you can calculate by hand from the trickle info (i.e. second trickle with the new UM - first trickle with the new UM) ID: 1451 · Reply Quote

old_user147 Send message Joined: 5 Aug 04 Posts: 30 Credit: 422,225 RAC: 0	Message 1457 - Posted: 22 Aug 2004, 9:49:07 UTC Incredible, Carl! Good work! On my P4 2Ghz the sec/ts went down from 3.4 to 2.67. This is 27 % faster! I hope it calculates correct. What have you done with the compiler? ID: 1457 · Reply Quote

old_user1 Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0	Message 1462 - Posted: 22 Aug 2004, 10:54:23 UTC - in response to Message 1457. > Incredible, Carl! Good work! > On my P4 2Ghz the sec/ts went down from 3.4 to 2.67. This is 27 % faster! > I hope it calculates correct. > What have you done with the compiler? hopefully someone with the new model will finish a phase to verify it keeps the calculations correct (I believe they will, but you never know with this "sensitive" application). I'm using these settings now: FFLAGS = -noreentrancy -nothreads -Vaxlib -static -static-libcxa -cm -w90 -w95 -tpp7 -tune -axW -unroll -lowercase -vms -nofor_main It should still run on anything, but vectorizes/parallelizes some code and optimizes loops, I imagine it's probably only really noticeable on P4's since Intel's compilers seem to do a good job of not letting AMD's take advantage even when they are wholly compatible ops! ID: 1462 · Reply Quote

old_user300 Send message Joined: 6 Aug 04 Posts: 7 Credit: 147,277 RAC: 0	Message 1464 - Posted: 22 Aug 2004, 11:25:34 UTC What version of the compiler are you using?. It looks from the replies so far that the speed up only occurs with intel kit. IFC 7.1.040 introduced the 'Genuine Intel' bug/feature which deliberately unset the K and W flags for non-intel kit. This apparently has been fixed in 8.0 versions and there is a patch to libirc.a to circumvent it. See http://softwareforums.intel.com/ids/board/message?board.id=11&message.id=1574&view=by_date_ascending&page=1 ID: 1464 · Reply Quote

old_user1 Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0	Message 1465 - Posted: 22 Aug 2004, 11:35:03 UTC - in response to Message 1464. Last modified: 22 Aug 2004, 12:01:49 UTC hmm yeah, well that refers to a runtime bug that 8.0 fixed, but it's given me an idea to try the -xK flag for P3 optimizes forced on always. That would mean you need a P3 to run CPDN, which probably isn't a bad cutoff since it's going to be pathetic on a P1 or P2 (i.e. take a 9 months to run a model). see the original post, but this zip now contains a model that forces P3 opts (so it should speed up on AMDs as well as Pentiums) ID: 1465 · Reply Quote

tullio Send message Joined: 6 Aug 04 Posts: 264 Credit: 965,476 RAC: 0	Message 1467 - Posted: 22 Aug 2004, 12:04:20 UTC - in response to Message 1465. > hmm yeah, well that refers to a runtime bug that 8.0 fixed, but it's given me > an idea to try the -xK flag for P3 optimizes forced on always. That would > mean you need a P3 to run CPDN, which probably isn't a bad cutoff since it's > going to be pathetic on a P1 or P2 (i.e. take a 9 months to run a model). > > > The new model runs on my SuSE 9.1 with no problem. I am a little discouraged since I have only a Pentium II CPU. My average s/TS is 17.18 s. How many years will it have to run before completion? ID: 1467 · Reply Quote

old_user147 Send message Joined: 5 Aug 04 Posts: 30 Credit: 422,225 RAC: 0	Message 1469 - Posted: 22 Aug 2004, 12:16:53 UTC - in response to Message 1467. > The new model runs on my SuSE 9.1 with no problem. I am a little discouraged > since I have only a Pentium II CPU. My average s/TS is 17.18 s. How many years > will it have to run before completion? > > After 155 cpu-days, little more than 5 months, it should be complete. ID: 1469 · Reply Quote

old_user1 Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0	Message 1470 - Posted: 22 Aug 2004, 12:18:12 UTC - in response to Message 1467. Last modified: 22 Aug 2004, 12:18:35 UTC Hi, did you just download the new model, because according to Intel it shouldn't even run on a PII (since I force P3 procs with -xK flag). But 17 seconds per timestep makes sense for a PII, which is why I was planning to "force" P3's, since the model takes months even on a P3. at 17 sec/ts that's over 5 months, so it may be best for P2 users to run BOINC with SETI & predictor. ID: 1470 · Reply Quote

tullio Send message Joined: 6 Aug 04 Posts: 264 Credit: 965,476 RAC: 0	Message 1473 - Posted: 22 Aug 2004, 12:58:53 UTC - in response to Message 1470. > Hi, did you just download the new model, because according to Intel it > shouldn't even run on a PII (since I force P3 procs with -xK flag). But 17 > seconds per timestep makes sense for a PII, which is why I was planning to > "force" P3's, since the model takes months even on a P3. at 17 sec/ts that's > over 5 months, so it may be best for P2 users to run BOINC with SETI & > predictor. > > Yes I downloaded the new model. I was running seti@home but that programs is full of problems and is more out than running so I shifted to climate. I may go back when it starts running again. ID: 1473 · Reply Quote

old_user1 Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0	Message 1478 - Posted: 22 Aug 2004, 13:11:19 UTC - in response to Message 1473. Last modified: 22 Aug 2004, 13:13:39 UTC OK, well I'm sure we'll have our own share of problems; that's the nice thing about BOINC, when one project is down you can get work from another. According to the Intel manual it shouldn't run on a P2 with the settings I used! But if it does, and it optimizes Pentiums & AMD's so much the better! I see pretty dramatic performance increases with this build (3 sec to 2.4 sec on a Pentium4 on Linux; 2.5 seconds to 2.2 seconds on my AMD64 in Windows); so hopefully it doesn't mess up the model calcs. If anyone is near a "phase change" (i.e. near 33.33%, 66.66%, or completion) and is trying out this optimized UM please let me know, as I would like to get the .gmts. and .rmts. files in your dataout dir to see if the calcs are sensible. ID: 1478 · Reply Quote

old_user300 Send message Joined: 6 Aug 04 Posts: 7 Credit: 147,277 RAC: 0	Message 1483 - Posted: 22 Aug 2004, 13:52:34 UTC Last modified: 22 Aug 2004, 14:09:00 UTC > Note: for "advanced" users only -- you may crash your current run! Well I can't say I wasn't warned! I shut down the client running 2 phase 3 wus and copied and chmoded the new executable. I restarted the client and it got as far as Resumming CPDN for the two wus and then the two runs zombied! I am sad! I have now restored the project from backup and restarted with the old executable. So far so good. But it looks like the new one doesn't like Opterons ID: 1483 · Reply Quote

Desti Send message Joined: 6 Aug 04 Posts: 124 Credit: 9,195,838 RAC: 0	Message 1484 - Posted: 22 Aug 2004, 14:02:09 UTC - in response to Message 1478. > > If anyone is near a "phase change" (i.e. near 33.33%, 66.66%, or completion) > and is trying out this optimized UM please let me know, as I would like to get > the .gmts. and .rmts. files in your dataout dir to see if the calcs are > sensible. > > > I have switched my Athlon XP to the new model at timestep ~170000. http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=2064 _____ <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=43">Linux Users Everywhere @ climateprediction.net</a> <br> ID: 1484 · Reply Quote

old_user147 Send message Joined: 5 Aug 04 Posts: 30 Credit: 422,225 RAC: 0	Message 1485 - Posted: 22 Aug 2004, 14:44:23 UTC - in response to Message 1478. Carl: Which option should make the build not run on P2s? If you mean -tpp7, it only optimizes for P4,P-M,... but you can run it on older machines, too. The same with -axW. My next "phase change" will happen next weekend. http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=277 ID: 1485 · Reply Quote

astroWX Volunteer moderator Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0	Message 1486 - Posted: 22 Aug 2004, 14:49:22 UTC - in response to Message 1478. Last modified: 22 Aug 2004, 15:40:01 UTC > OK, well I'm sure we'll have our own share of problems; that's the nice thing > about BOINC, when one project is down you can get work from another. > According to the Intel manual it shouldn't run on a P2 with the settings I > used! But if it does, and it optimizes Pentiums & AMD's so much the > better! > > I see pretty dramatic performance increases with this build (3 sec to 2.4 sec > on a Pentium4 on Linux; 2.5 seconds to 2.2 seconds on my AMD64 in Windows); so > hopefully it doesn't mess up the model calcs. > > If anyone is near a "phase change" (i.e. near 33.33%, 66.66%, or completion) > and is trying out this optimized UM please let me know, as I would like to get > the .gmts. and .rmts. files in your dataout dir to see if the calcs are > sensible. > Hi, Carl, One of the runs on Abox is at Phase 2 TS 217100 and should have done its end-of-Phase thing by the time I return Tuesday (evening, GMT) from a bit of fun from a test surely conceived by the arch-fiend of the lower dungeon at the Marquis d'Sade School of Medicine. If you still require results then, this run should be available. Jim edit Abox, P4 2.8, SuSE 9.0, was 3.55 sec/TS, now 2.94 & 3.01 sec/TS Bbox, P4 3.0, SuSE 9.0, was 3.52 & 3.68, now 2.92 sec/TS ________________________________________________ Video meliora, proboque; Deteriora sequor I see the better way, and approve it; I follow the worse -- Ovid (43BC-17AD) ID: 1486 · Reply Quote

old_user1 Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0	Message 1489 - Posted: 22 Aug 2004, 15:35:11 UTC - in response to Message 1485. > Carl: Which option should make the build not run on P2s? If you mean -tpp7, it > only optimizes for P4,P-M,... but you can run it on older machines, too. The > same with -axW. the option I'm using on the build in the zip's now is: -xK using -ax* seems to allow Intel Fortran programs to "choose" giving the unoptimize d "generic" IA32 code for AMD procs, but xK means everybody running will need PIII-compatibility at least. Perhaps it's a little too "strict" for a 10-20% performance gain? We never really had anyone run with less than a P3 on the old CPDN anyway Anyway the -xK is supposed to be "Pentium III compatible only", although 'eeyore' reported a crash on an AMD Opterton so perhaps that's too "strict?" Any other Opteron users? It's chugging along fine on my AMD64 (after only a few hours, doing about 2.24 sec per ts versus 2.46 on the regular beta UM). ID: 1489 · Reply Quote