Questions and Answers :
Unix/Linux :
Benchmarks and other problems
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
It will be nice to see some work done by the Linux boxes that are currently trashing everything they get because of missing 32bit libs.:) Are they sending work-units to Linux boxes that are trashing everything? I have a Linux box that has the 32-bit compatibility libraries but I have received almost no work units in about a year except a coupla retreads. Then day before yesterday, I got the current four work units that are crunching away. One has generated two trickles and the other three have generated one trickle each. It seems to me that if they were sending out 32-bit work units to boxen missing necessary libraries, that I would have been getting some of them too. But I have not. Wed 19 Jun 2019 03:27:26 PM EDT | Finished upload of hadam4_a027_200610_12_825_011882434_0_r411654165_1.zip Wed 19 Jun 2019 04:11:36 PM EDT | Sending scheduler request: To send trickle-up message. |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,976,682 RAC: 21,948 |
It seems to me that if they were sending out 32-bit work units to boxen missing necessary libraries, that I would have been getting some of them too. But I have not. Batch 825 550 (HADAM4) tasks for linux has the statistics shown. Of the 38% hard fails, i.e. failed on all three attempts, each one has either 2 or three fails because of missing 32bit libs. Success: 0 (0%) Fails: 208 (38%) Hard Fail: 37 (7%) Running: 513 (93%) Unsent: 0 (0%) |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I failed one of these on work unit 21490395. Not because of missing libraries, but because my machine crashed, and it was the version that could not tolerate machine restarts. UK Met Office HadAM4 at N144 resolution v8.08 i686-pc-linux-gnu stderr out <core_client_version>7.2.33</core_client_version> <![CDATA[ <message> process exited with code 22 (0x16, -234) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... Model crashed: READDUMP: BAD BUFFIN OF DATA tmp/xnnuj.pipe_dummy Model crashed: READDUMP: BAD BUFFIN OF DATA tmp/xnnuj.pipe_dummy Model crashed: READDUMP: BAD BUFFIN OF DATA tmp/xnnuj.pipe_dummy Model crashed: READDUMP: BAD BUFFIN OF DATA tmp/xnnuj.pipe_dummy Model crashed: READDUMP: BAD BUFFIN OF DATA tmp/xnnuj.pipe_dummy Model crashed: READDUMP: BAD BUFFIN OF DATA tmp/xnnuj.pipe_dummy Sorry, too many model crashes! :-( 13:45:02 (3083): called boinc_finish(22) </stderr_txt> ]]> This problem seems to have been fixed with the UK Met Office HadAM4 at N144 resolution v8.09 version of the software. I have four of those running and they are uploading and trickling OK. They seem to be running about twice as fast as the expected completion time. It was expected that they would take about 1050 hours to complete, but one is at 18.455% complete after running 72.5 hours. |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,976,682 RAC: 21,948 |
[quote]I failed one of these on work unit 21490395.[/quote And that one which failed 3 times, one of its three was a lack of 32bit libs. |
Send message Joined: 7 Aug 04 Posts: 2185 Credit: 64,822,615 RAC: 5,275 |
They seem to be running about twice as fast as the expected completion time. It was expected that they would take about 1050 hours to complete, but one is at 18.455% complete after running 72.5 hours. The initial estimate for time to completion is partially based on the boinc floating point benchmark. For some reason the 7.2.33 version that comes with Redhat 6 type installations has unrealistically low benchmarks. Some thing for my Phenom II 945 on CentOS 6. 7.2.33 gives a FP benchmark of about 1600 whereas the later versions of boinc on Ubuntu have about 3000 for the same CPU. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
The initial estimate for time to completion is partially based on the boinc floating point benchmark. For some reason the 7.2.33 version that comes with Redhat 6 type installations has unrealistically low benchmarks. If that benchmark is still the widely-used "whetstone" benchmark, it is possibly the worst benchmark there could be for computing floating-point operations. It is based on the statistics obtained by running tests on "typical" programs written in Algol 60 in an interpreter (not a compiler). The interpreter was used because it was easy to implement necessary timings into the code automatically. There are about a dozen loops in the program, each executed 10,000 times if I remember correctly. Each one does either some simple calculations, or calls subroutines. The one whose subroutine does floating point operations is in the benchmark to evaluate the cost of function calls, not loop overhead, not floating point operations. It just happens to do some floating point operations. I was involved in the writing of an optimizer for the C compiler at Bell Labs in the early 1980s. One of the optimizations we used was to expand called functions inline when it made sense. So it defeated the measurement of the call and return operations that was the original purpose of that module. But then my loop invariant code motion optimization moved all those operations out of the loop, since they did not change from one iteration to another. Then a live-dead analysis eliminated all the floating point operations remaining because because the results were not used. These optimizations resulted in an enormous speed-up of our execution of that benchmark. Since these optimizations were common by 1990, they are probably in almost all compilers by now. So whatever that benchmark may have measured in 1965, it does not measure them today. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I've moved all of the preceding posts out of the thread intended to discuss the new OpenIFS models. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Are they sending work-units to Linux boxes that are trashing everything? Projects don't send work, it's requested by computers connected to it. ******************* It seems to me that if they were sending out 32-bit work units to boxen missing necessary libraries, that I would have been getting some of them too. But I have not. Your computer got 4 of the latest version of that model on 17 Jun 2019, 12:21:33 UTC. This was 3 days before you posted. ****************** One of the reasons for your low benchmarks, is that your computer is horribily slow compared to the latest computers. And if you ever expect to get anywhere with the new type of model when it's finally release, (will we EVER get to that point? ), then people should be looking at processor times in the 3 GHz area. With a lot more memory than you have. All of which will be posted about when the time comes. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
A very good idea. |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,976,682 RAC: 21,948 |
I've moved all of the preceding posts out of the thread intended to discuss the new OpenIFS models. Thanks Les, I was thinking about moving some of it myself. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
And if you ever expect to get anywhere with the new type of model when it's finally release, (will we EVER get to that point? ), then people should be looking at processor times in the 3 GHz area. When I got the machine, 1.8 GHz was not all that slow, but 3 GHz is less than double the speed of mine. It is a 64-bit Intel Xeon. Is 16 GBytes of RAM all that small? (I just doubled it from the 8 GBytes that came with the machine.) To add more RAM with this motherboard, I would need to add the second processor. I could easily put a second processor in that mother board, but it would be the same speed, though I could then handle 8 processes at once instead of only four. And I could double the RAM at the same time. I do not think any one process could get over 16 Gbytes of RAM in my setup. At some point (when the money tree blooms), getting a new machine is probably the way to go. Time Sent (UTC) Host ID Result ID Result Name Phase Timestep CPU Time (sec) Average (sec/TS) 20 Jun 2019 08:17:32 1256552 21718833 hadam4_a0c5_201310_12_825_011882792_0 1 8,741 224,334 25.6646 |
Send message Joined: 1 Sep 04 Posts: 161 Credit: 81,522,141 RAC: 1,164 |
Jean-David - I think you are going OK with your setup. I am currently running the N144 tasks on four machines 1) AMD Phenom II X4 945 3.6 GHz 23.5 sec/TS 2) AMD Phenom II X4 945 3.6 GHz 24.5 sec/TS 3) AMD FX 8370 Eight Core 4.0 GHz 18-5-20.5 sec/TS 4) AMD FX 8370 Eight Core 4.0 GHz 18-5-20.5 sec/TS I don't believe your have a "tortoise" for a machine since you are reporting 25 sec/TS. As far a memory goes, these tasks seem to be using about 650 MB each. Depending on what else you are doing on your machine, you may or may not have enough memory. Look at your memory usage (% of total). If it is over 85% when running these tasks, I would strongly consider more memory if you are using the computer for anything else. There are many other factors that determine throughput (i.e. memory speed). |
Send message Joined: 3 Sep 04 Posts: 126 Credit: 26,610,380 RAC: 3,377 |
Xeons are faster than other processors with the same clock frequency, and 16 GB should be more than enough. The project is supposed to run on home computers. |
©2024 cpdn.org