climateprediction.net (CPDN) home page
Thread 'Request for 64 bit Models'

Thread 'Request for 64 bit Models'

Message boards : Number crunching : Request for 64 bit Models
Message board moderation

To post messages, you must log in.

AuthorMessage
Profileold_user27607

Send message
Joined: 28 Oct 04
Posts: 64
Credit: 34,444,555
RAC: 0
Message 17423 - Posted: 25 Nov 2005, 16:46:38 UTC

I am running a Suse 10 AMD64 3200 in 64 bit mode. Under that, a 10802 timestep sulphur runs 14 hours (almost exact). The same processor in another box under W2K 32 bit runs sulphur in 12 hours. This is a 2 hour penalty for a 64 bit OS.

If the prediction of a 30% speedup for a 64 bit executable is close to right, then making one for Linux where there is already good support for 64 bit OS would be very beneficial.

In the above example, sulphur runs could drop to around 9 hours, a *big* improvement over the 12 and 14 hour times I currently get.
ID: 17423 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 17425 - Posted: 25 Nov 2005, 17:25:24 UTC

There\'s something wrong with that performance. With a 3200+ (socket 754) in 64bit Fedora Core 3, mine ran at about 3.45 to 3.5 s/TS, which was only very slightly slower than WinXP 32bit on the same PC. Something else is going on there. Is the screensaver set to blank? Is cool n quiet off? Are you running the graphics much?
ID: 17425 · Report as offensive     Reply Quote
Profileold_user27607

Send message
Joined: 28 Oct 04
Posts: 64
Credit: 34,444,555
RAC: 0
Message 17430 - Posted: 25 Nov 2005, 19:42:13 UTC - in response to Message 17425.  
Last modified: 25 Nov 2005, 19:45:30 UTC

There\'s something wrong with that performance. With a 3200+ (socket 754) in 64bit Fedora Core 3, mine ran at about 3.45 to 3.5 s/TS, which was only very slightly slower than WinXP 32bit on the same PC. Something else is going on there. Is the screensaver set to blank? Is cool n quiet off? Are you running the graphics much?


Well, it looks like you were dead on. Apparently I failed to set the screen saver up. Suse 10 has the screen saver in a different place, and it took me about 10 minutes to find it via right click on a empty window. The SS was set to random, which could account for the extra time.

Thanks for your insight. I still think a 64 bit executable would speed things up significantly. Sixteen registers plus SSE3 ought to make a real difference.
ID: 17430 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 17432 - Posted: 25 Nov 2005, 20:15:55 UTC - in response to Message 17430.  

I still think a 64 bit executable would speed things up significantly. Sixteen registers plus SSE3 ought to make a real difference.

I agree, and it does speed things up considerably (based in testing on sulphur alpha 6-7 months ago), but there is now some concern that the results may be different enough in 64bit compared to 32bit that it might screw up model comparisons. Then there may also be automated distribution problems where BOINC has to be smart enough to download the right hadsm executable based on whether a PC is running a 64bit OS or not. But I don\'t know if BOINC can do that.
ID: 17432 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 17433 - Posted: 25 Nov 2005, 20:22:36 UTC

Another possible slowdown in your system...

If this is a socket754 3200+, and you have a Via chipset motherboard, it may be that the memory is running at 333 MHz instead of 400 MHz. Some Via chipset motherboards would automatically decrease memory speed to 333 MHz if 1 GB or more of memory was installed. CPDN is highly dependent on memory bandwidth and latency and anything that slows that down, also slows CPDN down.
ID: 17433 · Report as offensive     Reply Quote
Profileold_user27607

Send message
Joined: 28 Oct 04
Posts: 64
Credit: 34,444,555
RAC: 0
Message 17449 - Posted: 26 Nov 2005, 15:08:18 UTC - in response to Message 17433.  

Another possible slowdown in your system...

If this is a socket754 3200+, and you have a Via chipset motherboard, it may be that the memory is running at 333 MHz instead of 400 MHz. Some Via chipset motherboards would automatically decrease memory speed to 333 MHz if 1 GB or more of memory was installed. CPDN is highly dependent on memory bandwidth and latency and anything that slows that down, also slows CPDN down.


Actually, it is a 939 socket 3200 with 2x512 MB 400MHz memory.
In any case, the screen saver was the culprit, and time on the A64 in 64 bit mode has dropped to 11:32 from 14 hours. This is very close to the W2K speed, maybe a bit faster, with the same processor and memory.

One question: If CPDN is sensitive to memory, is it also sensitive to L2 cache size?

Thanks to the authors of both comments.
BillN

ID: 17449 · Report as offensive     Reply Quote
Profileold_user27607

Send message
Joined: 28 Oct 04
Posts: 64
Credit: 34,444,555
RAC: 0
Message 17450 - Posted: 26 Nov 2005, 15:27:25 UTC - in response to Message 17432.  
Last modified: 26 Nov 2005, 15:28:12 UTC

I still think a 64 bit executable would speed things up significantly. Sixteen registers plus SSE3 ought to make a real difference.

I agree, and it does speed things up considerably (based in testing on sulphur alpha 6-7 months ago), but there is now some concern that the results may be different enough in 64bit compared to 32bit that it might screw up model comparisons. Then there may also be automated distribution problems where BOINC has to be smart enough to download the right hadsm executable based on whether a PC is running a 64bit OS or not. But I don\'t know if BOINC can do that.

> \"Results may be different enough in 64bit compared to 32bit...\"
This could be bad news. I\'ve done some Fortran number crunching, though nothing on the scale of CPDN\'s 500K line monster. Clearly there is a difference in 32 and 64 bit accuracy, rounding and truncation. Tracking down the reason for the difference could be a career, not a job. :-{

The first step would be to run identical tests in AMD, Intel and your supercomputer to see where the differences lie in each section. Then compare the hardware rounding and truncation, plus over/underflow handling to establish which approach gives the most accurate calculation. Other possibilities include differences in compiler optimizations between 32 and 64 bit modes and undiscovered hardware bugs in the new 64 bit chips. This could be *really* tough to find.

ID: 17450 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 17453 - Posted: 26 Nov 2005, 17:09:09 UTC - in response to Message 17449.  

One question: If CPDN is sensitive to memory, is it also sensitive to L2 cache size?

Yes, somewhat. A socket 754 Athlon64 3400+ (2.4 GHz, 512K L2) ran sulphur in about 3.05 s/TS while a socket 754 3700+ (2.4 GHz, 1MB L2) ran it in about 2.95 s/TS. This was on the same type motherboard with the same memory/memory timings.

A big boost to memory performance, and CPDN speed, is being able to run the memory at 1T command rate. Yours is probably doing that already, but if not, I\'d give it a try.
ID: 17453 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 17454 - Posted: 26 Nov 2005, 17:16:45 UTC - in response to Message 17450.  

This could be bad news. I\'ve done some Fortran number crunching, though nothing on the scale of CPDN\'s 500K line monster. Clearly there is a difference in 32 and 64 bit accuracy, rounding and truncation. Tracking down the reason for the difference could be a career, not a job. :-{

The first step would be to run identical tests in AMD, Intel and your supercomputer to see where the differences lie in each section. Then compare the hardware rounding and truncation, plus over/underflow handling to establish which approach gives the most accurate calculation. Other possibilities include differences in compiler optimizations between 32 and 64 bit modes and undiscovered hardware bugs in the new 64 bit chips. This could be *really* tough to find.

We\'ve done informal comparisons previously. You might find that in the climateprediction.net science forum here, and/or in the phpBB forum. This was done back in the January February timeframe where PeteB and I ran identical models on different PCs. Pete ran it on two different P4s, and I ran it on an Athlon64. All the temperature time series were different from each other, albeit not by very much. The long term means were very close to each other, but the seasonal details differed a little. Pete reran it on one of his P4s, and it was slightly different again. The investigators don\'t seem bothered by those small differences. I\'m not sure if the 64 bit would make a larger difference or not.
ID: 17454 · Report as offensive     Reply Quote

Message boards : Number crunching : Request for 64 bit Models

©2024 cpdn.org