Message boards : Number crunching : Time taken anomaly.
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,008,987 RAC: 21,524 |
Weather At Home 2 (wah2) 0 23500 319.68 (2.46 - 835.45) 97Where can I get a computer that will crunch these tasks in under two and a half hours? ;) |
Send message Joined: 31 May 18 Posts: 53 Credit: 4,725,987 RAC: 9,174 |
DAARPA perhaps? Area 51? The Trisolarians? |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
Buy the fastest single core CPU you can, overclock it, run a single task only and make the machine as quiet as you can :)Weather At Home 2 (wah2) 0 23500 319.68 (2.46 - 835.45) 97Where can I get a computer that will crunch these tasks in under two and a half hours? ;) Or cheat, compile the code yourself and enable multicore... :D No idea where that bogus entry in the database came from! --- CPDN Visiting Scientist |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
Ah, I recall now George had an issue with one task that failed but for some odd reason was flagged as 'complete' on the CPDN server. We couldn't understand why but that's probably what happened here. |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
Where can I get a computer that will crunch these tasks in under two and a half hours? ;) Head down the hall for "Things that we all could have had except for they didn't want them," turn right at the 100mpg carburetors, and it should be on the shelf past the "forever light bulbs." ;) Given the memory bandwidth requirements of the code (they seem heavy in memory use), I wonder what a port to ARM would do on the Apple M2/M3 chips. Lots of really fast, low latency memory might be worth the effort! |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
Given the memory bandwidth requirements of the code (they seem heavy in memory use), I wonder what a port to ARM would do on the Apple M2/M3 chips. Lots of really fast, low latency memory might be worth the effort!Except that it's shared with the GPU, so the gains are not obvious to my mind. Dedicated fast DDR5 might do better. I don't fancy the development work though, given the low numbers of macs attached to CPDN. |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
Except that it's shared with the GPU, so the gains are not obvious to my mind. Dedicated fast DDR5 might do better. I don't fancy the development work though, given the low numbers of macs attached to CPDN. It's shared with the GPU, but there's enough bandwidth to service both, and the memory bandwidth is just insane (on the order of 100GB/s to cores, and multicore can make use of most of the couple hundred gig a second of DRAM bandwidth). I don't think discrete DDR5 can touch the performance of the LPDDR5 stuff Apple is using - it's far more closely situated to the CPU (basically soldered on package), with higher bandwidth/lower latency links. And the M series chips have huge caches as well - 192kb L1I, 128kb L1D, 12 or 16MB L2, and a massive last level cache as well. As far as Macs and CPDN... I'm not surprised there aren't many connected. The only OS X tasks in the past few years have been 32-bit tasks that aren't supported on any of the last half dozen MacOS releases or so - I spun up some VMs of old MacOS versions to do the math on those, along with a bunch of other people, but most Apple users are on the latest OS, and so simply can't run the tasks that have been available. I don't know if it would be possible to do a 64-bit ARM build of [something weather simulationy] on OS X and compare performance on the M series chips as a spitball test before validating correctness, but I would expect them to be both very fast, and quite power efficient for the work done (especially the efficiency cores). |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
I'd read a bit about the new Macs but not in any great detail. I did have a Mac intel build working for OpenIFS a while ago. It was fairly straightforward as macOS is largely *unix underneath. Problem is CPDN do not have dedicated developers any more (I'm retired and working for free). And reducing their technical debt is really where they should be going for the time being and investing in new models that are likely to attract scientists to the platform. Plus I don't have an Apple silicon machine to develop on :) --- CPDN Visiting Scientist |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
I'd read a bit about the new Macs but not in any great detail. Apple has done a really, really good job with their M series CPUs. Regardless of what people may think of the rest of Apple's devices, business practices, etc, their "Apple Silicon" CPUs are genuinely astounding chips. They're ARMv8, with a bunch of custom extensions, and they're the sort of thing that ARM refused to build for a long while - just raw, uncorked, 64-bit ARMv8 chips. Apple has bolted huge amounts of cache on them (192k L1I, 128k L1D per core, vs typically 32kb L1D and 32kb L1I on x86 chips - which, worse, is shared among hyperthreads), similarly large amounts of L2 and a system level cache, and then they've got closely coupled LPDDR5 "on package" with the chip, for just gobs of bandwidth. Normally "shared GPU/CPU memory" hurts one or the other, but Apple's design has enough bandwidth to keep both of them quite happy in a range of workloads. Then they've split the chip into performance vs efficiency cores, and the performance cores are quite fast, but the efficiency cores are still insanely respectable, just on basically no power. The total system power consumption is quite low for the performance they turn in. If you're interested in CPU architectures, it's interesting to see what they've done. But I agree, it's probably not worth a huge amount of development effort right now. I don't know how bound-to-x86 the models currently in use are - if they're handcoding SSE operations, it's going to be far more work to port them than if they're using vector intrinsics or are largely non-vector based. Not having seen the source, I've no idea what it looks like. I did have a Mac intel build working for OpenIFS a while ago. It was fairly straightforward as macOS is largely *unix underneath. Apple also has Rosetta, which is a rather slick way to translate x86 binaries to ARM - the M1s, for a while, were faster at running x86 code than all but the top end x86 machines, even though they had to translate it to ARM first (yes, I know, certain people, you could get far more x86 cores in a chip at the time, but for single threaded performance, the competition was strong). If you still have a modern Mac, it might be worth loading the code up and seeing if it'll build for Apple Silicon - there should be a way to build for AS even on an x86 host. Apple has a ton of experience with "fat binaries" that have code for multiple architectures, and I think the default is to just build for everything supported these days.
Yeah... I don't know how to fix that. My guess is that a lot of the compute that was formerly CPDN based is off to various cloud providers with "crack filler" sort of compute pricing, if the big models can handle interruption. I've no idea what state of the art in simulation is these days. The work to make the Windows tasks more reliable is definitely appreciated, and I'm looking forward to a new pile of Linux tasks this summer to keep my cores busy... Plus I don't have an Apple silicon machine to develop on :) If the performance is there on them, it might be worth getting you a Mac Mini or something to mess around with! My M1 Mini was the best computer I'd ever used, but then Apple decided to spend a year or so in some weeds that drove me away from their ecosystems, and I sold it off at some rather substantial loss. :/ Now I run gutless wonders with QubesOS. But as much as it would be exciting to see, I'm not sure it's the right thing to focus on with current project priorities. Just a dream... |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
Poking about in the WAH2 code, I've noticed sections related to earlier macOS builds, plus some notes on steps to build. Might be worth a go. I presume it's no problem to install macOS on a VM these days? Testing on real hardware would be the problem though. --- CPDN Visiting Scientist |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
Poking about in the WAH2 code, I've noticed sections related to earlier macOS builds, plus some notes on steps to build. Might be worth a go. I presume it's no problem to install macOS on a VM these days? Testing on real hardware would be the problem though. I don't know about recent MacOS versions and VMs. There's probably a way, at least on x86. I don't know of any way to do an Apple Silicon VM, though. You might ask in the Mac section and see if anyone would run some of your test binaries for them. I just don't have anything AS left or I'd happily let you beat on it remotely. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
I'll chat to the Oxford folk and see what they say. Mind you, WaH is still 32bit and I don't know if that would complicate it on macOS. |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
It would, yeah. MacOS hasn't supported 32-bit binaries in a LONG while. Though depending on how it's written, you may be able to build it 64-bit and get the same results out - floating point behaviors haven't changed. I just don't know enough about the internal architecture of it or what it relies on to be able to guess. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
It's not that easy unfortunately. The fortran models call C++ and shared memory is used for the 3 processes to talk to each. Would need work to check sizes. Ok, looks like best to move to 64bit first. The other option would be to distribute on macOS as a VM. More urgent things to do first. Interesting discussion though. |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
The fortran models call C++ and shared memory is used for the 3 processes to talk to each. Would need work to check sizes. Ew. :( Yeah, that's not going to be trivial, then.
As you're porting to 64-bit, it's worth keeping MacOS-isms in mind - may as well prepare for it while you're in there. I don't think distributing MacOS as a VM is viable from a licensing perspective, and most hypervisors won't run it out of the box either - there's some hacking around and odd configuration to do to set the environment up right. As much as "Old MacOS VMs" would be slick, it's probably not really viable. Easier to just get everything into the modern, 64-bit world, and then look at Apple Silicon support from there. If it's FORTRAN and C++, and not vectorized, it shouldn't be too hard to get it working over there. But projects for a later date! Or when someone else has the time to throw at it. |
©2024 cpdn.org