Questions and Answers : Unix/Linux : MPICH version?
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Jan 05 Posts: 7 Credit: 14,244 RAC: 0 |
G'day all Has anyone out there in LinuxLand got BONIC on a Beowulf with MPICH (or similar)? I've got a little cluster I could use for this gig when I'm not running Nbody sims :) Cheers Steve |
Send message Joined: 15 Nov 04 Posts: 19 Credit: 35,499 RAC: 0 |
I do not have a Beowulf but I was looking into running this beast on an openMosix.org cluster. I have started a bit of discussion on the NG comp.distributed and the boinc ML@berkeley. Quite understandably the devs said they have other priorities first and unfortunately I am not enough of a programmer to do this myself. But I am certainly interested to hear from you on how you come along. PS: openMosix has the advantage that it should run Boinc OOTB and you can add and remove clients at will. Have a look and let us know what you think. |
Send message Joined: 28 Jan 05 Posts: 7 Credit: 14,244 RAC: 0 |
G'day Leggewie Yeah. I've got the option of switching to OpenMOSIX (good to have your own cluster ;) It's been on the cards as something to look at. I think you're right about Boinc OOTB. I'm not sure if it'd be worth the complication though. Might just be as easy to let each node run its own model :/ Understandable on the dev's behalf too. I was curious as a distributed wx model could be a nice app for a Beowulf. ...All those cells to be processed and all ;) I'll post back on this forum if I take it further :) Cheers Steve |
Send message Joined: 15 Nov 04 Posts: 19 Credit: 35,499 RAC: 0 |
> right about Boinc OOTB. I'm not sure if it'd be worth the complication though. > Might just be as easy to let each node run its own model :/ What I do like about the idea even without being able to run just a single model on the cluster is that is becomes possible to have something like a "boinc server" where processes are started. Then you can add and remove nodes to the cluster (for example computer that only run at day time) and the "server" distributes the load on the cluster. As far as complications go that does not seem to be such a biggy. AFAIK it does not involve much more than patching and recompiling the kernel. > I'll post back on this forum if I take it further :) Please do! |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
quote .... does not seem to be such a biggy. Over 1 million lines of fortran developed and evolved over 20-30 years by a long line of scientists / programmers, and running on a 64 bit supercomputer. And then ported to desktops. If you're comfortable working with that sort of thing, Oxford Uni had <a href="http://www.climateprediction.net/newsb.php?id=4"> a vacancy advertised </a> in mid-November. Are you going to be providing your own computer / compiler? And a lot of users are desperate to get hold of a 64 bit version for AMD and Intel processors if you have the time. Les |
Send message Joined: 28 Jan 05 Posts: 7 Credit: 14,244 RAC: 0 |
The complications in this case relate to having more than one processor act on the same model. You need to divvy up the work amongst the processors to try and keep them all busy at the same time. As it explains in the excellent intro on this site, your typical wx model divides the atmosphere into "cells". You then apply the required physics to those cells and for fun - also add some neighbourly cell interaction. That's the "magic" of a parallel wx model. Mutilple processors each working on their own cell and sharing the results with the cell's neighbours. As I understand it, OpenMOSIX environments works best when the app and the data can be readily separated. Something I suspect might not be case in the "vanilla" CPnet model. I think that maybe we're looking for a smart scheduler. The workflow is not dissimilar to how they render 3DCG movies - multiple compute nodes talking to a central data store. If a node is free it asks the scheduler for a model to work on and then mounts the relevant model's data. That node then "restarts" the model and adds to the data. No biggie if a node bombs... it's the equiv of an unexpected hup. I believe the model app can cope with such problems without too much drama. ...shouldn't require any mods to the core wx model... But then again, I'm sure this idea isn't anything new. Most likely it's either been done or dismissed in this or other BOINC projects. Stevo |
Send message Joined: 28 Jan 05 Posts: 7 Credit: 14,244 RAC: 0 |
Oh and in case you really want to do your head in... Check out PUMA - Portable University Model of the Atmosphere I spent many an enjoyable rainy day watching my martian model shake itself apart =) http://puma.dkrz.de/puma/ Cheers Stevo |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Look at Top Teams, then the team at top of the list, then the person at the top of THAT list, then at his computers. 222 computers. No special progamming, no clusters, just LOTS of machines. Les |
Send message Joined: 28 Jan 05 Posts: 7 Credit: 14,244 RAC: 0 |
Oh yes indeedy. This relates to the thread how? I was looking for technical input on a parallel version of the model. S |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
quote: This relates to the thread how? I was looking for technical input on a parallel version of the model. NO parallel version! At all! Only single computers, a lot of them with HT. Closest you'll get to fast processing is the person with the 3 computers at the top of the Top Hosts list. Servers with 8 HT processors; so, 16 models at a time. Les |
Send message Joined: 15 Nov 04 Posts: 19 Credit: 35,499 RAC: 0 |
> quote > .... does not seem to be such a biggy. > > Over 1 million lines of fortran developed and evolved over 20-30 years by a > long line of > scientists / programmers, and running on a 64 bit supercomputer. And then > ported to desktops. > > If you're comfortable working with that sort of thing, Oxford Uni had <a> href="http://www.climateprediction.net/newsb.php?id=4"> a vacancy advertised > </a> in mid-November. Les Bayliss, if you quote, please try to understand what I write and please do not quote out of context! Thank you. What I quoted and what I was referring to is the following from steve_vmwx: > right about Boinc OOTB. I'm not sure if it'd be worth the complication though. > Might just be as easy to let each node run its own model :/ Steve said that it might not be worth the complication for him to run Boinc on openMosix OOTB, i.e. as it is today. I *specifically* said that even *without* changes to Boinc and/or CPDN to make it run just one model pre cluster instead of one per node, there are some benefits to be had even today and that installing an openMosix cluster does not involve much more than recompilation of the Kernel. This *clearly* relates to benefits vs. costs of switching with what is available software wise ATM. In any case, getting a program to run on openMosix is still much easier than with one of the other cluster types. CPDN can be run on it ATM and OOTB, it just does not yet benefit too much from it. |
Send message Joined: 15 Nov 04 Posts: 19 Credit: 35,499 RAC: 0 |
> As I understand it, OpenMOSIX environments works best when the app and the > data can be readily separated. Where do you infer that from? I am not sure that is the case. Take a look at http://howto.x-tend.be/openMosixWiki/index.php/FAQ, especially "Generally, how do I write an openMosix-aware program?". |
Send message Joined: 28 Jan 05 Posts: 7 Credit: 14,244 RAC: 0 |
Sorry Leggewie, it's been a while since I looked into OpenMOSIX. You're probably right :) I just vaguely recall that there are some apps that use a memory or IO pattern that won't "migrate". If push comes to shove I'll have to do more homework ;) With the source available anybody with the skills can have a go at it. I'm keeping an email "eye" on this thread so we'll see if anyone takes the bait. BTW, *I* got your context re "not a biggie" and agree that giving it a burl isn't a hard ask. Cheers Steve |
Send message Joined: 9 Dec 04 Posts: 3 Credit: 10,123 RAC: 0 |
Having used openmosix in the past, I remember that anything using shared memory doesn't migrate. That includes things like 'X' (the core server) as well as mozilla. Not sure whether boinc would migrate. You'd still be limited to running N separate boinc processes on your 'central' machine and hoping they'd migrate to N other machines. No speedup over logging on to each machine manually and starting boinc on each one. OpenMosix only migrates work to other machines at the process level (not e.g. thread level) as I recall. So since boinc is one process, that still hogs 100% of 1 machine. The only advantage I see (if you can get it to work) is ease of administration - log into a single machine and control many. But then, couldn't you achieve all that with a few 'ssh remotemachine /usr/local/boinc/boinc_xxxx' commands. Will Smith Banbury, UK |
Send message Joined: 28 Jan 05 Posts: 7 Credit: 14,244 RAC: 0 |
Thanks Will. Always good to have confirmation from someone that actually used the thang! (OpenMOSIX). I'll wait for the MPI port ;) Cheers Stevo |
Send message Joined: 15 Nov 04 Posts: 19 Credit: 35,499 RAC: 0 |
> You'd still be limited to running N separate boinc processes on your 'central' > machine and hoping they'd migrate to N other machines. No speedup over > logging on to each machine manually and starting boinc on each one. > > OpenMosix only migrates work to other machines at the process level (not e.g. > thread level) as I recall. So since boinc is one process, that still hogs > 100% of 1 machine. > > The only advantage I see (if you can get it to work) is ease of administration > - log into a single machine and control many. But then, couldn't you achieve > all that with a few 'ssh remotemachine /usr/local/boinc/boinc_xxxx' commands. You made explicit the points I was referring to when saying "CPDN can be run on it ATM and OOTB, it just does not yet benefit too much from it." Your analysis is absolutely correct. There are only few benefits to be had today but it would be great if BOINC/CPDN actually were written such that it spawned several processes. How much work that would involve? I have no idea, but I do know that the change could be gradual. And if it does not happen, then it just does not happen. |
©2024 cpdn.org