|
Questions and Answers : Getting started : Cluster startup: no more work units
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Sep 07 Posts: 1 Credit: 215,245 RAC: 0 |
Hi all: I\'m bringing a new computational cluster online, and I wanted to \"stress-test\" it by running boinc/climateprediction.net on it for a while. We have 128 processors on this cluster, so I just set it up to run climateprediction on all processors. It got up to about 37 before all of the remaining procesors are returning: Fri 28 Sep 2007 11:26:22 AM PDT|climateprediction.net|Requesting 60480 seconds of new work Fri 28 Sep 2007 11:26:27 AM PDT|climateprediction.net|Scheduler RPC succeeded [server version 509] Fri 28 Sep 2007 11:26:27 AM PDT|climateprediction.net|Message from server: No work sent Fri 28 Sep 2007 11:26:27 AM PDT|climateprediction.net|Deferring communication for 44 min 36 sec Fri 28 Sep 2007 11:26:27 AM PDT|climateprediction.net|Reason: no work from project My tests require that I fully load the cluster...what can I do? I have 4 nodes (8 cores/node) fully loaded, and a few more with only a single thread. Is there some setting I can set to allow more work units? Did I really empty the work unit queue for the project? Thanks! --Jim Linux Cluster running ROCKS 24 nodes with 2 quad-core 2.33Ghz cpus each (This is for the Lab for Atmospheric Research when it goes fully online, so I thought it appropriate to test with climateprediction.net) |
![]() ![]() Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
Hi, The Server Status page says that there is still more work to give out. You might be able to get more models on the following day, and so forth (I think there is there is a limit to the number of models you can pick up per 24 hour period on a given host). You can also pick up more work units by connecting to the SAP project as well (http://attribution.cpdn.org), SAP is a shorter duration but higher resolution model. Takes around 450MB of ram per model, and should take around 11 days to run on the Quad Xeon. They have a limited number of work units remaining, so please only attach to it if you plan to run the models to completion. I\'m not sure if APS has work to give out, but if they do, connect to http://apsathome.org/ - APS is hosted at the University of Manchester and they run atmospheric physics jobs (typically lasting 30 minutes to 5 hours each). Looks like an interesting machine... how long are you stress testing it, is it long enough to complete the models, or will they be discarded? (if the latter, running shorter work units would be better). I prefer to use Prime95\'s torture test for short-term stress testing, the reason being that the climate projects try to silently recover from errors, but the GIMPS project has a mode specifically designed to tell the user if a floating point error occurred. My linux host is running mprime right now to check whether the PC is stable enough to run the climate project. Note that you\'ll need to use the -An flag to assign each job to a specific core (n = 0-7), then select \'N\' for \'join gimps\', \'16\' for \'set CPU options\' (800 for 800MB per job), and finally \'18\' for \'torture test\'. I'm a volunteer and my views are my own. News and Announcements and FAQ |
©2025 cpdn.org