Message boards : Number crunching : New CPDN Software Version (5.15) On Site
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
I have put a new version that hopefully fixes some bugs in the crash recovery; this is on both the \"original\" & BBC CPDN sites. You will only get it if you are a new user, or crash/abort and get it. If you have been running the model fine then please continue and don\'t bother abandoning your run to get this new version. This version will also be able to run shorter workunits (whenever we make some!) -- i.e. the 160-year workunits are a bit much so we will probably go to 80-year (1920-2000 and 2000-2080) and even 40-year runs (starting 1960, 2000, 2040) in the future! |
Send message Joined: 16 Aug 04 Posts: 156 Credit: 9,035,872 RAC: 2,928 |
hm, got two 160 years models. when will the shorter ones come out? |
Send message Joined: 1 Sep 04 Posts: 24 Credit: 10,865,773 RAC: 0 |
This version will also be able to run shorter workunits (whenever we make some!) -- i.e. the 160-year workunits are a bit much so we will probably go to 80-year (1920-2000 and 2000-2080) and even 40-year runs (starting 1960, 2000, 2040) in the future! Will it be posible to opt for particular length of model runs? Preferably on per-machine basis... I\'ve got a couple of slightly lower speed machines (2GHz+ P4) which take ages to complete a full 160-year model run. I\'d be more than happy to run CPDN on those, but I\'d prefer to have WUs that take less than 3 months to complete. I\'d run SAP experiment on those, but unfortunately they tend to have less than 1Gig of RAM :( On the other hand I\'m willing to run the long model runs on some other (faster) machines. Metod ... |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
Running the existing 160-year model is effectively just the same as running four sequential 40-year models - at 1960, 2000, and 2040 the model uploads a \'restart dump\' to the server, so you can abort the model after the upload if you need to. Similarly, if you run the long model, but it crashes after 50 model years, the first 40 years will have been picked up can can form the basis for a 1960-2080 run at some point in the future. As well as the restart dump, a climate summary is uploaded at the end of each model year and model decade. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 4 Sep 04 Posts: 61 Credit: 80,585 RAC: 0 |
Running Slackware 10.2 here, I awoke to find that a couple of work units had downloaded and subsequently crashed/aborted. 2006-08-17 08:02:54 [climateprediction.net] Unrecoverable error for result hadcm3lbm_9hxy_25212425_1 (<file_xfer_error> <file_name>hadcm3lbm_9hxy_25212425_1_1.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lbm_9hxy_25212425_1_2.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lbm_9hxy_25212425_1_3.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lbm_9hxy_25212425_1_4.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lbm_9hxy_25212425_1_5.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lbm_9hxy_25212425_1_6.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lbm_9hxy_25212425_1_7.zip</file_name> <error_code>-161</error_code> </file_xfer_error> <file_xfer_error> <file_name>hadcm3lbm_9hxy_25212425_1_8.zip</file_name> <error Application was hadcm3lb 5.15. Got it as an older user who needed a new workunit. The odd thing is that while BOINC\'s stdout indicated ~49-50 min. of run time, the CPU usage is shown as 0. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Slackware seems to be bit of a problem for the program, and I don\'t think that anyone\'s been able to work out why. Keep an eye on this thread though; someone may be able to help. |
Send message Joined: 4 Sep 04 Posts: 61 Credit: 80,585 RAC: 0 |
Okay. Thanks, Les. Are there other models being distributed besides this one? If not, I might as well detach from the project as it doesn\'t seem to make any sense to blow up a whack of workunits in the hope of getting something I can run. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
No, the Coupled Ocean models are IT! The end result towards which the project has been slowly moving all these years. You could try SAP, the Seasonal Attribution Project, but if your computer can\'t handle even the start of a climate model, it probably won\'t do any good there either. But you can at least have a read about it here. It\'s where I and a lot of others went after the special \'spinup\' project finished. One year long, high resolution, and needing in excess of 800Megs of ram to do any good. There are rumors of other \'spinoff\' projects like it in the future; a hi-res model for some part of the world, with normal, lo-res, everywhere else. |
Send message Joined: 4 Sep 04 Posts: 61 Credit: 80,585 RAC: 0 |
Ah, well. I don\'t know what to do then. I\'ve happily crunched other workunits on this box. I\'ll take a look at SAP. Cheers, gang. trane |
Send message Joined: 4 Sep 04 Posts: 61 Credit: 80,585 RAC: 0 |
SAP\'s running fine, at least 6 min. into the workunit. I\'m a happy camper again. :) |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Fingers crossed then. :) One oddity, (mentioned in a couple of threads): When the model gets to 65%-70%, you get a trickle_up every time that the model checkpoints, instead of every 5th checkpoint, although they don\'t show on the server. It\'s just one of those things, so ignore it. |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
One other thing to remember with SAP, the checkpoint interval is quite long (40-50 minutes on my PCs), so it\'s important to have \'keep in memory\' set to yes in the Boinc general preferences. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 25 Nov 05 Posts: 11 Credit: 870,090 RAC: 0 |
Slackware seems to be bit of a problem for the program, and I don\'t think that anyone\'s been able to work out why. It doesn\'t seem to like Damn Small Linux either. Will do more investigating and see if I can find something useful to report. |
Send message Joined: 25 Nov 05 Posts: 11 Credit: 870,090 RAC: 0 |
The models are timing out at 720.0 seconds and rewinding. |
Send message Joined: 4 Sep 04 Posts: 61 Credit: 80,585 RAC: 0 |
Yep, that\'s exactly what I was seeing in my logs, too. I gave up and moved to SAP here: http://attribution.cpdn.org/ Works like a champ on Slackware, so whatever problem plagues this new CPDN model hasn\'t (yet) hit the SAP variation. |
Send message Joined: 25 Nov 05 Posts: 11 Credit: 870,090 RAC: 0 |
Unfortunately thats\'s not an option for me my boxes don\'t have 1gig ram :( I\'ve got it to run on a box with a 2.6.x kernel - were your boxes running 2.4.x kernels? |
Send message Joined: 4 Sep 04 Posts: 61 Credit: 80,585 RAC: 0 |
My box only has 768 megs. It seems fine with SAP. And, yes, I\'m running the 2.4.31 kernel here. That might be an issue. |
Send message Joined: 16 Aug 04 Posts: 156 Credit: 9,035,872 RAC: 2,928 |
It runs fine here with Zenwalk Linux which is based on Slackware, don\'t know the differences though, 2.6.x kernel anyway. I had SAP running fine with 512 Mb memory and Fedora C2, turned off some services and no desktop. Memory usage topped at around 430 Mb if I remember right. |
Send message Joined: 25 Nov 05 Posts: 11 Credit: 870,090 RAC: 0 |
It runs fine here with Zenwalk Linux which is based on Slackware, don\'t know the differences though, 2.6.x kernel anyway. Does anybody have this running successfully on a 2.4.x kernel? |
Send message Joined: 16 Aug 04 Posts: 156 Credit: 9,035,872 RAC: 2,928 |
Yep, it runs fine with Red Hat 8 with a 2.4.x kernel Edit: the Linux client is built on Red Hat 7.3 |
©2024 cpdn.org