climateprediction.net (CPDN) home page
Thread 'New CPDN Software Version (5.15) On Site'

Thread 'New CPDN Software Version (5.15) On Site'

Message boards : Number crunching : New CPDN Software Version (5.15) On Site
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
old_user1
Avatar

Send message
Joined: 5 Aug 04
Posts: 907
Credit: 299,864
RAC: 0
Message 23816 - Posted: 1 Aug 2006, 11:30:32 UTC

I have put a new version that hopefully fixes some bugs in the crash recovery; this is on both the \"original\" & BBC CPDN sites. You will only get it if you are a new user, or crash/abort and get it. If you have been running the model fine then please continue and don\'t bother abandoning your run to get this new version.

This version will also be able to run shorter workunits (whenever we make some!) -- i.e. the 160-year workunits are a bit much so we will probably go to 80-year (1920-2000 and 2000-2080) and even 40-year runs (starting 1960, 2000, 2040) in the future!
ID: 23816 · Report as offensive     Reply Quote
Helmer Bryd

Send message
Joined: 16 Aug 04
Posts: 156
Credit: 9,035,872
RAC: 2,928
Message 23820 - Posted: 1 Aug 2006, 22:45:52 UTC
Last modified: 1 Aug 2006, 22:49:38 UTC

hm, got two 160 years models. when will the shorter ones come out?
ID: 23820 · Report as offensive     Reply Quote
old_user8065

Send message
Joined: 1 Sep 04
Posts: 24
Credit: 10,865,773
RAC: 0
Message 23901 - Posted: 11 Aug 2006, 9:14:02 UTC - in response to Message 23816.  
Last modified: 11 Aug 2006, 9:14:54 UTC

This version will also be able to run shorter workunits (whenever we make some!) -- i.e. the 160-year workunits are a bit much so we will probably go to 80-year (1920-2000 and 2000-2080) and even 40-year runs (starting 1960, 2000, 2040) in the future!


Will it be posible to opt for particular length of model runs? Preferably on per-machine basis...

I\'ve got a couple of slightly lower speed machines (2GHz+ P4) which take ages to complete a full 160-year model run. I\'d be more than happy to run CPDN on those, but I\'d prefer to have WUs that take less than 3 months to complete. I\'d run SAP experiment on those, but unfortunately they tend to have less than 1Gig of RAM :(

On the other hand I\'m willing to run the long model runs on some other (faster) machines.

Metod ...
ID: 23901 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 23936 - Posted: 14 Aug 2006, 22:24:17 UTC

Running the existing 160-year model is effectively just the same as running four sequential 40-year models - at 1960, 2000, and 2040 the model uploads a \'restart dump\' to the server, so you can abort the model after the upload if you need to.

Similarly, if you run the long model, but it crashes after 50 model years, the first 40 years will have been picked up can can form the basis for a 1960-2080 run at some point in the future. As well as the restart dump, a climate summary is uploaded at the end of each model year and model decade.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 23936 · Report as offensive     Reply Quote
Profileold_user11965

Send message
Joined: 4 Sep 04
Posts: 61
Credit: 80,585
RAC: 0
Message 23967 - Posted: 17 Aug 2006, 1:16:30 UTC

Running Slackware 10.2 here, I awoke to find that a couple of work units had downloaded and subsequently crashed/aborted.

2006-08-17 08:02:54 [climateprediction.net] Unrecoverable error for result hadcm3lbm_9hxy_25212425_1 (<file_xfer_error>
  <file_name>hadcm3lbm_9hxy_25212425_1_1.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadcm3lbm_9hxy_25212425_1_2.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadcm3lbm_9hxy_25212425_1_3.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadcm3lbm_9hxy_25212425_1_4.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadcm3lbm_9hxy_25212425_1_5.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadcm3lbm_9hxy_25212425_1_6.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadcm3lbm_9hxy_25212425_1_7.zip</file_name>
  <error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>hadcm3lbm_9hxy_25212425_1_8.zip</file_name>
  <error


Application was hadcm3lb 5.15. Got it as an older user who needed a new workunit.

The odd thing is that while BOINC\'s stdout indicated ~49-50 min. of run time, the CPU usage is shown as 0.
ID: 23967 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 23968 - Posted: 17 Aug 2006, 2:02:25 UTC
Last modified: 17 Aug 2006, 2:04:59 UTC

Slackware seems to be bit of a problem for the program, and I don\'t think that anyone\'s been able to work out why.
Keep an eye on this thread though; someone may be able to help.


ID: 23968 · Report as offensive     Reply Quote
Profileold_user11965

Send message
Joined: 4 Sep 04
Posts: 61
Credit: 80,585
RAC: 0
Message 23969 - Posted: 17 Aug 2006, 2:31:00 UTC

Okay. Thanks, Les.

Are there other models being distributed besides this one? If not, I might as well detach from the project as it doesn\'t seem to make any sense to blow up a whack of workunits in the hope of getting something I can run.
ID: 23969 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 23970 - Posted: 17 Aug 2006, 2:55:21 UTC

No, the Coupled Ocean models are IT! The end result towards which the project has been slowly moving all these years.

You could try SAP, the Seasonal Attribution Project, but if your computer can\'t handle even the start of a climate model, it probably won\'t do any good there either.

But you can at least have a read about it here.
It\'s where I and a lot of others went after the special \'spinup\' project finished.
One year long, high resolution, and needing in excess of 800Megs of ram to do any good.

There are rumors of other \'spinoff\' projects like it in the future; a hi-res model for some part of the world, with normal, lo-res, everywhere else.

ID: 23970 · Report as offensive     Reply Quote
Profileold_user11965

Send message
Joined: 4 Sep 04
Posts: 61
Credit: 80,585
RAC: 0
Message 23971 - Posted: 17 Aug 2006, 3:05:52 UTC

Ah, well. I don\'t know what to do then. I\'ve happily crunched other workunits on this box. I\'ll take a look at SAP.

Cheers, gang.

trane

ID: 23971 · Report as offensive     Reply Quote
Profileold_user11965

Send message
Joined: 4 Sep 04
Posts: 61
Credit: 80,585
RAC: 0
Message 23972 - Posted: 17 Aug 2006, 3:20:28 UTC

SAP\'s running fine, at least 6 min. into the workunit. I\'m a happy camper again. :)
ID: 23972 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 23974 - Posted: 17 Aug 2006, 5:05:15 UTC

Fingers crossed then. :)

One oddity, (mentioned in a couple of threads):
When the model gets to 65%-70%, you get a trickle_up every time that the model checkpoints, instead of every 5th checkpoint, although they don\'t show on the server.
It\'s just one of those things, so ignore it.

ID: 23974 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 23980 - Posted: 17 Aug 2006, 8:02:18 UTC

One other thing to remember with SAP, the checkpoint interval is quite long (40-50 minutes on my PCs), so it\'s important to have \'keep in memory\' set to yes in the Boinc general preferences.
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 23980 · Report as offensive     Reply Quote
Profileold_user116389
Avatar

Send message
Joined: 25 Nov 05
Posts: 11
Credit: 870,090
RAC: 0
Message 24300 - Posted: 16 Sep 2006, 1:06:16 UTC - in response to Message 23968.  

Slackware seems to be bit of a problem for the program, and I don\'t think that anyone\'s been able to work out why.
Keep an eye on this thread though; someone may be able to help.




It doesn\'t seem to like Damn Small Linux either. Will do more investigating and see if I can find something useful to report.
ID: 24300 · Report as offensive     Reply Quote
Profileold_user116389
Avatar

Send message
Joined: 25 Nov 05
Posts: 11
Credit: 870,090
RAC: 0
Message 24303 - Posted: 16 Sep 2006, 3:17:51 UTC - in response to Message 24300.  



It doesn\'t seem to like Damn Small Linux either. Will do more investigating and see if I can find something useful to report.


The models are timing out at 720.0 seconds and rewinding.
ID: 24303 · Report as offensive     Reply Quote
Profileold_user11965

Send message
Joined: 4 Sep 04
Posts: 61
Credit: 80,585
RAC: 0
Message 24306 - Posted: 16 Sep 2006, 7:21:29 UTC - in response to Message 24303.  



It doesn\'t seem to like Damn Small Linux either. Will do more investigating and see if I can find something useful to report.


The models are timing out at 720.0 seconds and rewinding.


Yep, that\'s exactly what I was seeing in my logs, too. I gave up and moved to SAP here: http://attribution.cpdn.org/

Works like a champ on Slackware, so whatever problem plagues this new CPDN model hasn\'t (yet) hit the SAP variation.
ID: 24306 · Report as offensive     Reply Quote
Profileold_user116389
Avatar

Send message
Joined: 25 Nov 05
Posts: 11
Credit: 870,090
RAC: 0
Message 24327 - Posted: 17 Sep 2006, 10:25:37 UTC - in response to Message 24306.  



Yep, that\'s exactly what I was seeing in my logs, too. I gave up and moved to SAP here: http://attribution.cpdn.org/

Works like a champ on Slackware, so whatever problem plagues this new CPDN model hasn\'t (yet) hit the SAP variation.


Unfortunately thats\'s not an option for me my boxes don\'t have 1gig ram :(

I\'ve got it to run on a box with a 2.6.x kernel - were your boxes running 2.4.x kernels?
ID: 24327 · Report as offensive     Reply Quote
Profileold_user11965

Send message
Joined: 4 Sep 04
Posts: 61
Credit: 80,585
RAC: 0
Message 24329 - Posted: 17 Sep 2006, 16:14:08 UTC

My box only has 768 megs. It seems fine with SAP. And, yes, I\'m running the 2.4.31 kernel here. That might be an issue.
ID: 24329 · Report as offensive     Reply Quote
Helmer Bryd

Send message
Joined: 16 Aug 04
Posts: 156
Credit: 9,035,872
RAC: 2,928
Message 24334 - Posted: 18 Sep 2006, 1:16:03 UTC

It runs fine here with Zenwalk Linux which is based on Slackware, don\'t know the differences though, 2.6.x kernel anyway.

I had SAP running fine with 512 Mb memory and Fedora C2, turned off some services and no desktop. Memory usage topped at around 430 Mb if I remember right.
ID: 24334 · Report as offensive     Reply Quote
Profileold_user116389
Avatar

Send message
Joined: 25 Nov 05
Posts: 11
Credit: 870,090
RAC: 0
Message 24341 - Posted: 18 Sep 2006, 14:42:22 UTC - in response to Message 24334.  

It runs fine here with Zenwalk Linux which is based on Slackware, don\'t know the differences though, 2.6.x kernel anyway.

I had SAP running fine with 512 Mb memory and Fedora C2, turned off some services and no desktop. Memory usage topped at around 430 Mb if I remember right.


Does anybody have this running successfully on a 2.4.x kernel?
ID: 24341 · Report as offensive     Reply Quote
Helmer Bryd

Send message
Joined: 16 Aug 04
Posts: 156
Credit: 9,035,872
RAC: 2,928
Message 24342 - Posted: 18 Sep 2006, 17:48:10 UTC - in response to Message 24341.  
Last modified: 18 Sep 2006, 18:00:01 UTC


Does anybody have this running successfully on a 2.4.x kernel?

Yep, it runs fine with Red Hat 8 with a 2.4.x kernel

Edit: the Linux client is built on Red Hat 7.3
ID: 24342 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : New CPDN Software Version (5.15) On Site

©2024 cpdn.org