climateprediction.net (CPDN) home page
Thread 'Workunit uses lots of disk space+restart/recover the same task'

Thread 'Workunit uses lots of disk space+restart/recover the same task'

Questions and Answers : Unix/Linux : Workunit uses lots of disk space+restart/recover the same task
Message board moderation

To post messages, you must log in.

AuthorMessage
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 51473 - Posted: 26 Feb 2015, 16:55:31 UTC

Dear all,

Recently I resumed my computer contribution to CPDN, so I'm still a bit rusty and need some refreshing. I It would be great if one can help me.

One of the problems I'm encountering is that my current model UK Met Office HadAM3P (global only) with MOSES II landsurface scheme link to WUI http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=9546200 at 6% progress occupies 3.1 GB hard disk space in hadam3pm2_k243_1959_10_009463966/dataout
Shouldn't be much less?

Update: While writing this I accidentally deleted the model directory and BOINC exited with error when tried to write at CPU checkpoint. I recovered the folder, but it seems I cannot restart the same task. Can I? Any suggestions? Should I simply deleted all files and start new task?

Cheers

ID: 51473 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 51474 - Posted: 26 Feb 2015, 18:37:42 UTC - in response to Message 51473.  

I'm not sure on the typical size of these folders as I am away from my computers now. However, the model folders do get big.

Unfortunately you cannot recover your model despite restoring that folder. Your client_state.xml file no longer contains info about it since boinc thought that task errored out. Go ahead and delete that folder and start a new task.
ID: 51474 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 51475 - Posted: 26 Feb 2015, 18:46:45 UTC - in response to Message 51474.  
Last modified: 26 Feb 2015, 18:48:20 UTC

Thanks. Let's see how it goes. My other Linux machine errored out earlier on another hadam3p model.
ID: 51475 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 51476 - Posted: 26 Feb 2015, 19:46:23 UTC - in response to Message 51475.  

If you don't like the idea of errors, I'd go run the hadcm3s models as opposed to the MOSES ones. The MOSES ones hate to be interrupted for any reason. Which kind of goes against the idea of distributed computing running science apps when your computer is otherwise idle.

ID: 51476 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 51477 - Posted: 26 Feb 2015, 20:16:29 UTC

And that model that you linked to had lots of Suspends.

See my post here for comments about that.

ID: 51477 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 51479 - Posted: 26 Feb 2015, 21:54:57 UTC - in response to Message 51477.  

Thanks, but still I use these machines and I need to shut them down almost every day. So I wait for CPU checkpoint and then suspend, then exit and shut down. I mean I can't leave the machine >450h CPU at 100% to finish uninterrupted hadma3p models?! On a laptop CPUs running at 100% all the time are way too hot and I try to give CPDN some CPU computing while I'm working as not much idle time on these machines. I read some time ago that there is no way to make some of these models smaller and less error prone. But if I manage to complete less than 20% of tasks, then 80% of the computing time is just lost.

Nevertheless I will set some of the preferences as suggested.

Thanks mates
ID: 51479 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : Workunit uses lots of disk space+restart/recover the same task

©2024 cpdn.org