climateprediction.net (CPDN) home page
Thread 'What happens if I run out of disk space?'

Thread 'What happens if I run out of disk space?'

Message boards : Number crunching : What happens if I run out of disk space?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 43782 - Posted: 12 Feb 2012, 13:03:04 UTC

I run BOINC client in a partition all its own: 16 Gigabytes, approximately. The BOINC client typically allows three ClimatePrediction applications to run at a time. They tend to run at high priority because the completion time, from the very start, is often far longer than the calendar time. I am not worried about this because I know the results are accepted even if they are late.

But my BOINC manager tells me all kinds of things. The applications I am getting on my main machine use a LOT of disk space. If I let the ClimatePrediction applications complete, the available disk space is about 90%, but an individual application can take upt to 10 GBytes or so. This is amazing. If all three applications did that at the same time, there would be no more space. Would one or more applications crash, or does the boinc client arrange not to schedule these until space is available? Or what?

I am running Red Hat Enterprise Linux 5 on a dual hyperthreaded Xeon (4 logical processors).
ID: 43782 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 43783 - Posted: 12 Feb 2012, 14:43:48 UTC - in response to Message 43782.  

If you look at your list of models, the regional models are failing.
There are 2 sticky posts at the top of the Linux section which may apply:
Here &
Here

As for the amount of disk space used, it'll be because crashed models don't clear up after themselves. You need to look at the names of the models currently in the Tasks tab, and compare these with the folder names that are under ...\projects\climateprediction.net.
Then manually delete everything that's NOT current.

Used space should be about a gig per model, plus some space for the programs.

As to what happens when the disk fills up: Everything will crash from then on.


Backups: Here
ID: 43783 · Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 300
Credit: 3,288,263
RAC: 26,370
Message 43784 - Posted: 12 Feb 2012, 16:09:11 UTC

You should check the stderr_um.txt file for any error messages.

Also, this thread may be relevant:
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=6901
ID: 43784 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 43927 - Posted: 6 Mar 2012, 3:17:32 UTC - in response to Message 43783.  

Both my machines are 32-bit as far as computing is concerned. The processors on the big one are PAE, so I can use my 8 GBytes RAM, but no process can see over 32-bits worth of addresses.

I thought I had turned off the regional modals a long time ago, so I should not be getting any. Looks like they are off.

As far as disk space usage, it seems to be like this:

trillian:boinc[~/BOINC/projects/climateprediction.net]$ du . | sort -nr
10853696 .
4587540 ./hadcm3n_y8a0_1980_40_007618581
4336408 ./hadcm3n_y8a0_1980_40_007618581/dataout
3962520 ./hadcm3n_ye8q_1940_40_007615112
3711388 ./hadcm3n_ye8q_1940_40_007615112/dataout
1957320 ./hadcm3n_u028_1980_40_007693499
1706176 ./hadcm3n_u028_1980_40_007693499/dataout
250664 ./hadcm3n_ye8q_1940_40_007615112/datain
250664 ./hadcm3n_y8a0_1980_40_007618581/datain
250664 ./hadcm3n_u028_1980_40_007693499/datain
146164 ./hadcm3n_ye8q_1940_40_007615112/datain/masks
146164 ./hadcm3n_y8a0_1980_40_007618581/datain/masks
146164 ./hadcm3n_u028_1980_40_007693499/datain/masks
71228 ./hadcm3n_ye8q_1940_40_007615112/datain/dumps
71228 ./hadcm3n_y8a0_1980_40_007618581/datain/dumps
71228 ./hadcm3n_u028_1980_40_007693499/datain/dumps
33224 ./hadcm3n_ye8q_1940_40_007615112/datain/ancil
33224 ./hadcm3n_y8a0_1980_40_007618581/datain/ancil
33224 ./hadcm3n_u028_1980_40_007693499/datain/ancil
2124 ./txf
2096 ./gfx
620 ./hadcm3n_ye8q_1940_40_007615112/datain/ancil/ctldata
620 ./hadcm3n_y8a0_1980_40_007618581/datain/ancil/ctldata
620 ./hadcm3n_u028_1980_40_007693499/datain/ancil/ctldata
532 ./hadcm3n_ye8q_1940_40_007615112/datain/ancil/ctldata/STASHmaster
532 ./hadcm3n_y8a0_1980_40_007618581/datain/ancil/ctldata/STASHmaster
532 ./hadcm3n_u028_1980_40_007693499/datain/ancil/ctldata/STASHmaster
348 ./hadcm3n_u028_1980_40_007693499/jobs
340 ./hadcm3n_ye8q_1940_40_007615112/jobs
340 ./hadcm3n_y8a0_1980_40_007618581/jobs
84 ./hadcm3n_ye8q_1940_40_007615112/datain/ancil/ctldata/stasets
84 ./hadcm3n_y8a0_1980_40_007618581/datain/ancil/ctldata/stasets
84 ./hadcm3n_u028_1980_40_007693499/datain/ancil/ctldata/stasets
48 ./hadcm3n_u028_1980_40_007693499/tmp
44 ./hadcm3n_ye8q_1940_40_007615112/tmp
44 ./hadcm3n_y8a0_1980_40_007618581/tmp
16 ./txf/CVS

And these three are the programs currently executing. Looks like the dataout files are the problem.


ID: 43927 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 43990 - Posted: 10 Apr 2012, 1:41:48 UTC - in response to Message 43783.  

As for the amount of disk space used, it'll be because crashed models don't clear up after themselves. You need to look at the names of the models currently in the Tasks tab, and compare these with the folder names that are under ...\projects\climateprediction.net.
Then manually delete everything that's NOT current.

Used space should be about a gig per model, plus some space for the programs.

As to what happens when the disk fills up: Everything will crash from then on.


That did not seem to be the case. I terminated all three models that were running, deleted everything in projects/climateprediction.net, and started over. I did that because it was just about to overflow. It downloaded me three new models that are now running and they seem to be taking up 2 gigabytes each in directories such as ~/BOINC/projects/climateprediction.net/hadcm3n_ydi1_1980_40_007832936/dataout and they are only about 20% complete. the dataout directories are full of large files.

You should check the stderr_um.txt file for any error messages.


The stderr_um_text files are all 0 bytes.

ID: 43990 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 43992 - Posted: 10 Apr 2012, 3:17:36 UTC - in response to Message 43990.  

hadcm3 models can build up files to a bit over 1 Gig each. The data gets zipped and uploaded to the project servers every 25% of the way through.

Don't touch the files in the data out directory! That's the result of all of the crunching so far!


Backups: Here
ID: 43992 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 44007 - Posted: 12 Apr 2012, 22:25:27 UTC - in response to Message 43992.  

hadcm3 models can build up files to a bit over 1 Gig each. The data gets zipped and uploaded to the project servers every 25% of the way through.

Don't touch the files in the data out directory! That's the result of all of the crunching so far!


Well, I already deleted it. The boinc system was just about to run out of disk space. Saying that the files build up to a bit over a gig each is a big understatement. On my machine, I have a partition that is about 16 gigabytes exclusively for boinc projects, and most tasks take small amounts of space. World community grid is the second biggest user and all its tasks together take less than one gig. Climate Prediction is the largest. Typically there are three c.p. tasks running (I have 4 processors), and each one is at about 25% completion and they take about 5.17 GBytes total already. They were taking about 4 Gigabytes each when the system almost ran out of space and I cancelled them out.

This is how it is at the moment.
5794316 ./projects
5396096 ./projects/climateprediction.net
1924300 ./projects/climateprediction.net/hadcm3n_ydi1_1980_40_007832936
1863836 ./projects/climateprediction.net/hadcm3n_o34p_1980_40_007833299
1673168 ./projects/climateprediction.net/hadcm3n_ydi1_1980_40_007832936/dataout
1612704 ./projects/climateprediction.net/hadcm3n_o34p_1980_40_007833299/dataout
1444612 ./projects/climateprediction.net/hadcm3n_yiry_1980_40_007833065
1193480 ./projects/climateprediction.net/hadcm3n_yiry_1980_40_007833065/dataout

Is it typical that users run Climate Prediction in even larger partitions? How big a partition is really required? Your estimate of about one gig per task seems far too small.

ID: 44007 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 44008 - Posted: 12 Apr 2012, 22:59:06 UTC - in response to Message 44007.  

My BOINC partitions are 10 GIGs.
This includes both this main site, and our beta test site, and I have many versions of programs on that part that have been tested over the years.

Currently:

machine 1:
Total in use: 2.64 Gigs

the model's folders:
836 Megs for a hadcm3n model
499 Megs for a hadam3p model

and 633 Megs for the beta folders
The rest is common files

machine 2:

Total in use: 3.0 Gigs
793 Megs for one hadcm3n model
782 Megs for a 2nd hadcm3n model

and 693 Megs for the beta folders
The rest is common files

----------------

I uploaded the first lot of zip files a few days ago, which is why each model's folder size is under a gig, but before that one of them was at about 1.3 Gigs.

Unless the Linux version is a lot different, your set up is STRANGE.

25% is the point at which the files are zipped up and sent back to the project. After which the folder size should drop.



Backups: Here
ID: 44008 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 44009 - Posted: 12 Apr 2012, 23:27:32 UTC - in response to Message 44008.  

I uploaded the first lot of zip files a few days ago, which is why each model's folder size is under a gig, but before that one of them was at about 1.3 Gigs.

Unless the Linux version is a lot different, your set up is STRANGE.

25% is the point at which the files are zipped up and sent back to the project. After which the folder size should drop.


When I was watching the sizes of the various boinc processes, I did notice that c.p. did drop from time-to-time and ranged, IIRC, from about 4 GBytes to 12 GBytes for c.p. alone. But the last three tasks were all growing steadily together and when things were getting close to totally full in that partition, I started it over. So it seems to be running "normally" other than taking what seems to be an unreasonable amount of disk space.

What do you suppose is STRANGE about my setup? I have a 16 GByte partition for boinc. I do not suppose that is strange. What is in there is what the c.p. put in there. The entire directory structure for Climate Prediction I emptied out about a month ago because someone here said there was probably a lot of leftover stuff from jobs that terminated strangely. I found none, but deleted everything to be sure, and started over. The error files for c.p. are 0 bytes, so c.p. is not aware of anything unusual.
ID: 44009 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 44010 - Posted: 13 Apr 2012, 4:42:27 UTC - in response to Message 44009.  

I'll have a talk with the other moderators about your large file sizes, and see if anyone knows anything.


Backups: Here
ID: 44010 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 44011 - Posted: 13 Apr 2012, 6:56:16 UTC - in response to Message 44010.  

On my linux system, the current total space used by Project directory is 2GB that is with 3 HADAM3P tasks, 1 running, 1 waiting to start and 1 suspended to allow a HADAM3CN task to run. I have seen usage go up as high as 4.6GB when running only HADAM3CN tasks but I think that may have been exacerbated by one of the servers being down at the time. I haven't seen it go up that high for a while.

Dave
ID: 44011 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 44012 - Posted: 13 Apr 2012, 8:22:24 UTC - in response to Message 44011.  

Or it may have been when I had some crashed task files around.
ID: 44012 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 44015 - Posted: 13 Apr 2012, 11:19:46 UTC - in response to Message 44010.  

I'll have a talk with the other moderators about your large file sizes, and see if anyone knows anything.


Thank you. I would hate to have to quit c.p. because of this issue. It is the most important BOINC project I run. Maybe if I could trick it into downloading only one task at a time, or perhaps two, I would not run out of space.
ID: 44015 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 44017 - Posted: 14 Apr 2012, 17:15:11 UTC - in response to Message 44011.  

Well, in about two days, mine is now this:

6077656 ./projects
5674288 ./projects/climateprediction.net
2046624 ./projects/climateprediction.net/hadcm3n_ydi1_1980_40_007832936
1967412 ./projects/climateprediction.net/hadcm3n_o34p_1980_40_007833299
1795492 ./projects/climateprediction.net/hadcm3n_ydi1_1980_40_007832936/dataout
1716280 ./projects/climateprediction.net/hadcm3n_o34p_1980_40_007833299/dataout
1496904 ./projects/climateprediction.net/hadcm3n_yiry_1980_40_007833065
1245772 ./projects/climateprediction.net/hadcm3n_yiry_1980_40_007833065/dataout

These are all hadcm3n tasks and all are running. Two are "high priority". Actually, they all should be because they are close to not completing on time. They are due June 18 and have over 1300 hours each to complete.

I have no crashed files around. And it looks like no errors.

ls -l hadcm3n_o34p_1980_40_007833299/stderr_um.txt hadcm3n_ydi1_1980_40_007832936/stderr_um.txt hadcm3n_yiry_1980_40_007833065/stderr_um.txt hadcm3n_yiry_1980_40_007833065/dataout/stderr_um.txt
-rw-r--r-- 1 boinc boinc 0 Mar 19 00:35 hadcm3n_o34p_1980_40_007833299/stderr_um.txt
-rw-r--r-- 1 boinc boinc 0 Mar 18 23:35 hadcm3n_ydi1_1980_40_007832936/stderr_um.txt
-rw-r--r-- 1 boinc boinc 0 Mar 19 05:03 hadcm3n_yiry_1980_40_007833065/dataout/stderr_um.txt
-rw-r--r-- 1 boinc boinc 0 Mar 19 05:03 hadcm3n_yiry_1980_40_007833065/stderr_um.txt

ID: 44017 · Report as offensive     Reply Quote
Belfry

Send message
Joined: 19 Apr 08
Posts: 179
Credit: 4,306,992
RAC: 0
Message 44018 - Posted: 14 Apr 2012, 21:58:01 UTC
Last modified: 14 Apr 2012, 22:02:39 UTC

Jean-David, I'm not sure if this is impacting you, but have you seen this sticky about issues running RHEL 5 and its derivatives?

Edit: I see Les already linked to it... nevermind.
ID: 44018 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 44019 - Posted: 15 Apr 2012, 0:34:47 UTC - in response to Message 44018.  
Last modified: 15 Apr 2012, 0:36:27 UTC

Yes, I saw that. The problem described, "It appears that the hadam3p regional models (EU, SAF, PNW) don't play well with RHEL/CentOS/Scientific Linux 5 and crash immediately." is not occurring because I told the server not to send me those. The actual problem with those is that one of the system libraries is too old for the builds so they crash as soon as they call one of those librarires. Red Hat support each release for 7 years (recently increased to 10 years) so people need not upgrade unless they want new features. They reverse port all bug fixes and security fixes. But many BOINC applications assume you are running the latest and greatest (or nearly so) and do not allow for long-lived stable distributions.

But as I said, I do not accept jobs of the hadam3p variety.

Run only the selected applications
UK Met Office HadSM3 Slab Model: yes
UK Met Office HadCM3L Coupled Model: yes
UK Met Office HadAM3: yes
UK Met Office HadSM3 Mid-Holocene: yes
UK Met Office HadAM3P: no
UK Met Office FAMOUS: yes
UK Met Office HadAM3P European Region: no
UK Met Office HadAM3P Southern Africa: no
UK Met Office HadAM3P Pacific North West: no
UK Met Office HadCM3 Coupled Model Full Resolution Ocean: yes
ID: 44019 · Report as offensive     Reply Quote
Belfry

Send message
Joined: 19 Apr 08
Posts: 179
Credit: 4,306,992
RAC: 0
Message 44020 - Posted: 15 Apr 2012, 3:05:16 UTC
Last modified: 15 Apr 2012, 3:33:07 UTC

When the CPDN developers setup the hadam3p's they used an older distribution, but in trying to target the largest set of users they chose the most popular distribution, Ubuntu, which still had a more recent kernel and libraries than the oldest supported Red Hat. Hopefully in the future they'll use an older Scientific (a free distribution which releases synchronously with Red Hat Enterpise) as the development system.

Actually there were some reported problems with the RHEL 5 libstdc++6 and compression with hadcm3n. Could the source of your disk room troubles be related to bad compression? If you have administrative privileges on your machine you can get an RPM from Red Hat which will install a later version of libstdc++6. If not you can try the link mentioned in the sticky which points to an ingenious workaround from an unusually dedicated member.
ID: 44020 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 44021 - Posted: 15 Apr 2012, 3:50:56 UTC - in response to Message 44020.  

The question under consideration in this thread, is: Why are the hadcm3n models on this Linux system taking up approx 3 times the space as on my Windows system?

There's been no reply to my question on the other board, and the only 2 things that I can think of are, that the zips aren't being created, or that they're being created, and then not uploaded.

Jean
Are there message lines, either in Messages, or in the stdoutdae.txt file, which say something like: Started upload of hadcm3_a009_1859_10_000258667_0_1.zip?

And while I'm posting:
1) There's NO deadline in this project. The one that people keep quoting is just an artificial one that's a requirement of the BOINC system.

2) It's been posted many times that the hadcm3 models are very competitive for the FPU, and slow each other down A LOT. Only having half the number of these as there are processor cores is a good rule of thumb. Otherwise, they go into high priority running a lot of the time.

3) The reason that the project moved on to newer versions of library files, is that the University department for which our programmers work, has moved on to newer compilers, which no long support the older libraries.


Backups: Here
ID: 44021 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 44022 - Posted: 15 Apr 2012, 12:01:49 UTC - in response to Message 44021.  

This is a real puzzler. With various recent Ubuntu versions my hadcm3n subdirs in the BOINC/projects/climateprediction.net folder are about 1.2-1.7Gig depending on how far the models have progessed and the uploads. Using ext3 or ext4 filesystem.
Is it possibly something to to with the particular filesystem or filesystem parameters?
Is this an ext{2,3,4} filesystem or something else like xfs zfs ?
It's remotely possible that if the filesystem was created with very large blocks or chunks that there is wasted space with the smaller files.
This is just a guess at a remote possibility.
ID: 44022 · Report as offensive     Reply Quote
Belfry

Send message
Joined: 19 Apr 08
Posts: 179
Credit: 4,306,992
RAC: 0
Message 44023 - Posted: 15 Apr 2012, 14:28:12 UTC
Last modified: 15 Apr 2012, 14:30:24 UTC

(Eirik, how's the weather in Northfield?)

I have three hadcm3n's now and their directory sizes are:

1391436 projects/climateprediction.net/hadcm3n_o1t6_1980_40_007833447 (96.8% complete)
1193884 projects/climateprediction.net/hadcm3n_2056_1940_40_007858548 (39.7%)
1059632 projects/climateprediction.net/hadcm3n_o3f2_2020_40_007857560 (17.4%)

Ext3 partition running 64-bit Ubuntu 10.04 here. On this four-core machine that I've been running CPDN on for nearly three years, I've never seen my total CPDN disk usage go above 6 Gigs, even when I dedicate all the cores to CPDN.
ID: 44023 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : What happens if I run out of disk space?

©2024 cpdn.org