climateprediction.net (CPDN) home page
Thread 'CAM25 - "Disk usage limit exceeded"'

Thread 'CAM25 - "Disk usage limit exceeded"'

Message boards : Number crunching : CAM25 - "Disk usage limit exceeded"
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,808,726
RAC: 5,192
Message 57540 - Posted: 31 Dec 2017, 14:32:46 UTC

A remote machine that was off-line for the entire run of a CAM25 model accumulated almost 2 GB of upload files. The model ran to completion, including generating the restart file and the "out" file. However, when the machine was put online it immediately crashed with a "Disk usage limit exceeded" error.

The client state XML file has some headroom for the individual uploads (150 MB vs ~100 MB); the restart file is 90 MB. There is plenty of free disk space (>100 GB) and the BOINC Manager settings are default, so there should be no limits from that (BM 7.6.33).

Googling throws up some reports of errors on other projects related to virus checkers. But, if that is the case, why only at the end and why for that model and not others that ran online? Or is there some other gross upload setting that needs to be bumped for this very large model?
ID: 57540 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,013,957
RAC: 21,195
Message 57541 - Posted: 31 Dec 2017, 14:52:17 UTC - in response to Message 57540.  

Interesting,
I tend to only run models offline when I want to look at the file sizes etc. before they upload. I wonder if it is some obscure limit imposed by BOINC such as the limit on the clientstate.xml that caused problems before 7.6.33?
ID: 57541 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,808,726
RAC: 5,192
Message 57542 - Posted: 31 Dec 2017, 18:59:22 UTC

This one's out in the country with dodgy broadband. I was indeed running it offline to check the file sizes, but when contact is lost to that machine then both remote access clients I use can't get back in until a human being intervenes (in this case me on Boxing Day) - so it sat there crunching until the end. Usually a crash causes the files to be deleted but the files were still there: it was only when network access was enabled that the crash occurred and the files were deleted - almost a day after the model had actually finished.
ID: 57542 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 57543 - Posted: 1 Jan 2018, 6:37:11 UTC - in response to Message 57542.  

This is a result of a setting in client_state.xml. On the dev site for d801, Ian said the following when I got a disk limit error:

@george16 @sihanli1 The disk limit causing that is the one set in the workunit's <rsc_disk_bound> setting. For all of my WAH2 tasks on the beta and main projects it's set to

<rsc_disk_bound>2000000000.000000</rsc_disk_bound>


This size limit is exceeded if too many upload files get queued up for the cam25 model. Obviously they didn't up this bound when the problem was reported.
ID: 57543 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,808,726
RAC: 5,192
Message 57545 - Posted: 2 Jan 2018, 11:03:49 UTC

Thanks, George. E-mail reminder duly sent, though hopefully that setting should not affect too many people.
ID: 57545 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,808,726
RAC: 5,192
Message 57550 - Posted: 3 Jan 2018, 10:13:59 UTC

The rsc_disk_bound has now been increased, so this shouldn't happen again for that model.
ID: 57550 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 57558 - Posted: 3 Jan 2018, 19:42:17 UTC - in response to Message 57550.  

The rsc_disk_bound has now been increased, so this shouldn't happen again for that model.

But just to be clear, any currently running cam25 models that have a large queue of uploads could still have the problem. I would think that would be very few instances though. Any future cam25 batches should not have this issue.
ID: 57558 · Report as offensive     Reply Quote

Message boards : Number crunching : CAM25 - "Disk usage limit exceeded"

©2024 cpdn.org