climateprediction.net (CPDN) home page
Thread 'Upload server is out of disk space'

Thread 'Upload server is out of disk space'

Message boards : Number crunching : Upload server is out of disk space
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 5 · Next

AuthorMessage
Jesse Viviano

Send message
Joined: 20 Dec 14
Posts: 23
Credit: 2,450,095
RAC: 296
Message 58416 - Posted: 18 Jul 2018, 6:31:30 UTC

I am getting the following errors when I try to upload some result files. Sample error messages are listed below:

7/18/2018 2:19:10 AM | climateprediction.net | Started upload of wah2_nam50_pdhh_200912_13_735_011566091_0_r348673607_1.zip
7/18/2018 2:20:05 AM | climateprediction.net | [error] Error reported by file upload server: can't open file wah2_nam50_pdhh_200912_13_735_011566091_0_r348673607_1.zip: No space left on device
7/18/2018 2:20:05 AM | climateprediction.net | Temporarily failed upload of wah2_nam50_pdhh_200912_13_735_011566091_0_r348673607_1.zip: transient upload error
7/18/2018 2:20:05 AM | climateprediction.net | Backing off 00:10:56 on upload of wah2_nam50_pdhh_200912_13_735_011566091_0_r348673607_1.zip
7/18/2018 2:26:53 AM | climateprediction.net | Started upload of wah2_nam50_pdbv_200812_13_735_011565889_0_r1787333323_1.zip
7/18/2018 2:27:49 AM | climateprediction.net | [error] Error reported by file upload server: can't open file wah2_nam50_pdbv_200812_13_735_011565889_0_r1787333323_1.zip: No space left on device
7/18/2018 2:27:49 AM | climateprediction.net | Temporarily failed upload of wah2_nam50_pdbv_200812_13_735_011565889_0_r1787333323_1.zip: transient upload error
7/18/2018 2:27:49 AM | climateprediction.net | Backing off 04:03:43 on upload of wah2_nam50_pdbv_200812_13_735_011565889_0_r1787333323_1.zip
ID: 58416 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58417 - Posted: 18 Jul 2018, 7:02:16 UTC

OK, I'll let them know, although they may have alarms on that.
ID: 58417 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58419 - Posted: 18 Jul 2018, 11:20:58 UTC

This should now be fixed.
ID: 58419 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58421 - Posted: 18 Jul 2018, 12:26:37 UTC

And mine are uploading.
Phew.
ID: 58421 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58422 - Posted: 18 Jul 2018, 13:06:26 UTC

That didn't last long.
Now there's another problem.

I've just sent another email.
ID: 58422 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58423 - Posted: 18 Jul 2018, 13:59:33 UTC

Fixed and good to go.
ID: 58423 · Report as offensive     Reply Quote
gchrist

Send message
Joined: 17 Jul 05
Posts: 7
Credit: 6,509,173
RAC: 854
Message 59003 - Posted: 13 Nov 2018, 23:48:31 UTC

I have new upload failures, this time for the global tasks (766-770). They started to refuse uploading at about 9 p.m. CET, again with the error message 'server is out of disk space'. Just to let you know...
ID: 59003 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 59005 - Posted: 14 Nov 2018, 0:54:28 UTC - in response to Message 59003.  

I have new upload failures, this time for the global tasks (766-770). They started to refuse uploading at about 9 p.m. CET, again with the error message 'server is out of disk space'. Just to let you know...

This has been relayed to the project staff.
ID: 59005 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 59018 - Posted: 17 Nov 2018, 20:37:16 UTC

And it's out of space again.
Email sent.
ID: 59018 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,033,903
RAC: 14,766
Message 59027 - Posted: 19 Nov 2018, 23:08:44 UTC - in response to Message 59018.  

Do we still have a problem?

19/11/2018 23:04:05 | climateprediction.net | Started upload of wah2_global_e0pq_208812_145_768_011660298_0_r329021304_73.zip
19/11/2018 23:04:05 | climateprediction.net | [file_xfer] URL: http://upload3.cpdn.org/cgi-bin/file_upload_handler
19/11/2018 23:04:05 | climateprediction.net | [http] [ID#120] Info: Trying 130.246.191.84...
19/11/2018 23:04:26 | climateprediction.net | [http] [ID#120] Info: connect to 130.246.191.84 port 80 failed: Timed out
19/11/2018 23:04:26 | climateprediction.net | [http] [ID#120] Info: Failed to connect to upload3.cpdn.org port 80: Timed out
19/11/2018 23:04:26 | climateprediction.net | [http] [ID#120] Info: Closing connection 257
19/11/2018 23:04:26 | climateprediction.net | [http] HTTP error: Couldn't connect to server
19/11/2018 23:04:26 | climateprediction.net | [file_xfer] http op done; retval -107 (connect() failed)
19/11/2018 23:04:26 | climateprediction.net | [file_xfer] file transfer status -107 (connect() failed)
19/11/2018 23:04:26 | climateprediction.net | Temporarily failed upload of wah2_global_e0pq_208812_145_768_011660298_0_r329021304_73.zip: connect() failed
ID: 59027 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 59028 - Posted: 20 Nov 2018, 1:04:31 UTC - in response to Message 59027.  

Mine are going up, but some are getting those timeouts intermittently. Server must be very busy with the backload of uploads that were stalled.
ID: 59028 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 59029 - Posted: 20 Nov 2018, 2:35:24 UTC - in response to Message 59028.  

As luck would have it, I had ten work units that ended during the outage, and it has been slow going all day with all the zips. But the last four just finished uploading at 260 kbps, so I think the logjam is over.
ID: 59029 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 59031 - Posted: 20 Nov 2018, 4:10:42 UTC - in response to Message 59029.  

Oops, getting "No space left on server" error again now on a global batch 766 task.

Also, now getting
11/19/2018 20:32:10 | climateprediction.net | Aborting task wah2_global_e1n4_208812_145_769_011664000_1: exceeded disk limit: 1920.80MB > 1907.35MB
11/19/2018 21:32:11 | climateprediction.net | Aborting task wah2_global_e0nv_200412_145_766_011655231_1: exceeded disk limit: 1921.15MB > 1907.35MB
11/19/2018 21:44:12 | climateprediction.net | Aborting task wah2_global_e0os_200412_145_766_011655264_0: exceeded disk limit: 1920.43MB > 1907.35MB


This looks like some local disk usage limit coded into the task. Maybe local limit exceeded because uploads aren't happening?
I'll suspend computing on the machine that's getting these errors, for now
ID: 59031 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 59032 - Posted: 20 Nov 2018, 4:46:40 UTC

I've just reported the out of disk space.
I'll report the error.

I've just lost a model between zips 143 and 144. No error listed.

:(
ID: 59032 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 59033 - Posted: 20 Nov 2018, 4:57:16 UTC

OK, sent.

How far did those 3 get?
Did you get Restart or out files.

(My failure got the Restart, but not the out file.)
e1wm
ID: 59033 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 59034 - Posted: 20 Nov 2018, 6:39:03 UTC - in response to Message 59033.  
Last modified: 20 Nov 2018, 6:55:18 UTC

Lost another e0n7 batch 769 zips 60-145 absent
e0os batch 766 zips 90-145 absent
e0nv batch 766 zips 133-145 absent
e1n4 batch 769 zips 124-145 absent
no restart zip files for any of these.

It puzzles me. These all running on a Windows 10 (1803) virtualbox under Ubuntu 18.04 on Intel I7-3770.
I had suspended network activity during the latest upload problems.
I'll check if any of my other machines have reported similar problems. I've suspended computation on those that had their uploads still queued.
<edit>
Don't see any other "exceeded disk limit" errors on other machines. But the one that got the "exceeded disk limit" errors has by far the most uploads queued - over 100 50MB zip files.
I'll sleep on it, doesn't seem to be directly related to the main upload problem
ID: 59034 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 59035 - Posted: 20 Nov 2018, 8:01:33 UTC

I suspect that this is due to a vaguely remembered problem with BOINC if too many zips/too much date is queued.

Because of this, I've been manually controlling the models.
I let them run for a day, and then suspend them. Then I let all of the uploads get back to Oxford.
Next step is to suspend the models on the 2nd computer, and get rid of them, while letting the 1st computer create some more.
A set of models every hour and 20 minutes creates a lot of data.

Now that I'm only a couple of zips short of finishing, the plan was to only let the models run for an about and a half or so. Just long enough to get one set of zips, then upload them.

But the "out of space" problem caught up with me, so I've suspended everything.
ID: 59035 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 59038 - Posted: 20 Nov 2018, 10:29:42 UTC

OK, the disk space problem is being worked on.

In the meantime, it's time to keep an eye on all the models.
If there's a lot of data piling up in the Transfers tab, then Suspend the models and wait it out. That way you'll have a chance of getting the data through.
I don't know how many is too many, but I had about 35 zips waiting at one point, and that was OK.

If BOINC suddenly kills off the tasks, that's OK with the project. That problem of too much data is also being talked about.

I'll put a copy of this in the News thread.
ID: 59038 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 59039 - Posted: 20 Nov 2018, 11:29:38 UTC - in response to Message 59038.  

One of the times when only having slow machines is an advantage :)
ID: 59039 · Report as offensive     Reply Quote
ProfileBonsai911

Send message
Joined: 9 Sep 04
Posts: 228
Credit: 30,756,611
RAC: 3,303
Message 59040 - Posted: 20 Nov 2018, 12:12:12 UTC

My computer has 165 zips waiting for uploading. I'll stop calculation on my machine. Too bad.
ID: 59040 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 5 · Next

Message boards : Number crunching : Upload server is out of disk space

©2024 cpdn.org