Message boards : Number crunching : Upload server is out of disk space
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 Dec 14 Posts: 23 Credit: 2,450,095 RAC: 296 |
I am getting the following errors when I try to upload some result files. Sample error messages are listed below: 7/18/2018 2:19:10 AM | climateprediction.net | Started upload of wah2_nam50_pdhh_200912_13_735_011566091_0_r348673607_1.zip |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
OK, I'll let them know, although they may have alarms on that. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
This should now be fixed. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
And mine are uploading. Phew. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
That didn't last long. Now there's another problem. I've just sent another email. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Fixed and good to go. |
Send message Joined: 17 Jul 05 Posts: 7 Credit: 6,509,173 RAC: 854 |
I have new upload failures, this time for the global tasks (766-770). They started to refuse uploading at about 9 p.m. CET, again with the error message 'server is out of disk space'. Just to let you know... |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
I have new upload failures, this time for the global tasks (766-770). They started to refuse uploading at about 9 p.m. CET, again with the error message 'server is out of disk space'. Just to let you know... This has been relayed to the project staff. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
And it's out of space again. Email sent. |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 31,033,074 RAC: 14,759 |
Do we still have a problem? 19/11/2018 23:04:05 | climateprediction.net | Started upload of wah2_global_e0pq_208812_145_768_011660298_0_r329021304_73.zip 19/11/2018 23:04:05 | climateprediction.net | [file_xfer] URL: http://upload3.cpdn.org/cgi-bin/file_upload_handler 19/11/2018 23:04:05 | climateprediction.net | [http] [ID#120] Info: Trying 130.246.191.84... 19/11/2018 23:04:26 | climateprediction.net | [http] [ID#120] Info: connect to 130.246.191.84 port 80 failed: Timed out 19/11/2018 23:04:26 | climateprediction.net | [http] [ID#120] Info: Failed to connect to upload3.cpdn.org port 80: Timed out 19/11/2018 23:04:26 | climateprediction.net | [http] [ID#120] Info: Closing connection 257 19/11/2018 23:04:26 | climateprediction.net | [http] HTTP error: Couldn't connect to server 19/11/2018 23:04:26 | climateprediction.net | [file_xfer] http op done; retval -107 (connect() failed) 19/11/2018 23:04:26 | climateprediction.net | [file_xfer] file transfer status -107 (connect() failed) 19/11/2018 23:04:26 | climateprediction.net | Temporarily failed upload of wah2_global_e0pq_208812_145_768_011660298_0_r329021304_73.zip: connect() failed |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Mine are going up, but some are getting those timeouts intermittently. Server must be very busy with the backload of uploads that were stalled. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
As luck would have it, I had ten work units that ended during the outage, and it has been slow going all day with all the zips. But the last four just finished uploading at 260 kbps, so I think the logjam is over. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Oops, getting "No space left on server" error again now on a global batch 766 task. Also, now getting 11/19/2018 20:32:10 | climateprediction.net | Aborting task wah2_global_e1n4_208812_145_769_011664000_1: exceeded disk limit: 1920.80MB > 1907.35MB 11/19/2018 21:32:11 | climateprediction.net | Aborting task wah2_global_e0nv_200412_145_766_011655231_1: exceeded disk limit: 1921.15MB > 1907.35MB 11/19/2018 21:44:12 | climateprediction.net | Aborting task wah2_global_e0os_200412_145_766_011655264_0: exceeded disk limit: 1920.43MB > 1907.35MB This looks like some local disk usage limit coded into the task. Maybe local limit exceeded because uploads aren't happening? I'll suspend computing on the machine that's getting these errors, for now |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I've just reported the out of disk space. I'll report the error. I've just lost a model between zips 143 and 144. No error listed. :( |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
OK, sent. How far did those 3 get? Did you get Restart or out files. (My failure got the Restart, but not the out file.) e1wm |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Lost another e0n7 batch 769 zips 60-145 absent e0os batch 766 zips 90-145 absent e0nv batch 766 zips 133-145 absent e1n4 batch 769 zips 124-145 absent no restart zip files for any of these. It puzzles me. These all running on a Windows 10 (1803) virtualbox under Ubuntu 18.04 on Intel I7-3770. I had suspended network activity during the latest upload problems. I'll check if any of my other machines have reported similar problems. I've suspended computation on those that had their uploads still queued. <edit> Don't see any other "exceeded disk limit" errors on other machines. But the one that got the "exceeded disk limit" errors has by far the most uploads queued - over 100 50MB zip files. I'll sleep on it, doesn't seem to be directly related to the main upload problem |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I suspect that this is due to a vaguely remembered problem with BOINC if too many zips/too much date is queued. Because of this, I've been manually controlling the models. I let them run for a day, and then suspend them. Then I let all of the uploads get back to Oxford. Next step is to suspend the models on the 2nd computer, and get rid of them, while letting the 1st computer create some more. A set of models every hour and 20 minutes creates a lot of data. Now that I'm only a couple of zips short of finishing, the plan was to only let the models run for an about and a half or so. Just long enough to get one set of zips, then upload them. But the "out of space" problem caught up with me, so I've suspended everything. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
OK, the disk space problem is being worked on. In the meantime, it's time to keep an eye on all the models. If there's a lot of data piling up in the Transfers tab, then Suspend the models and wait it out. That way you'll have a chance of getting the data through. I don't know how many is too many, but I had about 35 zips waiting at one point, and that was OK. If BOINC suddenly kills off the tasks, that's OK with the project. That problem of too much data is also being talked about. I'll put a copy of this in the News thread. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
One of the times when only having slow machines is an advantage :) |
Send message Joined: 9 Sep 04 Posts: 228 Credit: 30,756,611 RAC: 3,303 |
My computer has 165 zips waiting for uploading. I'll stop calculation on my machine. Too bad. |
©2024 cpdn.org