climateprediction.net (CPDN) home page
Thread 'Upload server is out of disk space'

Thread 'Upload server is out of disk space'

Message boards : Number crunching : Upload server is out of disk space
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 69448 - Posted: 7 Aug 2023, 8:36:23 UTC
Last modified: 7 Aug 2023, 8:48:46 UTC

Hi folks,
I do have some WAHs batch 994 still crunching, but it seems I can't upload zips to UPLOAD7.cpdn.org with a Transient HTTP error. This is happening since 4 Aug at least.

EDIT: I noticed there is another thread: The uploads are stuck - so perhaps a moderator could move it there. Apology for the inconvenience.
ID: 69448 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,016,442
RAC: 21,024
Message 69452 - Posted: 7 Aug 2023, 11:16:03 UTC - in response to Message 69448.  

EDIT: I noticed there is another thread: The uploads are stuck - so perhaps a moderator could move it there. Apology for the inconvenience.


Andy is aware of this.I know the OS has recently been upgraded on the server and Andy had this on his list of things to look at waiting on his desk when he came back from leave on Wednesday. What I don't know is if this is now a problem he can fix or a problem where the files get sent to in Korea.
ID: 69452 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 69454 - Posted: 7 Aug 2023, 17:16:07 UTC - in response to Message 69452.  
Last modified: 7 Aug 2023, 17:17:08 UTC

Andy is aware of this.I know the OS has recently been upgraded on the server and Andy had this on his list of things to look at waiting on his desk when he came back from leave on Wednesday. What I don't know is if this is now a problem he can fix or a problem where the files get sent to in Korea.


Here is a paste bin from the log
ID: 69454 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 69466 - Posted: 11 Aug 2023, 10:29:48 UTC - in response to Message 69454.  

It was working for some time and I managed to upload most of the queue. However it is down again and I can't upload. Connection timed out, transient HTTP error.
ID: 69466 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,016,442
RAC: 21,024
Message 69467 - Posted: 11 Aug 2023, 14:56:22 UTC - in response to Message 69466.  

It was working for some time and I managed to upload most of the queue. However it is down again and I can't upload. Connection timed out, transient HTTP error.


I have emailed Andy. Sadly, it being Friday afternoon, this may not get looked at till Monday.
ID: 69467 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 69468 - Posted: 14 Aug 2023, 9:33:55 UTC - in response to Message 69467.  

It was working for some time and I managed to upload most of the queue. However it is down again and I can't upload. Connection timed out, transient HTTP error.


I have emailed Andy. Sadly, it being Friday afternoon, this may not get looked at till Monday.
This is now back up again.
---
CPDN Visiting Scientist
ID: 69468 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 69469 - Posted: 14 Aug 2023, 9:55:11 UTC - in response to Message 69468.  
Last modified: 14 Aug 2023, 9:56:08 UTC

It was working for some time and I managed to upload most of the queue. However it is down again and I can't upload. Connection timed out, transient HTTP error.


I have emailed Andy. Sadly, it being Friday afternoon, this may not get looked at till Monday.
This is now back up again.


I managed to upload few zips and then:

14/08/2023 12:52:38 | climateprediction.net | [http] [ID#5921] Info:  Connection #635 to host upload7.cpdn.org left intact
14/08/2023 12:52:38 |  | Internet access OK - project servers may be temporarily down.
14/08/2023 12:52:38 | climateprediction.net | [error] Error reported by file upload server: [wah2_eas25_a2eu_200511_25_994_012218564_0_r1079552417_18.zip] locked by file_upload_handler PID=29787
14/08/2023 12:52:38 | climateprediction.net | Temporarily failed upload of wah2_eas25_a2eu_200511_25_994_012218564_0_r1079552417_18.zip: transient upload error
14/08/2023 12:52:38 | climateprediction.net | Backing off 00:06:09 on upload of wah2_eas25_a2eu_200511_25_994_012218564_0_r1079552417_18.zip
14/08/2023 12:52:38 | climateprediction.net | [http] [ID#5920] Info:  Connected to upload7.cpdn.org (141.223.16.156) port 80 (#634)


Some % was uploaded and then the connection was lost.
ID: 69469 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,706,621
RAC: 9,524
Message 69470 - Posted: 14 Aug 2023, 10:39:25 UTC - in response to Message 69469.  

The error message "locked by file_upload_handler" is normal if an upload is interrupted for any reason. It will reset automatically after some time has passed (possibly an hour, maybe longer), and you will be able to try again. If you get any other error messages later, please post them here.
ID: 69470 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,016,442
RAC: 21,024
Message 69471 - Posted: 14 Aug 2023, 17:20:12 UTC

I have emailed Andy. Sadly, it being Friday afternoon, this may not get looked at till Monday.
Actually I didn't. - I sent it from the wrong email address so it bounced but I will wait till tomorrow and see what is happening to see if I need to send another.
ID: 69471 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 69473 - Posted: 14 Aug 2023, 20:56:26 UTC - in response to Message 69471.  

I have emailed Andy. Sadly, it being Friday afternoon, this may not get looked at till Monday.
Actually I didn't. - I sent it from the wrong email address so it bounced but I will wait till tomorrow and see what is happening to see if I need to send another.
I spoke with Andy this morning. He said he'd looked at the server and cleared some space and it was working. That doesn't mean he hasn't gone down again since this morning though.
ID: 69473 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 69474 - Posted: 15 Aug 2023, 6:05:08 UTC - in response to Message 69473.  

I''ve switched off network activity yesterday and I switched it on today. I still can't get the rest of the zips through Here is the log
ID: 69474 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,706,621
RAC: 9,524
Message 69475 - Posted: 16 Aug 2023, 10:15:09 UTC - in response to Message 69474.  
Last modified: 16 Aug 2023, 10:59:09 UTC

Sorry - I was busy yesterday. Those logs are horrible, aren't they! Especially because there are several things going on at once, all intertwined. But I've separated out the two important ones, and taken a look.

They are:
ID#5953: Started upload of wah2_eas25_a11n_199411_25_994_012216793_0_r879970839_18.zip (task 22320060)
ID#5954: Started upload of wah2_eas25_a2ix_200611_25_994_012218711_0_r1894094109_19.zip (task 22321986)

Both failed at exactly the same point, in the same way. That usually implies a common cause. Here's the second one.
15/08/2023 08:59:05 | climateprediction.net | [http] [ID#5954] Info: Found bundle for host: 0x2ffe950 [serially]
15/08/2023 08:59:05 | climateprediction.net | [http] [ID#5954] Info: Hostname 'upload7.cpdn.org' was found in DNS cache
15/08/2023 08:59:05 | climateprediction.net | [http] [ID#5954] Info: Trying 141.223.16.156:80...
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Info: Connected to upload7.cpdn.org (141.223.16.156) port 80 (#691)
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Sent header to server: POST /cgi-bin/file_upload_handler HTTP/1.1
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Sent header to server: Host: upload7.cpdn.org
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.20.2)
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Sent header to server: Accept: */*
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Sent header to server: Accept-Encoding: deflate, gzip
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Sent header to server: Accept-Language: en_GB
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Sent header to server: Content-Length: 313
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Sent header to server: Content-Type: application/x-www-form-urlencoded
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Sent header to server:
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Info: We are completely uploaded and fine
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Info: Mark bundle as not supporting multiuse
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Received header from server: HTTP/1.1 200 OK
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Received header from server: Date: Tue, 15 Aug 2023 05:58:40 GMT
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Received header from server: Server: Apache/2.4.37 (centos)
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Received header from server: Transfer-Encoding: chunked
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Received header from server: Content-Type: text/plain; charset=UTF-8
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Received header from server:
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Received header from server: 5d
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Received header from server: <data_server_reply>
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Received header from server: <status>0</status>
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Received header from server: <file_size>0</file_size>
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Received header from server: </data_server_reply>
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Received header from server:
15/08/2023 08:59:06 | climateprediction.net | [http] [ID#5954] Info: Connection #691 to host upload7.cpdn.org left intact
15/08/2023 08:59:07 | climateprediction.net | [http] [ID#5954] Info: Found bundle for host: 0x2ffe950 [serially]
15/08/2023 08:59:07 | climateprediction.net | [http] [ID#5954] Info: Re-using existing connection #691 with host upload7.cpdn.org
15/08/2023 08:59:07 | climateprediction.net | [http] [ID#5954] Info: Connected to upload7.cpdn.org (141.223.16.156) port 80 (#691)
15/08/2023 08:59:07 | climateprediction.net | [http] [ID#5954] Sent header to server: POST /cgi-bin/file_upload_handler HTTP/1.1
15/08/2023 08:59:07 | climateprediction.net | [http] [ID#5954] Sent header to server: Host: upload7.cpdn.org
15/08/2023 08:59:07 | climateprediction.net | [http] [ID#5954] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.20.2)
15/08/2023 08:59:07 | climateprediction.net | [http] [ID#5954] Sent header to server: Accept: */*
15/08/2023 08:59:07 | climateprediction.net | [http] [ID#5954] Sent header to server: Accept-Encoding: deflate, gzip
15/08/2023 08:59:07 | climateprediction.net | [http] [ID#5954] Sent header to server: Accept-Language: en_GB
15/08/2023 08:59:07 | climateprediction.net | [http] [ID#5954] Sent header to server: Content-Length: 124259687
15/08/2023 08:59:07 | climateprediction.net | [http] [ID#5954] Sent header to server: Content-Type: application/x-www-form-urlencoded
15/08/2023 08:59:07 | climateprediction.net | [http] [ID#5954] Sent header to server: Expect: 100-continue
15/08/2023 08:59:07 | climateprediction.net | [http] [ID#5954] Sent header to server:
15/08/2023 08:59:08 | climateprediction.net | [http] [ID#5954] Info: Mark bundle as not supporting multiuse
15/08/2023 08:59:08 | climateprediction.net | [http] [ID#5954] Received header from server: HTTP/1.1 100 Continue
15/08/2023 08:59:57 | climateprediction.net | [http] [ID#5954] Info: Recv failure: Connection was reset
15/08/2023 08:59:57 | climateprediction.net | [http] [ID#5954] Info: Closing connection 691
There are two lines of interest.

"Received header from server: <file_size>0</file_size>" (about half way down). I think this is BOINC's way of saying 'It's been tried before, but no actual data got through' - which fits the circumstances.

"Received header from server: HTTP/1.1 100 Continue" (right at the bottom, just before the connection is lost).

This seems to be a common cause of failure, in a number of cases of this nature. There has been a suggestion earlier in this conversation - I'll try to find it later - that the HTTP file transfer protocol requires a blank line after '100 Continue', before it will start the full binary transfer. I've speculated before that this might lead to a deadlock, with the server waiting for data, and the client waiting for the blank line. This needs investigating - urgently and seriously - as a potential bug in the BOINC server software, affecting all reties, but only retries. I'll start thinking about who, and how, to contact.

Edit: Found the previous discussion - look at message 69074. I think the author of that message is an experienced volunteer with CERN, where the BOINC server software is managed. But the wording suggests that the problem may originate in the configuration of the operating system on upload 7, rather than the BOINC code.
ID: 69475 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 69476 - Posted: 16 Aug 2023, 14:27:31 UTC - in response to Message 69475.  

Thanks Richard,

I will suspend network connection for now as I will be off for a week, then will resume and provide more logs if necessary.
ID: 69476 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,706,621
RAC: 9,524
Message 69477 - Posted: 16 Aug 2023, 16:23:05 UTC

I've been poking around and googling around various http and Apache pages, and the requirement for a blank line after an HTTP 100 Continue header does seem to be proper. For example,

In any case, all HTTP 1.1 clients must handle the 100 response correctly (perhaps by just ignoring it). The "100 Continue" response is structured like any HTTP response, i.e. consists of a status line, optional headers, and a blank line. Unlike other responses, it is always followed by another complete, final response.
Source
I've searched the BOINC GitHub repository (which includes the BOINC server files), and there are no relevant matches. I think this level of comms detail must be handled by Apache (for the server). and Curl (for the client).

I don't have access to a Linux server, and wouldn't know where to start with the documentation, but I do clearly see blank lines in the log I posted earlier - after "Content-Type: application/x-www-form-urlencoded" and "Expect: 100-continue" for upward-bound messages from client to server, and after "Content-Type: text/plain; charset=UTF-8" and "</data_server_reply>" for downward-bound messages from server to client.

But not after "HTTP/1.1 100 Continue".

This problem seems to be consistently present, but only

  • on CPDN upload7 server
  • when re-processing interrupted upload files

I'll try to see if I can get matching logs from any other BOINC project for comparison, but after that, I'm at my limit. It will have to be left to the CPDN team and the hosting bopdy for upload7.

ID: 69477 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,016,442
RAC: 21,024
Message 69478 - Posted: 16 Aug 2023, 18:06:33 UTC - in response to Message 69477.  

Thank you Richard. You are going above and beyond the call of duty.
ID: 69478 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 69479 - Posted: 16 Aug 2023, 20:17:44 UTC - in response to Message 69477.  

I'm playing catchup but I'm still unclear whether this is a configuration to the server that CPDN can easily fix, or, as I suspect, an issue with the server code itself? I can (or Dave/Richard) bring this to CPDN's attention if it's the former?
---
CPDN Visiting Scientist
ID: 69479 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,706,621
RAC: 9,524
Message 69480 - Posted: 16 Aug 2023, 21:33:12 UTC - in response to Message 69479.  

I'm veering towards "server configuration", but at this stage all possibilities are open. The key point is that Andy's statement that "the server is running" needs to be accompanied by a recognition like "but it isn't achieving everything it's supposed to." That can only be determined by actual investigation.
ID: 69480 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 69481 - Posted: 17 Aug 2023, 10:30:08 UTC - in response to Message 69480.  

I'm veering towards "server configuration", but at this stage all possibilities are open. The key point is that Andy's statement that "the server is running" needs to be accompanied by a recognition like "but it isn't achieving everything it's supposed to." That can only be determined by actual investigation.
Richard, Andy is back from holidays so I suggest contacting him directly. I have no access so it's only Andy can investigate.
ID: 69481 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,706,621
RAC: 9,524
Message 69482 - Posted: 17 Aug 2023, 10:50:25 UTC - in response to Message 69481.  

Richard, Andy is back from holidays so I suggest contacting him directly. I have no access so it's only Andy can investigate.
OK, I'll pick upon that - but it won't be quite immediate. I'm caught up in a number of non-boinc issues at the moment, and they're taking up a lot of thinking space, pushing the computer stuff to one side. I'll do what I can. Will be slightly easier after the Bank Holiday.
ID: 69482 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 69486 - Posted: 17 Aug 2023, 15:12:40 UTC - in response to Message 69482.  

Richard, Andy is back from holidays so I suggest contacting him directly. I have no access so it's only Andy can investigate.
OK, I'll pick upon that - but it won't be quite immediate. I'm caught up in a number of non-boinc issues at the moment, and they're taking up a lot of thinking space, pushing the computer stuff to one side. I'll do what I can. Will be slightly easier after the Bank Holiday.
Maybe there's a quick check Andy can do on the Korean server? There was an earlier message from computermzle(?) about looking for a blank string in the config. If someone can let me know which file exactly to look in I can discuss with Andy. Might be quick fix, if not, will rule it out.
---
CPDN Visiting Scientist
ID: 69486 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Upload server is out of disk space

©2024 cpdn.org