Message boards : Number crunching : Upload server is out of disk space
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
Hi folks, I do have some WAHs batch 994 still crunching, but it seems I can't upload zips to UPLOAD7.cpdn.org with a Transient HTTP error. This is happening since 4 Aug at least. EDIT: I noticed there is another thread: The uploads are stuck - so perhaps a moderator could move it there. Apology for the inconvenience. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,028,039 RAC: 20,189 |
EDIT: I noticed there is another thread: The uploads are stuck - so perhaps a moderator could move it there. Apology for the inconvenience. Andy is aware of this.I know the OS has recently been upgraded on the server and Andy had this on his list of things to look at waiting on his desk when he came back from leave on Wednesday. What I don't know is if this is now a problem he can fix or a problem where the files get sent to in Korea. |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
Andy is aware of this.I know the OS has recently been upgraded on the server and Andy had this on his list of things to look at waiting on his desk when he came back from leave on Wednesday. What I don't know is if this is now a problem he can fix or a problem where the files get sent to in Korea. Here is a paste bin from the log |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
It was working for some time and I managed to upload most of the queue. However it is down again and I can't upload. Connection timed out, transient HTTP error. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,028,039 RAC: 20,189 |
It was working for some time and I managed to upload most of the queue. However it is down again and I can't upload. Connection timed out, transient HTTP error. I have emailed Andy. Sadly, it being Friday afternoon, this may not get looked at till Monday. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,433,344 RAC: 13,607 |
This is now back up again.It was working for some time and I managed to upload most of the queue. However it is down again and I can't upload. Connection timed out, transient HTTP error. --- CPDN Visiting Scientist |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
This is now back up again.It was working for some time and I managed to upload most of the queue. However it is down again and I can't upload. Connection timed out, transient HTTP error. I managed to upload few zips and then: 14/08/2023 12:52:38 | climateprediction.net | [http] [ID#5921] Info: Connection #635 to host upload7.cpdn.org left intact 14/08/2023 12:52:38 | | Internet access OK - project servers may be temporarily down. 14/08/2023 12:52:38 | climateprediction.net | [error] Error reported by file upload server: [wah2_eas25_a2eu_200511_25_994_012218564_0_r1079552417_18.zip] locked by file_upload_handler PID=29787 14/08/2023 12:52:38 | climateprediction.net | Temporarily failed upload of wah2_eas25_a2eu_200511_25_994_012218564_0_r1079552417_18.zip: transient upload error 14/08/2023 12:52:38 | climateprediction.net | Backing off 00:06:09 on upload of wah2_eas25_a2eu_200511_25_994_012218564_0_r1079552417_18.zip 14/08/2023 12:52:38 | climateprediction.net | [http] [ID#5920] Info: Connected to upload7.cpdn.org (141.223.16.156) port 80 (#634) Some % was uploaded and then the connection was lost. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,709,934 RAC: 9,107 |
The error message "locked by file_upload_handler" is normal if an upload is interrupted for any reason. It will reset automatically after some time has passed (possibly an hour, maybe longer), and you will be able to try again. If you get any other error messages later, please post them here. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,028,039 RAC: 20,189 |
I have emailed Andy. Sadly, it being Friday afternoon, this may not get looked at till Monday.Actually I didn't. - I sent it from the wrong email address so it bounced but I will wait till tomorrow and see what is happening to see if I need to send another. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,433,344 RAC: 13,607 |
I spoke with Andy this morning. He said he'd looked at the server and cleared some space and it was working. That doesn't mean he hasn't gone down again since this morning though.I have emailed Andy. Sadly, it being Friday afternoon, this may not get looked at till Monday.Actually I didn't. - I sent it from the wrong email address so it bounced but I will wait till tomorrow and see what is happening to see if I need to send another. |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
I''ve switched off network activity yesterday and I switched it on today. I still can't get the rest of the zips through Here is the log |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,709,934 RAC: 9,107 |
Sorry - I was busy yesterday. Those logs are horrible, aren't they! Especially because there are several things going on at once, all intertwined. But I've separated out the two important ones, and taken a look. They are: ID#5953: Started upload of wah2_eas25_a11n_199411_25_994_012216793_0_r879970839_18.zip (task 22320060) ID#5954: Started upload of wah2_eas25_a2ix_200611_25_994_012218711_0_r1894094109_19.zip (task 22321986) Both failed at exactly the same point, in the same way. That usually implies a common cause. Here's the second one. 15/08/2023 08:59:05 | climateprediction.net | [http] [ID#5954] Info: Found bundle for host: 0x2ffe950 [serially]There are two lines of interest. "Received header from server: <file_size>0</file_size>" (about half way down). I think this is BOINC's way of saying 'It's been tried before, but no actual data got through' - which fits the circumstances. "Received header from server: HTTP/1.1 100 Continue" (right at the bottom, just before the connection is lost). This seems to be a common cause of failure, in a number of cases of this nature. There has been a suggestion earlier in this conversation - I'll try to find it later - that the HTTP file transfer protocol requires a blank line after '100 Continue', before it will start the full binary transfer. I've speculated before that this might lead to a deadlock, with the server waiting for data, and the client waiting for the blank line. This needs investigating - urgently and seriously - as a potential bug in the BOINC server software, affecting all reties, but only retries. I'll start thinking about who, and how, to contact. Edit: Found the previous discussion - look at message 69074. I think the author of that message is an experienced volunteer with CERN, where the BOINC server software is managed. But the wording suggests that the problem may originate in the configuration of the operating system on upload 7, rather than the BOINC code. |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
Thanks Richard, I will suspend network connection for now as I will be off for a week, then will resume and provide more logs if necessary. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,709,934 RAC: 9,107 |
I've been poking around and googling around various http and Apache pages, and the requirement for a blank line after an HTTP 100 Continue header does seem to be proper. For example, In any case, all HTTP 1.1 clients must handle the 100 response correctly (perhaps by just ignoring it). The "100 Continue" response is structured like any HTTP response, i.e. consists of a status line, optional headers, and a blank line. Unlike other responses, it is always followed by another complete, final response.I've searched the BOINC GitHub repository (which includes the BOINC server files), and there are no relevant matches. I think this level of comms detail must be handled by Apache (for the server). and Curl (for the client). I don't have access to a Linux server, and wouldn't know where to start with the documentation, but I do clearly see blank lines in the log I posted earlier - after "Content-Type: application/x-www-form-urlencoded" and "Expect: 100-continue" for upward-bound messages from client to server, and after "Content-Type: text/plain; charset=UTF-8" and "</data_server_reply>" for downward-bound messages from server to client. But not after "HTTP/1.1 100 Continue". This problem seems to be consistently present, but only
I'll try to see if I can get matching logs from any other BOINC project for comparison, but after that, I'm at my limit. It will have to be left to the CPDN team and the hosting bopdy for upload7. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,028,039 RAC: 20,189 |
Thank you Richard. You are going above and beyond the call of duty. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,433,344 RAC: 13,607 |
I'm playing catchup but I'm still unclear whether this is a configuration to the server that CPDN can easily fix, or, as I suspect, an issue with the server code itself? I can (or Dave/Richard) bring this to CPDN's attention if it's the former? --- CPDN Visiting Scientist |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,709,934 RAC: 9,107 |
I'm veering towards "server configuration", but at this stage all possibilities are open. The key point is that Andy's statement that "the server is running" needs to be accompanied by a recognition like "but it isn't achieving everything it's supposed to." That can only be determined by actual investigation. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,433,344 RAC: 13,607 |
I'm veering towards "server configuration", but at this stage all possibilities are open. The key point is that Andy's statement that "the server is running" needs to be accompanied by a recognition like "but it isn't achieving everything it's supposed to." That can only be determined by actual investigation.Richard, Andy is back from holidays so I suggest contacting him directly. I have no access so it's only Andy can investigate. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,709,934 RAC: 9,107 |
Richard, Andy is back from holidays so I suggest contacting him directly. I have no access so it's only Andy can investigate.OK, I'll pick upon that - but it won't be quite immediate. I'm caught up in a number of non-boinc issues at the moment, and they're taking up a lot of thinking space, pushing the computer stuff to one side. I'll do what I can. Will be slightly easier after the Bank Holiday. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,433,344 RAC: 13,607 |
Maybe there's a quick check Andy can do on the Korean server? There was an earlier message from computermzle(?) about looking for a blank string in the config. If someone can let me know which file exactly to look in I can discuss with Andy. Might be quick fix, if not, will rule it out.Richard, Andy is back from holidays so I suggest contacting him directly. I have no access so it's only Andy can investigate.OK, I'll pick upon that - but it won't be quite immediate. I'm caught up in a number of non-boinc issues at the moment, and they're taking up a lot of thinking space, pushing the computer stuff to one side. I'll do what I can. Will be slightly easier after the Bank Holiday. --- CPDN Visiting Scientist |
©2024 cpdn.org