Message boards : Number crunching : New work discussion - 2
Message board moderation
Previous · 1 . . . 27 · 28 · 29 · 30 · 31 · 32 · 33 . . . 42 · Next
Author | Message |
---|---|
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,696,681 RAC: 10,226 |
Thanks, both, for the logs. Euch - what a mess! I'll make some general comments first, then pick out details from each. First, these are both EAS tasks - the server which Dave/Andy say is running now. Let's take that on trust. Second - they both seem to get muddled by trying a test connection to Google. I suspect that may be caused by a timing problem - BOINC doesn't wait long enough. For testing and recovery purposes, I'd suggest making a change in cc_config.xml (but see * at the end of this post). Set these values: <dont_contact_ref_site>1</dont_contact_ref_site> <max_file_xfers_per_project>1</max_file_xfers_per_project>to keep things quiet and clean while we're working. From the format, I guess that both of you have collected the log from from BOINC Manager. I think, again, that BOINC can sometimes miss entries if it's trying to update the log while processing a big job communicating with a server. The stdoutdae.txt file version of the log can sometimes capture these missing lines. pututu: Initial contact is fine 6/30/2023 11:15:43 AM | climateprediction.net | [http] [ID#5725] Sent header to server: POST /cgi-bin/file_upload_handler HTTP/1.1 6/30/2023 11:15:43 AM | climateprediction.net | [http] [ID#5725] Received header from server: HTTP/1.1 200 OKHow big is your tile? 6/30/2023 11:15:43 AM | climateprediction.net | [http] [ID#5725] Received header from server: <file_size>41943040</file_size>That is enough to start work on the meat of the upload, but it goes wrong here: 6/30/2023 11:15:44 AM | climateprediction.net | [http] [ID#5725] Sent header to server: Content-Length: 82090749 6/30/2023 11:15:44 AM | climateprediction.net | [http] [ID#5725] Received header from server: HTTP/1.1 100 Continue 6/30/2023 11:16:04 AM | climateprediction.net | [http] [ID#5725] Info: Recv failure: Connection was reset 6/30/2023 11:16:04 AM | climateprediction.net | [http] [ID#5725] Info: Closing connection 810 6/30/2023 11:16:04 AM | climateprediction.net | [http] HTTP error: Failure when receiving data from the peer 6/30/2023 11:16:05 AM | | Project communication failed: attempting access to reference siteI think a 20 second delay at this point is a Windows default - can you confirm that you're running under Windows? The BOINC default can be changed, but initially is set at 300 seconds - I don't think you can over-ride Windows. So dead end - the rest is just the connection check with Google.. geophi: The setup is again fine, but we have 6/30/2023 1:45:19 PM | climateprediction.net | [http] [ID#21] Received header from server: <file_size>8495740</file_size>Is the different size usual? Then we get 6/30/2023 1:45:20 PM | climateprediction.net | [http] [ID#21] Sent header to server: Content-Length: 126698478 6/30/2023 1:45:20 PM | climateprediction.net | [http] [ID#21] Received header from server: HTTP/1.1 100 Continue 6/30/2023 1:50:27 PM | climateprediction.net | [http] [ID#21] Info: Operation too slow. Less than 10 bytes/sec transferred the last 300 seconds 6/30/2023 1:50:27 PM | climateprediction.net | [http] [ID#21] Info: Closing connection 39 6/30/2023 1:50:27 PM | climateprediction.net | [http] HTTP error: Timeout was reachedAgain a delay, but for 5 minutes - and BOINC has terminated it. I think that's under Linux? Another attempt at Google, but then we get 6/30/2023 1:50:28 PM | | [http] [ID#0] Sent header to server: roject_name> 6/30/2023 1:50:28 PM | | [http] [ID#0] Sent header to server: <name>wah2_eas25_a1hb_199711_25_994_012217357_2_r1373083460_restart.zip</name> 6/30/2023 1:50:28 PM | | [http] [ID#0] Sent header to server: <last_bytes_xferred>8561276.000000</last_bytes_xferred>That's mad. It's sent the 'continue' information to Google! BOINC client bug! But overall, it's probably not the main cause of your problems. Look at the varying file sizes: I think BOINC is trying to resend the remaining fraction of a big file which it has succeeded in sending part of already. I suspect that these partial retries are the main problem - the BOINC client/server combination are having difficulty coping with the separate sections which the server needs to stitch together. That's probably above all our pay grades - we would have to get BOINC Central involved in this, but they're not proving very responsive to bug reports these days. The server team here might have more success than a user report. * This setting also blocks regular updates of 'all_projects_list.xml'. If you're in the habit of checking for new projects using BOINC Manager, reset this setting when we've done here. |
Send message Joined: 9 Mar 22 Posts: 30 Credit: 1,065,239 RAC: 556 |
May I ask whether the clients are connected via a Squid Proxy? If so this may explain the following HTTP header: ... [http] [ID#5725] Received header from server: HTTP/1.1 100 Continue On the Client side this can be solved adding this lines to squid.conf: # may be a workaround for POST issues client_request_buffer_max_size 512 MB Then reload the configuration, e.g. with: sudo squid -k reconfigure On Windows open the Squid console as Administrator and run: squid -k reconfigure If the client is not configured to use a Squid the server's POST handling may need to be checked, especially if a size limit is set. |
Send message Joined: 5 Aug 04 Posts: 126 Credit: 24,413,595 RAC: 23,925 |
I think that's under Linux?Since WAH2 according to apps-page is Windows exclusive I doubt a Linux computer would try to return such work. Still, in the off-chance it's some kind of beta-wu, a quick look on geophi's log shows 6/30/2023 1:45:18 PM | climateprediction.net | [http] [ID#21] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.22.2) Similarly, pututu's log shows 6/30/2023 11:15:43 AM | climateprediction.net | [http] [ID#5725] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.16.20) |
Send message Joined: 7 Aug 04 Posts: 2186 Credit: 64,822,615 RAC: 5,275 |
@Richard The received header from server: 8495740 size is likely the the number of bytes the server thinks is uploaded so far, where it is stuck at now. In boinc manager it is stuck at 8.16 MB of the 128.98 MB upload file (6.33%) The “last bytes transferred” number of 8561276 converts to 8.16 MB so the client thinks it has transferred more than the server has recorded?? Yes, the boinc executable is running in wine on an Ubuntu linux host. We’ve seen this before where a single or a few uploads get stuck and rebooting the server, or manually killing the server side process associated with that file will allow them to upload. But I thought we had very different messages in the logs when that occurred in the past. |
Send message Joined: 7 Aug 04 Posts: 2186 Credit: 64,822,615 RAC: 5,275 |
I made the 2 changes to cc_config and tried the upload again. 7/1/2023 3:20:29 PM | climateprediction.net | Started upload of wah2_eas25_a1hb_199711_25_994_012217357_2_r1373083460_restart.zip 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Info: Connection 1 seems to be dead 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Info: Closing connection 1 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Info: schannel: shutting down SSL/TLS connection with dev.cpdn.org port 443 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Info: schannel: ApplyControlToken failure: SEC_E_UNSUPPORTED_FUNCTION (0x80090302) 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Info: Trying 141.223.16.156:80... 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Info: Connected to upload7.cpdn.org (141.223.16.156) port 80 (#3) 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Sent header to server: POST /cgi-bin/file_upload_handler HTTP/1.1 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Sent header to server: Host: upload7.cpdn.org 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.22.2) 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Sent header to server: Accept: */* 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Sent header to server: Accept-Encoding: deflate, gzip 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Sent header to server: Accept-Language: en_US 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Sent header to server: Content-Length: 318 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Sent header to server: Content-Type: application/x-www-form-urlencoded 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Sent header to server: 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Info: We are completely uploaded and fine 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: HTTP/1.1 200 OK 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: Date: Sat, 01 Jul 2023 20:33:53 GMT 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: Server: Apache/2.2.3 (CentOS) 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: Transfer-Encoding: chunked 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: Content-Type: text/plain; charset=UTF-8 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: 63 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: <data_server_reply> 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: <status>0</status> 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: <file_size>8495740</file_size> 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: </data_server_reply> 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: 0 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: 7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Info: Connection #3 to host upload7.cpdn.org left intact 7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Info: Found bundle for host: 0x8cd3a0 [serially] 7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Info: Re-using existing connection #3 with host upload7.cpdn.org 7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Sent header to server: POST /cgi-bin/file_upload_handler HTTP/1.1 7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Sent header to server: Host: upload7.cpdn.org 7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.22.2) 7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Sent header to server: Accept: */* 7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Sent header to server: Accept-Encoding: deflate, gzip 7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Sent header to server: Accept-Language: en_US 7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Sent header to server: Content-Length: 126698478 7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Sent header to server: Content-Type: application/x-www-form-urlencoded 7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Sent header to server: Expect: 100-continue 7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Sent header to server: 7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Received header from server: HTTP/1.1 100 Continue 7/1/2023 3:25:37 PM | climateprediction.net | [http] [ID#5] Info: Operation too slow. Less than 10 bytes/sec transferred the last 300 seconds 7/1/2023 3:25:37 PM | climateprediction.net | [http] [ID#5] Info: Closing connection 3 7/1/2023 3:25:37 PM | climateprediction.net | [http] HTTP error: Timeout was reached 7/1/2023 3:25:37 PM | climateprediction.net | Temporarily failed upload of wah2_eas25_a1hb_199711_25_994_012217357_2_r1373083460_restart.zip: transient HTTP error 7/1/2023 3:25:37 PM | climateprediction.net | Backing off 04:56:14 on upload of wah2_eas25_a1hb_199711_25_994_012217357_2_r1373083460_restart.zip |
Send message Joined: 22 Feb 11 Posts: 32 Credit: 226,546 RAC: 4,080 |
Did you reload config file with"read local prefs file" in "Options" drop-down? |
Send message Joined: 7 Aug 04 Posts: 2186 Credit: 64,822,615 RAC: 5,275 |
Did you reload config file with"read local prefs file" in "Options" drop-down? Yep. |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,989,107 RAC: 21,788 |
Just for information, I noted the other day that the restart.zip comes after the 12th zip, not at the end of the task as is more often the case. |
Send message Joined: 9 Mar 22 Posts: 30 Credit: 1,065,239 RAC: 556 |
The problem is this: 7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Received header from server: HTTP/1.1 100 Continue You may check if the server is configured to add '\r\n\r\n' (a blank line) at the end of that header. If not, the client waits for it until the timeout is over. geophi wrote: ... the boinc executable is running in wine ... Don't know if this modifies the network packets, e.g. removes the expected blank line. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,696,681 RAC: 10,226 |
If not, the client waits for it until the timeout is over.Why? That sounds like a bug - is it a client bug or a server bug? |
Send message Joined: 9 Mar 22 Posts: 30 Credit: 1,065,239 RAC: 556 |
if not, the client waits for it until the timeout is over. Basically (in short) because a blank line indicates a "transfer complete" in HTTP. In addition, "100-continue" was added to HTTP 1.1 after the initial spec. Some more information can be found here including a link to the relevant RFC: https://daniel.haxx.se/blog/2020/02/27/expect-tweaks-in-curl/ That sounds like a bug - is it a client bug or a server bug? I would start at the server to ensure it sends a blank line. You may notice blank lines in other parts of the logs (from google but even from the CPDN server). |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,696,681 RAC: 10,226 |
Thanks. You learn something new every day. Edit - this might be related to https://github.com/BOINC/boinc/issues/4572. As the reporter notes, the 'resolution' cited doesn't fix that issue - it relates to a real, but different, issue. The common factor appears to be the attempted restart of a large upload, which has been interrupted by a network glitch. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
Drat, just lost a 24% done WAH. Rebooting computers seems to upset them, another bug to be fixed, but I guess less important just now. https://www.cpdn.org/result.php?resultid=22326503 Seems someone referring to themselves as Mr Anonymous is trying it now. |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,989,107 RAC: 21,788 |
Rebooting computers seems to upset them,I find that suspending computation, waiting two minutes before closing down BOINC and again waiting 2 minutes before rebooting reduces the percentage of failures from this cause on restarting. Also the Windows tasks seem to be less prone to it than the met office Linux tasks. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
I find suspending computation, waiting two minutes before closing down BOINC and again waiting 2 minutes before rebooting reduces the percentage of failures from this cause on restarting. Also the Windows tasks seem to be less prone to it than the met office Linux tasks.Yes I've only lost one of about 30 in a week. They used to be a lot more fussy. When you say suspending computation, what about the option to leave them in memory? Would I have to turn it off so tasks have a chance to shut down before I close Boinc? I like to leave the option on, since when Boinc switches between apps, it isn't stopping the CPDN tasks completely, so they don't mind. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
These 3 WUs started at the same time and they're all at 36-38% progress. They're all running on the same Win7 i5-4690K quadcore CPU with nothing else running. I've restarted BOINC twice and the first was to upgrade to 7.22.2. Set to only allow a single file transfer. Might there be a clue in one working properrly and two not or is just random? wah2_eas25_a21o_200211_25_994_012218090_0 https://www.cpdn.org/result.php?resultid=22321358 wah2_eas25_a21o_200211_25_994_012218090_0_r98403190_1.zip 1.136 121210.47 K 00:18:38 - 197:37:54 85.08 KBps Uploading wah2_eas25_a21o_200211_25_994_012218090_0_r98403190_5.zip 90.622 121400.95 K 00:31:46 - 105:43:03 0.00 KBps Upload pending (Retry in: 03:05:03), retried: 62 slot 4: Transferred _9.zip today with _1.zip & _5.zip still hung after a BOINC restart. wah2_eas25_a23h_200211_25_994_012218155_0 https://www.cpdn.org/result.php?resultid=22321423 slot 6: This WU has transferred 9 zips as of this morning with none hanging. wah2_eas25_a342_201111_25_994_012219472_0 https://www.cpdn.org/result.php?resultid=22322764 wah2_eas25_a342_201111_25_994_012219472_0_r523039799_4.zip 0.000 120899.03 K 00:24:22 - 152:14:17 0.00 KBps Upload pending (Retry in: 02:55:04), retried: 55 slot 5: Transferred _9.zip yesterday with _4.zip still hung after a BOINC restart. 02-Jul-2023 08:54:14 [climateprediction.net] Started upload of wah2_eas25_a342_201111_25_994_012219472_0_r523039799_4.zip 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Info: Trying 141.223.16.156:80... 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Info: Connected to upload7.cpdn.org (141.223.16.156) port 80 (#31) 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: POST /cgi-bin/file_upload_handler HTTP/1.1 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Host: upload7.cpdn.org 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.22.2) 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Accept: */* 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Accept-Encoding: deflate, gzip 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Accept-Language: en_US 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Content-Length: 311 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Content-Type: application/x-www-form-urlencoded 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Info: We are completely uploaded and fine 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: HTTP/1.1 200 OK 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: Date: Sun, 02 Jul 2023 16:04:16 GMT 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: Server: Apache/2.2.3 (CentOS) 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: Transfer-Encoding: chunked 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: Content-Type: text/plain; charset=UTF-8 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: 64 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: <data_server_reply> 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: <status>0</status> 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: <file_size>87031808</file_size> 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: </data_server_reply> 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Info: Connection #31 to host upload7.cpdn.org left intact 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Info: Found bundle for host: 0x32d7a70 [serially] 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Info: Re-using existing connection #31 with host upload7.cpdn.org 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: POST /cgi-bin/file_upload_handler HTTP/1.1 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Host: upload7.cpdn.org 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.22.2) 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Accept: */* 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Accept-Encoding: deflate, gzip 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Accept-Language: en_US 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Content-Length: 36769291 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Content-Type: application/x-www-form-urlencoded 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Expect: 100-continue 02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: 02-Jul-2023 08:54:16 [climateprediction.net] [http] [ID#122] Received header from server: HTTP/1.1 100 Continue 02-Jul-2023 08:54:36 [climateprediction.net] [http] [ID#122] Info: Recv failure: Connection was reset 02-Jul-2023 08:54:36 [climateprediction.net] [http] [ID#122] Info: Closing connection 31 02-Jul-2023 08:54:36 [climateprediction.net] [http] HTTP error: Failure when receiving data from the peer 02-Jul-2023 08:54:36 [climateprediction.net] Temporarily failed upload of wah2_eas25_a342_201111_25_994_012219472_0_r523039799_4.zip: transient HTTP error 02-Jul-2023 08:54:36 [climateprediction.net] Backing off 05:49:49 on upload of wah2_eas25_a342_201111_25_994_012219472_0_r523039799_4.zip 02-Jul-2023 08:54:36 [climateprediction.net] Started upload of wah2_eas25_a21o_200211_25_994_012218090_0_r98403190_5.zip 02-Jul-2023 08:54:37 [climateprediction.net] [http] [ID#124] Info: Hostname upload7.cpdn.org was found in DNS cache 02-Jul-2023 08:54:37 [climateprediction.net] [http] [ID#124] Info: Trying 141.223.16.156:80... 02-Jul-2023 08:54:58 [climateprediction.net] [http] [ID#124] Info: connect to 141.223.16.156 port 80 failed: Timed out 02-Jul-2023 08:54:58 [climateprediction.net] [http] [ID#124] Info: Failed to connect to upload7.cpdn.org port 80 after 21303 ms: Couldn't connect to server 02-Jul-2023 08:54:58 [climateprediction.net] [http] [ID#124] Info: Closing connection 32 02-Jul-2023 08:54:58 [climateprediction.net] [http] HTTP error: Timeout was reached 02-Jul-2023 08:54:58 [climateprediction.net] Temporarily failed upload of wah2_eas25_a21o_200211_25_994_012218090_0_r98403190_5.zip: transient HTTP error 02-Jul-2023 08:54:58 [climateprediction.net] Backing off 05:18:52 on upload of wah2_eas25_a21o_200211_25_994_012218090_0_r98403190_5.zip |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,989,107 RAC: 21,788 |
I think it is just random. Something to do with a large upload being interrupted by a glitch anywhere, (server, client, network or a reboot.) |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,989,107 RAC: 21,788 |
When you say suspending computation, what about the option to leave them in memory?- I have never turned it off so can't comment on whether doing that before shutdown has any effect. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,696,681 RAC: 10,226 |
That would be possible, but it feels like there is more of a pattern to the logs we've seen so far than pure randomness. They've all failed at exactly the same point: after receiving the "HTTP/1.1 100 Continue" header from the server - the one that computezrmle thinks may be missing a following blank line. Aurum's is slightly different: he has a second upload waiting, and it's retried immediately after the client times out the first transfer. For the second file, the client fails even to establish the initial connection - the only time we've been shown that. Speculation: after the "HTTP/1.1 100 Continue", the client and server enter a state of deadlock, with each waiting for the other to speak first. The client is waiting for the next line of the header: the server thinks it's sent that last line, and is waiting for the data flow to start. The client blinks first, and abandons the first transfer. But (speculatively), the server has a longer timeout, and is holding the connection open for the elusive data - it doesn't expect a new connection, so the attempt is treated as the data it's still waiting for. That sort of problem should be detectable in the server logs, if anyone is prepared to investigate them? |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
Ok I won't then. It's just over at LHC it was recommended, so the task saves it's state. But that's using VirtualBox, which adds many more complications.When you say suspending computation, what about the option to leave them in memory?- I have never turned it off so can't comment on whether doing that before shutdown has any effect. |
©2024 cpdn.org