climateprediction.net (CPDN) home page
Thread 'New work discussion - 2'

Thread 'New work discussion - 2'

Message boards : Number crunching : New work discussion - 2
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 27 · 28 · 29 · 30 · 31 · 32 · 33 . . . 42 · Next

AuthorMessage
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,696,681
RAC: 10,226
Message 69066 - Posted: 1 Jul 2023, 11:40:24 UTC
Last modified: 1 Jul 2023, 12:24:57 UTC

Thanks, both, for the logs. Euch - what a mess! I'll make some general comments first, then pick out details from each.

First, these are both EAS tasks - the server which Dave/Andy say is running now. Let's take that on trust.

Second - they both seem to get muddled by trying a test connection to Google. I suspect that may be caused by a timing problem - BOINC doesn't wait long enough. For testing and recovery purposes, I'd suggest making a change in cc_config.xml (but see * at the end of this post).

Set these values:
<dont_contact_ref_site>1</dont_contact_ref_site>
<max_file_xfers_per_project>1</max_file_xfers_per_project>
to keep things quiet and clean while we're working.

From the format, I guess that both of you have collected the log from from BOINC Manager. I think, again, that BOINC can sometimes miss entries if it's trying to update the log while processing a big job communicating with a server. The stdoutdae.txt file version of the log can sometimes capture these missing lines.

pututu:
Initial contact is fine
6/30/2023 11:15:43 AM | climateprediction.net | [http] [ID#5725] Sent header to server: POST /cgi-bin/file_upload_handler HTTP/1.1
6/30/2023 11:15:43 AM | climateprediction.net | [http] [ID#5725] Received header from server: HTTP/1.1 200 OK
How big is your tile?
6/30/2023 11:15:43 AM | climateprediction.net | [http] [ID#5725] Received header from server:     <file_size>41943040</file_size>
That is enough to start work on the meat of the upload, but it goes wrong here:
6/30/2023 11:15:44 AM | climateprediction.net | [http] [ID#5725] Sent header to server: Content-Length: 82090749
6/30/2023 11:15:44 AM | climateprediction.net | [http] [ID#5725] Received header from server: HTTP/1.1 100 Continue
6/30/2023 11:16:04 AM | climateprediction.net | [http] [ID#5725] Info:  Recv failure: Connection was reset
6/30/2023 11:16:04 AM | climateprediction.net | [http] [ID#5725] Info:  Closing connection 810
6/30/2023 11:16:04 AM | climateprediction.net | [http] HTTP error: Failure when receiving data from the peer
6/30/2023 11:16:05 AM |  | Project communication failed: attempting access to reference site
I think a 20 second delay at this point is a Windows default - can you confirm that you're running under Windows? The BOINC default can be changed, but initially is set at 300 seconds - I don't think you can over-ride Windows. So dead end - the rest is just the connection check with Google..

geophi:
The setup is again fine, but we have
6/30/2023 1:45:19 PM | climateprediction.net | [http] [ID#21] Received header from server:     <file_size>8495740</file_size>
Is the different size usual?
Then we get
6/30/2023 1:45:20 PM | climateprediction.net | [http] [ID#21] Sent header to server: Content-Length: 126698478
6/30/2023 1:45:20 PM | climateprediction.net | [http] [ID#21] Received header from server: HTTP/1.1 100 Continue
6/30/2023 1:50:27 PM | climateprediction.net | [http] [ID#21] Info:  Operation too slow. Less than 10 bytes/sec transferred the last 300 seconds
6/30/2023 1:50:27 PM | climateprediction.net | [http] [ID#21] Info:  Closing connection 39
6/30/2023 1:50:27 PM | climateprediction.net | [http] HTTP error: Timeout was reached
Again a delay, but for 5 minutes - and BOINC has terminated it. I think that's under Linux?
Another attempt at Google, but then we get
6/30/2023 1:50:28 PM |  | [http] [ID#0] Sent header to server: roject_name>
6/30/2023 1:50:28 PM |  | [http] [ID#0] Sent header to server:     <name>wah2_eas25_a1hb_199711_25_994_012217357_2_r1373083460_restart.zip</name>
6/30/2023 1:50:28 PM |  | [http] [ID#0] Sent header to server:         <last_bytes_xferred>8561276.000000</last_bytes_xferred>
That's mad.

It's sent the 'continue' information to Google! BOINC client bug!

But overall, it's probably not the main cause of your problems. Look at the varying file sizes: I think BOINC is trying to resend the remaining fraction of a big file which it has succeeded in sending part of already. I suspect that these partial retries are the main problem - the BOINC client/server combination are having difficulty coping with the separate sections which the server needs to stitch together. That's probably above all our pay grades - we would have to get BOINC Central involved in this, but they're not proving very responsive to bug reports these days. The server team here might have more success than a user report.

* This setting also blocks regular updates of 'all_projects_list.xml'. If you're in the habit of checking for new projects using BOINC Manager, reset this setting when we've done here.
ID: 69066 · Report as offensive
computezrmle

Send message
Joined: 9 Mar 22
Posts: 30
Credit: 1,065,239
RAC: 556
Message 69067 - Posted: 1 Jul 2023, 12:32:14 UTC - in response to Message 69066.  

May I ask whether the clients are connected via a Squid Proxy?
If so this may explain the following HTTP header:
... [http] [ID#5725] Received header from server: HTTP/1.1 100 Continue


On the Client side this can be solved adding this lines to squid.conf:
# may be a workaround for POST issues
client_request_buffer_max_size 512 MB

Then reload the configuration, e.g. with:
sudo squid -k reconfigure

On Windows open the Squid console as Administrator and run:
squid -k reconfigure


If the client is not configured to use a Squid the server's POST handling may need to be checked, especially if a size limit is set.
ID: 69067 · Report as offensive
Ingleside

Send message
Joined: 5 Aug 04
Posts: 126
Credit: 24,413,595
RAC: 23,925
Message 69068 - Posted: 1 Jul 2023, 13:37:02 UTC - in response to Message 69066.  

I think that's under Linux?
Since WAH2 according to apps-page is Windows exclusive I doubt a Linux computer would try to return such work.
Still, in the off-chance it's some kind of beta-wu, a quick look on geophi's log shows
6/30/2023 1:45:18 PM | climateprediction.net | [http] [ID#21] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.22.2)

Similarly, pututu's log shows
6/30/2023 11:15:43 AM | climateprediction.net | [http] [ID#5725] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.16.20)
ID: 69068 · Report as offensive
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2186
Credit: 64,822,615
RAC: 5,275
Message 69069 - Posted: 1 Jul 2023, 20:13:53 UTC - in response to Message 69066.  

@Richard

The received header from server: 8495740 size is likely the the number of bytes the server thinks is uploaded so far, where it is stuck at now.
In boinc manager it is stuck at 8.16 MB of the 128.98 MB upload file (6.33%)

The “last bytes transferred” number of 8561276 converts to 8.16 MB so the client thinks it has transferred more than the server has recorded??

Yes, the boinc executable is running in wine on an Ubuntu linux host.

We’ve seen this before where a single or a few uploads get stuck and rebooting the server, or manually killing the server side process associated with that file will allow them to upload. But I thought we had very different messages in the logs when that occurred in the past.
ID: 69069 · Report as offensive
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2186
Credit: 64,822,615
RAC: 5,275
Message 69070 - Posted: 1 Jul 2023, 22:36:59 UTC - in response to Message 69066.  

I made the 2 changes to cc_config and tried the upload again.

7/1/2023 3:20:29 PM | climateprediction.net | Started upload of wah2_eas25_a1hb_199711_25_994_012217357_2_r1373083460_restart.zip
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Info:  Connection 1 seems to be dead
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Info:  Closing connection 1
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Info:  schannel: shutting down SSL/TLS connection with dev.cpdn.org port 443
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Info:  schannel: ApplyControlToken failure: SEC_E_UNSUPPORTED_FUNCTION (0x80090302)
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Info:    Trying 141.223.16.156:80...
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Info:  Connected to upload7.cpdn.org (141.223.16.156) port 80 (#3)
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Sent header to server: POST /cgi-bin/file_upload_handler HTTP/1.1
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Sent header to server: Host: upload7.cpdn.org
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.22.2)
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Sent header to server: Accept: */*
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Sent header to server: Accept-Encoding: deflate, gzip
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Sent header to server: Accept-Language: en_US
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Sent header to server: Content-Length: 318
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Sent header to server: Content-Type: application/x-www-form-urlencoded
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Sent header to server:
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Info:  We are completely uploaded and fine
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: HTTP/1.1 200 OK
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: Date: Sat, 01 Jul 2023 20:33:53 GMT
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: Server: Apache/2.2.3 (CentOS)
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: Transfer-Encoding: chunked
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: Content-Type: text/plain; charset=UTF-8
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server:
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: 63
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: <data_server_reply>
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server:     <status>0</status>
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server:     <file_size>8495740</file_size>
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: </data_server_reply>
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server:
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server: 0
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Received header from server:
7/1/2023 3:20:30 PM | climateprediction.net | [http] [ID#5] Info:  Connection #3 to host upload7.cpdn.org left intact
7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Info:  Found bundle for host: 0x8cd3a0 [serially]
7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Info:  Re-using existing connection #3 with host upload7.cpdn.org
7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Sent header to server: POST /cgi-bin/file_upload_handler HTTP/1.1
7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Sent header to server: Host: upload7.cpdn.org
7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.22.2)
7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Sent header to server: Accept: */*
7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Sent header to server: Accept-Encoding: deflate, gzip
7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Sent header to server: Accept-Language: en_US
7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Sent header to server: Content-Length: 126698478
7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Sent header to server: Content-Type: application/x-www-form-urlencoded
7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Sent header to server: Expect: 100-continue
7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Sent header to server:
7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Received header from server: HTTP/1.1 100 Continue
7/1/2023 3:25:37 PM | climateprediction.net | [http] [ID#5] Info:  Operation too slow. Less than 10 bytes/sec transferred the last 300 seconds
7/1/2023 3:25:37 PM | climateprediction.net | [http] [ID#5] Info:  Closing connection 3
7/1/2023 3:25:37 PM | climateprediction.net | [http] HTTP error: Timeout was reached
7/1/2023 3:25:37 PM | climateprediction.net | Temporarily failed upload of wah2_eas25_a1hb_199711_25_994_012217357_2_r1373083460_restart.zip: transient HTTP error
7/1/2023 3:25:37 PM | climateprediction.net | Backing off 04:56:14 on upload of wah2_eas25_a1hb_199711_25_994_012217357_2_r1373083460_restart.zip
ID: 69070 · Report as offensive
kotenok2000

Send message
Joined: 22 Feb 11
Posts: 32
Credit: 226,546
RAC: 4,080
Message 69071 - Posted: 1 Jul 2023, 23:41:55 UTC - in response to Message 69070.  

Did you reload config file with"read local prefs file" in "Options" drop-down?
ID: 69071 · Report as offensive
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2186
Credit: 64,822,615
RAC: 5,275
Message 69072 - Posted: 2 Jul 2023, 0:19:46 UTC - in response to Message 69071.  

Did you reload config file with"read local prefs file" in "Options" drop-down?

Yep.
ID: 69072 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,989,107
RAC: 21,788
Message 69073 - Posted: 2 Jul 2023, 6:57:25 UTC

Just for information, I noted the other day that the restart.zip comes after the 12th zip, not at the end of the task as is more often the case.
ID: 69073 · Report as offensive
computezrmle

Send message
Joined: 9 Mar 22
Posts: 30
Credit: 1,065,239
RAC: 556
Message 69074 - Posted: 2 Jul 2023, 7:57:04 UTC

The problem is this:
7/1/2023 3:20:31 PM | climateprediction.net | [http] [ID#5] Received header from server: HTTP/1.1 100 Continue

You may check if the server is configured to add '\r\n\r\n' (a blank line) at the end of that header.
If not, the client waits for it until the timeout is over.



geophi wrote:
... the boinc executable is running in wine ...

Don't know if this modifies the network packets, e.g. removes the expected blank line.
ID: 69074 · Report as offensive
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,696,681
RAC: 10,226
Message 69075 - Posted: 2 Jul 2023, 8:53:35 UTC - in response to Message 69074.  

If not, the client waits for it until the timeout is over.
Why?

That sounds like a bug - is it a client bug or a server bug?
ID: 69075 · Report as offensive
computezrmle

Send message
Joined: 9 Mar 22
Posts: 30
Credit: 1,065,239
RAC: 556
Message 69076 - Posted: 2 Jul 2023, 9:45:14 UTC - in response to Message 69075.  

if not, the client waits for it until the timeout is over.


Why?

Basically (in short) because a blank line indicates a "transfer complete" in HTTP.


In addition, "100-continue" was added to HTTP 1.1 after the initial spec.

Some more information can be found here including a link to the relevant RFC:
https://daniel.haxx.se/blog/2020/02/27/expect-tweaks-in-curl/

That sounds like a bug - is it a client bug or a server bug?

I would start at the server to ensure it sends a blank line.
You may notice blank lines in other parts of the logs (from google but even from the CPDN server).
ID: 69076 · Report as offensive
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,696,681
RAC: 10,226
Message 69077 - Posted: 2 Jul 2023, 10:02:39 UTC - in response to Message 69076.  
Last modified: 2 Jul 2023, 10:39:30 UTC

Thanks. You learn something new every day.

Edit - this might be related to https://github.com/BOINC/boinc/issues/4572. As the reporter notes, the 'resolution' cited doesn't fix that issue - it relates to a real, but different, issue.

The common factor appears to be the attempted restart of a large upload, which has been interrupted by a network glitch.
ID: 69077 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69078 - Posted: 2 Jul 2023, 13:16:24 UTC
Last modified: 2 Jul 2023, 13:17:09 UTC

Drat, just lost a 24% done WAH. Rebooting computers seems to upset them, another bug to be fixed, but I guess less important just now.

https://www.cpdn.org/result.php?resultid=22326503

Seems someone referring to themselves as Mr Anonymous is trying it now.
ID: 69078 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,989,107
RAC: 21,788
Message 69079 - Posted: 2 Jul 2023, 14:15:26 UTC - in response to Message 69078.  

Rebooting computers seems to upset them,
I find that suspending computation, waiting two minutes before closing down BOINC and again waiting 2 minutes before rebooting reduces the percentage of failures from this cause on restarting. Also the Windows tasks seem to be less prone to it than the met office Linux tasks.
ID: 69079 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69080 - Posted: 2 Jul 2023, 14:55:14 UTC - in response to Message 69079.  
Last modified: 2 Jul 2023, 14:56:08 UTC

I find suspending computation, waiting two minutes before closing down BOINC and again waiting 2 minutes before rebooting reduces the percentage of failures from this cause on restarting. Also the Windows tasks seem to be less prone to it than the met office Linux tasks.
Yes I've only lost one of about 30 in a week. They used to be a lot more fussy.

When you say suspending computation, what about the option to leave them in memory? Would I have to turn it off so tasks have a chance to shut down before I close Boinc? I like to leave the option on, since when Boinc switches between apps, it isn't stopping the CPDN tasks completely, so they don't mind.
ID: 69080 · Report as offensive
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 99
Credit: 18,701,746
RAC: 318
Message 69081 - Posted: 2 Jul 2023, 16:01:38 UTC

These 3 WUs started at the same time and they're all at 36-38% progress. They're all running on the same Win7 i5-4690K quadcore CPU with nothing else running. I've restarted BOINC twice and the first was to upgrade to 7.22.2. Set to only allow a single file transfer. Might there be a clue in one working properrly and two not or is just random?

wah2_eas25_a21o_200211_25_994_012218090_0
https://www.cpdn.org/result.php?resultid=22321358
wah2_eas25_a21o_200211_25_994_012218090_0_r98403190_1.zip 1.136 121210.47 K 00:18:38 - 197:37:54 85.08 KBps Uploading
wah2_eas25_a21o_200211_25_994_012218090_0_r98403190_5.zip 90.622 121400.95 K 00:31:46 - 105:43:03 0.00 KBps Upload pending (Retry in: 03:05:03), retried: 62
slot 4: Transferred _9.zip today with _1.zip & _5.zip still hung after a BOINC restart.

wah2_eas25_a23h_200211_25_994_012218155_0
https://www.cpdn.org/result.php?resultid=22321423
slot 6: This WU has transferred 9 zips as of this morning with none hanging.

wah2_eas25_a342_201111_25_994_012219472_0
https://www.cpdn.org/result.php?resultid=22322764
wah2_eas25_a342_201111_25_994_012219472_0_r523039799_4.zip 0.000 120899.03 K 00:24:22 - 152:14:17 0.00 KBps Upload pending (Retry in: 02:55:04), retried: 55
slot 5: Transferred _9.zip yesterday with _4.zip still hung after a BOINC restart.

02-Jul-2023 08:54:14 [climateprediction.net] Started upload of wah2_eas25_a342_201111_25_994_012219472_0_r523039799_4.zip
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Info: Trying 141.223.16.156:80...
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Info: Connected to upload7.cpdn.org (141.223.16.156) port 80 (#31)
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: POST /cgi-bin/file_upload_handler HTTP/1.1
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Host: upload7.cpdn.org
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.22.2)
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Accept: */*
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Accept-Encoding: deflate, gzip
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Accept-Language: en_US
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Content-Length: 311
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Content-Type: application/x-www-form-urlencoded
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server:
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Info: We are completely uploaded and fine
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: HTTP/1.1 200 OK
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: Date: Sun, 02 Jul 2023 16:04:16 GMT
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: Server: Apache/2.2.3 (CentOS)
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: Transfer-Encoding: chunked
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: Content-Type: text/plain; charset=UTF-8
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server:
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: 64
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: <data_server_reply>
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: <status>0</status>
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: <file_size>87031808</file_size>
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server: </data_server_reply>
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Received header from server:
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Info: Connection #31 to host upload7.cpdn.org left intact
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Info: Found bundle for host: 0x32d7a70 [serially]
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Info: Re-using existing connection #31 with host upload7.cpdn.org
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: POST /cgi-bin/file_upload_handler HTTP/1.1
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Host: upload7.cpdn.org
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.22.2)
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Accept: */*
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Accept-Encoding: deflate, gzip
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Accept-Language: en_US
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Content-Length: 36769291
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Content-Type: application/x-www-form-urlencoded
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server: Expect: 100-continue
02-Jul-2023 08:54:15 [climateprediction.net] [http] [ID#122] Sent header to server:
02-Jul-2023 08:54:16 [climateprediction.net] [http] [ID#122] Received header from server: HTTP/1.1 100 Continue
02-Jul-2023 08:54:36 [climateprediction.net] [http] [ID#122] Info: Recv failure: Connection was reset
02-Jul-2023 08:54:36 [climateprediction.net] [http] [ID#122] Info: Closing connection 31
02-Jul-2023 08:54:36 [climateprediction.net] [http] HTTP error: Failure when receiving data from the peer
02-Jul-2023 08:54:36 [climateprediction.net] Temporarily failed upload of wah2_eas25_a342_201111_25_994_012219472_0_r523039799_4.zip: transient HTTP error
02-Jul-2023 08:54:36 [climateprediction.net] Backing off 05:49:49 on upload of wah2_eas25_a342_201111_25_994_012219472_0_r523039799_4.zip
02-Jul-2023 08:54:36 [climateprediction.net] Started upload of wah2_eas25_a21o_200211_25_994_012218090_0_r98403190_5.zip
02-Jul-2023 08:54:37 [climateprediction.net] [http] [ID#124] Info: Hostname upload7.cpdn.org was found in DNS cache
02-Jul-2023 08:54:37 [climateprediction.net] [http] [ID#124] Info: Trying 141.223.16.156:80...
02-Jul-2023 08:54:58 [climateprediction.net] [http] [ID#124] Info: connect to 141.223.16.156 port 80 failed: Timed out
02-Jul-2023 08:54:58 [climateprediction.net] [http] [ID#124] Info: Failed to connect to upload7.cpdn.org port 80 after 21303 ms: Couldn't connect to server
02-Jul-2023 08:54:58 [climateprediction.net] [http] [ID#124] Info: Closing connection 32
02-Jul-2023 08:54:58 [climateprediction.net] [http] HTTP error: Timeout was reached
02-Jul-2023 08:54:58 [climateprediction.net] Temporarily failed upload of wah2_eas25_a21o_200211_25_994_012218090_0_r98403190_5.zip: transient HTTP error
02-Jul-2023 08:54:58 [climateprediction.net] Backing off 05:18:52 on upload of wah2_eas25_a21o_200211_25_994_012218090_0_r98403190_5.zip
ID: 69081 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,989,107
RAC: 21,788
Message 69082 - Posted: 2 Jul 2023, 16:38:16 UTC - in response to Message 69081.  

I think it is just random. Something to do with a large upload being interrupted by a glitch anywhere, (server, client, network or a reboot.)
ID: 69082 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,989,107
RAC: 21,788
Message 69083 - Posted: 2 Jul 2023, 18:03:28 UTC - in response to Message 69080.  

When you say suspending computation, what about the option to leave them in memory?
- I have never turned it off so can't comment on whether doing that before shutdown has any effect.
ID: 69083 · Report as offensive
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,696,681
RAC: 10,226
Message 69084 - Posted: 2 Jul 2023, 18:03:28 UTC

That would be possible, but it feels like there is more of a pattern to the logs we've seen so far than pure randomness.

They've all failed at exactly the same point: after receiving the "HTTP/1.1 100 Continue" header from the server - the one that computezrmle thinks may be missing a following blank line.

Aurum's is slightly different: he has a second upload waiting, and it's retried immediately after the client times out the first transfer. For the second file, the client fails even to establish the initial connection - the only time we've been shown that.

Speculation: after the "HTTP/1.1 100 Continue", the client and server enter a state of deadlock, with each waiting for the other to speak first. The client is waiting for the next line of the header: the server thinks it's sent that last line, and is waiting for the data flow to start.

The client blinks first, and abandons the first transfer. But (speculatively), the server has a longer timeout, and is holding the connection open for the elusive data - it doesn't expect a new connection, so the attempt is treated as the data it's still waiting for.

That sort of problem should be detectable in the server logs, if anyone is prepared to investigate them?
ID: 69084 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69085 - Posted: 2 Jul 2023, 18:21:14 UTC - in response to Message 69083.  

When you say suspending computation, what about the option to leave them in memory?
- I have never turned it off so can't comment on whether doing that before shutdown has any effect.
Ok I won't then. It's just over at LHC it was recommended, so the task saves it's state. But that's using VirtualBox, which adds many more complications.
ID: 69085 · Report as offensive
Previous · 1 . . . 27 · 28 · 29 · 30 · 31 · 32 · 33 . . . 42 · Next

Message boards : Number crunching : New work discussion - 2

©2024 cpdn.org