climateprediction.net (CPDN) home page
Thread 'Persistent upload problems'

Thread 'Persistent upload problems'

Message boards : Number crunching : Persistent upload problems
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 47364 - Posted: 20 Oct 2013, 22:10:18 UTC
Last modified: 22 Nov 2013, 21:56:33 UTC

This thread is for users MarkJ & Coku.

Only solutions for these two people should be posted here.
Everyone else should post about their problems in another existing thread, or create their own new thread.


edit
For reference, the earlier part of this problem is in Uploading issues - RAPIT tasks
ID: 47364 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 47365 - Posted: 20 Oct 2013, 22:26:36 UTC

I've started a new thread, as the conversation was getting cluttered in others.

So, more careful reading of listings and vague memories, shows up something: Proxies and Squid.

First, I'd like to ask for some more info from both of you:

1) Are your affected computers at home, or in an "office" environment?
2) Does the connection between the computer and "the phone line" involve a router?
3) Do you normally use a proxy server?
4) Both of you are in Australia, which is suspicious, so: What is the name of your ISP?

No doubt more questions to come over an extended period. I don't think very fast these days.

For the record, my ISP is BigPond, the 2 models on one of my computers finished yesterday and everything uploaded OK.
The 4 on my other computer finish in about 12 hours.


ID: 47365 · Report as offensive     Reply Quote
alvin

Send message
Joined: 12 Mar 12
Posts: 29
Credit: 666,199
RAC: 0
Message 47367 - Posted: 21 Oct 2013, 3:39:25 UTC - in response to Message 47365.  
Last modified: 21 Oct 2013, 3:42:50 UTC

I've started a new thread, as the conversation was getting cluttered in others.

So, more careful reading of listings and vague memories, shows up something: Proxies and Squid.

First, I'd like to ask for some more info from both of you:

1) Are your affected computers at home, or in an "office" environment?
2) Does the connection between the computer and "the phone line" involve a router?
3) Do you normally use a proxy server?
4) Both of you are in Australia, which is suspicious, so: What is the name of your ISP?

No doubt more questions to come over an extended period. I don't think very fast these days.

For the record, my ISP is BigPond, the 2 models on one of my computers finished yesterday and everything uploaded OK.
The 4 on my other computer finish in about 12 hours.



1. office. I'll make a shot from different location then possible
2. yes routers but no major changes for a while. also other 3-5 projects are fine via same proxy and routers
3. yes proxy but no major changes for a whlie
4. iiNET ISP which been recently Netspace ISP purchased by iiNET

Also upload stalls on 100% upload, which is fully scores all data and stalls with 100% like 51.2M/51.2M
some of computers recently reported strabge though like 100% 51.3/51.2 or 51.1/51.2 which I've considered suspicios but hittinh update button made 51.2/51.2 balance restored
ID: 47367 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 47368 - Posted: 21 Oct 2013, 4:57:11 UTC - in response to Message 47367.  

Ah, I think we have a clue:

Office - router - proxy

And in one of Mark's posts:
Received header from server: HTTP/1.0 502 Bad Gateway

and

Received header from server: Via: 1.0 ****:3128 (squid/2.7.STABLE8)


These 2 are typical of caching at an ISP, leading to problems when the transfer gets interrupted. (And the cache doesn't get flushed?)

As for other project transfers working, there's this on another site unrelated to BOINC stuff:
... viewing websites or receiving email works, but sending larger emails, submitting data to larger web forms, including via SSL, and uploading files, will timeout.


But it could also be the router. Has it been powered down and then turned back on?

If you want to read about the above quote and suggested cure, it's here: unable to upload files using POST
(The last 5 posts.)


ID: 47368 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 47369 - Posted: 21 Oct 2013, 8:31:16 UTC
Last modified: 21 Oct 2013, 8:39:29 UTC

The squid message was mine, which was why I asterisked out the IP address. I've used it for some years now. No updates available for windows so it hasn't changed for quite some time.

I did try bypassing the proxy but they still fail part way through the upload. I also tried using a uk-based proxy instead of mine.

My ISP is TPG.

Computers are all connected to a home router which is using ADSL2+.The router hasn't changed in the last 6 months.

I will check the router settings when I get home. I haven't looked at them in a while. I will also power cycle it while at it.
BOINC blog
ID: 47369 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 47370 - Posted: 21 Oct 2013, 11:07:03 UTC

I've checked the router. It's using PPPoE with an MTU size of 1492.
BOINC blog
ID: 47370 · Report as offensive     Reply Quote
alvin

Send message
Joined: 12 Mar 12
Posts: 29
Credit: 666,199
RAC: 0
Message 47371 - Posted: 21 Oct 2013, 11:52:22 UTC - in response to Message 47370.  

I've checked the router. It's using PPPoE with an MTU size of 1492.

No mate, MTU is just frame of IP packet
I recall I may have some limitations for uploads and mine misunderstood was based on previous successfull uploads. They might be really smaller than mine upload limit.
Thanks for clue again
ID: 47371 · Report as offensive     Reply Quote
alvin

Send message
Joined: 12 Mar 12
Posts: 29
Credit: 666,199
RAC: 0
Message 47372 - Posted: 21 Oct 2013, 11:54:44 UTC - in response to Message 47369.  
Last modified: 21 Oct 2013, 11:55:02 UTC

My ISP is TPG.

Computers are all connected to a home router which is using ADSL2+.The router hasn't changed in the last 6 months.


why on Earth you use proxy for home connection man?
ID: 47372 · Report as offensive     Reply Quote
ProfileThyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 47373 - Posted: 21 Oct 2013, 14:02:33 UTC
Last modified: 21 Oct 2013, 16:33:58 UTC

@MarkJ:

When BOINC uploads a file the data is read into a cache which is refreshed as it starts to run down, with a default timeout of 5 minutes on the refresh.

From the messages you posted in this thread it seems likely that your uploads are hitting the transfer inactivity limit.

Early last year I ran some tests which proved that if 2 uploads were running simultaneously for 5 minutes at least one of them would be timed out (i.e. BOINC was detecting the transfer as inactive even though it was still in progress). After an upload timed out it always restarted from 0 on the next retry. By default BOINC allows 8 simultaneous uploads with no more than 2 per project.

BOINC 6.12.27 added an option to increase the inactivity timeout. The change is documented as follows in BOINC Client Configuration:
<http_transfer_timeout>seconds</http_transfer_timeout> abort HTTP transfers if idle for this many seconds; default 300 + New in 6.12.27

If you're running an earlier version you are stuck with a 5 minute timeout and your only option would be to reduce the number of simultaneous uploads allowed.

I run with the following in my cc_config.xml file (no need to explain where it is and how it's used as you're obviously well aware of that already):

<cc_config>
    <options>
        <http_transfer_timeout>1800</http_transfer_timeout>
        <max_file_xfers>3</max_file_xfers>
        <max_file_xfers_per_project>1</max_file_xfers_per_project>
    </options>
</cc_config>

This increases the inactivity timeout to 30 minutes and allows no more than 3 projects to perform a single upload at the same time. The same restriction also applies to downloads, but the transfer directions are counted separately (i.e. there can be 3 of each running simultaneously).


@Coku:

Could you post an HTTP debug trace like the one MarkJ posted here?

To do that you'll need a cc_config.xml in your BOINC data directory containing the following:

<cc_config>
    <log_flags>
        <http_debug>1</http_debug>
        <http_xfer_debug>1</http_xfer_debug>
    </log_flags>
</cc_config>

Edit: I forgot to mention that the file must be created using a plain text editor (e.g. notepad). The BOINC client won't be able to read the file if you use a formatted text program (e.g. wordpad).

Having created the file you can start using it either by restarting BOINC or by clicking Advanced - Read config file on BOINC Manager's advanced view.

When a transfer fails post the debug messages from BOINC's event log. To disable HTTP debug you'll have to change the debug flag values in cc_config.xml to 0 and re-read the file.
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 47373 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,743,089
RAC: 6,177
Message 47374 - Posted: 21 Oct 2013, 16:17:00 UTC

It may be a side-thought, but mention of transfer timeouts brought to mind a discussion we had at SETI about six months ago.

http://www.ietf.org/rfc/rfc1323.txt - "TCP Extensions for High Performance" (about 20 years old). Look, in particular, at the sections on recovery from packet loss over a congested LFN (long, fat, pipe).

It turned out that most *nix (hence Mac OS X and Linux) installations had the RFC1323 extensions enabled - they, and proxy servers running Linux, had very few problems with stalled file transfers (mainly downloads, in SETI's case): but Windows machines were using TCP/IP very inefficiently over that class of link. That's acknowledged in the Microsoft Technet article on Tcp1323Opts

I don't know if the inbound data line to rapid-watch.badc.rl.ac.uk counts as a 'congested LFN', but it seems plausible. It's relatively easy to test: try setting

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\Tcpip\Parameters]
"Tcp1323Opts"=dword:00000003

and rebooting before retrying the upload. That simply enables timestamps and window scaling, as described in the Technet article.

The full SETI discussion (and people's experiences with this suggestion) can be found at Windows TCP Settings - Follow up - Help with server communication.
ID: 47374 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 47375 - Posted: 21 Oct 2013, 17:38:09 UTC - in response to Message 47374.  

... but Windows machines were using TCP/IP very inefficiently over that class of link. That's acknowledged in the Microsoft Technet article on Tcp1323Opts
...


Hmmm very interesting. I wonder if that explains why my 24Mb/s link at home is so painfully slow at peak times. I will have to experiment this evening to see if it helps with local congestion.


I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 47375 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 47378 - Posted: 21 Oct 2013, 21:19:27 UTC
Last modified: 21 Oct 2013, 21:22:54 UTC

My proxy server has the registry key (done some months back for SETI). The local crunchers don't.

I power cycled the router after checking its config.

Tried uploading again on one machine. First two wu failed straight away third one got going and went up to its usual 91% before failure. No proxy being used. Only transfer in progress so no competition on my end at least.

I can try the registry hack tonight Sydney time.
BOINC blog
ID: 47378 · Report as offensive     Reply Quote
alvin

Send message
Joined: 12 Mar 12
Posts: 29
Credit: 666,199
RAC: 0
Message 47382 - Posted: 22 Oct 2013, 9:53:29 UTC - in response to Message 47373.  

@MarkJ:


@Coku:


apart of your great post proposals mine issue was pretty simple: upload limit was 50Mb and seems previous ones been smaller and newer are all 51-52 Mb so it's easily fixed by increasing limit.
Thanks everyone
ID: 47382 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 47383 - Posted: 22 Oct 2013, 10:36:01 UTC - in response to Message 47382.  

@MarkJ:


@Coku:


apart of your great post proposals mine issue was pretty simple: upload limit was 50Mb and seems previous ones been smaller and newer are all 51-52 Mb so it's easily fixed by increasing limit.
Thanks everyone


How did you increase the limit?
BOINC blog
ID: 47383 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 47384 - Posted: 22 Oct 2013, 10:40:59 UTC - in response to Message 47372.  
Last modified: 22 Oct 2013, 10:42:54 UTC

why on Earth you use proxy for home connection man?

With 10 machines it saves on downloads. Some of the Einstein WU have "data packs" that are shared and these cache quite nicely. And then there are windows updates etc. Also it improves security.
BOINC blog
ID: 47384 · Report as offensive     Reply Quote
alvin

Send message
Joined: 12 Mar 12
Posts: 29
Credit: 666,199
RAC: 0
Message 47385 - Posted: 22 Oct 2013, 10:51:56 UTC - in response to Message 47383.  

@MarkJ:


@Coku:


apart of your great post proposals mine issue was pretty simple: upload limit was 50Mb and seems previous ones been smaller and newer are all 51-52 Mb so it's easily fixed by increasing limit.
Thanks everyone


How did you increase the limit?

depenfs on product you use.
mine proxies are both Microsoft and Linux, so Linux had low limit.
ID: 47385 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 47386 - Posted: 22 Oct 2013, 11:19:02 UTC - in response to Message 47378.  
Last modified: 22 Oct 2013, 11:31:39 UTC

I can try the registry hack tonight Sydney time.

Registry entry added on one machine. Rebooted. Not using proxy. First two WU upload failed (at 31% and 11%) third one still going. Lets see if it can pass 91%...

Nope stopped at exactly 91%. Details from client_state:

<file>
<name>hadcm3n_84n8_1980_40_008463976_0_2.zip</name>
<nbytes>54649410.000000</nbytes>
<max_nbytes>188743680.000000</max_nbytes>
<md5_cksum>9aa8b1f7badaf60f6401cecc4ad9474b</md5_cksum>
<status>1</status>
<upload_url>http://rapid-watch.badc.rl.ac.uk/cpdn_cgi/file_upload_handler</upload_url>
<persistent_file_xfer>
<num_retries>63</num_retries>
<first_request_time>1380426453.439072</first_request_time>
<next_request_time>1382453423.702038</next_request_time>
<time_so_far>32440.251300</time_so_far>
<last_bytes_xferred>17252352.000000</last_bytes_xferred>
<is_upload>1</is_upload>
</persistent_file_xfer>
</file>
<file>
<name>hadcm3n_84n8_1980_40_008463976_0_3.zip</name>
<nbytes>54634690.000000</nbytes>
<max_nbytes>188743680.000000</max_nbytes>
<md5_cksum>678d14166cb6300e2a26b21403d7890a</md5_cksum>
<status>1</status>
<upload_url>http://rapid-watch.badc.rl.ac.uk/cpdn_cgi/file_upload_handler</upload_url>
<persistent_file_xfer>
<num_retries>51</num_retries>
<first_request_time>1380734698.957724</first_request_time>
<next_request_time>1382459776.808936</next_request_time>
<time_so_far>15647.101232</time_so_far>
<last_bytes_xferred>6471680.000000</last_bytes_xferred>
<is_upload>1</is_upload>
</persistent_file_xfer>
</file>
<file>
<name>hadcm3n_84ne_1980_40_008463982_0_2.zip</name>
<nbytes>54565344.000000</nbytes>
<max_nbytes>188743680.000000</max_nbytes>
<md5_cksum>bb9c245a642c60be1c53544d2117d32d</md5_cksum>
<status>1</status>
<upload_url>http://rapid-watch.badc.rl.ac.uk/cpdn_cgi/file_upload_handler</upload_url>
<persistent_file_xfer>
<num_retries>18</num_retries>
<first_request_time>1381560801.987560</first_request_time>
<next_request_time>1382455180.958515</next_request_time>
<time_so_far>20557.432532</time_so_far>
<last_bytes_xferred>49659904.000000</last_bytes_xferred>
<is_upload>1</is_upload>
</persistent_file_xfer>
</file>
BOINC blog
ID: 47386 · Report as offensive     Reply Quote
alvin

Send message
Joined: 12 Mar 12
Posts: 29
Credit: 666,199
RAC: 0
Message 47387 - Posted: 22 Oct 2013, 11:26:16 UTC - in response to Message 47386.  

I can try the registry hack tonight Sydney time.

Registry entry added on one machine. Rebooted. Not using proxy. First two WU upload failed (at 31% and 11%) third one still going. Lets see if it can pass 91%...

Nope stopped at exactly 91%


you may reset tcp/ip via netsh command
also Microsoft had special ip-protocol fix-it utility in case of buffer overload etc

If issue sits on router side, please switch it off for some long time >10 min at least
ID: 47387 · Report as offensive     Reply Quote
alvin

Send message
Joined: 12 Mar 12
Posts: 29
Credit: 666,199
RAC: 0
Message 47389 - Posted: 22 Oct 2013, 11:58:40 UTC

Could you post an HTTP debug trace like the one MarkJ posted here?

To do that you'll need a cc_config.xml in your BOINC data directory containing the following:


<cc_config>
<log_flags>
<http_debug>1</http_debug>
<http_xfer_debug>1</http_xfer_debug>
</log_flags>
</cc_config>
ID: 47389 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 47391 - Posted: 22 Oct 2013, 12:46:39 UTC - in response to Message 47387.  

you may reset tcp/ip via netsh command
also Microsoft had special ip-protocol fix-it utility in case of buffer overload etc

If issue sits on router side, please switch it off for some long time >10 min at least

Reset via netsh didn't help.

Running the fixit doesn't seem to have made any difference either.
BOINC blog
ID: 47391 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Persistent upload problems

©2024 cpdn.org