Message boards : Number crunching : Persistent upload problems
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
This thread is for users MarkJ & Coku. Only solutions for these two people should be posted here. Everyone else should post about their problems in another existing thread, or create their own new thread. edit For reference, the earlier part of this problem is in Uploading issues - RAPIT tasks |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I've started a new thread, as the conversation was getting cluttered in others. So, more careful reading of listings and vague memories, shows up something: Proxies and Squid. First, I'd like to ask for some more info from both of you: 1) Are your affected computers at home, or in an "office" environment? 2) Does the connection between the computer and "the phone line" involve a router? 3) Do you normally use a proxy server? 4) Both of you are in Australia, which is suspicious, so: What is the name of your ISP? No doubt more questions to come over an extended period. I don't think very fast these days. For the record, my ISP is BigPond, the 2 models on one of my computers finished yesterday and everything uploaded OK. The 4 on my other computer finish in about 12 hours. |
Send message Joined: 12 Mar 12 Posts: 29 Credit: 666,199 RAC: 0 |
I've started a new thread, as the conversation was getting cluttered in others. 1. office. I'll make a shot from different location then possible 2. yes routers but no major changes for a while. also other 3-5 projects are fine via same proxy and routers 3. yes proxy but no major changes for a whlie 4. iiNET ISP which been recently Netspace ISP purchased by iiNET Also upload stalls on 100% upload, which is fully scores all data and stalls with 100% like 51.2M/51.2M some of computers recently reported strabge though like 100% 51.3/51.2 or 51.1/51.2 which I've considered suspicios but hittinh update button made 51.2/51.2 balance restored |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Ah, I think we have a clue: Office - router - proxy And in one of Mark's posts: Received header from server: HTTP/1.0 502 Bad Gateway These 2 are typical of caching at an ISP, leading to problems when the transfer gets interrupted. (And the cache doesn't get flushed?) As for other project transfers working, there's this on another site unrelated to BOINC stuff: ... viewing websites or receiving email works, but sending larger emails, submitting data to larger web forms, including via SSL, and uploading files, will timeout. But it could also be the router. Has it been powered down and then turned back on? If you want to read about the above quote and suggested cure, it's here: unable to upload files using POST (The last 5 posts.) |
Send message Joined: 28 Mar 09 Posts: 126 Credit: 9,825,980 RAC: 0 |
The squid message was mine, which was why I asterisked out the IP address. I've used it for some years now. No updates available for windows so it hasn't changed for quite some time. I did try bypassing the proxy but they still fail part way through the upload. I also tried using a uk-based proxy instead of mine. My ISP is TPG. Computers are all connected to a home router which is using ADSL2+.The router hasn't changed in the last 6 months. I will check the router settings when I get home. I haven't looked at them in a while. I will also power cycle it while at it. BOINC blog |
Send message Joined: 28 Mar 09 Posts: 126 Credit: 9,825,980 RAC: 0 |
I've checked the router. It's using PPPoE with an MTU size of 1492. BOINC blog |
Send message Joined: 12 Mar 12 Posts: 29 Credit: 666,199 RAC: 0 |
I've checked the router. It's using PPPoE with an MTU size of 1492. No mate, MTU is just frame of IP packet I recall I may have some limitations for uploads and mine misunderstood was based on previous successfull uploads. They might be really smaller than mine upload limit. Thanks for clue again |
Send message Joined: 12 Mar 12 Posts: 29 Credit: 666,199 RAC: 0 |
My ISP is TPG. why on Earth you use proxy for home connection man? |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
@MarkJ: When BOINC uploads a file the data is read into a cache which is refreshed as it starts to run down, with a default timeout of 5 minutes on the refresh. From the messages you posted in this thread it seems likely that your uploads are hitting the transfer inactivity limit. Early last year I ran some tests which proved that if 2 uploads were running simultaneously for 5 minutes at least one of them would be timed out (i.e. BOINC was detecting the transfer as inactive even though it was still in progress). After an upload timed out it always restarted from 0 on the next retry. By default BOINC allows 8 simultaneous uploads with no more than 2 per project. BOINC 6.12.27 added an option to increase the inactivity timeout. The change is documented as follows in BOINC Client Configuration: <http_transfer_timeout>seconds</http_transfer_timeout> abort HTTP transfers if idle for this many seconds; default 300 + New in 6.12.27 If you're running an earlier version you are stuck with a 5 minute timeout and your only option would be to reduce the number of simultaneous uploads allowed. I run with the following in my cc_config.xml file (no need to explain where it is and how it's used as you're obviously well aware of that already): <cc_config> <options> <http_transfer_timeout>1800</http_transfer_timeout> <max_file_xfers>3</max_file_xfers> <max_file_xfers_per_project>1</max_file_xfers_per_project> </options> </cc_config> This increases the inactivity timeout to 30 minutes and allows no more than 3 projects to perform a single upload at the same time. The same restriction also applies to downloads, but the transfer directions are counted separately (i.e. there can be 3 of each running simultaneously). @Coku: Could you post an HTTP debug trace like the one MarkJ posted here? To do that you'll need a cc_config.xml in your BOINC data directory containing the following: <cc_config> <log_flags> <http_debug>1</http_debug> <http_xfer_debug>1</http_xfer_debug> </log_flags> </cc_config> Edit: I forgot to mention that the file must be created using a plain text editor (e.g. notepad). The BOINC client won't be able to read the file if you use a formatted text program (e.g. wordpad). Having created the file you can start using it either by restarting BOINC or by clicking Advanced - Read config file on BOINC Manager's advanced view. When a transfer fails post the debug messages from BOINC's event log. To disable HTTP debug you'll have to change the debug flag values in cc_config.xml to 0 and re-read the file. "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,703,308 RAC: 9,860 |
It may be a side-thought, but mention of transfer timeouts brought to mind a discussion we had at SETI about six months ago. http://www.ietf.org/rfc/rfc1323.txt - "TCP Extensions for High Performance" (about 20 years old). Look, in particular, at the sections on recovery from packet loss over a congested LFN (long, fat, pipe). It turned out that most *nix (hence Mac OS X and Linux) installations had the RFC1323 extensions enabled - they, and proxy servers running Linux, had very few problems with stalled file transfers (mainly downloads, in SETI's case): but Windows machines were using TCP/IP very inefficiently over that class of link. That's acknowledged in the Microsoft Technet article on Tcp1323Opts I don't know if the inbound data line to rapid-watch.badc.rl.ac.uk counts as a 'congested LFN', but it seems plausible. It's relatively easy to test: try setting [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\Tcpip\Parameters] "Tcp1323Opts"=dword:00000003 and rebooting before retrying the upload. That simply enables timestamps and window scaling, as described in the Technet article. The full SETI discussion (and people's experiences with this suggestion) can be found at Windows TCP Settings - Follow up - Help with server communication. |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
... but Windows machines were using TCP/IP very inefficiently over that class of link. That's acknowledged in the Microsoft Technet article on Tcp1323Opts Hmmm very interesting. I wonder if that explains why my 24Mb/s link at home is so painfully slow at peak times. I will have to experiment this evening to see if it helps with local congestion. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 28 Mar 09 Posts: 126 Credit: 9,825,980 RAC: 0 |
My proxy server has the registry key (done some months back for SETI). The local crunchers don't. I power cycled the router after checking its config. Tried uploading again on one machine. First two wu failed straight away third one got going and went up to its usual 91% before failure. No proxy being used. Only transfer in progress so no competition on my end at least. I can try the registry hack tonight Sydney time. BOINC blog |
Send message Joined: 12 Mar 12 Posts: 29 Credit: 666,199 RAC: 0 |
@MarkJ: apart of your great post proposals mine issue was pretty simple: upload limit was 50Mb and seems previous ones been smaller and newer are all 51-52 Mb so it's easily fixed by increasing limit. Thanks everyone |
Send message Joined: 28 Mar 09 Posts: 126 Credit: 9,825,980 RAC: 0 |
@MarkJ: How did you increase the limit? BOINC blog |
Send message Joined: 28 Mar 09 Posts: 126 Credit: 9,825,980 RAC: 0 |
why on Earth you use proxy for home connection man? With 10 machines it saves on downloads. Some of the Einstein WU have "data packs" that are shared and these cache quite nicely. And then there are windows updates etc. Also it improves security. BOINC blog |
Send message Joined: 12 Mar 12 Posts: 29 Credit: 666,199 RAC: 0 |
@MarkJ: depenfs on product you use. mine proxies are both Microsoft and Linux, so Linux had low limit. |
Send message Joined: 28 Mar 09 Posts: 126 Credit: 9,825,980 RAC: 0 |
I can try the registry hack tonight Sydney time. Registry entry added on one machine. Rebooted. Not using proxy. First two WU upload failed (at 31% and 11%) third one still going. Lets see if it can pass 91%... Nope stopped at exactly 91%. Details from client_state: <file> <name>hadcm3n_84n8_1980_40_008463976_0_2.zip</name> <nbytes>54649410.000000</nbytes> <max_nbytes>188743680.000000</max_nbytes> <md5_cksum>9aa8b1f7badaf60f6401cecc4ad9474b</md5_cksum> <status>1</status> <upload_url>http://rapid-watch.badc.rl.ac.uk/cpdn_cgi/file_upload_handler</upload_url> <persistent_file_xfer> <num_retries>63</num_retries> <first_request_time>1380426453.439072</first_request_time> <next_request_time>1382453423.702038</next_request_time> <time_so_far>32440.251300</time_so_far> <last_bytes_xferred>17252352.000000</last_bytes_xferred> <is_upload>1</is_upload> </persistent_file_xfer> </file> <file> <name>hadcm3n_84n8_1980_40_008463976_0_3.zip</name> <nbytes>54634690.000000</nbytes> <max_nbytes>188743680.000000</max_nbytes> <md5_cksum>678d14166cb6300e2a26b21403d7890a</md5_cksum> <status>1</status> <upload_url>http://rapid-watch.badc.rl.ac.uk/cpdn_cgi/file_upload_handler</upload_url> <persistent_file_xfer> <num_retries>51</num_retries> <first_request_time>1380734698.957724</first_request_time> <next_request_time>1382459776.808936</next_request_time> <time_so_far>15647.101232</time_so_far> <last_bytes_xferred>6471680.000000</last_bytes_xferred> <is_upload>1</is_upload> </persistent_file_xfer> </file> <file> <name>hadcm3n_84ne_1980_40_008463982_0_2.zip</name> <nbytes>54565344.000000</nbytes> <max_nbytes>188743680.000000</max_nbytes> <md5_cksum>bb9c245a642c60be1c53544d2117d32d</md5_cksum> <status>1</status> <upload_url>http://rapid-watch.badc.rl.ac.uk/cpdn_cgi/file_upload_handler</upload_url> <persistent_file_xfer> <num_retries>18</num_retries> <first_request_time>1381560801.987560</first_request_time> <next_request_time>1382455180.958515</next_request_time> <time_so_far>20557.432532</time_so_far> <last_bytes_xferred>49659904.000000</last_bytes_xferred> <is_upload>1</is_upload> </persistent_file_xfer> </file> BOINC blog |
Send message Joined: 12 Mar 12 Posts: 29 Credit: 666,199 RAC: 0 |
I can try the registry hack tonight Sydney time. you may reset tcp/ip via netsh command also Microsoft had special ip-protocol fix-it utility in case of buffer overload etc If issue sits on router side, please switch it off for some long time >10 min at least |
Send message Joined: 12 Mar 12 Posts: 29 Credit: 666,199 RAC: 0 |
Could you post an HTTP debug trace like the one MarkJ posted here? To do that you'll need a cc_config.xml in your BOINC data directory containing the following: <cc_config> <log_flags> <http_debug>1</http_debug> <http_xfer_debug>1</http_xfer_debug> </log_flags> </cc_config> |
Send message Joined: 28 Mar 09 Posts: 126 Credit: 9,825,980 RAC: 0 |
you may reset tcp/ip via netsh command Reset via netsh didn't help. Running the fixit doesn't seem to have made any difference either. BOINC blog |
©2024 cpdn.org