climateprediction.net (CPDN) home page
Thread 'Persistent upload problems'

Thread 'Persistent upload problems'

Message boards : Number crunching : Persistent upload problems
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
alvin

Send message
Joined: 12 Mar 12
Posts: 29
Credit: 666,199
RAC: 0
Message 47392 - Posted: 22 Oct 2013, 12:53:22 UTC

the simplest in your case is to
- have blank PC with proved connectivity
- install same BOINC version
- copy full content of any failed client to that PC
- play with copied content
ID: 47392 · Report as offensive     Reply Quote
alvin

Send message
Joined: 12 Mar 12
Posts: 29
Credit: 666,199
RAC: 0
Message 47393 - Posted: 22 Oct 2013, 12:53:23 UTC
Last modified: 22 Oct 2013, 12:54:27 UTC

or remove NIC from PCI slot if you have one or plug new NIC and disable onboard NIC
ID: 47393 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 47415 - Posted: 27 Oct 2013, 20:54:04 UTC

Mark

Have you solved the problem yet?

ID: 47415 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 47416 - Posted: 28 Oct 2013, 8:38:31 UTC - in response to Message 47415.  

Mark

Have you solved the problem yet?


No
BOINC blog
ID: 47416 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 47417 - Posted: 28 Oct 2013, 8:43:39 UTC - in response to Message 47393.  

or remove NIC from PCI slot if you have one or plug new NIC and disable onboard NIC

Not likely. One machine maybe. Two machines improbable. Six (if I count the proxy server) is next to impossible. Besides that they are all happily talking to other projects.
BOINC blog
ID: 47417 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 47422 - Posted: 28 Oct 2013, 20:37:20 UTC
Last modified: 28 Oct 2013, 20:38:03 UTC

Using 56k dialup which is getting about 1.45k/sec transfer speeds. Managed to get a successful report, but look at how long it took to complete.
28/10/2013 9:22:49 PM | climateprediction.net | Sending scheduler request: To send trickle-up message.
28/10/2013 9:22:49 PM | climateprediction.net | Reporting 5 completed tasks
28/10/2013 9:22:49 PM | climateprediction.net | Not requesting tasks: "no new tasks" requested via Manager
29/10/2013 1:37:46 AM | climateprediction.net | Started upload of hadcm3n_84n8_1980_40_008463976_0_2.zip
29/10/2013 1:54:43 AM | climateprediction.net | Scheduler request completed


Also the 91% upload was on 99% and still going when I last checked. Its been going all night.
BOINC blog
ID: 47422 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 47423 - Posted: 28 Oct 2013, 22:12:54 UTC - in response to Message 47422.  

I guess as long as it clears the backlog, but it doesn't solve the problem.

One more thing to try: set your prefs for 1 processor, so that you only get one model next time, and see how that goes all along the way.

ID: 47423 · Report as offensive     Reply Quote
alvin

Send message
Joined: 12 Mar 12
Posts: 29
Credit: 666,199
RAC: 0
Message 47426 - Posted: 29 Oct 2013, 2:02:14 UTC - in response to Message 47423.  

could physical file be broken due to writing issue.
to fix you need either refresh particular files for project, either whole project or whole BOINC. you may delete project (having it copied before onto USB or other drive) and reconnect and replace files
1. run chkdsk c: /r /x
then asked to schedule check after restart confirm
2. restsrt then possible and wait till full scan finished
3. then restarted
4. copy BOINC folder to USB or other drive in full
5. do steps to refresh files or project or whole installation
6. enjoy your uploads!
ID: 47426 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 47435 - Posted: 29 Oct 2013, 9:30:15 UTC - in response to Message 47426.  

could physical file be broken due to writing issue.
to fix you need either refresh particular files for project, either whole project or whole BOINC. you may delete project (having it copied before onto USB or other drive) and reconnect and replace files
1. run chkdsk c: /r /x
then asked to schedule check after restart confirm
2. restsrt then possible and wait till full scan finished
3. then restarted
4. copy BOINC folder to USB or other drive in full
5. do steps to refresh files or project or whole installation
6. enjoy your uploads!

I don't think it's the disks in 5 different machines. Also switching to a dial up connection has improved things. I suspect something with the ISP which is why I went for a cheap dial up using a different provider.
BOINC blog
ID: 47435 · Report as offensive     Reply Quote
alvin

Send message
Joined: 12 Mar 12
Posts: 29
Credit: 666,199
RAC: 0
Message 47437 - Posted: 29 Oct 2013, 20:21:59 UTC

So it might be a general settings of something you've possibly applied to all machines?
its hard to test tor now cause no work available but another solution might be
1. brand new clean install of windows - may be virtualised too
2. get one project only
3. wait till task finished
ID: 47437 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 47438 - Posted: 29 Oct 2013, 20:35:13 UTC - in response to Message 47437.  

It's far more likely that the problem is at Mark's ISP.
A router or switch change for instance with the replacement having different (older?) firmware.

My post near the start of this thread regarding a listing made by Mark tends to support this.

ID: 47438 · Report as offensive     Reply Quote
alvin

Send message
Joined: 12 Mar 12
Posts: 29
Credit: 666,199
RAC: 0
Message 47439 - Posted: 29 Oct 2013, 21:32:35 UTC - in response to Message 47438.  

if all other projects are fine?
ID: 47439 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 47440 - Posted: 29 Oct 2013, 22:25:28 UTC - in response to Message 47439.  

Other projects don't have such large files to transfer.

ID: 47440 · Report as offensive     Reply Quote
alvin

Send message
Joined: 12 Mar 12
Posts: 29
Credit: 666,199
RAC: 0
Message 47443 - Posted: 30 Oct 2013, 1:49:18 UTC - in response to Message 47440.  

Mark's ISP is TPG which I assume one of top 5 in AU.
Mark, by chance did you contact techsupport of TPG? May be I missed that before?
ID: 47443 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 47444 - Posted: 30 Oct 2013, 9:18:58 UTC - in response to Message 47443.  

Mark's ISP is TPG which I assume one of top 5 in AU.
Mark, by chance did you contact techsupport of TPG? May be I missed that before?

No. From previous experience they can usually only handle simple things. Now that the dial up has cleared one machine I will be contacting them but not holding my breath.

I did try a firmware downgrade on my router but that made no difference.

As for other project they usually have fairly small result files so maybe the ISP has some size limit. I will include that in my email to their support people.

Still got 4 machines to clear. At 10 hours per zip file it's going to take a while. The wife used the phone a fair bit yesterday so that meant uploads failed until I got the dial up reconnected.
BOINC blog
ID: 47444 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 47447 - Posted: 30 Oct 2013, 20:20:31 UTC - in response to Message 47444.  

10 hours is not good. :(
My sympathies.

ID: 47447 · Report as offensive     Reply Quote
alvin

Send message
Joined: 12 Mar 12
Posts: 29
Credit: 666,199
RAC: 0
Message 47448 - Posted: 30 Oct 2013, 22:29:18 UTC - in response to Message 47447.  

10 hours is not good. :(
My sympathies.


oh good luck with that)
ID: 47448 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 47449 - Posted: 1 Nov 2013, 11:00:04 UTC

Two out of 5 machines cleared. I've working on the 3rd one.

These long scheduler requests seem to have 20Mb sched_request files. When using the ADSL line the fail with "HTTP internal server error" but when I use the dial up they can upload (after a few hours) and successfully report. Subsequent requests are much smaller.

I will also point the BOINC developers to this message thread seeing as they are planning on releasing the new client on Monday. I have copied the current one that's trying to go through as I type this.
BOINC blog
ID: 47449 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,703,308
RAC: 9,860
Message 47450 - Posted: 1 Nov 2013, 13:03:23 UTC - in response to Message 47449.  

Remember that the scheduler request file is the mechanism by which trickle data is transferred to the server. (Upload files are separate, and different).

If your scheduler contacts have been failing too, then the trickles won't have been acknowledged, and (I suspect) will be piling up to be sent again and again - that would account for the huge size. I haven't looked inside a sched_request file yet (will do, but the machine is downstairs), but I'm told that the presence, and contents, of what would be a trickle is pretty clear.

IIRC, you can 'suspend networking' to keep the uploads off the line and out of the way. Then, a manual 'project update' should send the sched_request (by itself), which would be quicker. Might even get through via broadband.

If you can get the trickles through and ack'd (what does the server record of your tasks in progress say about trickles received - are they up to date?), then it makes sense that the 'trickle pending' on the machine is purged and the file returns to a sensible size the next time.

Once that rubbish is off the line, you can go back to thinking (separately) about the upload file problem. Coku's discovery of a 50MB transfer limit inside a proxy somewhere - either yours, or at your ISP - sounds like a very plausible smoking gun.

I hadn't remembered you mentioning an "HTTP internal server error" before (I need to check back through the full thread), but that seems unlikely to be a bug - CPDN servers have been accepting trickles for years. It might be a timeout (even a bad Apache timing configuration on the new server, perhaps), or caused by the line delays from the failing uploads competing for time on the same link.

I think it's highly unlikely that any of this is caused by a bug in the new client - we'll have to have some much clearer evidence of 'necessary and sufficient causality' (as I was posting last night) before pointing the bone in that direction.
ID: 47450 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,703,308
RAC: 9,860
Message 47451 - Posted: 1 Nov 2013, 13:25:14 UTC

Afterthought: You are remembering that the 'A' in 'ADSL2+' stands for 'asymmetric', aren't you?. Uploads, especially on a home-grade connection, are far slower than downloads: ISPs optimise home lines on the assumption that you're sitting there watching films - downloading - not sending anything back.

Ten computers at home - most of them big powerful i7-class, many with GPUs - will be creating a lot of uploads, all competing for bandwidth. I've forgotten whether you also run GPUGrid, but their uploads can be pretty huge, too.

Can you get to the internal connection diagnostics for your router? To find out what speed you connection is really running at, rather than what the ISP's salesmen are charging you for?

Mine says:
Connection Information
Downstream: 5.461 Mbps
Upstream: 1005 Kbps

and other useful stuff like

Noise margin (Down/Up): 2.8 dB / 6.3 dB
Line attenuation (Down/Up): 52.2 dB / 32.4 dB
Output power (Down/Up): 19.5 dBm / 12.6 dBm
FEC Events (Down/Up): 6534626 / 5576
CRC Events (Down/Up): 2195 / 575

- though you have to dig pretty deep to get down to that sort of technical level.
ID: 47451 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Persistent upload problems

©2024 cpdn.org