Message boards : Number crunching : incredibly annoyingly slow uploads - like dial-up speed.
Message board moderation
Author | Message |
---|---|
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Last few days upload rates about 6-8 KB/sec(near dial-up speed, nowhere near broadband) Problem not on this user's end. one machine here has been trying 3 days and only uploaded 3 60MB files in 3 days. Meanwhile, models created so many more that there's 25-30 upload files waiting here (30-90MB each, and growing) -- Saw some mention here about problem, not sure what? Another underfunded infrastructure failure? Or maybe just me? |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Or maybe just me? Don't know if it is just you but this morning I had four uploads go though at over 100KB/s which is about as fast as I get to anywhere. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Just thought, Erik, my results will be completely irrelevant if they are a different model type and going to a different server. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Now, the problem is definitely mostly on my end. Don't know yet what triggered it, but at least one of my machines got into a situation where uploads were timing out partway through, and kept retrying and rarely completing the uploads. This slowed the uploads from my other 6 hosts to the point where most of them had uploads waiting, and the original problem machine got to having 26 files trying to upload, two at a time. Ugh. Unusually, all this attempted traffic (maybe 5-7 uploads trying at once) didn't slow other traffic noticeably. I did mess with some QOS setting on the router a couple weeks ago. Anyhow, I'll limit the uploads until the queues here clear, and then report back. Most of the tasks uploading zip files were wah2_eu25<xxx> going to <upload_url>http://upload3.cpdn.org/cpdn_cgi/file_upload_handler</upload_url> Apologies for false alarm. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Seems to confirm it - mine were eu25's also. Apologies for false alarm. No need for apologies - If it had been CPDN end it would have meant a quick check out and hopefully a quicker fix. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
When I have slow uploads, I shut down the net on all but one machine and let that one slowly clear. Then repeat with the others. It's fiddly, but more reliable. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
When I have slow uploads, I shut down the net on all but one machine and let that one slowly clear. Then repeat with the others. Gotcha - doing that. I'll figure out what went wrong later "one upload at a time" Thx <edit> looking more closely, some 90Mb files took 4hours or more and uploaded OK, at 9KB/sec. On another host all uploads failed and retried, again and again. Need to manage my tiny upload pipe, it seems <edit> |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Fortunately, a (HadCM3n task) #3 .zip went up ~1/3 usual DSL speed, not the roll-a-peanut-uphill-on-hands-and-knees-with-one's-nose 'speed' I too-often see. This sucker was 155.43MB! (We knew upload size would balloon when tasks were chopped into pieces but ... ) EDIT: for typo. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Well, they did it!
Be assured, I intend to throw some tacks on staff's chairs. [EDIT] Times are Pacific Standard time (Z-8). --- I hope no one has slow uploads with beasts of this size. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
That would be the "stash file", i.e. the number of different items that the researcher wants. Probably number-of-items x individual-file-sizes. I foretold this here. In this case, it looks like someone forgot to increase the size limit. Probably doesn't even know about this aspect of it. But it should have been picked up somewhere. Or perhaps not, thinking a bit further while typing. During in-house testing, there may not be a lot of data to fill the file, so it couldn't be foreseen. And it depends on where the data (zips) from the alpha models are "aimed". (Perhaps to "null" ?) It seems that this "cutting edge" modelling has developed it's own built in "cutting edge" programming problems. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Hi Jim, Looking at client_state.xml for 4 currently running hadcm3n's on a Linux PC, the max_nbytes for all 4 decadal uploads for all 4 tasks is 150,000,000 bytes. So far the uploaded 1st and 2nd decadal zips exceeded the max_nbytes (something over 160 MB) and didn't list any error in the message log. The transfer was a "success" according to boinc. This was with boinc 7.2.42 in Linux. I'm not sure why the final decade upload exceeding max_nbytes would give an error if the others did not. I hadn't been paying attention to file size uploads since it isn't normally listed in the boinc manager message log. Wow, those are big! |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Hi, George, My experience was same as yours. No error message on first three oversize .zip as seen with #4. All four were uploaded for each completed task, all apparently truncated at 150MB. (Tasks were, after #4 upload, marked as "Error.") If memory serves, when we last experienced a too-small max_nbytes value, the current value was chosen because it was so ridiculously large that it would never be exceeded. If I might borrow from Robert Burns (and be forgiven a US paraphrase): The best laid plans of mice and men often go awry ... I hope the responsible scientists weigh-in and tell us whether .zip files actually were truncated on upload. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
I�m presently running 2 of the tasks and am wondering if they zips are being truncated are they still usable by the Scientists and is it worth finishing them? Each is going to take about 21 days of comp time to finish |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
We don't know the answer to that yet, JIM. Stay tuned... "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
If I recall correctly, we can manually edit client_state.xml and change the max_nbytes for those file uploads to something larger, and the rest of the files should upload correctly. Of course this has to be done when boinc has been shut down. Perhaps Richard or Ian can chime in on that as they are more boinc knowledgeable and have better memories than I do. Problem is...I have no trust in hadcm3 models that are stopped and restarted that they won't error out just because they are so sensitive. I've lost too many in the distant past when cleanly shutting down boinc and restarting. |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Hi, again, George, I'm aware of the capability of editing that parameter and that we did so 'back when.' I considered it this time and chose not to do so -- because I don't like what I perceive as as a "slap-dash" approach to the project, from above, in recent years. Some on staff are trying to do the right thing and get things organized. They are, unfortunately, not holding the reins guiding this project ... Re. HadCM3 Models -- my machines don't suffer the "shutdown and die on restart" we too-often see reported... I remember shutting those beauties down every two or three days, for months, to make backups (in the days of much slower machines) -- when they ran for 160 years (or 200 years in the case of "Spinup" project). I have not the slightest clue as to the difference then and now, or the difference in your experience and mine, George -- wish I did. I'm aware my comments here are the sort of thing better placed on the mail-list. On the other hand, it is said that sunlight is the best disinfectant. I fear I've steered this Thread on an oblique course from Eirik's topic ... Apologies, Eirik. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
If I recall correctly, we can manually edit client_state.xml and change the max_nbytes for those file uploads to something larger, and the rest of the files should upload correctly. Of course this has to be done when boinc has been shut down. Perhaps Richard or Ian can chime in on that as they are more boinc knowledgeable and have better memories than I do. Your right, I remember doing that several years ago. I think that they were beta model, so the instructions are probably lost with the boards from the now defunct beta site. It worked well. so if someone could come with those instructions I am game to try it again. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
I'm aware of the capability of editing that parameter and that we did so 'back when.' I considered it this time and chose not to do so -- because I don't like what I perceive as as a "slap-dash" approach to the project, from above, in recent years. Yeah, my advice only helps those who would read this forum, very few of the people running the models. Re. HadCM3 Models -- my machines don't suffer the "shutdown and die on restart" we too-often see reported... I remember shutting those beauties down every two or three days, for months, to make backups (in the days of much slower machines) -- when they ran for 160 years (or 200 years in the case of "Spinup" project). I have not the slightest clue as to the difference then and now, or the difference in your experience and mine, George -- wish I did. The early hadcm3 version ones didn't have a problem. It came for me with some later version. And maybe it was most prevalent with Linux since that was where I ran most of these. JIM, after cleanly shutting boinc down, open client_state.xml in notepad and search for all the instances of max_nbytes for cpdn tasks. Change the value of all those instances from 150000000.000000 to 250000000.000000 Save and then exit notepad. After restarting, BOINC then shouldn't complain about the decade uploads being too big. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
JIM, after cleanly shutting boinc down, open client_state.xml in notepad and search for all the instances of max_nbytes for cpdn tasks. Change the value of all those instances from Thanks I give it a try and see if I can still do it. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
The edits to the Client_state file have been made. The problem is that one of the models was at 37% and had already sent the first (presumably) truncated decadal zip file. Any word from the Scientists on whether it is still usable and is worth the time (about 15 days) needed to finish it? |
©2024 cpdn.org