Message boards : Number crunching : Batch 996 Weather@Home2 East Asia25
Message board moderation
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 · Next
Author | Message |
---|---|
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,483,915 RAC: 15,324 |
I'm waiting on confirmation they've increased the max allowed httpd connections and I've sent them details from a couple of users who have v large uploads waiting to investigate any DDoS protection blocking. But otherwise the server is working fine. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,483,915 RAC: 15,324 |
The max number of connections to the Korean upload server has been increased from 256 to 1000. At the time Andy@CPDN made the change there were 116 active connections. IT in Korea are investigating further and I'll report back if they find anything. |
Send message Joined: 5 Jun 09 Posts: 97 Credit: 3,736,855 RAC: 4,073 |
Thanks Glenn. Sadly either this change is going to take some time to actually cause an improvement in the situation or it wasn't the complete solution. I've still got three zip files failing at every retry: wah2_eas25_a0uz_199012_24_996_012224663_2_r735015961_1.zip - 79.36% after 14:26 transfer time wah2_eas25_a4ml_20142_24_996_012229545_2_r1812486379_8.zip - 47.67% after 15:20 transfer time wah2_eas25_a4ml_20142_24_996_012229545_2_r1812486379_3.zip - 1.40% after 35:02 transfer time Both tasks are still running. The first one, wah2_eas25_a0uz_199012_24_996, due to finish in about 4.5 days, and wah2_eas25_a4ml_20142_24_996 in just under 2 days. Both my other tasks are uploading their zips in a timely manner, but even these can take a couple of retries (or, should that be a couple of retries?). {edit to add} The situation has gone backwards - all new uploads are descending rapidly into the re-try cycle, so the situation is certainly no better than it was before. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,483,915 RAC: 15,324 |
I don't know if it's the limit, could also just be network congestion causing the dropouts, but I'm not in close touch with the Korean side. I don't know what kind of bandwidth there is going to the server. I've got same issue myself. Uploads currently stalled around 90% on their 60th retry. But equally I come back to my PC in the morning and previously stalled transfers with high retries have gone. Maybe during the night the congestion eases?? I know the Koreans IT guys are keen to investigate and I've passed on details of IP addresses to look at. Hopefully they will find out more. |
Send message Joined: 2 Oct 06 Posts: 54 Credit: 27,309,613 RAC: 28,128 |
The max number of connections to the Korean upload server has been increased from 256 to 1000. At the time Andy@CPDN made the change there were 116 active connections. IT in Korea are investigating further and I'll report back if they find anything. FWIW, no change from my end. Pending uploads is now up to 36 for me. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,483,915 RAC: 15,324 |
Talking with Andy the feeling is it's a bandwidth issue to S. Korea. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,718,239 RAC: 8,054 |
Some time ago, I said I'd try to analyse my logs to see if upload times varied with time of day. Not in any recognisable way, seems to be the answer: (except for that one outlier, of course) 10-Oct-2023 12:16:32 [climateprediction.net] Started upload of wah2_eas25_a02o_198512_24_996_012223644_0_r1306186109_15.zip 10-Oct-2023 13:27:39 [climateprediction.net] Finished upload of wah2_eas25_a02o_198512_24_996_012223644_0_r1306186109_15.zip (99616522 bytes)Over an hour, but no sign of why. |
Send message Joined: 24 Dec 19 Posts: 32 Credit: 41,231,271 RAC: 73,109 |
Things seem to be improving ever so slightly. I now don't have to scroll down the page to see all my tasks waiting to upload. Can someone tell me what's going on with the file at the top of the list? Progress shows 100%. It's actually more like 180%. How is that even possible? |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,483,915 RAC: 15,324 |
Can someone tell me what's going on with the file at the top of the list? Progress shows 100%. It's actually more like 180%. How is that even possible?I'm guessing it's packet loss & resends. The client's counting the total packets sent? |
Send message Joined: 5 Jun 09 Posts: 97 Credit: 3,736,855 RAC: 4,073 |
I had one like that a short time ago, after digging through the log file it was fairly obvious that the BOINC client does get "somewhat confused" periodically and counts packets sent (but not acknowledged) as having arrived safely, and thus they are counted to the total transmitted. In my case a subsequent re-try reset the figure to zero, then to a more accurate value. |
Send message Joined: 29 May 15 Posts: 17 Credit: 717,192 RAC: 12,206 |
While increasing the number of connections should help by reducing the number of "first time stalls", there appears to be an issue that's leading to tasks with high re-try counts. The symptom is that once a task reaches a certain number of re-tries it becomes increasing more probable that it will fail on it's next attempt, so thinking out loud here, is the time-out time before declaring a failure too short for the "find this zip" time? Yeah. I've noticed this too. I have 5 files that will not upload no matter what. The percentage never changes on those. Three of them are extremely old, trickle numbers 2, 3, and 9 out of over 20. The rest have all uploaded successfully. I cannot tell if this is coincidence, like how limiting my upload speeds seemed to finally get uploads going. BTW, how does one view the retry count? |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,483,915 RAC: 15,324 |
Upload news We've had confirmation that the security policy on the http port at the S.Korea site is blocking some connections to the upload server due to the high number of attempts. Not unsurprisingly the site does not want to open up the port, so CPDN is going to switch the upload address to the UK JASMIN site (the upload URL is just an alias and can be pointed to other machines). This should happen later today and then it'll take a day or so for the change to propagate through the nameservers. In case anyone is wondering, the JASMIN upload server sits outside the main firewall at the site and doesn't have the same problem. This problem to S.Korea wasn't seen with earlier batches because the total uploads was much less. So please don't abort any outstanding transfers. --- CPDN Visiting Scientist |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Great to have an explanation. Thanks Glen. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,718,239 RAC: 8,054 |
One of the problems is that neither the user, nor the project, has any control over the retry interval for a file upload. One of my hosts caught that for this current eas run: 11-Oct-2023 00:29:44 [climateprediction.net] Started upload of wah2_eas25_a02v_198512_24_996_012223651_0_r1226154871_14.zip 11-Oct-2023 00:32:49 [climateprediction.net] Temporarily failed upload of wah2_eas25_a02v_198512_24_996_012223651_0_r1226154871_14.zip: transient HTTP error 11-Oct-2023 00:32:49 [climateprediction.net] Backing off 00:02:11 on upload of wah2_eas25_a02v_198512_24_996_012223651_0_r1226154871_14.zip 11-Oct-2023 00:35:01 [climateprediction.net] Started upload of wah2_eas25_a02v_198512_24_996_012223651_0_r1226154871_14.zip 11-Oct-2023 00:35:02 [climateprediction.net] Error reported by file upload server: [wah2_eas25_a02v_198512_24_996_012223651_0_r1226154871_14.zip] locked by file_upload_handler PID=3911898 11-Oct-2023 00:35:02 [climateprediction.net] Temporarily failed upload of wah2_eas25_a02v_198512_24_996_012223651_0_r1226154871_14.zip: transient upload error 11-Oct-2023 00:35:02 [climateprediction.net] Backing off 00:04:15 on upload of wah2_eas25_a02v_198512_24_996_012223651_0_r1226154871_14.zip 11-Oct-2023 00:39:18 [climateprediction.net] Started upload of wah2_eas25_a02v_198512_24_996_012223651_0_r1226154871_14.zip 11-Oct-2023 00:39:19 [climateprediction.net] Error reported by file upload server: [wah2_eas25_a02v_198512_24_996_012223651_0_r1226154871_14.zip] locked by file_upload_handler PID=3911898 11-Oct-2023 00:39:19 [climateprediction.net] Temporarily failed upload of wah2_eas25_a02v_198512_24_996_012223651_0_r1226154871_14.zip: transient upload error 11-Oct-2023 00:39:19 [climateprediction.net] Backing off 00:12:24 on upload of wah2_eas25_a02v_198512_24_996_012223651_0_r1226154871_14.zip 11-Oct-2023 00:52:33 [climateprediction.net] Started upload of wah2_eas25_a02v_198512_24_996_012223651_0_r1226154871_14.zip 11-Oct-2023 00:52:35 [climateprediction.net] Error reported by file upload server: [wah2_eas25_a02v_198512_24_996_012223651_0_r1226154871_14.zip] locked by file_upload_handler PID=3911898 11-Oct-2023 00:52:35 [climateprediction.net] Temporarily failed upload of wah2_eas25_a02v_198512_24_996_012223651_0_r1226154871_14.zip: transient upload error 11-Oct-2023 00:52:35 [climateprediction.net] Backing off 00:25:59 on upload of wah2_eas25_a02v_198512_24_996_012223651_0_r1226154871_14.zip 11-Oct-2023 01:24:01 [climateprediction.net] Started upload of wah2_eas25_a02v_198512_24_996_012223651_0_r1226154871_14.zip 11-Oct-2023 01:24:03 [climateprediction.net] Error reported by file upload server: [wah2_eas25_a02v_198512_24_996_012223651_0_r1226154871_14.zip] locked by file_upload_handler PID=3911898 11-Oct-2023 01:24:03 [climateprediction.net] Temporarily failed upload of wah2_eas25_a02v_198512_24_996_012223651_0_r1226154871_14.zip: transient upload error 11-Oct-2023 01:24:03 [climateprediction.net] Backing off 00:51:04 on upload of wah2_eas25_a02v_198512_24_996_012223651_0_r1226154871_14.zip 11-Oct-2023 02:22:09 [climateprediction.net] Started upload of wah2_eas25_a02v_198512_24_996_012223651_0_r1226154871_14.zip 11-Oct-2023 02:23:23 [climateprediction.net] Finished upload of wah2_eas25_a02v_198512_24_996_012223651_0_r1226154871_14.zip (99189131 bytes)The delay starts at somewhere around 2 minutes (clearly too short for Korea), and roughly doubles with each attempt: the exact values are randomised, so that different files don't end up retrying in lockstep. It might be better if projects could set a 'minimum delay' figure for uploads, as they already can for scheduler contacts. But that would be a difficult change, and I can't see BOINC picking up on it, in its current state. Rollout would also be slow. Incidentally, this log section gives a sort-of answer for the PID lockout question. 1 hour wasn't enough - try 2 hours. |
Send message Joined: 24 Dec 19 Posts: 32 Credit: 41,231,271 RAC: 73,109 |
Thanks for fixing the problem or finding a workaround! All my tasks have uploaded! :) |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
The big jump in the number of users reporting tasks is I think evidence that switching to Jasmine has worked. CM3 short 0 1093 --- 0 |
Send message Joined: 5 Jun 09 Posts: 97 Credit: 3,736,855 RAC: 4,073 |
Should our computers "automagically" connect to Jasmin now, or will that take some time? The reason I ask is that mine is still looking at, what I believe to be the Korean serve "upload7.cpnd.org" on ip address 141.223.16.156, port 80. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
You are right, looks like it's not been changed over yet. But something has changed judging by the numbers. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Should our computers "automagically" connect to Jasmin now, or will that take some time? Wouldn't this quote from Glenn explain why some may see the changeover faster than others? italics mine "We've had confirmation that the security policy on the http port at the S.Korea site is blocking some connections to the upload server due to the high number of attempts. Not unsurprisingly the site does not want to open up the port, so CPDN is going to switch the upload address to the UK JASMIN site (the upload URL is just an alias and can be pointed to other machines). This should happen later today and then it'll take a day or so for the change to propagate through the nameservers. " |
Send message Joined: 29 May 15 Posts: 17 Credit: 717,192 RAC: 12,206 |
Uploads complete! Let's hope the issue has been fixed. |
©2024 cpdn.org