climateprediction.net home page
Upload issue.

Upload issue.

Questions and Answers : Windows : Upload issue.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5

AuthorMessage
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1058
Credit: 36,583,114
RAC: 15,886
Message 69760 - Posted: 11 Oct 2023, 11:33:44 UTC

A few more thoughts on that log. Two more lines -

10/10/2023 8:09:59 AM | climateprediction.net | [http] [ID#90] Received header from server: <file_size>1354684</file_size>
10/10/2023 8:10:01 AM | climateprediction.net | [http] [ID#90] Sent header to server: Content-Length: 98722343
confirm - as if we didn't know already - that this file had been tried before: it got through, and some data was transferred (1.3 MB), but that's a tiny fraction of the full file.

From the segment I quoted yesterday, the connection stayed open for 32 seconds from '100 Continue' until it was reset. That's not long enough to do much at all: I mentioned elsewhere that I'm logging my transfer times, on a fast fibre connection. The quickest so far has been 63 seconds, but I had one yesterday which took an hour and 11 minutes! It started at 11:16 UTC on Tuesday, and finished at 12:27 UTC. I wonder what the Koreans are doing on a Tuesday, while we have lunch?
ID: 69760 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1044
Credit: 16,196,312
RAC: 12,647
Message 69764 - Posted: 11 Oct 2023, 13:25:10 UTC - in response to Message 69738.  
Last modified: 11 Oct 2023, 13:48:18 UTC

That's the same sequence as we saw from this server, before it was stripped down and rebuilt.
To clarify, it was the machine's OS which was updated to the latest version and then latest boinc software installed.

Given the traffic is going to Korea and it's a University site, I suspect they might be a little overwhelmed at times with data coming in, esp during their working day? My transfers do eventually get through (highest retry I had was '12' so far).

For 'stuck' files, what is the retry number people are getting?
ID: 69764 · Report as offensive     Reply Quote
Grackle

Send message
Joined: 21 Oct 18
Posts: 6
Credit: 7,415,884
RAC: 15,672
Message 69784 - Posted: 12 Oct 2023, 16:24:20 UTC - in response to Message 69764.  

Currently, for the 10/12/2023 11:23:01 AM | | [http] [ID#0] Sent header to server: <name>wah2_eas25_a0v2_199012_24_996_012224666_0_r957404909_3.zip</name>
upload, I'm at 22 retries without ever getting past 1.35MB.
ID: 69784 · Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 2 Oct 06
Posts: 54
Credit: 27,309,613
RAC: 28,128
Message 69823 - Posted: 13 Oct 2023, 15:18:11 UTC - in response to Message 69764.  

For 'stuck' files, what is the retry number people are getting?

I have 15 stuck uploads. The retries range from 7 to 42 at this time.
ID: 69823 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4529
Credit: 18,661,594
RAC: 14,529
Message 69828 - Posted: 13 Oct 2023, 15:37:18 UTC - in response to Message 69823.  

If you have a lot of stuck uploads on the same task, it is worth pausing the task till some have cleared. Seethis message in the batch thread and the few following messages. A user has reported a task crashing because rsc_disk_bound figure was exceeded.
ID: 69828 · Report as offensive     Reply Quote
Tomcat

Send message
Joined: 29 May 15
Posts: 17
Credit: 717,192
RAC: 12,206
Message 69854 - Posted: 14 Oct 2023, 12:23:44 UTC
Last modified: 14 Oct 2023, 12:34:19 UTC

I was able to fix my upload issue by limiting the upload speed to under 150 kbps and waiting for the retry timer to tick down. Manually clicking the retry timer doesn't seem to do things, it just gets the uploads stuck again. Currently I have 16 stuck uploads, that's a marked improvement over 72.

Before limiting my speed, NONE of my uploads worked at all, there were close to a hundred uploads stuck. That caused me to lose a task to error 196 (0x000000C4) EXIT_DISK_LIMIT_EXCEEDED. Yes, that was me.

Like Dave said, suspend the tasks if you cannot get the uploads working properly. Mine failed at above 90%.

On a side note, I think this is what happens if you hit retry too many times...

2023-10-14 8:32:40 AM | climateprediction.net | Started upload of wah2_eas25_a2c6_200012_24_996_012226578_1_r43387403_18.zip
2023-10-14 8:32:42 AM | climateprediction.net | [error] Error reported by file upload server: [wah2_eas25_a2c6_200012_24_996_012226578_1_r43387403_18.zip] locked by file_upload_handler PID=52970
2023-10-14 8:32:42 AM | climateprediction.net | Temporarily failed upload of wah2_eas25_a2c6_200012_24_996_012226578_1_r43387403_18.zip: transient upload error
ID: 69854 · Report as offensive     Reply Quote
Klimax

Send message
Joined: 16 Mar 20
Posts: 2
Credit: 346,439
RAC: 14,268
Message 69894 - Posted: 16 Oct 2023, 10:21:55 UTC

Looks like I am lucky. Only two files are so far failing to upload:
wah2_eas25_a0mw_198912_24_996_012224372_2_r774578916_1.zip
wah2_eas25_a07r_198612_24_996_012223827_0_r1847752070_1.zip
ID: 69894 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4529
Credit: 18,661,594
RAC: 14,529
Message 69895 - Posted: 16 Oct 2023, 11:16:43 UTC

The scientists in Korea have asked for the maximum number of simultaneous connections allowed on the server to be increased. Hopefully that will sort out the problems.
ID: 69895 · Report as offensive     Reply Quote
Klimax

Send message
Joined: 16 Mar 20
Posts: 2
Credit: 346,439
RAC: 14,268
Message 69896 - Posted: 16 Oct 2023, 15:21:50 UTC - in response to Message 69895.  

Just tried and no change. At 21,12MB/94,24MB and 44,44MB/94,32MB it still fails. (On some tries it might get bit further, on other not)
ID: 69896 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4529
Credit: 18,661,594
RAC: 14,529
Message 69897 - Posted: 16 Oct 2023, 15:33:39 UTC - in response to Message 69896.  
Last modified: 16 Oct 2023, 15:36:02 UTC

Just tried and no change. At 21,12MB/94,24MB and 44,44MB/94,32MB it still fails. (On some tries it might get bit further, on other not)
I have no idea whether they have made any changes yet or not, just that it has been requested. It is now coming up to midnight in Seoul which may well mean any changes if agreed won't happen till tomorrow at the earliest.
ID: 69897 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1044
Credit: 16,196,312
RAC: 12,647
Message 69914 - Posted: 17 Oct 2023, 11:44:58 UTC - in response to Message 69897.  

Just tried and no change. At 21,12MB/94,24MB and 44,44MB/94,32MB it still fails. (On some tries it might get bit further, on other not)
I have no idea whether they have made any changes yet or not, just that it has been requested. It is now coming up to midnight in Seoul which may well mean any changes if agreed won't happen till tomorrow at the earliest.
I reported in the main batch thread that the limit on the max uploads has been increased from 256 to 1000. However, when Andy made the change the no. of active uploads was 116, so less than half the previous max.
ID: 69914 · Report as offensive     Reply Quote
Grackle

Send message
Joined: 21 Oct 18
Posts: 6
Credit: 7,415,884
RAC: 15,672
Message 69919 - Posted: 17 Oct 2023, 15:10:21 UTC

Still having the same issues, FWIW, on

wah2_eas25_a1dp_199312_24_996_012225337_0_r687377745_7.zip
&
wah2_eas25_a0v2_199012_24_996_012224666_0_r957404909_3.zip
ID: 69919 · Report as offensive     Reply Quote
Karen

Send message
Joined: 12 Jul 19
Posts: 9
Credit: 363,587
RAC: 536
Message 69926 - Posted: 17 Oct 2023, 19:19:07 UTC

I'm also having difficulties with one particular "restart" file, while two other restarts and three 12.zip and 13.zip files have gone through - are the restarts being sent to a different server?

17/10/2023 13:33:42 | climateprediction.net | Started upload of wah2_eas25_a3bh_200612_24_996_012227849_0_r628545001_restart.zip
17/10/2023 13:34:11 | | Project communication failed: attempting access to reference site
17/10/2023 13:34:11 | climateprediction.net | Temporarily failed upload of wah2_eas25_a3bh_200612_24_996_012227849_0_r628545001_restart.zip: transient HTTP error
17/10/2023 13:34:11 | climateprediction.net | Backing off 05:47:31 on upload of wah2_eas25_a3bh_200612_24_996_012227849_0_r628545001_restart.zip
17/10/2023 13:34:14 | | Internet access OK - project servers may be temporarily down.
17/10/2023 17:06:47 | climateprediction.net | Sending scheduler request: To send trickle-up message.
17/10/2023 17:06:47 | climateprediction.net | Not requesting tasks: don't need (CPU: job cache full; AMD/ATI GPU: )
17/10/2023 17:06:48 | climateprediction.net | Scheduler request completed
17/10/2023 17:06:48 | climateprediction.net | Project requested delay of 3636 seconds
17/10/2023 17:06:54 | climateprediction.net | Started upload of wah2_eas25_a38u_200612_24_996_012227754_0_r1614428011_13.zip
17/10/2023 17:09:08 | climateprediction.net | Finished upload of wah2_eas25_a38u_200612_24_996_012227754_0_r1614428011_13.zip
17/10/2023 18:07:17 | climateprediction.net | Started upload of wah2_eas25_a3bh_200612_24_996_012227849_0_r628545001_13.zip
17/10/2023 18:07:29 | climateprediction.net | Sending scheduler request: To send trickle-up message.
17/10/2023 18:07:29 | climateprediction.net | Not requesting tasks: don't need (CPU: job cache full; AMD/ATI GPU: )
17/10/2023 18:07:30 | climateprediction.net | Scheduler request completed
17/10/2023 18:07:30 | climateprediction.net | Project requested delay of 3636 seconds
17/10/2023 18:09:57 | climateprediction.net | Finished upload of wah2_eas25_a3bh_200612_24_996_012227849_0_r628545001_13.zip
17/10/2023 18:36:19 | climateprediction.net | Started upload of wah2_eas25_a4ae_201212_24_996_012229106_0_r1829487277_13.zip
17/10/2023 18:38:33 | climateprediction.net | Finished upload of wah2_eas25_a4ae_201212_24_996_012229106_0_r1829487277_13.zip
17/10/2023 19:08:11 | climateprediction.net | Sending scheduler request: To send trickle-up message.
17/10/2023 19:08:11 | climateprediction.net | Not requesting tasks: don't need (CPU: job cache full; AMD/ATI GPU: )
17/10/2023 19:08:12 | climateprediction.net | Scheduler request completed
17/10/2023 19:08:12 | climateprediction.net | Project requested delay of 3636 seconds
17/10/2023 19:08:43 | climateprediction.net | Started upload of wah2_eas25_a3bh_200612_24_996_012227849_0_r628545001_restart.zip
17/10/2023 19:09:10 | | Project communication failed: attempting access to reference site
17/10/2023 19:09:10 | climateprediction.net | Temporarily failed upload of wah2_eas25_a3bh_200612_24_996_012227849_0_r628545001_restart.zip: transient HTTP error
17/10/2023 19:09:10 | climateprediction.net | Backing off 05:15:58 on upload of wah2_eas25_a3bh_200612_24_996_012227849_0_r628545001_restart.zip
17/10/2023 19:09:13 | | Internet access OK - project servers may be temporarily down.
ID: 69926 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4529
Credit: 18,661,594
RAC: 14,529
Message 69927 - Posted: 17 Oct 2023, 19:23:07 UTC

I'm also having difficulties with one particular "restart" file, while two other restarts and three 12.zip and 13.zip files have gone through - are the restarts being sent to a different server?
Restart.zips are going to upload7 in Korea the same as the other zips. I am still struggling to understand why some of us have no issues at all with zips of any sort. (Well four or five needed more than one attempt but all have gone through for the seven tasks I have completed and the two that are just over 3/4 completed have had everything go through.
ID: 69927 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1044
Credit: 16,196,312
RAC: 12,647
Message 69932 - Posted: 18 Oct 2023, 11:33:22 UTC

Please see this post for update on uploads: https://www.cpdn.org/forum_thread.php?id=9222&postid=69931
ID: 69932 · Report as offensive     Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 31 Aug 04
Posts: 145
Credit: 2,080,724
RAC: 753
Message 70591 - Posted: 3 Mar 2024, 22:20:12 UTC

I am leaving the project, I see 9 - 10 days of CPU time spent on tasks that are being scrapped as "not needed". The approach here is wrong, those 9 - 10 days of CPU time, (4GHz i7), would have been USED by other projects for genuine work. I say wrong, quite frankly it stinks.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 70591 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1044
Credit: 16,196,312
RAC: 12,647
Message 70592 - Posted: 3 Mar 2024, 22:41:59 UTC - in response to Message 70591.  

Those tasks were aborted as soon as we realised there was a problem and the results would have been unusable.
Would you rather we let you carry on computing those tasks until they finish, wasting time that could be put to better use?
It was the right approach. And CPDN very rarely abort tasks.
---
CPDN Visiting Scientist
ID: 70592 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5

Questions and Answers : Windows : Upload issue.

©2024 cpdn.org