climateprediction.net (CPDN) home page
Thread 'Upload failures'

Thread 'Upload failures'

Message boards : Number crunching : Upload failures
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · Next

AuthorMessage
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 61186 - Posted: 5 Oct 2019, 4:55:02 UTC

That error about file locked by upload handler is a server side problem where a file upload was interrupted in some way, but the server process that was handling it did not terminate. When upload retries for that file occur,, the server thinks the file initial upload is still in progress and the next try gets that error. The solution in the past was rebooting the server, or restarting some processes. Seems like ANZ models used to have this problem a lot with the server in Tasmania, and now the CAM models are favored for this problem.
ID: 61186 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 61187 - Posted: 5 Oct 2019, 7:09:21 UTC - in response to Message 61185.  

I'm away from the machine until Tuesday. Will reboot and check then.
Nothing much happening "upstairs".
Is there any improvement with the zips?
ID: 61187 · Report as offensive     Reply Quote
Albert H.

Send message
Joined: 18 Feb 06
Posts: 73
Credit: 61,797,770
RAC: 46,121
Message 61188 - Posted: 5 Oct 2019, 8:16:48 UTC

Sorry, mine are still stuck
Albert
ID: 61188 · Report as offensive     Reply Quote
Speedy

Send message
Joined: 20 Jul 05
Posts: 25
Credit: 414,873
RAC: 406
Message 61189 - Posted: 5 Oct 2019, 21:46:18 UTC
Last modified: 5 Oct 2019, 21:51:44 UTC

I am sorry to hear that people are getting stuck uploads. I thought the upload situation was going to be fixed before the release of new work?
Unfortunately I am not able to comment in regards to state uploads as I have not been able to get any work. I have .read in another thread there has been Windows work but I was not one of the lucky recipients
ID: 61189 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 61199 - Posted: 8 Oct 2019, 6:55:16 UTC - in response to Message 61185.  

My upload fail is still the same:
08/10/2019 04:30:46 | climateprediction.net | Started upload of wah2_cam25_a0js_200405_18_832_011891079_0_r168521463_restart.zip
08/10/2019 04:31:10 | climateprediction.net | Temporarily failed upload of wah2_cam25_a0js_200405_18_832_011891079_0_r168521463_restart.zip: transient HTTP error
08/10/2019 04:31:10 | climateprediction.net | Backing off 05:13:08 on upload of wah2_cam25_a0js_200405_18_832_011891079_0_r168521463_restart.zip
08/10/2019 04:31:16 | | Project communication failed: attempting access to reference site
08/10/2019 04:31:18 | | Internet access OK - project servers may be temporarily down.

Interestingly, I am running 4 cam25 models and only this model is currently presenting the stuck upload for the _restart.zip. Other zips for this and other models are going through.
ID: 61199 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 61200 - Posted: 8 Oct 2019, 7:02:55 UTC

Interestingly, I am running 4 cam25 models and only this model is currently presenting the stuck upload for the _restart.zip. Other zips for this and other models are going through.


I suspect the one stuck while others go through is the file locked by upload handler issue which requires either a script restart or a reboot of the upload handler.
ID: 61200 · Report as offensive     Reply Quote
Albert H.

Send message
Joined: 18 Feb 06
Posts: 73
Credit: 61,797,770
RAC: 46,121
Message 61207 - Posted: 10 Oct 2019, 14:27:09 UTC

No News about failing uploads ?

10/10/2019 14:19:30 | climateprediction.net | Sending scheduler request: To send trickle-up message.
10/10/2019 14:19:30 | climateprediction.net | Requesting new tasks for CPU
10/10/2019 14:19:32 | climateprediction.net | Scheduler request completed: got 0 new tasks
10/10/2019 14:19:32 | climateprediction.net | Project has no tasks available
10/10/2019 15:47:20 | climateprediction.net | Started upload of wah2_anz50_n1oq_201612_20_794_011764572_2_r1272938784_18.zip
10/10/2019 15:47:22 | | Project communication failed: attempting access to reference site
10/10/2019 15:47:22 | climateprediction.net | Temporarily failed upload of wah2_anz50_n1oq_201612_20_794_011764572_2_r1272938784_18.zip: transient HTTP error
10/10/2019 15:47:22 | climateprediction.net | Backing off 04:54:13 on upload of wah2_anz50_n1oq_201612_20_794_011764572_2_r1272938784_18.zip
10/10/2019 15:47:24 | | Internet access OK - project servers may be temporarily down.

Thisone PLUS 9 others waiting for weeks.

Thanks
ID: 61207 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 61209 - Posted: 10 Oct 2019, 19:56:48 UTC - in response to Message 61207.  

I've notified the project staff about the server possibly being down. No response yet.
ID: 61209 · Report as offensive     Reply Quote
Kiska

Send message
Joined: 28 Mar 12
Posts: 7
Credit: 692,648
RAC: 0
Message 61231 - Posted: 15 Oct 2019, 0:45:45 UTC

Still no news?

10/15/2019 11:38:39 AM | climateprediction.net | Started upload of wah2_cam25_a0kr_200405_18_691_011370296_1_r1902152139_10.zip
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Info: Hostname upload6.cpdn.org was found in DNS cache
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Info: Trying 158.97.9.11...
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Info: Connected to upload6.cpdn.org (158.97.9.11) port 80 (#127)
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Sent header to server: POST /cgi-bin/file_upload_handler HTTP/1.1
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Sent header to server: Host: upload6.cpdn.org
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.8.2)
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Sent header to server: Accept: */*
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Sent header to server: Accept-Encoding: deflate, gzip
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Sent header to server: Content-Type: application/x-www-form-urlencoded
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Sent header to server: Accept-Language: en_US
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Sent header to server: Content-Length: 312
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Sent header to server:
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Info: We are completely uploaded and fine
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: HTTP/1.1 200 OK
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: Date: Tue, 15 Oct 2019 00:38:40 GMT
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: Server: Apache
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: Transfer-Encoding: chunked
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: Content-Type: text/plain; charset=UTF-8
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server:
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: 64
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: <data_server_reply>
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: <status>0</status>
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: <file_size>98304000</file_size>
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: </data_server_reply>
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server:
10/15/2019 11:38:40 AM | | [http_xfer] [ID#100] HTTP: wrote 100 bytes
10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Info: Connection #127 to host upload6.cpdn.org left intact
10/15/2019 11:38:41 AM | climateprediction.net | [http] HTTP_OP::libcurl_exec(): ca-bundle set
10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Info: Found bundle for host upload6.cpdn.org: 0x3cb70f0 [can pipeline]
10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Info: Re-using existing connection! (#127) with host upload6.cpdn.org
10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Info: Connected to upload6.cpdn.org (158.97.9.11) port 80 (#127)
10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Sent header to server: POST /cgi-bin/file_upload_handler HTTP/1.1
10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Sent header to server: Host: upload6.cpdn.org
10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.8.2)
10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Sent header to server: Accept: */*
10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Sent header to server: Accept-Encoding: deflate, gzip
10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Sent header to server: Content-Type: application/x-www-form-urlencoded
10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Sent header to server: Accept-Language: en_US
10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Sent header to server: Content-Length: 11750518
10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Sent header to server: Expect: 100-continue
10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Sent header to server:
10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Received header from server: HTTP/1.1 100 Continue
10/15/2019 11:39:01 AM | climateprediction.net | [http] [ID#100] Info: Recv failure: Connection was reset
10/15/2019 11:39:01 AM | climateprediction.net | [http] [ID#100] Info: Closing connection 127
10/15/2019 11:39:01 AM | climateprediction.net | [http] HTTP error: Failure when receiving data from the peer
10/15/2019 11:39:02 AM | climateprediction.net | Temporarily failed upload of wah2_cam25_a0kr_200405_18_691_011370296_1_r1902152139_10.zip: transient HTTP error
10/15/2019 11:39:02 AM | climateprediction.net | Backing off 03:59:20 on upload of wah2_cam25_a0kr_200405_18_691_011370296_1_r1902152139_10.zip
ID: 61231 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61232 - Posted: 15 Oct 2019, 5:41:16 UTC

:(
I've just sent another email.
Keep breathing; it may take a while. :)
ID: 61232 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 0
Message 61237 - Posted: 16 Oct 2019, 15:56:39 UTC

These have been stuck for weeks. Should I delete/cancel ???

10/16/2019 11:52:51 AM | climateprediction.net | Started upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_16.zip
10/16/2019 11:52:51 AM | climateprediction.net | Started upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_17.zip
10/16/2019 11:52:53 AM | climateprediction.net | [error] Error reported by file upload server: can't open log file '../log_rwah0/file_upload_handler.log' (errno: 9)
10/16/2019 11:52:53 AM | climateprediction.net | [error] Error reported by file upload server: can't open log file '../log_rwah0/file_upload_handler.log' (errno: 9)
10/16/2019 11:52:53 AM | climateprediction.net | Temporarily failed upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_16.zip: transient upload error
10/16/2019 11:52:53 AM | climateprediction.net | Backing off 05:39:51 on upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_16.zip
10/16/2019 11:52:53 AM | climateprediction.net | Temporarily failed upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_17.zip: transient upload error
10/16/2019 11:52:53 AM | climateprediction.net | Backing off 05:08:40 on upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_17.zip
10/16/2019 11:52:54 AM | climateprediction.net | Started upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_18.zip
10/16/2019 11:52:54 AM | climateprediction.net | Started upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_19.zip
10/16/2019 11:52:56 AM | climateprediction.net | [error] Error reported by file upload server: can't open log file '../log_rwah0/file_upload_handler.log' (errno: 9)
10/16/2019 11:52:56 AM | climateprediction.net | [error] Error reported by file upload server: can't open log file '../log_rwah0/file_upload_handler.log' (errno: 9)
10/16/2019 11:52:56 AM | climateprediction.net | Temporarily failed upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_18.zip: transient upload error
10/16/2019 11:52:56 AM | climateprediction.net | Backing off 04:48:58 on upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_18.zip
10/16/2019 11:52:56 AM | climateprediction.net | Temporarily failed upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_19.zip: transient upload error
10/16/2019 11:52:56 AM | climateprediction.net | Backing off 04:25:27 on upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_19.zip
ID: 61237 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61238 - Posted: 16 Oct 2019, 20:32:44 UTC - in response to Message 61237.  
Last modified: 16 Oct 2019, 20:34:51 UTC

I guess you may as be abort the ANZ zips.
The Oxford people are busy elsewhere.
ID: 61238 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61239 - Posted: 16 Oct 2019, 20:34:06 UTC - in response to Message 61231.  

You may as well abort the CAM25 zips.
ID: 61239 · Report as offensive     Reply Quote
archeye

Send message
Joined: 3 Jul 19
Posts: 3
Credit: 50,058
RAC: 0
Message 61272 - Posted: 18 Oct 2019, 20:24:22 UTC - in response to Message 61237.  

I have only recently started supporting this project again after a long gap.

I noticed that most of the trickles did upload ok but I had 2 zip files that would not for some reason to do with the project server.

I have now 5 wah2 tasks.

I will let these run but have set the project to "no new work" in boinc manager.

It does not seem worthwhile to run tasks when the upload does not seem to work reliably.

From what I read in other places of the forum the communication (internet) is not so reliable for the project and this is the main reason uploads do not complete.

The boinc community activity for this project in running the tasks should be supported with a reliable means of uploading the results.

Maybe one solution is to set up one or more virtual project servers in a good internet coverage location to manage all the distribution and uploads of tasks.

Then the project team could just communicate with this remote location whenever they needed to update the tasks to be sent out and download the results.

Just trying to propose a possible way forward :)
ID: 61272 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 61274 - Posted: 18 Oct 2019, 21:23:53 UTC - in response to Message 61272.  

The hung upload files have mainly been for the WAH2 ANZ and CAM25 regions. It used to be the servers for these regions were remote and remotely administered in the countries where the research projects were being proposed and worked on. I'm not sure anymore about that, but it would make sense as to why those two regions have the problems and the others don't.
ID: 61274 · Report as offensive     Reply Quote
Speedy

Send message
Joined: 20 Jul 05
Posts: 25
Credit: 414,873
RAC: 406
Message 61280 - Posted: 19 Oct 2019, 20:38:17 UTC
Last modified: 19 Oct 2019, 20:41:00 UTC

I noticed this morning after turning my computer on their is a Weather At Home 2 that failed saying the following files were absent I gather these are from batch #845
    20/10/2019 8:38:16 AM | climateprediction.net | Computation for task wah2_global_c0ey_198812_13_845_011911553_0 finished
    20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_2.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent
    20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_3.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent
    20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_4.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent
    20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_5.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent
    20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_6.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent
    20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_7.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent
    20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_8.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent
    20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_9.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent
    20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_10.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent
    20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_11.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent
    20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_12.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent
    20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_13.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent
    20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_restart.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent
    20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_out.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent


Another task has been created and sent out so it will be interesting to see whether or not this one fails two


ID: 61280 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 61281 - Posted: 19 Oct 2019, 21:38:57 UTC - in response to Message 61280.  

@Speedy The error message in stderr on the task page says "The system cannot find the drive specified.". This is an error that crops up occasionally. No one knows the cause. It's not typically reproduced in the other tasks in the work unit. It may be some kind of timing issue when the model tries to write to, or read from the disk.

The error listing you pasted into your post are just because the model crashed before those monthly upload files are created. It was expecting to upload them and they were never generated. It's unfortunately not useful for finding the cause of the crash.
ID: 61281 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61285 - Posted: 20 Oct 2019, 0:30:14 UTC

Climate models have lots of files open, which all need saving at checkpoints.
With your computer having so many processors, it will need a VERY fast HD to keep up with all that saving when it occurs at the same time.
ID: 61285 · Report as offensive     Reply Quote
Speedy

Send message
Joined: 20 Jul 05
Posts: 25
Credit: 414,873
RAC: 406
Message 61286 - Posted: 20 Oct 2019, 0:53:50 UTC - in response to Message 61285.  

Climate models have lots of files open, which all need saving at checkpoints.
With your computer having so many processors, it will need a VERY fast HD to keep up with all that saving when it occurs at the same time.

Thank you for pointing this out I have cut it down to working on three tasks at a time. I'm not sure but maybe when I turned my machine last night it was trying to upload a trickle message
@Speedy The error message in stderr on the task page says "The system cannot find the drive specified.". This is an error that crops up occasionally. No one knows the cause. It's not typically reproduced in the other tasks in the work unit. It may be some kind of timing issue when the model tries to write to, or read from the disk.

The error listing you pasted into your post are just because the model crashed before those monthly upload files are created. It was expecting to upload them and they were never generated. It's unfortunately not useful for finding the cause of the crash.

Thank you for explaining the error message it makes complete sense. Hopefully I will be able to complete other tasks without them crashing
ID: 61286 · Report as offensive     Reply Quote
Ivorget

Send message
Joined: 23 Feb 05
Posts: 7
Credit: 1,423,261
RAC: 213
Message 61530 - Posted: 14 Nov 2019, 5:19:58 UTC - in response to Message 61239.  

I've had one cam25 zip transfer failing with 'transient HTTP error' since about Oct 26:
https://www.cpdn.org/result.php?resultid=21743279

Can it be fixed or is the advice still to abort?
ID: 61530 · Report as offensive     Reply Quote
Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · Next

Message boards : Number crunching : Upload failures

©2024 cpdn.org