Message boards : Number crunching : Upload failures
Message board moderation
Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · Next
Author | Message |
---|---|
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
That error about file locked by upload handler is a server side problem where a file upload was interrupted in some way, but the server process that was handling it did not terminate. When upload retries for that file occur,, the server thinks the file initial upload is still in progress and the next try gets that error. The solution in the past was rebooting the server, or restarting some processes. Seems like ANZ models used to have this problem a lot with the server in Tasmania, and now the CAM models are favored for this problem. |
Send message Joined: 13 Jan 07 Posts: 195 Credit: 10,581,566 RAC: 0 |
I'm away from the machine until Tuesday. Will reboot and check then. Nothing much happening "upstairs". |
Send message Joined: 18 Feb 06 Posts: 73 Credit: 61,633,546 RAC: 47,710 |
Sorry, mine are still stuck Albert |
Send message Joined: 20 Jul 05 Posts: 25 Credit: 414,873 RAC: 406 |
I am sorry to hear that people are getting stuck uploads. I thought the upload situation was going to be fixed before the release of new work? Unfortunately I am not able to comment in regards to state uploads as I have not been able to get any work. I have .read in another thread there has been Windows work but I was not one of the lucky recipients |
Send message Joined: 13 Jan 07 Posts: 195 Credit: 10,581,566 RAC: 0 |
My upload fail is still the same: 08/10/2019 04:30:46 | climateprediction.net | Started upload of wah2_cam25_a0js_200405_18_832_011891079_0_r168521463_restart.zip 08/10/2019 04:31:10 | climateprediction.net | Temporarily failed upload of wah2_cam25_a0js_200405_18_832_011891079_0_r168521463_restart.zip: transient HTTP error 08/10/2019 04:31:10 | climateprediction.net | Backing off 05:13:08 on upload of wah2_cam25_a0js_200405_18_832_011891079_0_r168521463_restart.zip 08/10/2019 04:31:16 | | Project communication failed: attempting access to reference site 08/10/2019 04:31:18 | | Internet access OK - project servers may be temporarily down. Interestingly, I am running 4 cam25 models and only this model is currently presenting the stuck upload for the _restart.zip. Other zips for this and other models are going through. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,022,240 RAC: 20,762 |
Interestingly, I am running 4 cam25 models and only this model is currently presenting the stuck upload for the _restart.zip. Other zips for this and other models are going through. I suspect the one stuck while others go through is the file locked by upload handler issue which requires either a script restart or a reboot of the upload handler. |
Send message Joined: 18 Feb 06 Posts: 73 Credit: 61,633,546 RAC: 47,710 |
No News about failing uploads ? 10/10/2019 14:19:30 | climateprediction.net | Sending scheduler request: To send trickle-up message. 10/10/2019 14:19:30 | climateprediction.net | Requesting new tasks for CPU 10/10/2019 14:19:32 | climateprediction.net | Scheduler request completed: got 0 new tasks 10/10/2019 14:19:32 | climateprediction.net | Project has no tasks available 10/10/2019 15:47:20 | climateprediction.net | Started upload of wah2_anz50_n1oq_201612_20_794_011764572_2_r1272938784_18.zip 10/10/2019 15:47:22 | | Project communication failed: attempting access to reference site 10/10/2019 15:47:22 | climateprediction.net | Temporarily failed upload of wah2_anz50_n1oq_201612_20_794_011764572_2_r1272938784_18.zip: transient HTTP error 10/10/2019 15:47:22 | climateprediction.net | Backing off 04:54:13 on upload of wah2_anz50_n1oq_201612_20_794_011764572_2_r1272938784_18.zip 10/10/2019 15:47:24 | | Internet access OK - project servers may be temporarily down. Thisone PLUS 9 others waiting for weeks. Thanks |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
I've notified the project staff about the server possibly being down. No response yet. |
Send message Joined: 28 Mar 12 Posts: 7 Credit: 692,648 RAC: 0 |
Still no news? 10/15/2019 11:38:39 AM | climateprediction.net | Started upload of wah2_cam25_a0kr_200405_18_691_011370296_1_r1902152139_10.zip 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Info: Hostname upload6.cpdn.org was found in DNS cache 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Info: Trying 158.97.9.11... 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Info: Connected to upload6.cpdn.org (158.97.9.11) port 80 (#127) 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Sent header to server: POST /cgi-bin/file_upload_handler HTTP/1.1 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Sent header to server: Host: upload6.cpdn.org 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.8.2) 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Sent header to server: Accept: */* 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Sent header to server: Accept-Encoding: deflate, gzip 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Sent header to server: Content-Type: application/x-www-form-urlencoded 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Sent header to server: Accept-Language: en_US 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Sent header to server: Content-Length: 312 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Sent header to server: 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Info: We are completely uploaded and fine 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: HTTP/1.1 200 OK 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: Date: Tue, 15 Oct 2019 00:38:40 GMT 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: Server: Apache 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: Transfer-Encoding: chunked 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: Content-Type: text/plain; charset=UTF-8 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: 64 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: <data_server_reply> 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: <status>0</status> 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: <file_size>98304000</file_size> 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: </data_server_reply> 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Received header from server: 10/15/2019 11:38:40 AM | | [http_xfer] [ID#100] HTTP: wrote 100 bytes 10/15/2019 11:38:40 AM | climateprediction.net | [http] [ID#100] Info: Connection #127 to host upload6.cpdn.org left intact 10/15/2019 11:38:41 AM | climateprediction.net | [http] HTTP_OP::libcurl_exec(): ca-bundle set 10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Info: Found bundle for host upload6.cpdn.org: 0x3cb70f0 [can pipeline] 10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Info: Re-using existing connection! (#127) with host upload6.cpdn.org 10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Info: Connected to upload6.cpdn.org (158.97.9.11) port 80 (#127) 10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Sent header to server: POST /cgi-bin/file_upload_handler HTTP/1.1 10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Sent header to server: Host: upload6.cpdn.org 10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.8.2) 10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Sent header to server: Accept: */* 10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Sent header to server: Accept-Encoding: deflate, gzip 10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Sent header to server: Content-Type: application/x-www-form-urlencoded 10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Sent header to server: Accept-Language: en_US 10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Sent header to server: Content-Length: 11750518 10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Sent header to server: Expect: 100-continue 10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Sent header to server: 10/15/2019 11:38:41 AM | climateprediction.net | [http] [ID#100] Received header from server: HTTP/1.1 100 Continue 10/15/2019 11:39:01 AM | climateprediction.net | [http] [ID#100] Info: Recv failure: Connection was reset 10/15/2019 11:39:01 AM | climateprediction.net | [http] [ID#100] Info: Closing connection 127 10/15/2019 11:39:01 AM | climateprediction.net | [http] HTTP error: Failure when receiving data from the peer 10/15/2019 11:39:02 AM | climateprediction.net | Temporarily failed upload of wah2_cam25_a0kr_200405_18_691_011370296_1_r1902152139_10.zip: transient HTTP error 10/15/2019 11:39:02 AM | climateprediction.net | Backing off 03:59:20 on upload of wah2_cam25_a0kr_200405_18_691_011370296_1_r1902152139_10.zip |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
:( I've just sent another email. Keep breathing; it may take a while. :) |
Send message Joined: 16 Oct 11 Posts: 254 Credit: 15,954,577 RAC: 0 |
These have been stuck for weeks. Should I delete/cancel ??? 10/16/2019 11:52:51 AM | climateprediction.net | Started upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_16.zip 10/16/2019 11:52:51 AM | climateprediction.net | Started upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_17.zip 10/16/2019 11:52:53 AM | climateprediction.net | [error] Error reported by file upload server: can't open log file '../log_rwah0/file_upload_handler.log' (errno: 9) 10/16/2019 11:52:53 AM | climateprediction.net | [error] Error reported by file upload server: can't open log file '../log_rwah0/file_upload_handler.log' (errno: 9) 10/16/2019 11:52:53 AM | climateprediction.net | Temporarily failed upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_16.zip: transient upload error 10/16/2019 11:52:53 AM | climateprediction.net | Backing off 05:39:51 on upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_16.zip 10/16/2019 11:52:53 AM | climateprediction.net | Temporarily failed upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_17.zip: transient upload error 10/16/2019 11:52:53 AM | climateprediction.net | Backing off 05:08:40 on upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_17.zip 10/16/2019 11:52:54 AM | climateprediction.net | Started upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_18.zip 10/16/2019 11:52:54 AM | climateprediction.net | Started upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_19.zip 10/16/2019 11:52:56 AM | climateprediction.net | [error] Error reported by file upload server: can't open log file '../log_rwah0/file_upload_handler.log' (errno: 9) 10/16/2019 11:52:56 AM | climateprediction.net | [error] Error reported by file upload server: can't open log file '../log_rwah0/file_upload_handler.log' (errno: 9) 10/16/2019 11:52:56 AM | climateprediction.net | Temporarily failed upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_18.zip: transient upload error 10/16/2019 11:52:56 AM | climateprediction.net | Backing off 04:48:58 on upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_18.zip 10/16/2019 11:52:56 AM | climateprediction.net | Temporarily failed upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_19.zip: transient upload error 10/16/2019 11:52:56 AM | climateprediction.net | Backing off 04:25:27 on upload of wah2_anz50_n5ns_201612_20_794_011769722_0_r1631347844_19.zip |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I guess you may as be abort the ANZ zips. The Oxford people are busy elsewhere. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
You may as well abort the CAM25 zips. |
Send message Joined: 3 Jul 19 Posts: 3 Credit: 50,058 RAC: 0 |
I have only recently started supporting this project again after a long gap. I noticed that most of the trickles did upload ok but I had 2 zip files that would not for some reason to do with the project server. I have now 5 wah2 tasks. I will let these run but have set the project to "no new work" in boinc manager. It does not seem worthwhile to run tasks when the upload does not seem to work reliably. From what I read in other places of the forum the communication (internet) is not so reliable for the project and this is the main reason uploads do not complete. The boinc community activity for this project in running the tasks should be supported with a reliable means of uploading the results. Maybe one solution is to set up one or more virtual project servers in a good internet coverage location to manage all the distribution and uploads of tasks. Then the project team could just communicate with this remote location whenever they needed to update the tasks to be sent out and download the results. Just trying to propose a possible way forward :) |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
The hung upload files have mainly been for the WAH2 ANZ and CAM25 regions. It used to be the servers for these regions were remote and remotely administered in the countries where the research projects were being proposed and worked on. I'm not sure anymore about that, but it would make sense as to why those two regions have the problems and the others don't. |
Send message Joined: 20 Jul 05 Posts: 25 Credit: 414,873 RAC: 406 |
I noticed this morning after turning my computer on their is a Weather At Home 2 that failed saying the following files were absent I gather these are from batch #845
20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_2.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent 20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_3.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent 20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_4.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent 20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_5.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent 20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_6.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent 20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_7.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent 20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_8.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent 20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_9.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent 20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_10.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent 20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_11.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent 20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_12.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent 20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_13.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent 20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_restart.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent 20/10/2019 8:38:16 AM | climateprediction.net | Output file wah2_global_c0ey_198812_13_845_011911553_0_r559669980_out.zip for task wah2_global_c0ey_198812_13_845_011911553_0 absent
|
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
@Speedy The error message in stderr on the task page says "The system cannot find the drive specified.". This is an error that crops up occasionally. No one knows the cause. It's not typically reproduced in the other tasks in the work unit. It may be some kind of timing issue when the model tries to write to, or read from the disk. The error listing you pasted into your post are just because the model crashed before those monthly upload files are created. It was expecting to upload them and they were never generated. It's unfortunately not useful for finding the cause of the crash. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Climate models have lots of files open, which all need saving at checkpoints. With your computer having so many processors, it will need a VERY fast HD to keep up with all that saving when it occurs at the same time. |
Send message Joined: 20 Jul 05 Posts: 25 Credit: 414,873 RAC: 406 |
Climate models have lots of files open, which all need saving at checkpoints. Thank you for pointing this out I have cut it down to working on three tasks at a time. I'm not sure but maybe when I turned my machine last night it was trying to upload a trickle message @Speedy The error message in stderr on the task page says "The system cannot find the drive specified.". This is an error that crops up occasionally. No one knows the cause. It's not typically reproduced in the other tasks in the work unit. It may be some kind of timing issue when the model tries to write to, or read from the disk. Thank you for explaining the error message it makes complete sense. Hopefully I will be able to complete other tasks without them crashing |
Send message Joined: 23 Feb 05 Posts: 7 Credit: 1,423,261 RAC: 213 |
I've had one cam25 zip transfer failing with 'transient HTTP error' since about Oct 26: https://www.cpdn.org/result.php?resultid=21743279 Can it be fixed or is the advice still to abort? |
©2024 cpdn.org