Message boards : Number crunching : Download errors on UK Met Office HadAM4 at N216 resolution v8.52 tasks
Message board moderation
Author | Message |
---|---|
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,705,793 RAC: 9,655 |
Two successive tasks have failed to download cleanly. task 21931151 reports: <message> WU download error: couldn't get input files: <file_xfer_error> <file_name>a10l_867_atmos.gz</file_name> <error_code>-119 (md5 checksum failed for file)</error_code> </file_xfer_error> <file_xfer_error> <file_name>ic_N216_2002_12_000004.nc.gz</file_name> <error_code>-119 (md5 checksum failed for file)</error_code> </file_xfer_error> <file_xfer_error> <file_name>HAPPI_1.5K_sst_N216_2095-10-01_2096-04-30.gz</file_name> <error_code>-119 (md5 checksum failed for file)</error_code> </file_xfer_error> </message>but local client reports Tue 28 Apr 2020 11:34:04 BST | climateprediction.net | [unparsed_xml] SCHEDULER_REPLY::parse(): unrecognized ?xml Tue 28 Apr 2020 11:34:04 BST | climateprediction.net | [unparsed_xml] SCHEDULER_REPLY::parse(): unrecognized upload_template Tue 28 Apr 2020 11:34:06 BST | climateprediction.net | Started download of a10l_867_atmos.gz Tue 28 Apr 2020 11:34:06 BST | climateprediction.net | Started download of ic_N216_2002_12_000004.nc.gz Tue 28 Apr 2020 11:34:06 BST | climateprediction.net | Started download of HAPPI_1.5K_sst_N216_2095-10-01_2096-04-30.gz Tue 28 Apr 2020 11:34:08 BST | climateprediction.net | Temporarily failed download of a10l_867_atmos.gz: connect() failed Tue 28 Apr 2020 11:34:08 BST | climateprediction.net | Temporarily failed download of ic_N216_2002_12_000004.nc.gz: connect() failed Tue 28 Apr 2020 11:34:08 BST | climateprediction.net | Temporarily failed download of HAPPI_1.5K_sst_N216_2095-10-01_2096-04-30.gz: connect() failedThese two sets of messages don't seem to tie up. Machine is just completing task 21922323, successfully downloaded 22 April. Anyone know of any changes since then? Edit - second failure is task 21922854 - still in reporting delay, but similar symptoms locally. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
I currently have one that has been stuck in download for two hours. It has failed on three other machines. I don't know if there is any connection or not. https://www.cpdn.org/workunit.php?wuid=12016400 |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,014,785 RAC: 20,946 |
It has failed on three other machines. Three previous errors are missing 32bit libraries. Will contact project but might not manage till tomorrow morning so someone else might beat me to it. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,705,793 RAC: 9,655 |
Thanks Dave. Other tasks are running to completion, so it's not the libs here. A third has just failed - I'd better set NNT overnight. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Problem reported. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Had a reply to say that it should be fixed now. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
Yes, the download finished and all is OK. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
That's good. I'm half way through, so a few more days yet. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,705,793 RAC: 9,655 |
A second (near identical) machine has downloaded a new task which is currently ready to run. I'll re-enable the machine with yesterday's problem. Has anyone heard what the problem was? I couldn't make any sense of the mixed messages. Edit - problem machine has downloaded new work and is running again. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,014,785 RAC: 20,946 |
Thanks Dave. Other tasks are running to completion, so it's not the libs here. A third has just failed - I'd better set NNT overnight. I would have been very surprised if you were missing the libs Richard! |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
I have a stuck download of batch 867 WU Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | [http] HTTP_OP::init_get(): http://download.cpdn.org/download//batch_867/workunits/hadam4h_a17c_209511_4_867_012013459.zip Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | Started download of hadam4h_a17c_209511_4_867_012013459.zip Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | [http] HTTP_OP::init_get(): http://download.cpdn.org/download//batch_867/ancils/a17c_867_atmos.gz Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | Started download of a17c_867_atmos.gz Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | [http] [ID#71828] Info: Connection 3859 seems to be dead! Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | [http] [ID#71828] Info: Closing connection 3859 Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | [http] [ID#71828] Info: TLSv1.2 (OUT), TLS alert, Client hello (1): Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | [http] [ID#71829] Info: Found bundle for host download.cpdn.org: 0x559f10a67390 [serially] Fri 15 May 2020 12:38:33 PM EEST | climateprediction.net | [http] [ID#71828] Info: Trying 129.67.193.131... .......................... Fri 15 May 2020 12:42:34 PM EEST | climateprediction.net | Temporarily failed download of ic_N216_2002_12_000004.nc.gz: transient HTTP error Fri 15 May 2020 12:42:34 PM EEST | climateprediction.net | Backing off 00:04:42 on download of ic_N216_2002_12_000004.nc.gz Fri 15 May 2020 12:42:34 PM EEST | climateprediction.net | Temporarily failed download of HAPPI_1.5K_sst_N216_2095-10-01_2096-04-30.gz: transient HTTP error Fri 15 May 2020 12:42:34 PM EEST | climateprediction.net | Backing off 00:04:19 on download of HAPPI_1.5K_sst_N216_2095-10-01_2096-04-30.gz |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,705,793 RAC: 9,655 |
It's me again - and maybe with a repeat of the same problem. Two new task downloads failed this morning: hadam4h_20c6_209305_5_903_012079897_0, WU created 15 Apr 2021, 10:43:51 UTC hadam4h_1191_209805_5_902_012078980_0, WU created 9 Apr 2021, 14:25:29 UTC stderr online reads <core_client_version>7.16.16</core_client_version> <![CDATA[ <message> WU download error: couldn't get input files: <file_xfer_error> <file_name>1191_902_atmos.gz</file_name> <error_code>-119 (md5 checksum failed for file)</error_code> </file_xfer_error> <file_xfer_error> <file_name>ic_N216_2003_03_000052_f.nc.gz</file_name> <error_code>-119 (md5 checksum failed for file)</error_code> </file_xfer_error> <file_xfer_error> <file_name>HAPPI_1.5K_sst_N216_2098-04-01_2098-10-30.gz</file_name> <error_code>-119 (md5 checksum failed for file)</error_code> </file_xfer_error> <file_xfer_error> <file_name>so2dms_rcp26-2095_N216_208912-210112.gz</file_name> <error_code>-119 (md5 checksum failed for file)</error_code> </file_xfer_error> </message> ]]>and similar for three files from the other task. Local event log contains multiple lines like 30/04/2021 09:26:30 | climateprediction.net | Temporarily failed download of 1191_902_atmos.gz: connect() failedbut no other clue. Machine had just completed a UK Met Office HadAM4 at N216 resolution task (successful) and was downloading replacements on report. Another task is running fine, and tasks from other projects are downloading normally. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,014,785 RAC: 20,946 |
It's me again - and maybe with a repeat of the same problem. Two new task downloads failed this morning: I have sent Andy a message. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,014,785 RAC: 20,946 |
HI Dave, |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,014,785 RAC: 20,946 |
And more from Andy, No stuck files visible to the naked eye, but a restart has cleared the problem. But I have to wait another hour before testing again... |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Mine seem to download OK. This one appears in the task list, but it is not running yet because four others are ahead of it. Thu 29 Apr 2021 11:59:58 PM EDT | climateprediction.net | Sending scheduler request: To fetch work. Thu 29 Apr 2021 11:59:58 PM EDT | climateprediction.net | Requesting new tasks for CPU Fri 30 Apr 2021 12:00:00 AM EDT | climateprediction.net | Scheduler request completed: got 1 new tasks Fri 30 Apr 2021 12:00:00 AM EDT | climateprediction.net | Project requested delay of 3636 seconds Fri 30 Apr 2021 12:00:03 AM EDT | climateprediction.net | Started download of hadam4h_a0v0_200911_4_852_011938001.zip Fri 30 Apr 2021 12:00:03 AM EDT | climateprediction.net | Started download of a0v0_852_atmos.gz Fri 30 Apr 2021 12:00:05 AM EDT | climateprediction.net | Finished download of hadam4h_a0v0_200911_4_852_011938001.zip Fri 30 Apr 2021 12:00:05 AM EDT | climateprediction.net | Started download of ic_N216_2003_11_000042.nc.gz Fri 30 Apr 2021 12:00:20 AM EDT | climateprediction.net | Finished download of ic_N216_2003_11_000042.nc.gz Fri 30 Apr 2021 12:00:20 AM EDT | climateprediction.net | Started download of ALLclim_ancil_7mon_OSTIA_sst_N216_2009-10-01_2010-04-30.gz Fri 30 Apr 2021 12:00:24 AM EDT | climateprediction.net | Finished download of ALLclim_ancil_7mon_OSTIA_sst_N216_2009-10-01_2010-04-30.gz Fri 30 Apr 2021 12:00:24 AM EDT | climateprediction.net | Started download of ALLclim_ancil_7mon_OSTIA_ice_v2_N216_2009-10-01_2010-04-30.gz Fri 30 Apr 2021 12:00:30 AM EDT | climateprediction.net | Finished download of ALLclim_ancil_7mon_OSTIA_ice_v2_N216_2009-10-01_2010-04-30.gz Fri 30 Apr 2021 12:00:30 AM EDT | climateprediction.net | Started download of so2dms_rcp45_N216_2009_2020.gz Fri 30 Apr 2021 12:00:55 AM EDT | climateprediction.net | Finished download of a0v0_852_atmos.gz Fri 30 Apr 2021 12:00:55 AM EDT | climateprediction.net | Started download of ozone_rcp45_N216L38_2009_2020v2.gz Fri 30 Apr 2021 12:00:57 AM EDT | climateprediction.net | Finished download of ozone_rcp45_N216L38_2009_2020v2.gz Fri 30 Apr 2021 12:01:02 AM EDT | climateprediction.net | Finished download of so2dms_rcp45_N216_2009_2020.gz |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,705,793 RAC: 9,655 |
Ad more from Andy,Actually, that was my reply... The hour is up, and I can confirm I've got my replacement task(s), and all files have downloaded properly. If anyone to the west of me wakes up to find that they suffered the same error as me overnight - you may find that your computer is reluctant to request new work. A simple BOINC restart is all that's needed to clear the blockage. For the record, Andy replied This has now been solved. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,705,793 RAC: 9,655 |
|
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,014,785 RAC: 20,946 |
They're failing again: And as Richard noted in the BOINC forums, the project has been taken off line. (Servers to do with work, but the forums are clearly still working. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
The systems that accept results must be working. Wed 08 Sep 2021 10:56:52 AM EDT | climateprediction.net | Sending scheduler request: To send trickle-up message. Wed 08 Sep 2021 10:56:52 AM EDT | climateprediction.net | Reporting 1 completed tasks Wed 08 Sep 2021 10:56:52 AM EDT | climateprediction.net | Not requesting tasks: don't need () Wed 08 Sep 2021 10:56:54 AM EDT | climateprediction.net | Scheduler request completed Wed 08 Sep 2021 10:56:54 AM EDT | climateprediction.net | Project is temporarily shut down for maintenance Wed 08 Sep 2021 10:56:54 AM EDT | climateprediction.net | Project requested delay of 3600 seconds |
©2024 cpdn.org