climateprediction.net (CPDN) home page
Thread 'Download errors on UK Met Office HadAM4 at N216 resolution v8.52 tasks'

Thread 'Download errors on UK Met Office HadAM4 at N216 resolution v8.52 tasks'

Message boards : Number crunching : Download errors on UK Met Office HadAM4 at N216 resolution v8.52 tasks
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,705,793
RAC: 9,655
Message 62346 - Posted: 28 Apr 2020, 15:51:56 UTC
Last modified: 28 Apr 2020, 16:02:52 UTC

Two successive tasks have failed to download cleanly. task 21931151 reports:

<message>
WU download error: couldn't get input files:
<file_xfer_error>
  <file_name>a10l_867_atmos.gz</file_name>
  <error_code>-119 (md5 checksum failed for file)</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>ic_N216_2002_12_000004.nc.gz</file_name>
  <error_code>-119 (md5 checksum failed for file)</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>HAPPI_1.5K_sst_N216_2095-10-01_2096-04-30.gz</file_name>
  <error_code>-119 (md5 checksum failed for file)</error_code>
</file_xfer_error>
</message>
but local client reports

Tue 28 Apr 2020 11:34:04 BST | climateprediction.net | [unparsed_xml] SCHEDULER_REPLY::parse(): unrecognized ?xml
Tue 28 Apr 2020 11:34:04 BST | climateprediction.net | [unparsed_xml] SCHEDULER_REPLY::parse(): unrecognized upload_template
Tue 28 Apr 2020 11:34:06 BST | climateprediction.net | Started download of a10l_867_atmos.gz
Tue 28 Apr 2020 11:34:06 BST | climateprediction.net | Started download of ic_N216_2002_12_000004.nc.gz
Tue 28 Apr 2020 11:34:06 BST | climateprediction.net | Started download of HAPPI_1.5K_sst_N216_2095-10-01_2096-04-30.gz
Tue 28 Apr 2020 11:34:08 BST | climateprediction.net | Temporarily failed download of a10l_867_atmos.gz: connect() failed
Tue 28 Apr 2020 11:34:08 BST | climateprediction.net | Temporarily failed download of ic_N216_2002_12_000004.nc.gz: connect() failed
Tue 28 Apr 2020 11:34:08 BST | climateprediction.net | Temporarily failed download of HAPPI_1.5K_sst_N216_2095-10-01_2096-04-30.gz: connect() failed
These two sets of messages don't seem to tie up.

Machine is just completing task 21922323, successfully downloaded 22 April. Anyone know of any changes since then?

Edit - second failure is task 21922854 - still in reporting delay, but similar symptoms locally.
ID: 62346 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 62347 - Posted: 28 Apr 2020, 17:51:22 UTC - in response to Message 62346.  

I currently have one that has been stuck in download for two hours.
It has failed on three other machines.
I don't know if there is any connection or not.
https://www.cpdn.org/workunit.php?wuid=12016400
ID: 62347 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,014,785
RAC: 20,946
Message 62348 - Posted: 28 Apr 2020, 18:29:00 UTC - in response to Message 62347.  

It has failed on three other machines.
I don't know if there is any connection or not.


Three previous errors are missing 32bit libraries.

Will contact project but might not manage till tomorrow morning so someone else might beat me to it.
ID: 62348 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,705,793
RAC: 9,655
Message 62349 - Posted: 28 Apr 2020, 19:03:31 UTC - in response to Message 62348.  

Thanks Dave. Other tasks are running to completion, so it's not the libs here. A third has just failed - I'd better set NNT overnight.
ID: 62349 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 62350 - Posted: 28 Apr 2020, 21:01:37 UTC

Problem reported.
ID: 62350 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 62353 - Posted: 29 Apr 2020, 0:11:44 UTC

Had a reply to say that it should be fixed now.
ID: 62353 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 62354 - Posted: 29 Apr 2020, 0:54:34 UTC - in response to Message 62353.  

Yes, the download finished and all is OK.
ID: 62354 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 62355 - Posted: 29 Apr 2020, 2:43:05 UTC - in response to Message 62354.  

That's good.
I'm half way through, so a few more days yet.
ID: 62355 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,705,793
RAC: 9,655
Message 62357 - Posted: 29 Apr 2020, 7:44:45 UTC
Last modified: 29 Apr 2020, 8:18:29 UTC

A second (near identical) machine has downloaded a new task which is currently ready to run. I'll re-enable the machine with yesterday's problem.

Has anyone heard what the problem was? I couldn't make any sense of the mixed messages.

Edit - problem machine has downloaded new work and is running again.
ID: 62357 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,014,785
RAC: 20,946
Message 62360 - Posted: 29 Apr 2020, 8:43:37 UTC - in response to Message 62349.  

Thanks Dave. Other tasks are running to completion, so it's not the libs here. A third has just failed - I'd better set NNT overnight.


I would have been very surprised if you were missing the libs Richard!
ID: 62360 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 62428 - Posted: 15 May 2020, 9:46:37 UTC

I have a stuck download of batch 867 WU

Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | [http] HTTP_OP::init_get(): http://download.cpdn.org/download//batch_867/workunits/hadam4h_a17c_209511_4_867_012013459.zip
Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | Started download of hadam4h_a17c_209511_4_867_012013459.zip
Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | [http] HTTP_OP::init_get(): http://download.cpdn.org/download//batch_867/ancils/a17c_867_atmos.gz
Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | Started download of a17c_867_atmos.gz
Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | [http] [ID#71828] Info: Connection 3859 seems to be dead!
Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | [http] [ID#71828] Info: Closing connection 3859
Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | [http] [ID#71828] Info: TLSv1.2 (OUT), TLS alert, Client hello (1):
Fri 15 May 2020 12:38:32 PM EEST | climateprediction.net | [http] [ID#71829] Info: Found bundle for host download.cpdn.org: 0x559f10a67390 [serially]
Fri 15 May 2020 12:38:33 PM EEST | climateprediction.net | [http] [ID#71828] Info: Trying 129.67.193.131...
..........................
Fri 15 May 2020 12:42:34 PM EEST | climateprediction.net | Temporarily failed download of ic_N216_2002_12_000004.nc.gz: transient HTTP error
Fri 15 May 2020 12:42:34 PM EEST | climateprediction.net | Backing off 00:04:42 on download of ic_N216_2002_12_000004.nc.gz
Fri 15 May 2020 12:42:34 PM EEST | climateprediction.net | Temporarily failed download of HAPPI_1.5K_sst_N216_2095-10-01_2096-04-30.gz: transient HTTP error
Fri 15 May 2020 12:42:34 PM EEST | climateprediction.net | Backing off 00:04:19 on download of HAPPI_1.5K_sst_N216_2095-10-01_2096-04-30.gz
ID: 62428 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,705,793
RAC: 9,655
Message 63925 - Posted: 30 Apr 2021, 9:29:58 UTC

It's me again - and maybe with a repeat of the same problem. Two new task downloads failed this morning:

hadam4h_20c6_209305_5_903_012079897_0, WU created 15 Apr 2021, 10:43:51 UTC
hadam4h_1191_209805_5_902_012078980_0, WU created 9 Apr 2021, 14:25:29 UTC

stderr online reads

<core_client_version>7.16.16</core_client_version>
<![CDATA[
<message>
WU download error: couldn't get input files:
<file_xfer_error>
  <file_name>1191_902_atmos.gz</file_name>
  <error_code>-119 (md5 checksum failed for file)</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>ic_N216_2003_03_000052_f.nc.gz</file_name>
  <error_code>-119 (md5 checksum failed for file)</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>HAPPI_1.5K_sst_N216_2098-04-01_2098-10-30.gz</file_name>
  <error_code>-119 (md5 checksum failed for file)</error_code>
</file_xfer_error>
<file_xfer_error>
  <file_name>so2dms_rcp26-2095_N216_208912-210112.gz</file_name>
  <error_code>-119 (md5 checksum failed for file)</error_code>
</file_xfer_error>
</message>
]]>
and similar for three files from the other task.

Local event log contains multiple lines like
30/04/2021 09:26:30 | climateprediction.net | Temporarily failed download of 1191_902_atmos.gz: connect() failed
but no other clue.
Machine had just completed a UK Met Office HadAM4 at N216 resolution task (successful) and was downloading replacements on report. Another task is running fine, and tasks from other projects are downloading normally.
ID: 63925 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,014,785
RAC: 20,946
Message 63926 - Posted: 30 Apr 2021, 10:07:28 UTC

It's me again - and maybe with a repeat of the same problem. Two new task downloads failed this morning:


I have sent Andy a message.
ID: 63926 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,014,785
RAC: 20,946
Message 63927 - Posted: 30 Apr 2021, 10:28:33 UTC

HI Dave,

Thanks, yes this is a problem I am aware of, it's a problem with downloads at the moment. I am going to be re-diverting them.

Best wishes,

Andy
ID: 63927 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,014,785
RAC: 20,946
Message 63928 - Posted: 30 Apr 2021, 12:05:49 UTC
Last modified: 30 Apr 2021, 13:09:37 UTC

And more from Andy,

No stuck files visible to the naked eye, but a restart has cleared the problem. But I have to wait another hour before testing again...
ID: 63928 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 63929 - Posted: 30 Apr 2021, 12:10:56 UTC

Mine seem to download OK.

This one appears in the task list, but it is not running yet because four others are ahead of it.

Thu 29 Apr 2021 11:59:58 PM EDT | climateprediction.net | Sending scheduler request: To fetch work.
Thu 29 Apr 2021 11:59:58 PM EDT | climateprediction.net | Requesting new tasks for CPU
Fri 30 Apr 2021 12:00:00 AM EDT | climateprediction.net | Scheduler request completed: got 1 new tasks
Fri 30 Apr 2021 12:00:00 AM EDT | climateprediction.net | Project requested delay of 3636 seconds
Fri 30 Apr 2021 12:00:03 AM EDT | climateprediction.net | Started download of hadam4h_a0v0_200911_4_852_011938001.zip
Fri 30 Apr 2021 12:00:03 AM EDT | climateprediction.net | Started download of a0v0_852_atmos.gz
Fri 30 Apr 2021 12:00:05 AM EDT | climateprediction.net | Finished download of hadam4h_a0v0_200911_4_852_011938001.zip
Fri 30 Apr 2021 12:00:05 AM EDT | climateprediction.net | Started download of ic_N216_2003_11_000042.nc.gz
Fri 30 Apr 2021 12:00:20 AM EDT | climateprediction.net | Finished download of ic_N216_2003_11_000042.nc.gz
Fri 30 Apr 2021 12:00:20 AM EDT | climateprediction.net | Started download of ALLclim_ancil_7mon_OSTIA_sst_N216_2009-10-01_2010-04-30.gz
Fri 30 Apr 2021 12:00:24 AM EDT | climateprediction.net | Finished download of ALLclim_ancil_7mon_OSTIA_sst_N216_2009-10-01_2010-04-30.gz
Fri 30 Apr 2021 12:00:24 AM EDT | climateprediction.net | Started download of ALLclim_ancil_7mon_OSTIA_ice_v2_N216_2009-10-01_2010-04-30.gz
Fri 30 Apr 2021 12:00:30 AM EDT | climateprediction.net | Finished download of ALLclim_ancil_7mon_OSTIA_ice_v2_N216_2009-10-01_2010-04-30.gz
Fri 30 Apr 2021 12:00:30 AM EDT | climateprediction.net | Started download of so2dms_rcp45_N216_2009_2020.gz
Fri 30 Apr 2021 12:00:55 AM EDT | climateprediction.net | Finished download of a0v0_852_atmos.gz
Fri 30 Apr 2021 12:00:55 AM EDT | climateprediction.net | Started download of ozone_rcp45_N216L38_2009_2020v2.gz
Fri 30 Apr 2021 12:00:57 AM EDT | climateprediction.net | Finished download of ozone_rcp45_N216L38_2009_2020v2.gz
Fri 30 Apr 2021 12:01:02 AM EDT | climateprediction.net | Finished download of so2dms_rcp45_N216_2009_2020.gz
ID: 63929 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,705,793
RAC: 9,655
Message 63930 - Posted: 30 Apr 2021, 12:48:51 UTC - in response to Message 63928.  

Ad more from Andy,

No stuck files visible to the naked eye, but a restart has cleared the problem. But I have to wait another hour before testing again...
Actually, that was my reply...

The hour is up, and I can confirm I've got my replacement task(s), and all files have downloaded properly. If anyone to the west of me wakes up to find that they suffered the same error as me overnight - you may find that your computer is reluctant to request new work. A simple BOINC restart is all that's needed to clear the blockage.

For the record, Andy replied

This has now been solved.
ID: 63930 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,705,793
RAC: 9,655
Message 64432 - Posted: 8 Sep 2021, 8:35:24 UTC

ID: 64432 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,014,785
RAC: 20,946
Message 64433 - Posted: 8 Sep 2021, 12:00:37 UTC - in response to Message 64432.  

They're failing again:

hadam4h_2173_209805_5_903_012081010_0
hadam4h_d09g_206711_5_897_012067561_2

And as Richard noted in the BOINC forums, the project has been taken off line. (Servers to do with work, but the forums are clearly still working.
ID: 64433 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64434 - Posted: 8 Sep 2021, 15:09:54 UTC - in response to Message 64433.  

The systems that accept results must be working.
Wed 08 Sep 2021 10:56:52 AM EDT | climateprediction.net | Sending scheduler request: To send trickle-up message.
Wed 08 Sep 2021 10:56:52 AM EDT | climateprediction.net | Reporting 1 completed tasks
Wed 08 Sep 2021 10:56:52 AM EDT | climateprediction.net | Not requesting tasks: don't need ()
Wed 08 Sep 2021 10:56:54 AM EDT | climateprediction.net | Scheduler request completed
Wed 08 Sep 2021 10:56:54 AM EDT | climateprediction.net | Project is temporarily shut down for maintenance
Wed 08 Sep 2021 10:56:54 AM EDT | climateprediction.net | Project requested delay of 3600 seconds

ID: 64434 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Download errors on UK Met Office HadAM4 at N216 resolution v8.52 tasks

©2024 cpdn.org