climateprediction.net home page
Lots of units completed but they don't transfer (update pending)

Lots of units completed but they don't transfer (update pending)

Questions and Answers : Windows : Lots of units completed but they don't transfer (update pending)
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
JimT

Send message
Joined: 14 May 05
Posts: 4
Credit: 3,042,684
RAC: 0
Message 61877 - Posted: 2 Jan 2020, 12:25:56 UTC

There is at least a page of transfers waiting to happen. This is wah2
ID: 61877 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4529
Credit: 18,661,594
RAC: 14,529
Message 61882 - Posted: 2 Jan 2020, 14:57:03 UTC - in response to Message 61877.  

There is at least a page of transfers waiting to happen. This is wah2


Can you tell us a bit more please? Which batch number. Different regional models go to different servers. If you enable file transfer debug in the event log options, you can see which server they are trying to get to which we can pass on to the project who will then notify wherever the server is if it isn't one of their own. (Nothing will happen now till Monday anyway as the project people are away till then.
ID: 61882 · Report as offensive     Reply Quote
JimT

Send message
Joined: 14 May 05
Posts: 4
Credit: 3,042,684
RAC: 0
Message 61883 - Posted: 2 Jan 2020, 16:34:07 UTC - in response to Message 61882.  

I also do rosetta, seti, and LHC and these are all OK.
Does this give the information you need?

02/01/2020 14:27:33 | climateprediction.net | Started upload of wah2_global_a039_208812_145_727_011556616_0_r778036711_120.zip
02/01/2020 14:27:33 | climateprediction.net | Started upload of wah2_global_a039_208812_145_727_011556616_0_r778036711_121.zip
02/01/2020 14:27:36 | climateprediction.net | Backing off 04:44:34 on upload of wah2_global_a039_208812_145_727_011556616_0_r778036711_120.zip
02/01/2020 14:27:36 | climateprediction.net | Backing off 04:03:07 on upload of wah2_global_a039_208812_145_727_011556616_0_r778036711_121.zip
02/01/2020 16:18:42 | climateprediction.net | Sending scheduler request: To fetch work.
02/01/2020 16:18:42 | climateprediction.net | Requesting new tasks for CPU
02/01/2020 16:18:45 | climateprediction.net | Scheduler request completed: got 0 new tasks
02/01/2020 16:18:45 | climateprediction.net | No tasks sent
ID: 61883 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61885 - Posted: 2 Jan 2020, 19:49:26 UTC - in response to Message 61883.  
Last modified: 2 Jan 2020, 20:01:00 UTC

The reason that you're having problems with cpdn, is that there's something wrong with one or more servers in climate research centers around the world.
And, with your computers hidden, we need you to tell us which one(s).

That short list tells us about one of them, by way of the batch number, which we can now look up.

They're going to upload server 2.
But that batch was issued at the end of March 2018, so you'll need to get a move on if you want to complete them.
The project people have been closing old, unneeded batches recently.

Update
I've just sent an email to Oxford, so now it's wait-and-see.
Hopefully someone is monitoring the emails on their phone.
ID: 61885 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4529
Credit: 18,661,594
RAC: 14,529
Message 61890 - Posted: 3 Jan 2020, 15:47:47 UTC

I've just sent an email to Oxford, so now it's wait-and-see.
Hopefully someone is monitoring the emails on their phone.


Andy has informed us the link is now working again. This may not resolve them all if some are going to different servers around the globe so do post again if some are still stuck.
ID: 61890 · Report as offensive     Reply Quote
JimT

Send message
Joined: 14 May 05
Posts: 4
Credit: 3,042,684
RAC: 0
Message 61893 - Posted: 3 Jan 2020, 18:34:56 UTC - in response to Message 61890.  

I've just sent an email to Oxford, so now it's wait-and-see.
Hopefully someone is monitoring the emails on their phone.


Andy has informed us the link is now working again. This may not resolve them all if some are going to different servers around the globe so do post again if some are still stuck.


The transfer page is now empty - many thanks.
I was expecting it would sort itself out but it never did.
In future I will come here sooner if something similar happens.
ID: 61893 · Report as offensive     Reply Quote
ganderson

Send message
Joined: 18 Feb 14
Posts: 1
Credit: 1,448,699
RAC: 0
Message 61939 - Posted: 9 Jan 2020, 0:02:34 UTC

I have the same issue many batches completed they transfer get to 100% and then say waiting for retry.
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
Jimatepredictio..
wah2_sam25_s11y_199112_13_719_011508850_2_r2007255732_9.zip wah2_sam25_s11 y_199112_13_719_011508850_2_r2007255732_10.zip wah2_sam25_s11 y_199112_13_719_011508850_2_r2007255732_11 .zip wah2_sam25_s11y_199112_13_719_011508850_2_r2007255732_12.zip wah2_sam25_s11 y_199112_13_719_011508850_2_r2007255732_13.zip wah2_sam25_s11y_199112_13_719_011508850_2_r2007255732_restart.zip wah2_global_c3ow_200612_13_845_011915799_1_r64257149,1 .zip wah2_global_c3ow_200612_13_845_011915799_1_r64257149_3.zip wah2_global_c3ow_200612_13_845_011915799_1_r64257149_4.zip wah2_global_c3ow_200612_13_845_011915799_1_r64257149_5.zip wah2_global_c3ow_200612_13_845_011915799_1_r64257149_6.zip wah2_global_c3ow_200612_13_845_011915799_1_r64257149_7.zip wah2_global_c3ow_200612_13_845_011915799_1_r64257149_8.zip wah2_global_c3ow_200612_13_845_011915799_1_r64257149_9.zip wah2_global_c3ow_200612_13_845_011915799_1_r64257149_10.zip wah2_global_c3ow_200612_13_845_011915799_1_r64257149.11 .zip wah2_global_c3ow_200612_13_845_011915799_1_r64257149_12.zip wah2_global_c3ow_200612_13_845_011915799_1_r64257149.13.zip wah2_global_c3ow_200612_13_845_011915799_1_r64257149_restart.zip wah2_eas50_a2fj_201312_24_849_011929536_0_r1609441922_6.zip wah2_eas50_a2fj_201312_24_849_011929536_0_r1609441922_7.zip wah2_eas50_a2fj_201312_24_849_011929536_0_r1609441922_16.zip wah2_eas50_a0cb_200512_24_849_011926828_0_r1059788620_6.zip wah2_eas50_a0cb_200512_24_849_011926828_0_r1059788620.17.zip wah2_ea s50_a 1 wf_201112.24.849.011928848_0_r272312829_3.zip wah2_ea s50_a 1 wf_201112.24.849.011928848_0_r272312829_6.zip wah2_ea s50_a 1 wf_201112.24.849.011928848_0_r272312829_7.zip wah2_ea s50_a 1 wf_201112.24.849.011928848_0_r272312829_12.zip
100.00% 93.42/93.42... 00:53:26 0.00 KBps
100.00% 93.75/93.75... 00:43:12 0.00 KBps
100.00% 94.08/94.08... 00:33:58 0.00 KBps
100.00% 97.37/97.37... 00:31:52 0.00 KBps
100.00% 93.77/93.77... 00:25:44 0.00 KBps
100.00% 111.58/111.... 00:28:35 0.00 KBps
100.00% 64.57/64.57... 00:53:39 0.00 KBps
100.00% 64.65/64.65... 00:37:02 0.00 KBps
100.00% 64.61/64.61... 00:32:33 0.00 KBps
100.00% 64.56/64.56... 00:16:33 0.00 KBps
100.00% 64.36/64.36... 00:14:55 0.00 KBps
100.00% 64.28/64.28... 00:12:42 0.00 KBps
100.00% 64.24/64.24... 00:07:05 0.00 KBps
100.00% 64.17/64.17... 00:04:47 0.00 KBps
100.00% 64.37/64.37... 00:01:08 0.00 KBps
100.00% 64.43/64.43... 00:01:02 0.00 KBps
100.00% 64.50/64.50... 00:01:10 0.00 KBps
100.00% 64.60/64.60... 00:01:08 0.00 KBps
100.00% 37.78/37.78... 00:00:47 0.00 KBps
100.00% 70.58/70.58... 00:06:17 0.00 KBps
100.00% 69.53/69.53... 00:06:03 0.00 KBps
100.00% 70.76/70.76... 00:06:04 0.00 KBps
100.00% 70.58/70.58... 00:03:01 0.00 KBps
100.00% 69.74/69.74... 00:05:00 0.00 KBps
100.00% 69.57/69.57... 00:03:02 0.00 KBps
100.00% 70.61/70.61... 00:04:46 0.00 KBps
100.00% 68.99/68.99... 00:06:18 0.00 KBps
100.00% 69.64/69.64... 00:04:43 0.00 KBps
Upload: retry in 03:37:48 (project backoff: 00:18:04) Upload: retry in 02:50:07 (project backoff: 00:18:04) Upload: retry in 03:41:54 (project backoff: 00:18:04) Upload: retry in 04:57:58 (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04) Upload: pending (project backoff: 00:18:04)

Ok it separated the columns but you get the idea. I have many more this is the first page. Sorry I am not more computer savvy. Hate that all this computing time appears to have been wasted....
ID: 61939 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61940 - Posted: 9 Jan 2020, 3:11:22 UTC - in response to Message 61939.  

Of the ones that are listed:

Batch 719 has been closed. Those models were issued in July 2018, so they're quite old.

Batch 845 has also been closed. They may have gotten enough data back from the others.

Batch 849 is still open, but are going to upload 7, which is often under heavy load.
I'll send an email and see if they can restart it.
ID: 61940 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4529
Credit: 18,661,594
RAC: 14,529
Message 61941 - Posted: 9 Jan 2020, 9:17:48 UTC

The people in Oxford are going to "look" at the server. Apparently upload7 is in Korea. They are also going to talk to the people there and hopefully between them they will get it going again.
ID: 61941 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4529
Credit: 18,661,594
RAC: 14,529
Message 61943 - Posted: 9 Jan 2020, 12:10:30 UTC - in response to Message 61941.  

Andy has had a look at the server and can't see anything wrong. He says it is possible it was down but problem now fixed. Are the uploads still failing?
ID: 61943 · Report as offensive     Reply Quote
Sir Antony Magnus
Avatar

Send message
Joined: 8 Oct 05
Posts: 6
Credit: 593,818
RAC: 0
Message 62103 - Posted: 11 Feb 2020, 14:55:29 UTC

Same issues with WAH2 units (project backoff) constant. Ironically it is only happening to one unit on mine.

2/11/2020 9:51:26 AM | climateprediction.net | Temporarily failed upload of wah2_eas50_30ub_209212_24_866_012006790_0_r566124486_5.zip: transient HTTP error
2/11/2020 9:51:45 AM | climateprediction.net | Temporarily failed upload of wah2_eas50_30ub_209212_24_866_012006790_0_r566124486_6.zip: transient HTTP error

Having said this only applies to the above stated unit, currently running another unit as well and AFAIK it is uploading well, then one can only assume server malfunction. Please advise!

Regards,

Antony
ID: 62103 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 62104 - Posted: 11 Feb 2020, 16:04:35 UTC

Notification about the issue was in this thread: Server down for batches 849, 850. 851 and 866 this weekend.

Discussion about it in this thread: Not possible to upload results since 3 days
ID: 62104 · Report as offensive     Reply Quote
Olof Olsson

Send message
Joined: 31 Aug 04
Posts: 1
Credit: 2,336,127
RAC: 0
Message 62105 - Posted: 11 Feb 2020, 23:06:50 UTC

Hello!

I have noticed that there is a lot of uploads hanging because of transient HTTP errors.

Here is a some rows from my logs:
wah2_eas50_320h_209612_24_866_012008308_0_r1454405223_restart.zip: transient HTTP error
wah2_eas50_30b4_209012_24_866_012006099_1_r2090020072_restart.zip
wah2_eas50_30yj_209212_24_866_012006942_1_r262554203_20.zip: transient HTTP error
2020-02-11 13:42:17 | | Project communication failed: attempting access to reference site
2020-02-11 13:42:19 | | Internet access OK - project servers may be temporarily down.

Totally there is 32 results to send but I seems not to be able to reach the servers.
If there is a scheduled maintenance, it should be good if someone can post a information at
https://www.cpdn.org/server_status.php

Best wish
Olof Olsson
ID: 62105 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4529
Credit: 18,661,594
RAC: 14,529
Message 62109 - Posted: 12 Feb 2020, 7:31:33 UTC - in response to Message 62105.  

The status page gives the status of the servers at Oxford. As the server for these tasks is in South Korea it doesn't show anything. As noted in the links in the post before yours, the project has been asked to contact South Korea for an update on the server maintenance last weekend which was announced. Either something went wrong or it is taking longer than the weekend to sort out whatever needed sorting.
ID: 62109 · Report as offensive     Reply Quote
haschi

Send message
Joined: 3 Oct 19
Posts: 1
Credit: 261,368
RAC: 0
Message 62110 - Posted: 12 Feb 2020, 7:46:03 UTC - in response to Message 62109.  

i have got the same problem. but thanks for the update :)
ID: 62110 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2183
Credit: 64,822,615
RAC: 5,275
Message 62111 - Posted: 12 Feb 2020, 14:37:28 UTC

They think they have solved the problem. Let us know if you still have problems uploading the EAS files.
ID: 62111 · Report as offensive     Reply Quote
Profile Iain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,728,292
RAC: 3,041
Message 62115 - Posted: 12 Feb 2020, 19:18:40 UTC

The EAS version of this problem should now be over - files are uploading and models reporting.
ID: 62115 · Report as offensive     Reply Quote
Anubischick

Send message
Joined: 11 Dec 05
Posts: 5
Credit: 1,653,433
RAC: 0
Message 62169 - Posted: 29 Feb 2020, 0:11:43 UTC

Hi,

It's been a long time since I posted here but I'm having a similar problem. Normally I'd just wait a couple of days and then the uploads would tootle off but I just noticed that it's only these 3 that are 'stuck' since I just had another job transfer back for cpdn. So is there something up with these?

29/02/2020 11:05:17 AM | climateprediction.net | [fxd] starting upload, upload_offset -1
29/02/2020 11:05:17 AM | climateprediction.net | Started upload of wah2_sam50_n40x_201612_25_811_011829449_0_r1577236060_25.zip
29/02/2020 11:05:17 AM | climateprediction.net | [file_xfer] URL: http://upload3.cpdn.org/cgi-bin/file_upload_handler
29/02/2020 11:05:19 AM | climateprediction.net | [fxd] starting upload, upload_offset -1
29/02/2020 11:05:19 AM | climateprediction.net | Started upload of wah2_sam50_n40x_201612_25_811_011829449_0_r1577236060_out.zip
29/02/2020 11:05:19 AM | climateprediction.net | [file_xfer] URL: http://upload3.cpdn.org/cgi-bin/file_upload_handler
29/02/2020 11:05:22 AM | climateprediction.net | [file_xfer] http op done; retval -107 (connect() failed)
29/02/2020 11:05:22 AM | climateprediction.net | [file_xfer] file transfer status -107 (connect() failed)
29/02/2020 11:05:22 AM | climateprediction.net | Temporarily failed upload of wah2_sam50_n40x_201612_25_811_011829449_0_r1577236060_25.zip: connect() failed
29/02/2020 11:05:22 AM | climateprediction.net | Backing off 03:16:40 on upload of wah2_sam50_n40x_201612_25_811_011829449_0_r1577236060_25.zip
29/02/2020 11:05:22 AM | climateprediction.net | [fxd] starting upload, upload_offset -1
29/02/2020 11:05:22 AM | climateprediction.net | Started upload of wah2_sam50_n0fo_201412_24_808_011816086_2_r995725629_20.zip
29/02/2020 11:05:22 AM | climateprediction.net | [file_xfer] URL: http://upload3.cpdn.org/cgi-bin/file_upload_handler
29/02/2020 11:05:23 AM | | Project communication failed: attempting access to reference site
29/02/2020 11:05:24 AM | climateprediction.net | [file_xfer] http op done; retval -107 (connect() failed)
29/02/2020 11:05:24 AM | climateprediction.net | [file_xfer] file transfer status -107 (connect() failed)
29/02/2020 11:05:24 AM | climateprediction.net | Temporarily failed upload of wah2_sam50_n40x_201612_25_811_011829449_0_r1577236060_out.zip: connect() failed
29/02/2020 11:05:24 AM | climateprediction.net | Backing off 04:50:05 on upload of wah2_sam50_n40x_201612_25_811_011829449_0_r1577236060_out.zip
29/02/2020 11:05:25 AM | | Internet access OK - project servers may be temporarily down.
29/02/2020 11:05:27 AM | climateprediction.net | [file_xfer] http op done; retval -107 (connect() failed)
29/02/2020 11:05:27 AM | climateprediction.net | [file_xfer] file transfer status -107 (connect() failed)
29/02/2020 11:05:27 AM | climateprediction.net | Temporarily failed upload of wah2_sam50_n0fo_201412_24_808_011816086_2_r995725629_20.zip: connect() failed
29/02/2020 11:05:27 AM | climateprediction.net | [file_xfer] project-wide xfer delay for 851.290628 sec
29/02/2020 11:05:27 AM | climateprediction.net | Backing off 05:28:24 on upload of wah2_sam50_n0fo_201412_24_808_011816086_2_r995725629_20.zip
29/02/2020 11:05:28 AM | | Project communication failed: attempting access to reference site
29/02/2020 11:05:29 AM | | Internet access OK - project servers may be temporarily down.

Please let me know if I should be prodding something or just sitting back with a cuppa.

Cheers
ID: 62169 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2183
Credit: 64,822,615
RAC: 5,275
Message 62170 - Posted: 29 Feb 2020, 2:22:07 UTC - in response to Message 62169.  

It may be that upload3 server is down, or not able to be accessed for some other reason. Perhaps your other tasks were sent to another upload server and that's why they went up okay. I'll e-mail the IT person who oversee server issues and see if he can resolve this. But it is the weekend so it may not be solved immediately.
ID: 62170 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4529
Credit: 18,661,594
RAC: 14,529
Message 62173 - Posted: 29 Feb 2020, 7:53:36 UTC

Just had confirmation that the physical machine that hosts upload3 is down. Most batches will be going to different servers and won't be affected and as George said, this may not be resolved till after the weekend.
ID: 62173 · Report as offensive     Reply Quote
1 · 2 · Next

Questions and Answers : Windows : Lots of units completed but they don't transfer (update pending)

©2024 cpdn.org