climateprediction.net home page
Completed task fails to upload several times over last few days

Completed task fails to upload several times over last few days

Message boards : Number crunching : Completed task fails to upload several times over last few days
Message board moderation

To post messages, you must log in.

AuthorMessage
candido

Send message
Joined: 15 Nov 10
Posts: 43
Credit: 6,118,949
RAC: 0
Message 64069 - Posted: 20 Jun 2021, 12:52:46 UTC
Last modified: 20 Jun 2021, 12:57:33 UTC

This tasks is not uploading (https://www.cpdn.org/workunit.php?wuid=12089722)
Anyone has any idea on what is happening

Sun 20 Jun 2021 01:36:57 PM WEST | climateprediction.net | Started upload of hadsm4_a0ed_201310_6_911_012089722_2_r478262704_4.zip
Sun 20 Jun 2021 01:37:00 PM WEST | | Project communication failed: attempting access to reference site
Sun 20 Jun 2021 01:37:00 PM WEST | climateprediction.net | Temporarily failed upload of hadsm4_a0ed_201310_6_911_012089722_2_r478262704_4.zip: transient HTTP error
Sun 20 Jun 2021 01:37:00 PM WEST | climateprediction.net | Backing off 05:41:16 on upload of hadsm4_a0ed_201310_6_911_012089722_2_r478262704_4.zip
Sun 20 Jun 2021 01:37:02 PM WEST | | Internet access OK - project servers may be temporarily down.

Many thanks
candido
ID: 64069 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64072 - Posted: 20 Jun 2021, 14:28:58 UTC - in response to Message 64069.  

email sent.
ID: 64072 · Report as offensive     Reply Quote
candido

Send message
Joined: 15 Nov 10
Posts: 43
Credit: 6,118,949
RAC: 0
Message 64073 - Posted: 20 Jun 2021, 18:44:20 UTC - in response to Message 64072.  

Thanks Les Bayliss
ID: 64073 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,962,600
RAC: 21,639
Message 64074 - Posted: 22 Jun 2021, 18:58:45 UTC

Andy says there are no problems showing on the server. Are you still having problems? I see all 6 tricles have uploaded which means you will get your credit. Did zips 5 and 6 upload or do they have the same problem as 4?
ID: 64074 · Report as offensive     Reply Quote
candido

Send message
Joined: 15 Nov 10
Posts: 43
Credit: 6,118,949
RAC: 0
Message 64075 - Posted: 22 Jun 2021, 20:16:04 UTC - in response to Message 64074.  
Last modified: 22 Jun 2021, 20:53:39 UTC

Thanks for your reply.
I just found out that I had a second WU finished today that was also not uploading.
I decided to suspend all tasks and restart the machine.
Hopefully it wouldn't break any of the running WU.
And fortunately it didn't.
And it solved the both uploading problems.
Thanks again
Candido
ID: 64075 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64076 - Posted: 22 Jun 2021, 21:01:11 UTC - in response to Message 64075.  

Good old Reboot - it fix's lots of things.
ID: 64076 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 64077 - Posted: 23 Jun 2021, 5:45:30 UTC - in response to Message 64076.  

Good old Reboot - it fixes lots of things.

_____
Yes, but these WU's hate re-boots.
ID: 64077 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64078 - Posted: 23 Jun 2021, 8:03:03 UTC - in response to Message 64077.  

Yes, but these WU's hate re-boots.


I wonder if this is still true. I just had to replace the UPS on my machine, and that required new software to interface to it. While configuring that I accidentally powered down the machine by turning off the power to it. I did not even press the stop button on the machine, much less doing the normal shutdown. And four CPDN N215 models were running. When the machine came back up, those models resumed without complaint.
ID: 64078 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,962,600
RAC: 21,639
Message 64079 - Posted: 23 Jun 2021, 9:17:31 UTC - in response to Message 64078.  

I wonder if this is still true.


I certainly don't lose as many as I used to. I sometimes get away with no failures on a reboot with 8 tasks running but it is still an issue. My anecdotal perception is that reboots involving a kernel upgrade are more likely to produce a failure but I haven't recorded this so it may not make any difference at all. I do get the odd failure on reboots with CPDN and don't remember any with other projects though as the longest tasks I run from other projects are only a couple of days, the chances of them running during a reboot are much lower when mostly my running them means no work from CPDN.
ID: 64079 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 64080 - Posted: 24 Jun 2021, 4:39:10 UTC - in response to Message 64079.  

Neither do I lose as much as I used to but I still do, now and then. Current lot three. 1) Computer did an auto-re-boot after the update (Peter. I had set it to update after one month). 2) This one I lost due to power failure. 3) The WU was feeling tetchy. It is quite possible we are getting used to the shenanigans of these WU"s? Three WU's out of thirty-six is not bad.
ID: 64080 · Report as offensive     Reply Quote
candido

Send message
Joined: 15 Nov 10
Posts: 43
Credit: 6,118,949
RAC: 0
Message 64134 - Posted: 6 Jul 2021, 20:03:26 UTC

I have another WU not uploading.
Tried the "old reboot" fix a few times and didn't work.
This is the WU:
Tue 06 Jul 2021 08:58:04 PM WEST | climateprediction.net | Started upload of hadsm4_a1cu_201310_6_910_012088963_0_r2057498656_4.zip
Tue 06 Jul 2021 08:58:08 PM WEST | | Project communication failed: attempting access to reference site
Tue 06 Jul 2021 08:58:08 PM WEST | climateprediction.net | Temporarily failed upload of hadsm4_a1cu_201310_6_910_012088963_0_r2057498656_4.zip: transient HTTP error
Tue 06 Jul 2021 08:58:08 PM WEST | climateprediction.net | Backing off 03:10:45 on upload of hadsm4_a1cu_201310_6_910_012088963_0_r2057498656_4.zip
Tue 06 Jul 2021 08:58:09 PM WEST | | Internet access OK - project servers may be temporarily down.

Any ideas?
Thanks
candido
ID: 64134 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64135 - Posted: 6 Jul 2021, 22:10:33 UTC - in response to Message 64134.  

I'll send an email.
ID: 64135 · Report as offensive     Reply Quote
David Wallom
Volunteer moderator
Project administrator

Send message
Joined: 26 Oct 11
Posts: 15
Credit: 3,275,889
RAC: 0
Message 64136 - Posted: 6 Jul 2021, 22:53:41 UTC - in response to Message 64135.  

Hi,

Indeed very odd as all of the other uploads for that WU are sitting waiting in the in_progress folder....?

Can you forward that zip to me directly by email please? david.wallom at oerc.ox.ac.uk

regards

David
ID: 64136 · Report as offensive     Reply Quote
candido

Send message
Joined: 15 Nov 10
Posts: 43
Credit: 6,118,949
RAC: 0
Message 64143 - Posted: 8 Jul 2021, 15:44:04 UTC - in response to Message 64136.  

David, I have just sent the file by email to that address,
Regards
Candido
ID: 64143 · Report as offensive     Reply Quote
candido

Send message
Joined: 15 Nov 10
Posts: 43
Credit: 6,118,949
RAC: 0
Message 64144 - Posted: 8 Jul 2021, 15:55:17 UTC - in response to Message 64136.  

What should I do now. Abort the WU?
IT's still trying to upload...
Thanks
candido
ID: 64144 · Report as offensive     Reply Quote

Message boards : Number crunching : Completed task fails to upload several times over last few days

©2024 cpdn.org