climateprediction.net (CPDN) home page
Thread 'Upload failures'

Thread 'Upload failures'

Message boards : Number crunching : Upload failures
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 19 · Next

AuthorMessage
[P3D] Crashtest

Send message
Joined: 2 Apr 05
Posts: 16
Credit: 19,179,312
RAC: 13,386
Message 60533 - Posted: 1 Jul 2019, 16:00:24 UTC - in response to Message 60532.  

So CPDN is strange:

01.07.2019 17:56:08 | climateprediction.net | Temporarily failed upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_21.zip: transient HTTP error
01.07.2019 17:56:08 | climateprediction.net | Backing off 04:31:03 on upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_21.zip
01.07.2019 17:56:08 | climateprediction.net | Started upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_23.zip
01.07.2019 17:56:09 | | Internet access OK - project servers may be temporarily down.
01.07.2019 17:56:28 | climateprediction.net | Temporarily failed upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_23.zip: transient HTTP error
01.07.2019 17:56:28 | climateprediction.net | Backing off 04:44:09 on upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_23.zip
01.07.2019 17:56:28 | climateprediction.net | Started upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_24.zip
01.07.2019 17:56:29 | | Project communication failed: attempting access to reference site
01.07.2019 17:56:30 | | Internet access OK - project servers may be temporarily down.
01.07.2019 17:56:46 | climateprediction.net | Finished upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_22.zip
01.07.2019 17:56:46 | climateprediction.net | Started upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_restart.zip
01.07.2019 17:56:50 | | Project communication failed: attempting access to reference site
01.07.2019 17:56:50 | climateprediction.net | Temporarily failed upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_24.zip: connect() failed
01.07.2019 17:56:50 | climateprediction.net | Backing off 03:47:56 on upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_24.zip
01.07.2019 17:56:50 | climateprediction.net | Started upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_out.zip
01.07.2019 17:56:51 | | Internet access OK - project servers may be temporarily down.
01.07.2019 17:57:36 | climateprediction.net | Finished upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_restart.zip
01.07.2019 17:57:36 | climateprediction.net | Started upload of wah2_sam50_n61i_201612_25_822_011883535_0_r516235200_2.zip
01.07.2019 17:58:33 | climateprediction.net | Finished upload of wah2_sam50_n61i_201612_25_822_011883535_0_r516235200_2.zip
01.07.2019 17:58:33 | climateprediction.net | Started upload of wah2_sam50_n61i_201612_25_822_011883535_0_r516235200_3.zip
01.07.2019 17:58:41 | | Project communication failed: attempting access to reference site
01.07.2019 17:58:41 | climateprediction.net | Temporarily failed upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_out.zip: transient HTTP error
01.07.2019 17:58:41 | climateprediction.net | Backing off 04:51:54 on upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_out.zip
01.07.2019 17:58:41 | climateprediction.net | Started upload of wah2_sam50_n61i_201612_25_822_011883535_0_r516235200_4.zip
01.07.2019 17:58:42 | | Internet access OK - project servers may be temporarily down.


Some uploads at 1400 KBps and some transient HTTP errors !?! WTF
ID: 60533 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,019,755
RAC: 20,934
Message 60534 - Posted: 1 Jul 2019, 16:56:03 UTC

Some uploads at 1400 KBps and some transient HTTP errors !?! WTF


With thousands of computers hammering the servers this is to be expected. The server will be handling the maximum number of simultaneous transfers it can and can only accept a new one when one is finished. I am only enabling internet access when I have at least two transfers not affected by this server and for an hour in the middle of the night. I expect by tomorrow evening, things may have calmed down. (Assuming the server hasn't filled up again!)
ID: 60534 · Report as offensive     Reply Quote
[P3D] Crashtest

Send message
Joined: 2 Apr 05
Posts: 16
Credit: 19,179,312
RAC: 13,386
Message 60535 - Posted: 1 Jul 2019, 17:03:13 UTC - in response to Message 60534.  

This is nothing special! World Community Grid get hammered with serveral TB every day with more computers than CPDN... even more during Pentathlon
ID: 60535 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,019,755
RAC: 20,934
Message 60536 - Posted: 1 Jul 2019, 18:42:27 UTC - in response to Message 60535.  
Last modified: 1 Jul 2019, 18:52:12 UTC

This is nothing special! World Community Grid get hammered with serveral TB every day with more computers than CPDN... even more during Pentathlon


But much smaller uploads. Most for wcg are less than 1MB. Most of this backlog are in the region of 90MB I wouldn't be surprised if the backlog is over 100 TB. I am sure that once everything is sorted to move the data on in a timely manner the data centre being used will cope but it will be a while till the reside of the problem is cleared.

I can say that my very low number of uploads have cleared

Edit: I see that the WCG climate models in beta have 128MB uploads. IfThey may be fine. I don't know what their budget for infrastructure is compared with CPDN.
ID: 60536 · Report as offensive     Reply Quote
Speedy

Send message
Joined: 20 Jul 05
Posts: 25
Credit: 414,873
RAC: 406
Message 60538 - Posted: 2 Jul 2019, 1:49:34 UTC

With the backlog been 100 TB for argument sake. It will take 16.6667 days to clear the backlog this will be done by 20th July at 6 TB a day. I just did the maths from the numbers in the previous post
ID: 60538 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 60539 - Posted: 2 Jul 2019, 2:34:51 UTC
Last modified: 2 Jul 2019, 2:38:41 UTC

Figuring "how long to clear uploads"
Right now one of my 3 fast boxes (Ryzen 2 2700X) is the only one I'm letting upload at the moment. It has about 40 92 MB safr50 queued for upload and about 160 76MB sam50 uploads queued. It has been running all through this recent incident, but disconnected from the internet for a part of that.
It uploaded about 80 of various sizes in the last 3 hours. So at least 12 hours to clear its upload queue.
Two more fast boxes will take another 30 hours. The old slow ones, not much worry

So I get a significantly sooner time to catch up than Speedy's 15.7 days. Nearer 4 days at a guess. But we'll all see how it goes.

Remember one of Murphy's mottoes "Constants aren't , variables won't"
ID: 60539 · Report as offensive     Reply Quote
[P3D] Crashtest

Send message
Joined: 2 Apr 05
Posts: 16
Credit: 19,179,312
RAC: 13,386
Message 60540 - Posted: 2 Jul 2019, 5:49:16 UTC

At the moment my computers have more than 3000 files (290GB) waiting for upload, 91 done !?!
ID: 60540 · Report as offensive     Reply Quote
Mephist0

Send message
Joined: 21 Feb 08
Posts: 47
Credit: 7,929,915
RAC: 0
Message 60542 - Posted: 2 Jul 2019, 7:22:15 UTC
Last modified: 2 Jul 2019, 7:24:21 UTC

I have 71 files to upload. I will try to upload a few just to see if my tranfer is now working since i had issues with transient errors even before the space issue begun..

I have configured just to upload one file at a time and also max 1MiB/s in upload speed. Seems my upload was at 0 for 3,5 minutes then it started to upload.

1st upload: 74MB file, Transient HTTPS error :(
2nd upload: 43MB file (sam50) Transient HTTP error...
3rd upload: 43MB file (anz50) Transient HTTP error...

So.. Seems i will not be able to get Climate prediction to work with my Proxy setup. Other projects works fine.. So i dont get it why Climate Prediction is not working?

I also get the following error for some reason...
2019-07-02 09:23:39 | | Project communication failed: attempting access to reference site
ID: 60542 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,019,755
RAC: 20,934
Message 60543 - Posted: 2 Jul 2019, 7:51:04 UTC - in response to Message 60542.  

Given that some have so much data to transfer it will take days, These errors can be expected to continue on and off for a while longer.
ID: 60543 · Report as offensive     Reply Quote
Mephist0

Send message
Joined: 21 Feb 08
Posts: 47
Credit: 7,929,915
RAC: 0
Message 60544 - Posted: 2 Jul 2019, 9:34:27 UTC

Ok. I will try tomorrow and the next day. Then im on vacation for 3 weeks then another try ;)
ID: 60544 · Report as offensive     Reply Quote
blyons123

Send message
Joined: 21 Sep 15
Posts: 8
Credit: 4,854,775
RAC: 0
Message 60545 - Posted: 2 Jul 2019, 9:47:05 UTC

Hello,
I haven't been able to upload since I started project again 2 weeks ago. Log shows below each time. World Community uploads have no problem.

7/2/2019 5:31:24 PM | climateprediction.net | Temporarily failed upload of wah2_safr50_n1ej_201512_13_819_011864601_0_r452315379_5.zip: transient HTTP error
7/2/2019 5:31:24 PM | climateprediction.net | Backing off 03:05:52 on upload of wah2_safr50_n1ej_201512_13_819_011864601_0_r452315379_5.zip
7/2/2019 5:31:26 PM | | Internet access OK - project servers may be temporarily down.
ID: 60545 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,019,755
RAC: 20,934
Message 60546 - Posted: 2 Jul 2019, 9:57:15 UTC - in response to Message 60545.  

I haven't been able to upload since I started project again 2 weeks ago. Log shows below each time. World Community uploads have no problem.


One of the servers couldn't offload data as fast as it was coming in and filled up. It is now taking zips again but with I think over 100TB of zips (at a guess) trying to get through, it is going to be a while till the pressure eases off. I wouldn't be surprised if it takes another couple of days or more before the errors stop completely.
ID: 60546 · Report as offensive     Reply Quote
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 175
Credit: 6,242,691
RAC: 699
Message 60547 - Posted: 2 Jul 2019, 10:08:54 UTC - in response to Message 60535.  
Last modified: 2 Jul 2019, 10:11:32 UTC

This is nothing special! World Community Grid get hammered with several TB every day with more computers than CPDN... even more during Pentathlon


I'm not sure that this is comparable.
WCG isn't a single project and uses BOINC to link to projects just as one can do via BOINC directly. So the total upload to WCG is spread over all those projects that are currently in operation. I would assume uploads to a WCG project go directly to the computers run by the project, just as in the case of CPDN, not to some mega computing site.
The 128 Mb files that Dave mentions, appear to have been for a one off completed project, not at all similar to the continuous (more or less) streams of CPDN research projects.
ID: 60547 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 60548 - Posted: 2 Jul 2019, 10:40:45 UTC - in response to Message 60547.  

WCG is a huge operation. IBM folded their dedicated WCG servers into the Cloud (whatever that is) a couple of years ago, and I think has server capacity all over the place. The projects are not remotely similar.
ID: 60548 · Report as offensive     Reply Quote
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 175
Credit: 6,242,691
RAC: 699
Message 60549 - Posted: 2 Jul 2019, 10:58:37 UTC - in response to Message 60548.  

Hi Jim, What sort of advantage does WCG offer volunteer computing over connecting via BOINC directly, since one gets connected to BOINC anyway when joining WCG.
ID: 60549 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 60554 - Posted: 2 Jul 2019, 13:12:14 UTC - in response to Message 60549.  

Hi Jim, What sort of advantage does WCG offer volunteer computing over connecting via BOINC directly, since one gets connected to BOINC anyway when joining WCG.

WCG selects the projects by their own team of scientific experts, so you get some level of quality control. And they thoroughly test out the scientific applications before releasing them, and work with the scientists to package up their work into usable chunks, relieving the scientists of that burden.

Most importantly, they run the data center (which has large upload/download bandwidth). They are world experts at that, and it is very (very) reliable and fault tolerant. A number of projects are not, through no fault of their own. But they are educational institutions, not commercial cloud operations.
ID: 60554 · Report as offensive     Reply Quote
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 175
Credit: 6,242,691
RAC: 699
Message 60560 - Posted: 2 Jul 2019, 18:02:11 UTC - in response to Message 60554.  

Thanks Jim. So WCG is an IBM CLOUD based operation using BOINC in the same way as CPDN on behalf of a group of research projects which come & go. As you say, a different level of resources.
ID: 60560 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 60561 - Posted: 2 Jul 2019, 19:20:18 UTC - in response to Message 60560.  
Last modified: 2 Jul 2019, 19:20:37 UTC

Yes, IBM donates the servers, IP links and their IT experts as a public service. It quite admirable. But you do lose the smaller projects that may be scientifically interesting, since WCG insists on a minimum amount of work in order to make it worth their effort. The have to thoroughly vet the applications to make sure there are no security problems, for example.

In fact, they release their own version of BOINC, but it is usually several releases behind, since they have to test it out for security flaws too. I always use the latest standard version of BOINC.
ID: 60561 · Report as offensive     Reply Quote
Mephist0

Send message
Joined: 21 Feb 08
Posts: 47
Credit: 7,929,915
RAC: 0
Message 60564 - Posted: 3 Jul 2019, 8:15:41 UTC

Has anyone been able to upload anything to Jasmin recently?

I have some anz50 tasks that is not using Jasmin server i guess, any idea how i could get them to upload instead? It seems it tries the same files over and over again..
ID: 60564 · Report as offensive     Reply Quote
Wilgard

Send message
Joined: 30 Mar 10
Posts: 12
Credit: 2,609,109
RAC: 87
Message 60565 - Posted: 3 Jul 2019, 8:40:05 UTC - in response to Message 60564.  

Hi,
Monday 1st July I had 10 works units pending to upload and now I have only 4.
After checking boinc's logs 6 of them succeed yesterday at 01am.
So it seems that "sometimes" it works even if most of the time it does not.
So being patient is unfortunately the only solution
ID: 60565 · Report as offensive     Reply Quote
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 19 · Next

Message boards : Number crunching : Upload failures

©2024 cpdn.org