Message boards : Number crunching : Upload failures
Message board moderation
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 19 · Next
Author | Message |
---|---|
Send message Joined: 2 Apr 05 Posts: 16 Credit: 19,179,312 RAC: 13,386 |
So CPDN is strange: 01.07.2019 17:56:08 | climateprediction.net | Temporarily failed upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_21.zip: transient HTTP error 01.07.2019 17:56:08 | climateprediction.net | Backing off 04:31:03 on upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_21.zip 01.07.2019 17:56:08 | climateprediction.net | Started upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_23.zip 01.07.2019 17:56:09 | | Internet access OK - project servers may be temporarily down. 01.07.2019 17:56:28 | climateprediction.net | Temporarily failed upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_23.zip: transient HTTP error 01.07.2019 17:56:28 | climateprediction.net | Backing off 04:44:09 on upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_23.zip 01.07.2019 17:56:28 | climateprediction.net | Started upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_24.zip 01.07.2019 17:56:29 | | Project communication failed: attempting access to reference site 01.07.2019 17:56:30 | | Internet access OK - project servers may be temporarily down. 01.07.2019 17:56:46 | climateprediction.net | Finished upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_22.zip 01.07.2019 17:56:46 | climateprediction.net | Started upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_restart.zip 01.07.2019 17:56:50 | | Project communication failed: attempting access to reference site 01.07.2019 17:56:50 | climateprediction.net | Temporarily failed upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_24.zip: connect() failed 01.07.2019 17:56:50 | climateprediction.net | Backing off 03:47:56 on upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_24.zip 01.07.2019 17:56:50 | climateprediction.net | Started upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_out.zip 01.07.2019 17:56:51 | | Internet access OK - project servers may be temporarily down. 01.07.2019 17:57:36 | climateprediction.net | Finished upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_restart.zip 01.07.2019 17:57:36 | climateprediction.net | Started upload of wah2_sam50_n61i_201612_25_822_011883535_0_r516235200_2.zip 01.07.2019 17:58:33 | climateprediction.net | Finished upload of wah2_sam50_n61i_201612_25_822_011883535_0_r516235200_2.zip 01.07.2019 17:58:33 | climateprediction.net | Started upload of wah2_sam50_n61i_201612_25_822_011883535_0_r516235200_3.zip 01.07.2019 17:58:41 | | Project communication failed: attempting access to reference site 01.07.2019 17:58:41 | climateprediction.net | Temporarily failed upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_out.zip: transient HTTP error 01.07.2019 17:58:41 | climateprediction.net | Backing off 04:51:54 on upload of wah2_sam50_n1zc_199012_24_822_011878923_0_r472347994_out.zip 01.07.2019 17:58:41 | climateprediction.net | Started upload of wah2_sam50_n61i_201612_25_822_011883535_0_r516235200_4.zip 01.07.2019 17:58:42 | | Internet access OK - project servers may be temporarily down. Some uploads at 1400 KBps and some transient HTTP errors !?! WTF |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,019,755 RAC: 20,934 |
Some uploads at 1400 KBps and some transient HTTP errors !?! WTF With thousands of computers hammering the servers this is to be expected. The server will be handling the maximum number of simultaneous transfers it can and can only accept a new one when one is finished. I am only enabling internet access when I have at least two transfers not affected by this server and for an hour in the middle of the night. I expect by tomorrow evening, things may have calmed down. (Assuming the server hasn't filled up again!) |
Send message Joined: 2 Apr 05 Posts: 16 Credit: 19,179,312 RAC: 13,386 |
This is nothing special! World Community Grid get hammered with serveral TB every day with more computers than CPDN... even more during Pentathlon |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,019,755 RAC: 20,934 |
This is nothing special! World Community Grid get hammered with serveral TB every day with more computers than CPDN... even more during Pentathlon But much smaller uploads. Most for wcg are less than 1MB. Most of this backlog are in the region of 90MB I wouldn't be surprised if the backlog is over 100 TB. I am sure that once everything is sorted to move the data on in a timely manner the data centre being used will cope but it will be a while till the reside of the problem is cleared. I can say that my very low number of uploads have cleared Edit: I see that the WCG climate models in beta have 128MB uploads. IfThey may be fine. I don't know what their budget for infrastructure is compared with CPDN. |
Send message Joined: 20 Jul 05 Posts: 25 Credit: 414,873 RAC: 406 |
With the backlog been 100 TB for argument sake. It will take 16.6667 days to clear the backlog this will be done by 20th July at 6 TB a day. I just did the maths from the numbers in the previous post |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Figuring "how long to clear uploads" Right now one of my 3 fast boxes (Ryzen 2 2700X) is the only one I'm letting upload at the moment. It has about 40 92 MB safr50 queued for upload and about 160 76MB sam50 uploads queued. It has been running all through this recent incident, but disconnected from the internet for a part of that. It uploaded about 80 of various sizes in the last 3 hours. So at least 12 hours to clear its upload queue. Two more fast boxes will take another 30 hours. The old slow ones, not much worry So I get a significantly sooner time to catch up than Speedy's 15.7 days. Nearer 4 days at a guess. But we'll all see how it goes. Remember one of Murphy's mottoes "Constants aren't , variables won't" |
Send message Joined: 2 Apr 05 Posts: 16 Credit: 19,179,312 RAC: 13,386 |
At the moment my computers have more than 3000 files (290GB) waiting for upload, 91 done !?! |
Send message Joined: 21 Feb 08 Posts: 47 Credit: 7,929,915 RAC: 0 |
I have 71 files to upload. I will try to upload a few just to see if my tranfer is now working since i had issues with transient errors even before the space issue begun.. I have configured just to upload one file at a time and also max 1MiB/s in upload speed. Seems my upload was at 0 for 3,5 minutes then it started to upload. 1st upload: 74MB file, Transient HTTPS error :( 2nd upload: 43MB file (sam50) Transient HTTP error... 3rd upload: 43MB file (anz50) Transient HTTP error... So.. Seems i will not be able to get Climate prediction to work with my Proxy setup. Other projects works fine.. So i dont get it why Climate Prediction is not working? I also get the following error for some reason... 2019-07-02 09:23:39 | | Project communication failed: attempting access to reference site |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,019,755 RAC: 20,934 |
Given that some have so much data to transfer it will take days, These errors can be expected to continue on and off for a while longer. |
Send message Joined: 21 Feb 08 Posts: 47 Credit: 7,929,915 RAC: 0 |
Ok. I will try tomorrow and the next day. Then im on vacation for 3 weeks then another try ;) |
Send message Joined: 21 Sep 15 Posts: 8 Credit: 4,854,775 RAC: 0 |
Hello, I haven't been able to upload since I started project again 2 weeks ago. Log shows below each time. World Community uploads have no problem. 7/2/2019 5:31:24 PM | climateprediction.net | Temporarily failed upload of wah2_safr50_n1ej_201512_13_819_011864601_0_r452315379_5.zip: transient HTTP error 7/2/2019 5:31:24 PM | climateprediction.net | Backing off 03:05:52 on upload of wah2_safr50_n1ej_201512_13_819_011864601_0_r452315379_5.zip 7/2/2019 5:31:26 PM | | Internet access OK - project servers may be temporarily down. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,019,755 RAC: 20,934 |
I haven't been able to upload since I started project again 2 weeks ago. Log shows below each time. World Community uploads have no problem. One of the servers couldn't offload data as fast as it was coming in and filled up. It is now taking zips again but with I think over 100TB of zips (at a guess) trying to get through, it is going to be a while till the pressure eases off. I wouldn't be surprised if it takes another couple of days or more before the errors stop completely. |
Send message Joined: 15 Jan 11 Posts: 175 Credit: 6,242,691 RAC: 699 |
This is nothing special! World Community Grid get hammered with several TB every day with more computers than CPDN... even more during Pentathlon I'm not sure that this is comparable. WCG isn't a single project and uses BOINC to link to projects just as one can do via BOINC directly. So the total upload to WCG is spread over all those projects that are currently in operation. I would assume uploads to a WCG project go directly to the computers run by the project, just as in the case of CPDN, not to some mega computing site. The 128 Mb files that Dave mentions, appear to have been for a one off completed project, not at all similar to the continuous (more or less) streams of CPDN research projects. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
WCG is a huge operation. IBM folded their dedicated WCG servers into the Cloud (whatever that is) a couple of years ago, and I think has server capacity all over the place. The projects are not remotely similar. |
Send message Joined: 15 Jan 11 Posts: 175 Credit: 6,242,691 RAC: 699 |
Hi Jim, What sort of advantage does WCG offer volunteer computing over connecting via BOINC directly, since one gets connected to BOINC anyway when joining WCG. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
Hi Jim, What sort of advantage does WCG offer volunteer computing over connecting via BOINC directly, since one gets connected to BOINC anyway when joining WCG. WCG selects the projects by their own team of scientific experts, so you get some level of quality control. And they thoroughly test out the scientific applications before releasing them, and work with the scientists to package up their work into usable chunks, relieving the scientists of that burden. Most importantly, they run the data center (which has large upload/download bandwidth). They are world experts at that, and it is very (very) reliable and fault tolerant. A number of projects are not, through no fault of their own. But they are educational institutions, not commercial cloud operations. |
Send message Joined: 15 Jan 11 Posts: 175 Credit: 6,242,691 RAC: 699 |
Thanks Jim. So WCG is an IBM CLOUD based operation using BOINC in the same way as CPDN on behalf of a group of research projects which come & go. As you say, a different level of resources. |
Send message Joined: 15 Jan 06 Posts: 637 Credit: 26,751,529 RAC: 653 |
Yes, IBM donates the servers, IP links and their IT experts as a public service. It quite admirable. But you do lose the smaller projects that may be scientifically interesting, since WCG insists on a minimum amount of work in order to make it worth their effort. The have to thoroughly vet the applications to make sure there are no security problems, for example. In fact, they release their own version of BOINC, but it is usually several releases behind, since they have to test it out for security flaws too. I always use the latest standard version of BOINC. |
Send message Joined: 21 Feb 08 Posts: 47 Credit: 7,929,915 RAC: 0 |
Has anyone been able to upload anything to Jasmin recently? I have some anz50 tasks that is not using Jasmin server i guess, any idea how i could get them to upload instead? It seems it tries the same files over and over again.. |
Send message Joined: 30 Mar 10 Posts: 12 Credit: 2,609,109 RAC: 87 |
Hi, Monday 1st July I had 10 works units pending to upload and now I have only 4. After checking boinc's logs 6 of them succeed yesterday at 01am. So it seems that "sometimes" it works even if most of the time it does not. So being patient is unfortunately the only solution |
©2024 cpdn.org