Message boards : Number crunching : The uploads are stuck
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 25 · Next
Author | Message |
---|---|
Send message Joined: 27 Jan 07 Posts: 300 Credit: 3,288,263 RAC: 26,370 |
Uploads still stuck for me. Hopefully, server can be fixed today or tomorrow. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,480,601 RAC: 15,120 |
Uploads still stuck for me. Hopefully, server can be fixed today or tomorrow.The absolute earliest will be tomorrow when the cloud provider support go back to work. However, they will probably have a queue of requests to get through so might be day or two after. |
Send message Joined: 4 Dec 15 Posts: 52 Credit: 2,489,447 RAC: 2,080 |
When you say "cores" do you mean real cpu cores or threads? (Just want to double check). Boinc doesn't know about Hyper-Threading. Every thread is counted as a core. In every setting that refers to 'cpu' this actually means 'thread'. - - - - - - - - - - Greetings, Jens |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Messages flying about but no identification of exactly what the problem is yet. Andy thinks Jasmin have changed something in their infrastructure but not sure what. What is clear is that those involved are working on it and not sitting on their hands. (With the possible exception of those managing jasmin which I assume is the successor to Joint Academic Network. I never did work out what the rest of the new acronym is. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,718,239 RAC: 8,054 |
Messages flying about but no identification of exactly what the problem is yet. Andy thinks Jasmin have changed something in their infrastructure but not sure what. What is clear is that those involved are working on it and not sitting on their hands. (With the possible exception of those managing jasmin which I assume is the successor to Joint Academic Network. I never did work out what the rest of the new acronym is.Thanks Dave. I've been bouncing around all morning like a kitten on heat, wanting to (but knowing I mustn't) ask 'are we nearly there yet?'. |
Send message Joined: 29 Nov 17 Posts: 82 Credit: 14,764,750 RAC: 86,533 |
Messages flying about but no identification of exactly what the problem is yet. Andy thinks Jasmin have changed something in their infrastructure but not sure what. What is clear is that those involved are working on it and not sitting on their hands. (With the possible exception of those managing jasmin which I assume is the successor to Joint Academic Network. I never did work out what the rest of the new acronym is. The Joint Analysis System Meeting Infrastructure Needs |
Send message Joined: 2 Oct 19 Posts: 21 Credit: 47,674,094 RAC: 24,265 |
traceroute eventually makes its way to the proper destination. traceroute to upload11.cpdn.org (192.171.169.187), 64 hops max 1 192.168.1.1 (Fios_Quantum_Gateway.fios-router.home) 0.342ms 0.285ms 0.264ms 2 100.0.197.1 (lo0-100.BSTNMA-VFTTP-308.verizon-gni.net) 7.776ms 9.234ms 12.042ms 3 100.41.214.178 (B3308.BSTNMA-LCR-21.verizon-gni.net) 11.290ms 7.794ms 14.045ms 4 * * * 5 140.222.236.255 (0.ae2.BR1.BOS30.ALTER.NET) 4.700ms 9.752ms 9.719ms 6 62.115.170.72 (bost-b2-link.telia.net) 10.007ms * * 7 62.115.122.202 (nyk-bb1-link.ip.twelve99.net) 15.936ms 9.395ms 9.848ms 8 62.115.112.245 (ldn-bb4-link.ip.twelve99.net) 79.272ms 78.822ms 78.587ms 9 62.115.120.239 (ldn-b2-link.ip.twelve99.net) 79.447ms 79.003ms 78.299ms 10 62.115.175.131 (jisc-ic345131-ldn-b2.ip.twelve99-cust.net) 76.696ms 78.259ms 78.755ms 11 146.97.35.197 (ae24.londhx-sbr1.ja.net) 79.159ms 78.027ms 78.719ms 12 146.97.33.2 (ae29.londpg-sbr2.ja.net) 79.844ms 77.670ms 81.317ms 13 146.97.33.22 (ae31.erdiss-sbr2.ja.net) 90.225ms 87.908ms 88.663ms 14 * * * 15 146.97.41.34 (ral-r26.ja.net) 88.691ms 88.240ms 88.613ms 16 * * * 17 * * * 18 * * * 19 * * * 20 192.171.169.187 (192.171.169.187) 85.539ms !* 85.660ms !* 88.446ms !* |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,718,239 RAC: 8,054 |
And "transient HTTP error" after two minutes (timeout) has changed to "connect() failed" after 1 second (software not running yet). |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
The Joint Analysis System Meeting Infrastructure Needs In case anyone cares, this describes their systems and services... https://www.bnlawrence.net/assets/talks/2014-07-02-lawrence_eresearchNZ14_jasmin.pdf |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
And "transient HTTP error" after two minutes (timeout) has changed to "connect() failed" after 1 second (software not running yet). True: I guess it is better to fail fast. At least they are doing something. Traceroute is the same as last week (so I will not bother to post the results here). Tue 03 Jan 2023 08:30:18 AM EST | climateprediction.net | Started upload of oifs_43r3_ps_0144_2001050100_123_970_12186788_0_r1420963935_113.zip Tue 03 Jan 2023 08:30:18 AM EST | climateprediction.net | Started upload of oifs_43r3_ps_0144_2001050100_123_970_12186788_0_r1420963935_114.zip Tue 03 Jan 2023 08:30:21 AM EST | | Project communication failed: attempting access to reference site Tue 03 Jan 2023 08:30:21 AM EST | climateprediction.net | Temporarily failed upload of oifs_43r3_ps_0144_2001050100_123_970_12186788_0_r1420963935_113.zip: connect() failed Tue 03 Jan 2023 08:30:21 AM EST | climateprediction.net | Backing off 00:53:19 on upload of oifs_43r3_ps_0144_2001050100_123_970_12186788_0_r1420963935_113.zip Tue 03 Jan 2023 08:30:21 AM EST | climateprediction.net | Temporarily failed upload of oifs_43r3_ps_0144_2001050100_123_970_12186788_0_r1420963935_114.zip: connect() failed Tue 03 Jan 2023 08:30:21 AM EST | climateprediction.net | Backing off 00:18:48 on upload of oifs_43r3_ps_0144_2001050100_123_970_12186788_0_r1420963935_114.zip Tue 03 Jan 2023 08:30:23 AM EST | | Internet access OK - project servers may be temporarily down. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
David can now log in. More messages flying about. |
Send message Joined: 27 Mar 21 Posts: 79 Credit: 78,306,920 RAC: 297 |
AndreyOR wrote: I can't get any more work because of too many uploads in progress, according to event log.Richard Haselgrove wrote: The "too many uploads in progress" limit? It doesn't count the files, just the number of tasks that can't report because they have at least one file still to upload.Richard Haselgrove wrote: It's what BOINC reads as 'number of CPUs' in the system - so that's probably what the OS reads from the BIOS. If you have a physical CPU that supports hyperthreading, you could double the number by turning hyperthreading on.Actually it's twice the number of "usable CPUs". "Usable CPUs" is AFAIU the number of logical CPUs of the host system, possibly overridden by cc_config::options::ncpus, _and_ modified by computing preferences' percentage of CPUs allowed to be used by BOINC. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,718,239 RAC: 8,054 |
But back to the matter in hand - I was misled (over-optimistic) about the changed error message. In full, it reads: 03/01/2023 14:31:57 | climateprediction.net | [http] [ID#4304] Info: Trying 192.171.169.187:80... |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,480,601 RAC: 15,120 |
Messages flying about but no identification of exactly what the problem is yet. Andy thinks Jasmin have changed something in their infrastructure but not sure what. What is clear is that those involved are working on it and not sitting on their hands. (With the possible exception of those managing jasmin which I assume is the successor to Joint Academic Network. I never did work out what the rest of the new acronym is.JASMIN is a cloud service managed by people at the Rutherford Appleton Labs for the benefit of UK academia. JANet is the UK academic network provider. A traceroute will show a connection from your broadband to a janet hub before a hop or two to JASMIN. I would not bother trying to force uploads through, the server is up and will be at capacity for some time. The connect() message just means the httpd servers were busy and there's none spare. Should improve after a couple of hrs. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Just tell the volunteers to sit tight and be patient. It's not all quite working yet.(I think we guessed that.) |
Send message Joined: 7 Jun 17 Posts: 23 Credit: 44,434,789 RAC: 2,600,991 |
I'm Uploading. |
Send message Joined: 4 Oct 15 Posts: 34 Credit: 9,075,151 RAC: 374 |
Me Too :) And to let everybody else have a part of it, i limit my upload slots to one for now. Doesn't hurt me, since the server can use the full bandwith of my upload with one slot. Greets Felix |
Send message Joined: 2 Oct 19 Posts: 21 Credit: 47,674,094 RAC: 24,265 |
I've had few files upload so only 39,113 to go. That's from 340 completed tasks. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
I've had few files upload so only 39,113 to go. That's from 340 completed tasks.Bunch of files uploaded then it started getting http error at about 15:30. 15:50, started working again then more errors from 17:10 onwards. May just be the server being overloaded though. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,718,239 RAC: 8,054 |
I saw it working quite a bit during the late afternoon, but with occasional breaks. I don't know whether that was just a pause to cool down, or a full stop requiring manual intervention. But it seems to have stopped for good (or at least, for the night) now. |
©2024 cpdn.org