climateprediction.net (CPDN) home page
Thread 'Project Outage'

Thread 'Project Outage'

Message boards : Number crunching : Project Outage
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64311 - Posted: 10 Aug 2021, 6:46:22 UTC

I have everything Suspended, and am in the Wait-And See-Zone.
ID: 64311 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 64312 - Posted: 10 Aug 2021, 7:04:37 UTC - in response to Message 64310.  

Do we abort these WU's in the pipeline or do we wait and see? Will the server start again from halfway or have these WU's entered the Black Holes on the Internet? Over the years I have accumulated a lot of WU's which can best be described, they are in some Black Hole and be done with the matter. No record on my machines, no record on the server.


I have 7 tasks still running (all 216's which means they will go for a while yet) and 8 whose downloads are stuck. In the past when this has happened, once the server problems are resolved the tasks have downloaded and run normally. I certainly wouldn't abort while waiting for the servers to be sorted but I have known the occasional task get lost in the system somehow.
ID: 64312 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 64316 - Posted: 10 Aug 2021, 10:45:36 UTC

Three of the eight tasks that were stuck downloading have now finished downloading so that seems to be fixed. Another five minutes and I will now if the trickle server is running. (Server status page says not but as it only updates about every two hours it may not be accurate.)
ID: 64316 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64317 - Posted: 10 Aug 2021, 11:27:36 UTC - in response to Message 64316.  

Three of the eight tasks that were stuck downloading have now finished downloading so that seems to be fixed.


Four (not a typo) of the three tasks that were stuck downloading have now finished downloading, so that seems to be fixed. The fourth showed up while the other three were downloading. They downloaded very quickly. One (an N144 one) is already running. The boinc-client is running other tasks: WCG, Rosetta.
ID: 64317 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 64318 - Posted: 10 Aug 2021, 11:47:59 UTC
Last modified: 10 Aug 2021, 11:51:46 UTC

Life is back to normal. Downloading is in progress. Thanks be to Jupiter, King of Gods and Climate.
ID: 64318 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 64347 - Posted: 12 Aug 2021, 16:43:30 UTC

From Andy,

Hi All,

All services have been restored now to climateprediction.net infrastructure. The Department of Engineering IT Support decided to roll back the changes they made to the networking. This has allowed us to restore all the CPDN services.

Best wishes,

Andy


I won't post my opinion of the IT people who let the work go on so long after it was scheduled to finish and still didn't get it right.
ID: 64347 · Report as offensive     Reply Quote
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 150
Credit: 12,830,559
RAC: 228
Message 64350 - Posted: 13 Aug 2021, 7:42:20 UTC - in response to Message 64347.  

From Andy,

Hi All,

All services have been restored now to climateprediction.net infrastructure. The Department of Engineering IT Support decided to roll back the changes they made to the networking. This has allowed us to restore all the CPDN services.

Best wishes,

Andy


I won't post my opinion of the IT people who let the work go on so long after it was scheduled to finish and still didn't get it right.


Download appears to have failed again.
ID: 64350 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 64352 - Posted: 13 Aug 2021, 13:30:19 UTC

Download appears to have failed again.


Just seen this. Friday afternoon not the best time to get things sorted in a hurry. The server appears to be running. I have a full cache currently and don't really want to up my cache to 10+10 days to get more. If anyone else is getting this problem I or another mod will let Andy know.
ID: 64352 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64353 - Posted: 13 Aug 2021, 13:52:39 UTC - in response to Message 64352.  

If anyone else is getting this problem I or another mod will let Andy know.


Fri 13 Aug 2021 07:40:57 AM EDT | climateprediction.net | Started download of ic_N144_200311_000057.nc.gz
Fri 13 Aug 2021 07:40:57 AM EDT | climateprediction.net | Started download of batch_911_a07e_sst.gz
Fri 13 Aug 2021 07:42:13 AM EDT |                       | Project communication failed: attempting access to reference site
Fri 13 Aug 2021 07:42:13 AM EDT | climateprediction.net | Temporarily failed download of ic_N144_200311_000057.nc.gz: transient HTTP error
Fri 13 Aug 2021 07:42:13 AM EDT | climateprediction.net | Backing off 00:11:32 on download of ic_N144_200311_000057.nc.gz
Fri 13 Aug 2021 07:42:16 AM EDT |                       | Internet access OK - project servers may be temporarily down.
Fri 13 Aug 2021 07:42:16 AM EDT | climateprediction.net | Temporarily failed download of batch_911_a07e_sst.gz: transient HTTP error
Fri 13 Aug 2021 07:42:16 AM EDT | climateprediction.net | Backing off 00:11:51 on download of batch_911_a07e_sst.gz
Fri 13 Aug 2021 09:13:53 AM EDT | climateprediction.net | Started download of hadam4_a1xx_201310_6_915_012102977.zip
Fri 13 Aug 2021 09:13:53 AM EDT | climateprediction.net | Started download of a1xx_915_atmos.gz
Fri 13 Aug 2021 09:15:12 AM EDT |                       | Project communication failed: attempting access to reference site
Fri 13 Aug 2021 09:15:12 AM EDT | climateprediction.net | Temporarily failed download of hadam4_a1xx_201310_6_915_012102977.zip: transient HTTP error
Fri 13 Aug 2021 09:15:12 AM EDT | climateprediction.net | Backing off 01:03:09 on download of hadam4_a1xx_201310_6_915_012102977.zip
Fri 13 Aug 2021 09:15:12 AM EDT | climateprediction.net | Temporarily failed download of a1xx_915_atmos.gz: transient HTTP error
Fri 13 Aug 2021 09:15:12 AM EDT | climateprediction.net | Backing off 00:56:30 on download of a1xx_915_atmos.gz
Fri 13 Aug 2021 09:15:14 AM EDT |                       | Internet access OK - project servers may be temporarily down.

ID: 64353 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 64354 - Posted: 13 Aug 2021, 15:46:25 UTC

Message sent.
ID: 64354 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 99
Credit: 18,701,746
RAC: 318
Message 64356 - Posted: 14 Aug 2021, 9:07:02 UTC - in response to Message 64352.  

Friday afternoon not the best time to get things sorted in a hurry.
How do they say clock-puncher in British?
ID: 64356 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64357 - Posted: 14 Aug 2021, 12:40:33 UTC - in response to Message 64356.  

Friday afternoon not the best time to get things sorted in a hurry.
How do they say clock-puncher in British?


They say: Patience.
ID: 64357 · Report as offensive     Reply Quote
[SG]Felix

Send message
Joined: 4 Oct 15
Posts: 34
Credit: 9,075,151
RAC: 374
Message 64359 - Posted: 15 Aug 2021, 9:18:07 UTC

Seems to me like they stopped everything except webservers
ID: 64359 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64361 - Posted: 15 Aug 2021, 11:27:10 UTC

Yes, there's been a hardware failure at Oxford Uni, so Andy has shut down the project to prevent data loss.
ID: 64361 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 64362 - Posted: 15 Aug 2021, 17:49:55 UTC

Then, where are our trickles and uploads going? They are going somewhere.
ID: 64362 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64363 - Posted: 15 Aug 2021, 19:02:08 UTC - in response to Message 64362.  

Uploads don't go to Oxford.
They go to wherever on the planet those researchers are located.

The trickles?
Andy may have arranged "somewhere" for them to go to for the time being.
ID: 64363 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 64364 - Posted: 16 Aug 2021, 5:11:54 UTC

Today I am finding the message of trickle pending but no trace yet of upload of zip files pending.
ID: 64364 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 64365 - Posted: 16 Aug 2021, 8:20:36 UTC
Last modified: 16 Aug 2021, 8:26:11 UTC

I find that my trickles from Sunday 15August and now (here at UTC-5) Monday are all still on my boxen, but with the suffix '.sent'
Sent, but not acknowledged. And somehow hidden, had to go root to see em.
In other words, trickles are being bounced, and possibly diverted. No worries from here.
Hope this helps.
ID: 64365 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64366 - Posted: 16 Aug 2021, 9:26:37 UTC

Andy has cast some powerful magic spell on all of the user computers. :)

It's 10.23am Monday there, so hopefully IT has managed to drag some workers in to fix whatever it is.
ID: 64366 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64368 - Posted: 16 Aug 2021, 16:38:03 UTC

5.30pm and it's still down. :(
It's going to be a long week/month/year.

My slow running N216 has finally reached it's first zip/trickle, and the trickle_up file has been moved to the mail room ready to send. :)
ID: 64368 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Project Outage

©2024 cpdn.org