climateprediction.net (CPDN) home page
Thread 'Project Outage'

Thread 'Project Outage'

Message boards : Number crunching : Project Outage
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 64260 - Posted: 6 Aug 2021, 16:27:39 UTC
Last modified: 6 Aug 2021, 17:24:03 UTC

website is back up but getting

Fri 06 Aug 2021 17:21:54 BST | climateprediction.net | Project is temporarily shut down for maintenance

On project update. Unless someone is doing overtime, it will be Monday before everything stands a chance of returning to normal. Will post again when more bits return to normal. Once things do start working again, then and only then let us know if anything is behaving oddly.

Thanks.

Edit:1 hour later, 9 completed tasks uploaded and 8 new ones now downloading.
ID: 64260 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 64262 - Posted: 6 Aug 2021, 20:16:06 UTC - in response to Message 64260.  

I had two that had competed a day or two ago, but had not been reported yet.
A manual update fixed it, and I am now all up to date.
https://www.cpdn.org/results.php?hostid=1520871
ID: 64262 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 64264 - Posted: 6 Aug 2021, 20:26:58 UTC

I have had transient http errors on all 8 downloads and Richard has had all 4 of his new tasks allocated fail to download. Clearly they need to pay more overtime.
ID: 64264 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,029,761
RAC: 14,491
Message 64265 - Posted: 6 Aug 2021, 22:39:38 UTC
Last modified: 6 Aug 2021, 22:41:49 UTC

I had one that was ready to report but stuck - no contact with site since this morning. Solved by suspending then resuming network activity manually (no transfers waiting). Now reported and another task has failed to download.
ID: 64265 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 64266 - Posted: 7 Aug 2021, 7:52:27 UTC

New tasks still not downloading. I have informed Andy but don't expect anything to change till Monday. I don't know enough about the output from the flags I enabled to work out if it is a script needs restarting, internal addresses changed or something more obscure.
ID: 64266 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 64268 - Posted: 7 Aug 2021, 8:08:13 UTC

They have all gone fishing, finned or? Anyway, I am not being able to download any WU and I suppose I will have to wait till Monday. Let us all pray to the God's of CPDN.
ID: 64268 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64269 - Posted: 7 Aug 2021, 9:07:38 UTC - in response to Message 64268.  

From Andy:

A number of key machines still have no networking access following the switch work on Tuesday.

------------------------

They're going to be spoken to.
ID: 64269 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64270 - Posted: 7 Aug 2021, 10:12:06 UTC - in response to Message 64266.  

One task finished a few days ago and it uploaded yesterday. A second task finished today and it uploaded OK soon after.
My client tried to download a new task yesterday, but is not getting the files. After the second task finished, my client tried to download a second task, but that is stuck too.
So it seems most of CPDN is working, but not downloads yet.
ID: 64270 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 64271 - Posted: 7 Aug 2021, 11:32:48 UTC - in response to Message 64269.  
Last modified: 7 Aug 2021, 11:34:30 UTC

From Andy:

A number of key machines still have no networking access following the switch work on Tuesday.

------------------------

They're going to be spoken to.

________________________
Good. At least someone can speak to machines.
My computers have run out of WU's. Which reminds me, trickles are not uploading.
ID: 64271 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64277 - Posted: 7 Aug 2021, 13:49:47 UTC - in response to Message 64271.  

Good. At least someone can speak to machines.
My computers have run out of WU's. Which reminds me, trickles are not uploading.


It seems to me that my trickle files are uploading...

Sat 07 Aug 2021 03:37:41 AM EDT | climateprediction.net | Started upload of hadam4h_b0sf_201211_5_882_012036031_0_r1205737584_5.zip
Sat 07 Aug 2021 03:38:05 AM EDT | climateprediction.net | Finished upload of hadam4h_b0sf_201211_5_882_012036031_0_r1205737584_5.zip
Sat 07 Aug 2021 03:58:22 AM EDT | climateprediction.net | Computation for task hadam4h_b0sf_201211_5_882_012036031_0 finished
Sat 07 Aug 2021 03:58:24 AM EDT | climateprediction.net | Started upload of hadam4h_b0sf_201211_5_882_012036031_0_r1205737584_out.zip
Sat 07 Aug 2021 03:58:28 AM EDT | climateprediction.net | Finished upload of hadam4h_b0sf_201211_5_882_012036031_0_r1205737584_out.zip
Sat 07 Aug 2021 04:58:33 AM EDT | climateprediction.net | Sending scheduler request: To report completed tasks.
Sat 07 Aug 2021 04:58:33 AM EDT | climateprediction.net | Reporting 1 completed tasks
Sat 07 Aug 2021 04:58:33 AM EDT | climateprediction.net | Not requesting tasks: some download is stalled
Sat 07 Aug 2021 04:58:35 AM EDT | climateprediction.net | Scheduler request completed

ID: 64277 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64280 - Posted: 7 Aug 2021, 15:41:46 UTC

The trickles server is not running, as per the Project Status page.
ID: 64280 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 64282 - Posted: 7 Aug 2021, 18:14:59 UTC

It seems to me that my trickle files are uploading...


That is the zip files uploading which are produced at the same time as the trickles.
ID: 64282 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64291 - Posted: 8 Aug 2021, 21:33:42 UTC - in response to Message 64282.  

That is the zip files uploading which are produced at the same time as the trickles.


Does that mean my trickles will never be uploaded since, since then, the tasks that produced them have exited, my machine got some updates needing updates, and so my machine has been rebooted?
ID: 64291 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64292 - Posted: 8 Aug 2021, 21:51:16 UTC - in response to Message 64291.  

The best place to look to see if trickle_up files have been sent, is in the Event log.

Failing that, as in your case, you should be able to see them on your computer if they're still there. (They're very small.)

Else, if they did upload, you just have to wait until the work at Oxford is finished and all of the servers are working, then check the task page to see if several trickle_ups are listed with the same date/time stamp.

Waiting is where I'm at right now, both for the trickle_ups to show up, and the files for the next task to download.
And BOINC is Suspended, so as not to waste time on futile attempts to communicate.
ID: 64292 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64293 - Posted: 9 Aug 2021, 3:02:27 UTC - in response to Message 64292.  

The best place to look to see if trickle_up files have been sent, is in the Event log.

Failing that, as in your case, you should be able to see them on your computer if they're still there. (They're very small.)
...
Waiting is where I'm at right now, both for the trickle_ups to show up, and the files for the next task to download.
And BOINC is Suspended, so as not to waste time on futile attempts to communicate.


My Event Log does not go back far enough. I do not know where to look for the trickle up messages. slots? projects/climate...?
However it is still in /var/log/messages....
# grep trickle messages-20210808
Aug  4 01:14:32 localhost boinc[2021]: 04-Aug-2021 01:14:32 [climateprediction.net] Sending scheduler request: To send trickle-up message.
Aug  4 03:55:41 localhost boinc[2021]: 04-Aug-2021 03:55:41 [climateprediction.net] Sending scheduler request: To send trickle-up message.
Aug  4 04:02:59 localhost boinc[2021]: 04-Aug-2021 04:02:59 [climateprediction.net] Sending scheduler request: To send trickle-up message.
Aug  4 04:12:17 localhost boinc[2021]: 04-Aug-2021 04:12:17 [climateprediction.net] Sending scheduler request: To send trickle-up message.
Aug  4 04:24:33 localhost boinc[2021]: 04-Aug-2021 04:24:33 [climateprediction.net] Sending scheduler request: To send trickle-up message.
Aug  4 04:44:50 localhost boinc[2021]: 04-Aug-2021 04:44:50 [climateprediction.net] Sending scheduler request: To send trickle-up message.
Aug  4 05:38:06 localhost boinc[2021]: 04-Aug-2021 05:38:06 [climateprediction.net] Sending scheduler request: To send trickle-up message.
Aug  4 07:17:35 localhost boinc[2021]: 04-Aug-2021 07:17:35 [climateprediction.net] Sending scheduler request: To send trickle-up message.
Aug  4 10:36:45 localhost boinc[2021]: 04-Aug-2021 10:36:45 [climateprediction.net] Sending scheduler request: To send trickle-up message.
Aug  4 12:56:50 localhost boinc[2021]: 04-Aug-2021 12:56:50 [climateprediction.net] Sending scheduler request: To send trickle-up message.
Aug  4 16:54:14 localhost boinc[2021]: 04-Aug-2021 16:54:14 [climateprediction.net] Sending scheduler request: To send trickle-up message.
Aug  6 17:01:52 localhost boinc[2021]: 06-Aug-2021 17:01:52 [climateprediction.net] Sending scheduler request: To send trickle-up message.
Aug  7 03:37:24 localhost boinc[2021]: 07-Aug-2021 03:37:24 [climateprediction.net] Sending scheduler request: To send trickle-up message.

I infer things were running OK including most of August 4, and the trickles started going out again starting late on August 6. So maybe all the ones I tried to sent actually went up and I will see them once the web site catches up. I now have three work unit stuck trying to download. I imagine patience will fix this.

I did not suspend Boinc-client, or even just climateprediction because I still have one work unit working, and it might as well upload trickle-up messages and those files that go up at the same time. I suppose I should have set climateprediction to no new tasks, but I did not think of it until you suggested it, and now it is so close to working that I might as well just keep running. It looks at most once an hour.
ID: 64293 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64294 - Posted: 9 Aug 2021, 4:24:29 UTC - in response to Message 64293.  

I think they appear under:
/var/lib/boinc-client/projects/climateprediction.net

This is what is in one of mine that I saved ages ago:

<variety>year</variety>
<wu>hadam4_a01y_200611_12_785_011729848</wu>
<result>hadam4_a01y_200611_12_785_011729848_1_r1940024311</result>
<ph>1</ph>
<ts>51941</ts>
<cp>728765</cp>
<vr>8.08</vr>
ID: 64294 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64301 - Posted: 9 Aug 2021, 12:43:29 UTC - in response to Message 64268.  

The complaints have changed in the last day or two. I imagine this is actually improvement. From my event log it now shows like this:

Mon 09 Aug 2021 07:06:51 AM EDT | climateprediction.net | Temporarily failed download of a019_915_atmos.gz: transient HTTP error
Mon 09 Aug 2021 07:06:51 AM EDT | climateprediction.net | Backing off 05:28:01 on download of a019_915_atmos.gz
Mon 09 Aug 2021 07:06:53 AM EDT |                       | Internet access OK - project servers may be temporarily down.

I infer that means they are actually trying to send me stuff, but it is not getting through.
ID: 64301 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 64302 - Posted: 9 Aug 2021, 12:55:09 UTC - in response to Message 64301.  

The complaints have changed in the last day or two. I imagine this is actually improvement. From my event log it now shows like this:

Mon 09 Aug 2021 07:06:51 AM EDT | climateprediction.net | Temporarily failed download of a019_915_atmos.gz: transient HTTP error
Mon 09 Aug 2021 07:06:51 AM EDT | climateprediction.net | Backing off 05:28:01 on download of a019_915_atmos.gz
Mon 09 Aug 2021 07:06:53 AM EDT |                       | Internet access OK - project servers may be temporarily down.

I infer that means they are actually trying to send me stuff, but it is not getting through.

That is what I was getting on Saturday. I do hope the IT support people can sort it out soon though. No point in my sending another email as I know Andy is aware of it and chasing them.
ID: 64302 · Report as offensive     Reply Quote
Albert H.

Send message
Joined: 18 Feb 06
Posts: 73
Credit: 61,736,409
RAC: 46,501
Message 64308 - Posted: 9 Aug 2021, 20:33:50 UTC

Yes,
upload is OK
download NOK, at this time.
ID: 64308 · Report as offensive     Reply Quote
KAMasud

Send message
Joined: 6 Oct 06
Posts: 204
Credit: 7,608,986
RAC: 0
Message 64310 - Posted: 10 Aug 2021, 5:51:28 UTC

Do we abort these WU's in the pipeline or do we wait and see? Will the server start again from halfway or have these WU's entered the Black Holes on the Internet? Over the years I have accumulated a lot of WU's which can best be described, they are in some Black Hole and be done with the matter. No record on my machines, no record on the server.
ID: 64310 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : Project Outage

©2024 cpdn.org