climateprediction.net (CPDN) home page
Thread 'Server Problem Fixed'

Thread 'Server Problem Fixed'

Message boards : Number crunching : Server Problem Fixed
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 56120 - Posted: 2 May 2017, 14:33:40 UTC

Yes! Trckles are now uploading again a finished tasks can report and clear. Good work.
ID: 56120 · Report as offensive     Reply Quote
ProfileByron Leigh Hatch @ team Carl ...
Avatar

Send message
Joined: 17 Aug 04
Posts: 289
Credit: 44,103,664
RAC: 0
Message 56121 - Posted: 2 May 2017, 16:29:54 UTC

Downloads are failing.
I followed the advice given of: do not detach and reattach.
I'm getting the following message:

2017-05-02 6:50:02 AM | cpdnboinc | Requesting new tasks for CPU
2017-05-02 6:50:03 AM | | Project communication failed: attempting access to reference site
2017-05-02 6:50:03 AM | cpdnboinc | Scheduler request failed: Couldn't resolve host name
2017-05-02 6:50:05 AM | | Internet access OK - project servers may be temporarily down.
2017-05-02 6:59:17 AM | cpdnboinc | Sending scheduler request: To fetch work.
2017-05-02 6:59:17 AM | cpdnboinc | Requesting new tasks for CPU
2017-05-02 6:59:19 AM | cpdnboinc | Scheduler request failed: Couldn't resolve host name
2017-05-02 6:59:20 AM | | Project communication failed: attempting access to reference site
2017-05-02 6:59:22 AM | | Internet access OK - project servers may be temporarily down.
2017-05-02 7:28:22 AM | cpdnboinc | Sending scheduler request: To fetch work.
2017-05-02 7:28:22 AM | cpdnboinc | Requesting new tasks for CPU
2017-05-02 7:28:24 AM | climateprediction.net | Scheduler request completed: got 0 new tasks
2017-05-02 7:28:24 AM | climateprediction.net | Project has no tasks available
2017-05-02 8:29:03 AM | climateprediction.net | Sending scheduler request: To fetch work.
2017-05-02 8:29:03 AM | climateprediction.net | Requesting new tasks for CPU
2017-05-02 8:29:07 AM | climateprediction.net | Scheduler request completed: got 2 new tasks
2017-05-02 8:29:09 AM | climateprediction.net | Started download of hadcm3s_8142_201412_120_564_011003749.zip
2017-05-02 8:29:09 AM | climateprediction.net | Started download of 71rs_2014.ostart.gz
2017-05-02 8:29:31 AM | climateprediction.net | Temporarily failed download of hadcm3s_8142_201412_120_564_011003749.zip: connect() failed
2017-05-02 8:29:31 AM | climateprediction.net | Backing off 00:03:14 on download of hadcm3s_8142_201412_120_564_011003749.zip
2017-05-02 8:29:31 AM | climateprediction.net | Temporarily failed download of 71rs_2014.ostart.gz: connect() failed
2017-05-02 8:29:31 AM | climateprediction.net | Backing off 00:02:43 on download of 71rs_2014.ostart.gz
2017-05-02 8:29:31 AM | climateprediction.net | Started download of 71rs_2014.astart.gz
2017-05-02 8:29:31 AM | climateprediction.net | Started download of spec3a_sw_3_asol2c_hadcm3.gz
2017-05-02 8:29:32 AM | | Project communication failed: attempting access to reference site
2017-05-02 8:29:34 AM | | Internet access OK - project servers may be temporarily down.
2017-05-02 8:29:53 AM | climateprediction.net | Temporarily failed download of 71rs_2014.astart.gz: connect() failed
2017-05-02 8:29:53 AM | climateprediction.net | Backing off 00:02:16 on download of 71rs_2014.astart.gz
2017-05-02 8:29:53 AM | climateprediction.net | Temporarily failed download of spec3a_sw_3_asol2c_hadcm3.gz: connect() failed
2017-05-02 8:29:53 AM | climateprediction.net | Backing off 00:02:41 on download of spec3a_sw_3_asol2c_hadcm3.gz
2017-05-02 8:29:53 AM | climateprediction.net | Started download of spec3a_lw_3_asol2c_hadcm3.gz
2017-05-02 8:29:53 AM | climateprediction.net | Started download of waterfix.ancil.be.32.gz
2017-05-02 8:29:54 AM | | Project communication failed: attempting access to reference site
2017-05-02 8:29:55 AM | | Internet access OK - project servers may be temporarily down.
ID: 56121 · Report as offensive     Reply Quote
ProfileJeff Bakle

Send message
Joined: 24 Nov 05
Posts: 1
Credit: 3,182,140
RAC: 2,169
Message 56122 - Posted: 3 May 2017, 0:27:56 UTC

I was able to add the project back to my system. No work is currently available for my system at this time, but it is good to be back in the collective.
ID: 56122 · Report as offensive     Reply Quote
ProfileRandi
Avatar

Send message
Joined: 28 Jun 07
Posts: 31
Credit: 4,348,423
RAC: 356
Message 56123 - Posted: 3 May 2017, 3:00:33 UTC

The "do not detach and reattach" advice came too late for me.

Just now I reset and then removed CPDN and then I added it back.
It appears to be working correctly.
Zooniverse Old Weather transcriber
and
Old Weather BOINC team member.
ID: 56123 · Report as offensive     Reply Quote
Kevin

Send message
Joined: 5 Jul 09
Posts: 63
Credit: 6,091,274
RAC: 0
Message 56124 - Posted: 3 May 2017, 4:20:02 UTC

My last task that reported after the server came back on line is showing as completed.

https://www.cpdn.org/cpdnboinc/result.php?resultid=20340872

3 tasks that finished while the backup server was running and were showing as completed have lost their trickles and are now showing as in progress.

https://www.cpdn.org/cpdnboinc/result.php?resultid=20350265
https://www.cpdn.org/cpdnboinc/result.php?resultid=20350827
https://www.cpdn.org/cpdnboinc/result.php?resultid=20345787

The trickles reported in the last task were reported before sever went down.
Kevin
ID: 56124 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 56125 - Posted: 3 May 2017, 5:53:53 UTC - in response to Message 56124.  

Two of my tasks have sent trickles since things went back to normal but trickles sent before the alternative upload server went/was taken off line don't appear on the task pages. Won't know how this affects credit until the credit script is run.

Not overly worried about this as the information has always been retrieved and sorted eventually in the past. I know this is frustrating for those who keep a close tally on credits however.
ID: 56125 · Report as offensive     Reply Quote
Kevin

Send message
Joined: 5 Jul 09
Posts: 63
Credit: 6,091,274
RAC: 0
Message 56126 - Posted: 3 May 2017, 7:25:20 UTC - in response to Message 56125.  


Not overly worried about this as the information has always been retrieved and sorted eventually in the past. I know this is frustrating for those who keep a close tally on credits however.


Not worried about credit, it should turn up eventually, it was just a gentle hint that something may need a quick look at:-)

Apart from that 3 of them are batch 561 which some were having problems with.
Kevin
ID: 56126 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,884,997
RAC: 4,577
Message 56134 - Posted: 4 May 2017, 13:24:33 UTC

There's a new batch of 186 WAH2 PNW25/21 but none of my machines can download any because:

04/05/2017 14:06:51 | climateprediction.net | Not requesting tasks: some download is stalled

I'm tempted to abort the stalled downloads if there is no prospect of the stalled models being unstalled.

PS The WAH2 batch number 565 is duplicated with a small HADCM3S test batch on the backup site, but that's only a cosmetic problem.
ID: 56134 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 56135 - Posted: 4 May 2017, 14:07:22 UTC - in response to Message 56134.  

04/05/2017 14:06:51 | climateprediction.net | Not requesting tasks: some download is stalled


Was also wondering about aborting the stalled download task I have, though this machine doesn't have any stalled downloads and is now telling me no work is available so perhaps I should give it a bit more of a chance.

I had wondered if the reason mine wasn't downloading was why it had been abandoned by previous cruncher but on checking
https://www.cpdn.org/cpdnboinc//workunit.php?wuid=10996540


I see it got as far as producing three trickles. So still don't know how global an issue the stuck downloads is.
ID: 56135 · Report as offensive     Reply Quote
Kevin

Send message
Joined: 5 Jul 09
Posts: 63
Credit: 6,091,274
RAC: 0
Message 56136 - Posted: 4 May 2017, 14:39:21 UTC

I've had one stuck downloading for a couple of days, and its a _1

A couple of the servers have gone from the server status page so maybe they are still sorting things out.
Kevin
ID: 56136 · Report as offensive     Reply Quote
ProfileByron Leigh Hatch @ team Carl ...
Avatar

Send message
Joined: 17 Aug 04
Posts: 289
Credit: 44,103,664
RAC: 0
Message 56137 - Posted: 4 May 2017, 15:12:18 UTC

I'm getting the same: I've had 2 downloads stalled for a couple of days now, and there both _2

2017-05-04 7:21:28 AM | climateprediction.net | update requested by user
2017-05-04 7:21:32 AM | climateprediction.net | Sending scheduler request: Requested by user.
2017-05-04 7:21:32 AM | climateprediction.net | Not requesting tasks: some download is stalled
2017-05-04 7:21:34 AM | climateprediction.net | Scheduler request completed

hadcm3s_831b_201412_120_564_011006242_2
hadcm3s_8142_201412_120_564_011003749_2

I was wondered if I should abort the stalled download task?
I think I will just wait, the weekend is not far away.
ID: 56137 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 56138 - Posted: 4 May 2017, 16:35:37 UTC

Have the same problem. I have 4 wah2_pnw25 downloads stalled in my transfer tab since last night.
ID: 56138 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 56139 - Posted: 4 May 2017, 16:48:23 UTC
Last modified: 4 May 2017, 16:52:48 UTC

Collating information from previous posts, this is affecting at least batches 563, 564 and 565. Will let project know.
ID: 56139 · Report as offensive     Reply Quote
keputnam

Send message
Joined: 31 Aug 04
Posts: 29
Credit: 3,972,828
RAC: 132
Message 56141 - Posted: 4 May 2017, 20:39:12 UTC - in response to Message 56139.  

Add batch 486

I've had a download stalled for almost three days now
ID: 56141 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 56142 - Posted: 4 May 2017, 20:53:50 UTC - in response to Message 56141.  

Am becoming increasingly certain it is all work for download is stalling. That means I wont be aborting any tasks especially as most tasks are retreads at the moment meaning they may well be on their last chance.
ID: 56142 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,373,077
RAC: 15,530
Message 56143 - Posted: 4 May 2017, 22:15:31 UTC - in response to Message 56139.  

Add batch 406 as well.
ID: 56143 · Report as offensive     Reply Quote
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 175
Credit: 6,242,691
RAC: 699
Message 56144 - Posted: 4 May 2017, 23:21:33 UTC

I had this problem three weeks ago. I posted under "New Work".

I eventually aborted all the tasks on three machines, given that the maintenance problems had become acute. they'd been hanging there for days.
ID: 56144 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 56147 - Posted: 5 May 2017, 5:45:02 UTC

For me the question is whether it is configuration problems with individual batches where the wrong location is being pointed to for the files to be downloaded as has happened in the past or a global issue with the servers. As some of the tasks in question have at least got as far as downloading on to other computers previously my money is on the latter so I am not aborting any thing unless I hear from the project people either direct or via moderators that this should be done..
ID: 56147 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,884,997
RAC: 4,577
Message 56148 - Posted: 5 May 2017, 10:12:07 UTC - in response to Message 56147.  

I agree, Dave: I've got 406, 499, 506, 561. It looks to me like an infrastructure problem somewhere. I would quite like to run some of the models, even though they're reissues, as they would help fill in some gaps in my cross-machine performance array. However, if they're never going to download then clearly they have to be aborted.
ID: 56148 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 56149 - Posted: 5 May 2017, 12:10:29 UTC

Wednesday 9.30am project is being taken offline FOR IT TO to upgrade THE GPFS (General Parallel File System and not something to do with the Green Party as I first thought, that having dominated my other half's life over past weeks!) Uploads will be diverted to another server but subsetting server will be off line. It is anticipated this will take a day. - Not sure if that means 24 hrs or a working day.
ID: 56149 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Server Problem Fixed

©2024 cpdn.org