Message boards : Number crunching : Server Problem Fixed
Message board moderation
Author | Message |
---|---|
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Yes! Trckles are now uploading again a finished tasks can report and clear. Good work. |
Send message Joined: 17 Aug 04 Posts: 289 Credit: 44,103,664 RAC: 0 |
Downloads are failing. I followed the advice given of: do not detach and reattach. I'm getting the following message: 2017-05-02 6:50:02 AM | cpdnboinc | Requesting new tasks for CPU 2017-05-02 6:50:03 AM | | Project communication failed: attempting access to reference site 2017-05-02 6:50:03 AM | cpdnboinc | Scheduler request failed: Couldn't resolve host name 2017-05-02 6:50:05 AM | | Internet access OK - project servers may be temporarily down. 2017-05-02 6:59:17 AM | cpdnboinc | Sending scheduler request: To fetch work. 2017-05-02 6:59:17 AM | cpdnboinc | Requesting new tasks for CPU 2017-05-02 6:59:19 AM | cpdnboinc | Scheduler request failed: Couldn't resolve host name 2017-05-02 6:59:20 AM | | Project communication failed: attempting access to reference site 2017-05-02 6:59:22 AM | | Internet access OK - project servers may be temporarily down. 2017-05-02 7:28:22 AM | cpdnboinc | Sending scheduler request: To fetch work. 2017-05-02 7:28:22 AM | cpdnboinc | Requesting new tasks for CPU 2017-05-02 7:28:24 AM | climateprediction.net | Scheduler request completed: got 0 new tasks 2017-05-02 7:28:24 AM | climateprediction.net | Project has no tasks available 2017-05-02 8:29:03 AM | climateprediction.net | Sending scheduler request: To fetch work. 2017-05-02 8:29:03 AM | climateprediction.net | Requesting new tasks for CPU 2017-05-02 8:29:07 AM | climateprediction.net | Scheduler request completed: got 2 new tasks 2017-05-02 8:29:09 AM | climateprediction.net | Started download of hadcm3s_8142_201412_120_564_011003749.zip 2017-05-02 8:29:09 AM | climateprediction.net | Started download of 71rs_2014.ostart.gz 2017-05-02 8:29:31 AM | climateprediction.net | Temporarily failed download of hadcm3s_8142_201412_120_564_011003749.zip: connect() failed 2017-05-02 8:29:31 AM | climateprediction.net | Backing off 00:03:14 on download of hadcm3s_8142_201412_120_564_011003749.zip 2017-05-02 8:29:31 AM | climateprediction.net | Temporarily failed download of 71rs_2014.ostart.gz: connect() failed 2017-05-02 8:29:31 AM | climateprediction.net | Backing off 00:02:43 on download of 71rs_2014.ostart.gz 2017-05-02 8:29:31 AM | climateprediction.net | Started download of 71rs_2014.astart.gz 2017-05-02 8:29:31 AM | climateprediction.net | Started download of spec3a_sw_3_asol2c_hadcm3.gz 2017-05-02 8:29:32 AM | | Project communication failed: attempting access to reference site 2017-05-02 8:29:34 AM | | Internet access OK - project servers may be temporarily down. 2017-05-02 8:29:53 AM | climateprediction.net | Temporarily failed download of 71rs_2014.astart.gz: connect() failed 2017-05-02 8:29:53 AM | climateprediction.net | Backing off 00:02:16 on download of 71rs_2014.astart.gz 2017-05-02 8:29:53 AM | climateprediction.net | Temporarily failed download of spec3a_sw_3_asol2c_hadcm3.gz: connect() failed 2017-05-02 8:29:53 AM | climateprediction.net | Backing off 00:02:41 on download of spec3a_sw_3_asol2c_hadcm3.gz 2017-05-02 8:29:53 AM | climateprediction.net | Started download of spec3a_lw_3_asol2c_hadcm3.gz 2017-05-02 8:29:53 AM | climateprediction.net | Started download of waterfix.ancil.be.32.gz 2017-05-02 8:29:54 AM | | Project communication failed: attempting access to reference site 2017-05-02 8:29:55 AM | | Internet access OK - project servers may be temporarily down. |
Send message Joined: 24 Nov 05 Posts: 1 Credit: 3,178,826 RAC: 1,972 |
I was able to add the project back to my system. No work is currently available for my system at this time, but it is good to be back in the collective. |
Send message Joined: 28 Jun 07 Posts: 31 Credit: 4,348,423 RAC: 356 |
The "do not detach and reattach" advice came too late for me. Just now I reset and then removed CPDN and then I added it back. It appears to be working correctly. Zooniverse Old Weather transcriber and Old Weather BOINC team member. |
Send message Joined: 5 Jul 09 Posts: 63 Credit: 6,091,274 RAC: 0 |
My last task that reported after the server came back on line is showing as completed. https://www.cpdn.org/cpdnboinc/result.php?resultid=20340872 3 tasks that finished while the backup server was running and were showing as completed have lost their trickles and are now showing as in progress. https://www.cpdn.org/cpdnboinc/result.php?resultid=20350265 https://www.cpdn.org/cpdnboinc/result.php?resultid=20350827 https://www.cpdn.org/cpdnboinc/result.php?resultid=20345787 The trickles reported in the last task were reported before sever went down. Kevin |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
Two of my tasks have sent trickles since things went back to normal but trickles sent before the alternative upload server went/was taken off line don't appear on the task pages. Won't know how this affects credit until the credit script is run. Not overly worried about this as the information has always been retrieved and sorted eventually in the past. I know this is frustrating for those who keep a close tally on credits however. |
Send message Joined: 5 Jul 09 Posts: 63 Credit: 6,091,274 RAC: 0 |
Not worried about credit, it should turn up eventually, it was just a gentle hint that something may need a quick look at:-) Apart from that 3 of them are batch 561 which some were having problems with. Kevin |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,884,997 RAC: 4,577 |
There's a new batch of 186 WAH2 PNW25/21 but none of my machines can download any because: 04/05/2017 14:06:51 | climateprediction.net | Not requesting tasks: some download is stalled I'm tempted to abort the stalled downloads if there is no prospect of the stalled models being unstalled. PS The WAH2 batch number 565 is duplicated with a small HADCM3S test batch on the backup site, but that's only a cosmetic problem. |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
04/05/2017 14:06:51 | climateprediction.net | Not requesting tasks: some download is stalled Was also wondering about aborting the stalled download task I have, though this machine doesn't have any stalled downloads and is now telling me no work is available so perhaps I should give it a bit more of a chance. I had wondered if the reason mine wasn't downloading was why it had been abandoned by previous cruncher but on checking https://www.cpdn.org/cpdnboinc//workunit.php?wuid=10996540 I see it got as far as producing three trickles. So still don't know how global an issue the stuck downloads is. |
Send message Joined: 5 Jul 09 Posts: 63 Credit: 6,091,274 RAC: 0 |
I've had one stuck downloading for a couple of days, and its a _1 A couple of the servers have gone from the server status page so maybe they are still sorting things out. Kevin |
Send message Joined: 17 Aug 04 Posts: 289 Credit: 44,103,664 RAC: 0 |
I'm getting the same: I've had 2 downloads stalled for a couple of days now, and there both _2 2017-05-04 7:21:28 AM | climateprediction.net | update requested by user 2017-05-04 7:21:32 AM | climateprediction.net | Sending scheduler request: Requested by user. 2017-05-04 7:21:32 AM | climateprediction.net | Not requesting tasks: some download is stalled 2017-05-04 7:21:34 AM | climateprediction.net | Scheduler request completed hadcm3s_831b_201412_120_564_011006242_2 hadcm3s_8142_201412_120_564_011003749_2 I was wondered if I should abort the stalled download task? I think I will just wait, the weekend is not far away. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Have the same problem. I have 4 wah2_pnw25 downloads stalled in my transfer tab since last night. |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
Collating information from previous posts, this is affecting at least batches 563, 564 and 565. Will let project know. |
Send message Joined: 31 Aug 04 Posts: 29 Credit: 3,972,828 RAC: 132 |
Add batch 486 I've had a download stalled for almost three days now |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
Am becoming increasingly certain it is all work for download is stalling. That means I wont be aborting any tasks especially as most tasks are retreads at the moment meaning they may well be on their last chance. |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 31,365,622 RAC: 15,545 |
Add batch 406 as well. |
Send message Joined: 15 Jan 11 Posts: 175 Credit: 6,242,691 RAC: 699 |
I had this problem three weeks ago. I posted under "New Work". I eventually aborted all the tasks on three machines, given that the maintenance problems had become acute. they'd been hanging there for days. |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
For me the question is whether it is configuration problems with individual batches where the wrong location is being pointed to for the files to be downloaded as has happened in the past or a global issue with the servers. As some of the tasks in question have at least got as far as downloading on to other computers previously my money is on the latter so I am not aborting any thing unless I hear from the project people either direct or via moderators that this should be done.. |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,884,997 RAC: 4,577 |
I agree, Dave: I've got 406, 499, 506, 561. It looks to me like an infrastructure problem somewhere. I would quite like to run some of the models, even though they're reissues, as they would help fill in some gaps in my cross-machine performance array. However, if they're never going to download then clearly they have to be aborted. |
Send message Joined: 15 May 09 Posts: 4541 Credit: 19,039,635 RAC: 18,944 |
Wednesday 9.30am project is being taken offline FOR IT TO to upgrade THE GPFS (General Parallel File System and not something to do with the Green Party as I first thought, that having dominated my other half's life over past weeks!) Uploads will be diverted to another server but subsetting server will be off line. It is anticipated this will take a day. - Not sure if that means 24 hrs or a working day. |
©2024 cpdn.org