climateprediction.net (CPDN) home page
Thread 'What Happened ???'

Thread 'What Happened ???'

Message boards : Number crunching : What Happened ???
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 7 · Next

AuthorMessage
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 0
Message 58081 - Posted: 18 Apr 2018, 15:31:11 UTC

Can someone explain the long outage ?
ID: 58081 · Report as offensive     Reply Quote
ProfileSaenger
Avatar

Send message
Joined: 1 Nov 04
Posts: 185
Credit: 4,166,063
RAC: 857
Message 58082 - Posted: 18 Apr 2018, 16:18:21 UTC

And could someone explain, why the 3 WUs currently running on my computer are not listed in my task list? The one I reported some time today is missing as well.
Grüße vom Sänger
ID: 58082 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 30,985,838
RAC: 14,284
Message 58083 - Posted: 18 Apr 2018, 17:41:15 UTC
Last modified: 18 Apr 2018, 17:43:33 UTC

All of my tasks on one computer are now missing although work done looks OK (up from zeroes all round). Presumably because of the shift to the backup system. Should we just be patient?
ID: 58083 · Report as offensive     Reply Quote
flashawk

Send message
Joined: 29 Jun 12
Posts: 31
Credit: 1,438,478
RAC: 0
Message 58084 - Posted: 18 Apr 2018, 17:56:11 UTC

I can see that I have 84 "Validation Pending" WU'S, I also had 2 computation errors out of 3 WU's started. I need to clear out all my backup work to start back on CPDN with a full head of steam. I sure hope we don't have an outage that long again.
ID: 58084 · Report as offensive     Reply Quote
Alex Plantema

Send message
Joined: 3 Sep 04
Posts: 126
Credit: 26,610,380
RAC: 3,377
Message 58086 - Posted: 18 Apr 2018, 19:10:01 UTC

Most tasks received after March 21 are missing from the list, and what is shown isn't ordered chronologically.
ID: 58086 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 58088 - Posted: 18 Apr 2018, 20:58:29 UTC - in response to Message 58084.  

I can see that I have 84 "Validation Pending" WU'S, I also had 2 computation errors out of 3 WU's started. I need to clear out all my backup work to start back on CPDN with a full head of steam. I sure hope we don't have an outage that long again.


Forget about the “validation pending”. The WU's have been reported and you got credit for them. The Scientists are using the data. CP doesn’t use that validation system. It’s is used by other Boinc projects, just not CP.

I’m not quite sure what you mean by clear out all the backup work. Are the failed WU’s still showing in your Boinc Manager? Has Boinc Manager had a chance to contact the server since they failed?
ID: 58088 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58089 - Posted: 18 Apr 2018, 22:09:16 UTC - in response to Message 58081.  

Can someone explain the long outage ?


When this site shuts down, people switch to the BOINC site, in particular the Projects section, top post, which is: News on Project Outages, where a message, usually from Andy, is posted.

In this case, a new thread was also created: CPDN project going offline this afternoon, which is full of posts from people, including one that I put there near the end of the thread, which I felt explains things well enough.
ID: 58089 · Report as offensive     Reply Quote
flashawk

Send message
Joined: 29 Jun 12
Posts: 31
Credit: 1,438,478
RAC: 0
Message 58091 - Posted: 19 Apr 2018, 0:38:28 UTC - in response to Message 58088.  

I can see that I have 84 "Validation Pending" WU'S, I also had 2 computation errors out of 3 WU's started. I need to clear out all my backup work to start back on CPDN with a full head of steam. I sure hope we don't have an outage that long again.


Forget about the “validation pending”. The WU's have been reported and you got credit for them. The Scientists are using the data. CP doesn’t use that validation system. It’s is used by other Boinc projects, just not CP.

I’m not quite sure what you mean by clear out all the backup work. Are the failed WU’s still showing in your Boinc Manager? Has Boinc Manager had a chance to contact the server since they failed?


Backup work, other projects that I work on when this one goes down. I'm not sure why you would assume I got credit for anything because I didn't, my RAC and my total credit haven't changed sense before the outage.

They haven't changed for almost 3 weeks, I watched my score after my zips went through. That following Saturday night - Sunday morning nothing changed.
ID: 58091 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58093 - Posted: 19 Apr 2018, 3:54:52 UTC

Most recent message from Andy:

Hi All,

The project is now back online. There was a major issue with the server on which the master database resides. We are now running the project from the slave server. Over the coming weeks there will have to be brief periods of scheduled downtime to switch back to using the master database machine.

Best regards,

Andy


So, the "back up again" is a work in progress.
ID: 58093 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 58094 - Posted: 19 Apr 2018, 7:04:45 UTC - in response to Message 58091.  

Sorry I misunderstood you.

The reason that you haven’t received new credits in the last 3 weeks is that the credit script hasn’t run in since the Saturday before the outage started. (it only runs once a week) If I remember correctly, the outage started on a Friday about 3.5 weeks ago. If the credit script runs this week (a big if) the total and averages should become current. Credits awarded for bacjup projects can be viewed in the “Statistics” tab under that projects name.
ID: 58094 · Report as offensive     Reply Quote
[AF>France>Aquitaine>Cote-Adou...

Send message
Joined: 25 Sep 09
Posts: 12
Credit: 370,712
RAC: 0
Message 58095 - Posted: 19 Apr 2018, 12:11:45 UTC

The 2 wu's running on my PC have been reset to 0%, when one of them was around 90% completed. Is that normal ?
ID: 58095 · Report as offensive     Reply Quote
ProfileSaenger
Avatar

Send message
Joined: 1 Nov 04
Posts: 185
Credit: 4,166,063
RAC: 857
Message 58096 - Posted: 19 Apr 2018, 14:31:41 UTC - in response to Message 58095.  

The 2 wu's running on my PC have been reset to 0%, when one of them was around 90% completed. Is that normal ?

Mine are crunching fine, and even after a forced update they stayed at their percentage. But...
And could someone explain, why the 3 WUs currently running on my computer are not listed in my task list? The one I reported some time today is missing as well.

This question is still open. I'm crunching on three WUs, but none of them is in my computers task list. Should I stop them?
Grüße vom Sänger
ID: 58096 · Report as offensive     Reply Quote
ProfileByron Leigh Hatch @ team Carl ...
Avatar

Send message
Joined: 17 Aug 04
Posts: 289
Credit: 44,103,664
RAC: 0
Message 58097 - Posted: 19 Apr 2018, 14:42:22 UTC
Last modified: 19 Apr 2018, 15:11:40 UTC

I have not uploaded any Ready to report since the servers came back on Line a day ago.
all my zips files have uploaded.

[Edit]
Windows 10 (x64) Pro - (10.00.16299.00)
BOINC 7.8.3 (x64)
on drop down tab under Activity
I have selected:
1) Run Always.
2) Suspend GPU.
3) Network activity Always.
[/Edit]


or are the servers just over loaded, and just time out?

2018-04-19 07:03:57 | climateprediction.net | Scheduler request failed: Timeout was reached

I have only one task crunching:

2018-04-19 04:12:01 | climateprediction.net | Started upload of wah2_afr50_s003_198512_120_542_010978888_1_r179970855_70.zip
2018-04-19 04:12:36 | climateprediction.net | Finished upload of wah2_afr50_s003_198512_120_542_010978888_1_r179970855_70.zip


this wah2_afr50 is on it's 20 th day crunching and 70 zips uploaded so far.

wah2_afr50_s003_198512_120_542_010978888_1_r179970855_70.zip

2018-04-19 06:48:56 | climateprediction.net | Scheduler request failed: Timeout was reached
2018-04-19 06:48:57 | | Project communication failed: attempting access to reference site
2018-04-19 06:48:59 | | Internet access OK - project servers may be temporarily down.
2018-04-19 06:54:48 | climateprediction.net | General prefs: from climateprediction.net (last modified 24-Aug-2017 11:14:23)
2018-04-19 06:54:48 | climateprediction.net | Computer location: school
2018-04-19 06:54:48 | | General prefs: using separate prefs for school
2018-04-19 06:54:48 | | Preferences:
2018-04-19 06:54:48 | | max memory usage when active: 262064.41 MB
2018-04-19 06:54:48 | | max memory usage when idle: 262064.41 MB
2018-04-19 06:54:48 | | max disk usage: 3542.11 GB
2018-04-19 06:54:48 | | (to change preferences, visit a project web site or select Preferences in the Manager)
2018-04-19 07:01:40 | climateprediction.net | update requested by user
2018-04-19 07:01:42 | climateprediction.net | Sending scheduler request: Requested by user.
2018-04-19 07:01:42 | climateprediction.net | Sending trickle-up message
2018-04-19 07:01:42 | climateprediction.net | Reporting 172 completed tasks
2018-04-19 07:01:42 | climateprediction.net | Requesting new tasks for CPU
2018-04-19 07:03:57 | climateprediction.net | Scheduler request failed: Timeout was reached
2018-04-19 07:03:58 | | Project communication failed: attempting access to reference site
2018-04-19 07:04:00 | | Internet access OK - project servers may be temporarily down.

thanks for hints, tips or help.
ID: 58097 · Report as offensive     Reply Quote
KWSN Sir Clark

Send message
Joined: 8 Jul 05
Posts: 33
Credit: 1,274,211
RAC: 0
Message 58098 - Posted: 19 Apr 2018, 15:53:09 UTC
Last modified: 19 Apr 2018, 15:53:53 UTC

Hmmmmmmmm.

I have 7 listed as in progress on here but actually only have two still crunching.
The rest should have been reported.
ID: 58098 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 0
Message 58099 - Posted: 19 Apr 2018, 16:35:18 UTC - in response to Message 58089.  

Can someone explain the long outage ?


When this site shuts down, people switch to the BOINC site, in particular the Projects section, top post, which is: News on Project Outages, where a message, usually from Andy, is posted.

In this case, a new thread was also created: CPDN project going offline this afternoon, which is full of posts from people, including one that I put there near the end of the thread, which I felt explains things well enough.


Sorry Les...I didn't think to look at the Projects section on the BOINC site. Would be great if you could do this consistently when outages exceed 12 hours. Will the future (expected) outages to move back to the primary servers be scheduled with notices to us??

Art Masson
ID: 58099 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58103 - Posted: 19 Apr 2018, 19:25:32 UTC - in response to Message 58096.  

The 2 wu's running on my PC have been reset to 0%, when one of them was around 90% completed. Is that normal ?

Mine are crunching fine, and even after a forced update they stayed at their percentage. But...
And could someone explain, why the 3 WUs currently running on my computer are not listed in my task list? The one I reported some time today is missing as well.

This question is still open. I'm crunching on three WUs, but none of them is in my computers task list. Should I stop them?



This was alluded to in my previous post.

But to be specific: There's still more work to be done, and the matter of tasks on computers not being in the Tasks list on the server is being discussed.
And the problem seems to be worse than what is affecting you.
(e.g. I've got 4 tasks listed that were finished and reported last December.)

I see no reason to stop processing tasks or aborting tasks.
ID: 58103 · Report as offensive     Reply Quote
ProfileByron Leigh Hatch @ team Carl ...
Avatar

Send message
Joined: 17 Aug 04
Posts: 289
Credit: 44,103,664
RAC: 0
Message 58105 - Posted: 19 Apr 2018, 22:03:58 UTC

Hello everyone,
can someone tell me why none of my "Ready to report" have reported in?
new message in Event log:

2018-04-19 14:24:43 | |climateprediction.net | Requesting new tasks for CPU
2018-04-19 14:26:40 | |climateprediction.net | Scheduler request failed: Server returned nothing (no headers, no data)
2018-04-19 14:26:41 | | Project communication failed: attempting access to reference site
2018-04-19 14:26:43 | | Internet access OK - project servers may be temporarily down.

thanks for hints, tips or help.
Byron
ID: 58105 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58106 - Posted: 19 Apr 2018, 22:51:01 UTC - in response to Message 58105.  

If you turn off the request for new work, you may get a more realistic answer.
ID: 58106 · Report as offensive     Reply Quote
ProfileByron Leigh Hatch @ team Carl ...
Avatar

Send message
Joined: 17 Aug 04
Posts: 289
Credit: 44,103,664
RAC: 0
Message 58107 - Posted: 19 Apr 2018, 23:12:05 UTC - in response to Message 58106.  
Last modified: 19 Apr 2018, 23:48:37 UTC

thanks Les will do.

2018-04-19 16:13:21 | climateprediction.net | Not requesting tasks: "no new tasks" requested via Manager
018-04-19 16:15:36 | climateprediction.net | Scheduler request failed: Timeout was reached
2018-04-19 16:15:37 | | Project communication failed: attempting access to reference site
2018-04-19 16:17:12 | climateprediction.net | Fetching scheduler list
2018-04-19 16:17:18 | climateprediction.net | Master file download succeeded
2018-04-19 16:25:43 | climateprediction.net | Sending scheduler request: To send trickle-up message.
2018-04-19 16:25:43 | climateprediction.net | Reporting 172 completed tasks
2018-04-19 16:25:43 | climateprediction.net | Not requesting tasks: "no new tasks" requested via Manager
2018-04-19 16:27:59 | climateprediction.net | Scheduler request failed: Timeout was reached
2018-04-19 16:28:00 | | Project communication failed: attempting access to reference site
2018-04-19 16:28:02 | | Internet access OK - project servers may be temporarily down.
ID: 58107 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 58108 - Posted: 19 Apr 2018, 23:15:09 UTC - in response to Message 58106.  

If you turn off the request for new work, you may get a more realistic answer.


There are obviously still a few things going on, that presumably will sort themselves out. They always do.
Initially with the outage I turned of network access as there is no sense hammering away at something that is down. Then when Byron reported that zips were uploading, I turned it back on, but also changed to 'no new tasks'.
The zips cleared and then as things got restored, my 'ready to reports' uploaded. After the report that things were up and running I then turned on reception for new tasks. Now have full compliment plus some waiting to run.

Still the odd error message, which is to be expected, like the following from an hour ago:
20/04/2018 09:56:18 | climateprediction.net | Not requesting tasks: don't need (job cache full)
20/04/2018 09:56:37 | climateprediction.net | [error] handle_trickle_down failed: unexpected null pointer

Good to see it running again, but as always, more and better communication from the CPDN team with the crunchers is needed. It should not be left up to the forum mods to do this. As it was still up and running, a few tweets on the climateprediction.net home page would have helped.
ID: 58108 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 7 · Next

Message boards : Number crunching : What Happened ???

©2024 cpdn.org