climateprediction.net (CPDN) home page
Thread 'Server Status page questions'

Thread 'Server Status page questions'

Message boards : Number crunching : Server Status page questions
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
zombie67 [MM]
Avatar

Send message
Joined: 2 Oct 06
Posts: 54
Credit: 27,309,613
RAC: 28,128
Message 68600 - Posted: 18 Mar 2023, 3:00:13 UTC

How accurate is the Server Status page? Specifically the "In progress", "Runtime of last 100 tasks in hours: average, min, max", and "Users in last 24 hours" columns? I ask because something doesn't seem right. Examples:

Weather At Home 2 (wah2): 17k tasks in progress, yet the "Users in last 24 hours" column is just 1. I am assuming that column means users that have returned a task in the last 24 hours. Maybe I misunderstand that column. But if it means what I think it does, with 17k tasks in progress, only 1 returned in the last 24 hours? That can't be right.

Six other apps have no values for runtime or users columns, yet they have hundreds of tasks in progress each. Surely that cannot be right. And even if the users column is accurate, the runtime of the last 100 should still be populated.

So what am I missing? Maybe the issue is that all those thousands of tasks in progress are really ghosts, and not actually in progress? Or...?
ID: 68600 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,013,957
RAC: 21,195
Message 68601 - Posted: 18 Mar 2023, 9:32:04 UTC - in response to Message 68600.  

So what am I missing? Maybe the issue is that all those thousands of tasks in progress are really ghosts, and not actually in progress? Or...?
Not sure. but with over 660K computers with credit that could include for a lot of machines that either no longer crunch or even exist. It would be interesting to compare the data from another project when it has no tasks available to download.
ID: 68601 · Report as offensive     Reply Quote
Ingleside

Send message
Joined: 5 Aug 04
Posts: 127
Credit: 24,447,557
RAC: 24,147
Message 68602 - Posted: 18 Mar 2023, 13:36:01 UTC - in response to Message 68600.  

Looking on both CPDN and Rosetta@home's status pages, it seems only applications with "active" users shows any information about 100 last results. Meaning, until someone example return HADAM4 model no information will be displayed for HADAM4.

As for "in progress", at least with Rosetta@home's 3-days-deadline this will very quickly drop to zero then available work dries out for any of the applications. With CPDN multi-months deadlines things does go much slower.

For "ghosts", many projects does re-issue any "lost" work, example if server sends work but connection craps-out before BOINC client gets the work, but at least back in the day CPDN did not use this server-option. Meaning, unless CPDN have finially started using this server option, it wouldn't be surprising if nearly all work is "ghosts".

BTW, since CPDN stopped showing new trickles and stopped giving credit for non-OpenIFS work back in November or December 2022, it's also possible a larger-than-normal number of users has just quit running CPDN.
ID: 68602 · Report as offensive     Reply Quote
WB8ILI

Send message
Joined: 1 Sep 04
Posts: 161
Credit: 81,522,141
RAC: 1,164
Message 68603 - Posted: 18 Mar 2023, 16:16:09 UTC

I think the Status Page is OK.

As mentioned, there are little if any tasks being sent out or being returned right now. So a user count of of 1 in the last 24 hours makes sense to me.

An average run time of a task over the last 100 tasks reported of 100 hours also makes sense to me.

Last year on this date there were about 31K tasks in progress. Today there is about 26K. There are always a lot of tasks in progress. Many, if not most, never to be returned for various reasons.

Dave - It would be hard to compare in-progress counts on other projects to CPDN. Other projects have due dates in days rather than a year. It is not clear if CPDN "deletes" tasks when the due date (1 year typically) is reached. Sometimes I think they do, but other times I don't think so. Other projects would go to zero in-progress tasks in a few days because all overdue tasks would be cancelled.
ID: 68603 · Report as offensive     Reply Quote
ProfileConan
Avatar

Send message
Joined: 6 Jul 06
Posts: 147
Credit: 3,615,496
RAC: 420
Message 68604 - Posted: 19 Mar 2023, 6:10:29 UTC
Last modified: 19 Mar 2023, 6:10:53 UTC

I have also wondered about the server page.

UK Met Office Coupled Model Full Resolution Ocean has had 927 tasks "in progress" for many months but I have seen no indication that any have been returned and the number never changes.

Weather At Home 2 (wah2) (region independent) has 4,731 tasks in progress again for many months and again I have not seen any activity with this either (maybe 1 came back 4 months ago but can't be sure).

What is happening with these work units?

Conan
ID: 68604 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,013,957
RAC: 21,195
Message 68605 - Posted: 19 Mar 2023, 8:26:35 UTC

What is happening with these work units?
In the case of the region independent tasks, I doubt anything is happening. The research that used these is long finished. However I do see the very occasional user returning one on the server status page. CPDN has in the past granted credit for work done after the deadline. At a time when tasks even on a reasonably fast machine of the day could take over six months I don't think this was unreasonable. I hope this is not happening on more recent tasks but I have no idea whether it is or not.
ID: 68605 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 99
Credit: 18,701,746
RAC: 318
Message 68606 - Posted: 19 Mar 2023, 10:02:48 UTC - in response to Message 68605.  
Last modified: 19 Mar 2023, 10:13:02 UTC

What is happening with these work units?
In the case of the region independent tasks, I doubt anything is happening. The research that used these is long finished. However I do see the very occasional user returning one on the server status page. CPDN has in the past granted credit for work done after the deadline. At a time when tasks even on a reasonably fast machine of the day could take over six months I don't think this was unreasonable. I hope this is not happening on more recent tasks but I have no idea whether it is or not.

Which applications exactly are the "region independent tasks?"
I keep getting hadam4 WUs and I sure do NOT want to waste my electric bill on useless garbage.
If there's obsolete WUs circulating then the project should issue "server aborts" for all of them and clear the decks of the flotsam and jetsam.
ID: 68606 · Report as offensive     Reply Quote
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 150
Credit: 12,830,559
RAC: 228
Message 68607 - Posted: 19 Mar 2023, 10:31:37 UTC - in response to Message 68606.  


Which applications exactly are the "region independent tasks?"
I keep getting hadam4 WUs and I sure do NOT want to waste my electric bill on useless garbage.
If there's obsolete WUs circulating then the project should issue "server aborts" for all of them and clear the decks of the flotsam and jetsam.



Fourth line down :-

Weather At Home 2 (wah2) (region independent) 0 4731 --- 0

They’re Windows tasks not related to your hadam4 WUs
ID: 68607 · Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 2 Oct 06
Posts: 54
Credit: 27,309,613
RAC: 28,128
Message 68608 - Posted: 19 Mar 2023, 11:46:37 UTC

4731 tasks in progress that are not needed? That is a huge amount of electricity and time being wasted. IMO, they should be aborted by the server. Folks crunching them are already awarded credits for work done via trickles. So no need to worry about loss of credits. Assuming they are real tasks and not ghosts, of course.

This applies to any of the sub-projects that are no longer needed.
ID: 68608 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,013,957
RAC: 21,195
Message 68609 - Posted: 19 Mar 2023, 13:34:19 UTC - in response to Message 68608.  

Assuming they are real tasks and not ghosts, of course.
I think they are probably a mixture of ghosts and tasks belonging to computers that are either dead or no longer running boinc. Moderators have raised this with the project in the past and the slab models have since been removed from the server status page. The next to go i would guess will be the coupled model full resolution ocean ones. There will be a lot of the regional wah2 models as well as the region independent ones that also fall in the category of being ghosts/belonging to machines that will never be heard of again.
ID: 68609 · Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 2 Oct 06
Posts: 54
Credit: 27,309,613
RAC: 28,128
Message 68610 - Posted: 20 Mar 2023, 5:54:44 UTC - in response to Message 68609.  

Assuming they are real tasks and not ghosts, of course.
I think they are probably a mixture of ghosts and tasks belonging to computers that are either dead or no longer running boinc. Moderators have raised this with the project in the past and the slab models have since been removed from the server status page. The next to go i would guess will be the coupled model full resolution ocean ones. There will be a lot of the regional wah2 models as well as the region independent ones that also fall in the category of being ghosts/belonging to machines that will never be heard of again.



It sounds like you are saying that obsolete tasks should be removed. But is that a thing that is actually happening now? Ot just something that you think may possibly happen? And if so, when?

Clarity is a good thing, which then leads to less questions. But it feels to me like getting clear answers here are rare. Why is it this way with this project? Does it have to be this way? I am not pointing fingers at anyone in particular, because I have no real understanding of the organizational structure, and the various roles of the few people we even know about. Why can't the people doing the actual science and/or the bionic server work take 15 minutes per week and talk about what is going on? For example, WRT adding the ability to choose sub-projects in the bionic project preferences for users. Who is working on that, if anyone? And can we talk to that person?
ID: 68610 · Report as offensive     Reply Quote
BellyNitpicker

Send message
Joined: 13 Jun 20
Posts: 6
Credit: 5,301,352
RAC: 176,529
Message 68611 - Posted: 20 Mar 2023, 7:33:10 UTC

I think the answer to the original question is, "the server page is not accurate".

For example, the number of tasks in progress under the Computing Status indicates, 45689, whereas the sum of the tasks in progress by application is 25755.

The figures do change periodically, but I think that it is pointless to speculate what they really mean because we don't know what happens behind the scenes.

Suffice to say that it would be useful for admin to have a tidy up and publish a statement about what's happening both with individual applications and the project as a whole.
Nick
ID: 68611 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,013,957
RAC: 21,195
Message 68612 - Posted: 20 Mar 2023, 8:05:35 UTC
Last modified: 20 Mar 2023, 9:44:11 UTC

Clarity is a good thing, which then leads to less questions. But it feels to me like getting clear answers here are rare. Why is it this way with this project?
Sometimes the moderators know things for certain because we have been told by someone at the project. For example we usually know when there are tasks running from the testing branch. Other times such as this subject, we are guessing just as you are and this is one of them. The only difference is that the moderators and those who have been with the project since the tasks that would on my machines then which were not even then towards the faster range of the spectrum last nine months have a little more information to base our guesses on.

I have over the years learned to live with uncertainty with this project and the lack of accuracy of the server status page is known about by the project people. My main concern is the accuracy of tasks available to send which does tend to be a good guide to what is going on albeit, getting updated only about every two hours means someone who relies on it to know when to switch to CPDN can miss the boat with small batches.
ID: 68612 · Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 2 Oct 06
Posts: 54
Credit: 27,309,613
RAC: 28,128
Message 68649 - Posted: 15 Apr 2023, 13:51:47 UTC

The numbers for OpenIFS 43r3 Baroclinic Lifecycle just changed dramatically. It was something like 150 in progress, and in a single day it dropped to ~50. Weird. The other two OpenIFS sub projects haven't changed in days.
ID: 68649 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,013,957
RAC: 21,195
Message 68650 - Posted: 15 Apr 2023, 15:42:39 UTC - in response to Message 68649.  

The numbers for OpenIFS 43r3 Baroclinic Lifecycle just changed dramatically. It was something like 150 in progress, and in a single day it dropped to ~50. Weird. The other two OpenIFS sub projects haven't changed in days.
It is showing 11 users have returned results in the past 24 hours. I am assuming that users actually means users and not computers in which case I think it is just a bit of a statistical anomaly. It is of course possible that the machines all belong in reality to one person despite the fact that it shows as 11 users returning results in 24 hours.
ID: 68650 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 68652 - Posted: 15 Apr 2023, 17:51:44 UTC - in response to Message 68600.  

Weather At Home 2 (wah2): 17k tasks in progress, yet the "Users in last 24 hours" column is just 1. I am assuming that column means users that have returned a task in the last 24 hours. Maybe I misunderstand that column. But if it means what I think it does, with 17k tasks in progress, only 1 returned in the last 24 hours? That can't be right.


Well, I completed one today for oifs_43r3_bl. And I am real. This on a Iinux machine. I have another machine that is signed up for CPDN, but it runs Windows 10 and has received no CPDN work since 21 Aug 2022. It does receive work for other projects.

Task 22317868
Name 	        oifs_43r3_bl_a4ck_2016092300_15_991_12212423_2
Workunit 	12212423
Created 	15 Apr 2023, 5:23:15 UTC
Sent 	        15 Apr 2023, 5:24:02 UTC
Report deadline 14 Jun 2023, 5:24:02 UTC
Received 	15 Apr 2023, 12:23:18 UTC
Server state 	Over
Outcome 	Success
Client state 	Done
Exit status 	0 (0x00000000)
Computer ID 	1511241

ID: 68652 · Report as offensive     Reply Quote
JagDoc

Send message
Joined: 21 Dec 22
Posts: 5
Credit: 7,823,377
RAC: 5,613
Message 68657 - Posted: 16 Apr 2023, 6:43:17 UTC - in response to Message 68649.  

The numbers for OpenIFS 43r3 Baroclinic Lifecycle just changed dramatically. It was something like 150 in progress, and in a single day it dropped to ~50. Weird. The other two OpenIFS sub projects haven't changed in days.

There was resends of WUs from Feb 13 where deadline expired and was finished by other user.
I got one and finished it, expired at this host:
https://www.cpdn.org/results.php?hostid=1533008
ID: 68657 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,013,957
RAC: 21,195
Message 68658 - Posted: 16 Apr 2023, 8:02:58 UTC

JagDoc Yes, you have the correct explanation. I have five OIFS tasks currently uploading that are all resends. This is evidence that the shorter deadlines introduced with OIFS work and perhaps could be shortened still further? Just so long as things don't get mixed up because the East Asia regional tasks I have crunching under WINE are going to take 52 days altogether. That means a hundred days on slower machines. While my opinion will get passed on to the project, I have no say in these decisions but would suggest 150 days as an appropriate deadline. This would mean the tasks are not appropriate for those who do not run the tasks 24/7 most of the time.
ID: 68658 · Report as offensive     Reply Quote
JagDoc

Send message
Joined: 21 Dec 22
Posts: 5
Credit: 7,823,377
RAC: 5,613
Message 68659 - Posted: 16 Apr 2023, 8:56:38 UTC - in response to Message 68658.  

JagDoc Yes, you have the correct explanation. I have five OIFS tasks currently uploading that are all resends. This is evidence that the shorter deadlines introduced with OIFS work and perhaps could be shortened still further? Just so long as things don't get mixed up because the East Asia regional tasks I have crunching under WINE are going to take 52 days altogether. That means a hundred days on slower machines. While my opinion will get passed on to the project, I have no say in these decisions but would suggest 150 days as an appropriate deadline. This would mean the tasks are not appropriate for those who do not run the tasks 24/7 most of the time.

RNA has at the long running tasks a deadline of 14 days, it will be extended by the server if the host is running the WU further.
From the RNA-forum:
"If the specified "deadline" is reached, the client reports to the server that it still has work to do on the task, and the server then extends the "deadline" automatically. Unfortunately, the server cannot report the newly scheduled deadline to the client. This all works as long as the machine in question is online at least once every 2 weeks after the initial submission deadline (displayed in the client) (otherwise no "trickle up messages" can be sent)."

I think it would be good to do this at the long running models.
ID: 68659 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,013,957
RAC: 21,195
Message 68660 - Posted: 16 Apr 2023, 9:42:24 UTC - in response to Message 68659.  

RNA has at the long running tasks a deadline of 14 days, it will be extended by the server if the host is running the WU further.
From the RNA-forum:
"If the specified "deadline" is reached, the client reports to the server that it still has work to do on the task, and the server then extends the "deadline" automatically. Unfortunately, the server cannot report the newly scheduled deadline to the client. This all works as long as the machine in question is online at least once every 2 weeks after the initial submission deadline (displayed in the client) (otherwise no "trickle up messages" can be sent)."

I think it would be good to do this at the long running models.


Not a bad idea. Details about how often a machine has to contact the server might need to change. A task that takes 52 days running 24/7 is going to take a very long time if only running one day a week.
ID: 68660 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Server Status page questions

©2024 cpdn.org