Message boards :
Number crunching :
Server Status page questions
Message board moderation
Author | Message |
---|---|
Send message Joined: 2 Oct 06 Posts: 54 Credit: 27,309,613 RAC: 28,128 |
How accurate is the Server Status page? Specifically the "In progress", "Runtime of last 100 tasks in hours: average, min, max", and "Users in last 24 hours" columns? I ask because something doesn't seem right. Examples: Weather At Home 2 (wah2): 17k tasks in progress, yet the "Users in last 24 hours" column is just 1. I am assuming that column means users that have returned a task in the last 24 hours. Maybe I misunderstand that column. But if it means what I think it does, with 17k tasks in progress, only 1 returned in the last 24 hours? That can't be right. Six other apps have no values for runtime or users columns, yet they have hundreds of tasks in progress each. Surely that cannot be right. And even if the users column is accurate, the runtime of the last 100 should still be populated. So what am I missing? Maybe the issue is that all those thousands of tasks in progress are really ghosts, and not actually in progress? Or...? |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,961,772 RAC: 21,888 |
So what am I missing? Maybe the issue is that all those thousands of tasks in progress are really ghosts, and not actually in progress? Or...?Not sure. but with over 660K computers with credit that could include for a lot of machines that either no longer crunch or even exist. It would be interesting to compare the data from another project when it has no tasks available to download. |
Send message Joined: 5 Aug 04 Posts: 126 Credit: 24,386,260 RAC: 24,059 |
Looking on both CPDN and Rosetta@home's status pages, it seems only applications with "active" users shows any information about 100 last results. Meaning, until someone example return HADAM4 model no information will be displayed for HADAM4. As for "in progress", at least with Rosetta@home's 3-days-deadline this will very quickly drop to zero then available work dries out for any of the applications. With CPDN multi-months deadlines things does go much slower. For "ghosts", many projects does re-issue any "lost" work, example if server sends work but connection craps-out before BOINC client gets the work, but at least back in the day CPDN did not use this server-option. Meaning, unless CPDN have finially started using this server option, it wouldn't be surprising if nearly all work is "ghosts". BTW, since CPDN stopped showing new trickles and stopped giving credit for non-OpenIFS work back in November or December 2022, it's also possible a larger-than-normal number of users has just quit running CPDN. |
Send message Joined: 1 Sep 04 Posts: 161 Credit: 81,522,141 RAC: 1,164 |
I think the Status Page is OK. As mentioned, there are little if any tasks being sent out or being returned right now. So a user count of of 1 in the last 24 hours makes sense to me. An average run time of a task over the last 100 tasks reported of 100 hours also makes sense to me. Last year on this date there were about 31K tasks in progress. Today there is about 26K. There are always a lot of tasks in progress. Many, if not most, never to be returned for various reasons. Dave - It would be hard to compare in-progress counts on other projects to CPDN. Other projects have due dates in days rather than a year. It is not clear if CPDN "deletes" tasks when the due date (1 year typically) is reached. Sometimes I think they do, but other times I don't think so. Other projects would go to zero in-progress tasks in a few days because all overdue tasks would be cancelled. |
Send message Joined: 6 Jul 06 Posts: 147 Credit: 3,615,496 RAC: 420 |
I have also wondered about the server page. UK Met Office Coupled Model Full Resolution Ocean has had 927 tasks "in progress" for many months but I have seen no indication that any have been returned and the number never changes. Weather At Home 2 (wah2) (region independent) has 4,731 tasks in progress again for many months and again I have not seen any activity with this either (maybe 1 came back 4 months ago but can't be sure). What is happening with these work units? Conan |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,961,772 RAC: 21,888 |
What is happening with these work units?In the case of the region independent tasks, I doubt anything is happening. The research that used these is long finished. However I do see the very occasional user returning one on the server status page. CPDN has in the past granted credit for work done after the deadline. At a time when tasks even on a reasonably fast machine of the day could take over six months I don't think this was unreasonable. I hope this is not happening on more recent tasks but I have no idea whether it is or not. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
What is happening with these work units?In the case of the region independent tasks, I doubt anything is happening. The research that used these is long finished. However I do see the very occasional user returning one on the server status page. CPDN has in the past granted credit for work done after the deadline. At a time when tasks even on a reasonably fast machine of the day could take over six months I don't think this was unreasonable. I hope this is not happening on more recent tasks but I have no idea whether it is or not. Which applications exactly are the "region independent tasks?" I keep getting hadam4 WUs and I sure do NOT want to waste my electric bill on useless garbage. If there's obsolete WUs circulating then the project should issue "server aborts" for all of them and clear the decks of the flotsam and jetsam. |
Send message Joined: 28 Jul 19 Posts: 149 Credit: 12,830,559 RAC: 228 |
Fourth line down :- Weather At Home 2 (wah2) (region independent) 0 4731 --- 0 They’re Windows tasks not related to your hadam4 WUs |
Send message Joined: 2 Oct 06 Posts: 54 Credit: 27,309,613 RAC: 28,128 |
4731 tasks in progress that are not needed? That is a huge amount of electricity and time being wasted. IMO, they should be aborted by the server. Folks crunching them are already awarded credits for work done via trickles. So no need to worry about loss of credits. Assuming they are real tasks and not ghosts, of course. This applies to any of the sub-projects that are no longer needed. |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,961,772 RAC: 21,888 |
Assuming they are real tasks and not ghosts, of course.I think they are probably a mixture of ghosts and tasks belonging to computers that are either dead or no longer running boinc. Moderators have raised this with the project in the past and the slab models have since been removed from the server status page. The next to go i would guess will be the coupled model full resolution ocean ones. There will be a lot of the regional wah2 models as well as the region independent ones that also fall in the category of being ghosts/belonging to machines that will never be heard of again. |
Send message Joined: 2 Oct 06 Posts: 54 Credit: 27,309,613 RAC: 28,128 |
Assuming they are real tasks and not ghosts, of course.I think they are probably a mixture of ghosts and tasks belonging to computers that are either dead or no longer running boinc. Moderators have raised this with the project in the past and the slab models have since been removed from the server status page. The next to go i would guess will be the coupled model full resolution ocean ones. There will be a lot of the regional wah2 models as well as the region independent ones that also fall in the category of being ghosts/belonging to machines that will never be heard of again. It sounds like you are saying that obsolete tasks should be removed. But is that a thing that is actually happening now? Ot just something that you think may possibly happen? And if so, when? Clarity is a good thing, which then leads to less questions. But it feels to me like getting clear answers here are rare. Why is it this way with this project? Does it have to be this way? I am not pointing fingers at anyone in particular, because I have no real understanding of the organizational structure, and the various roles of the few people we even know about. Why can't the people doing the actual science and/or the bionic server work take 15 minutes per week and talk about what is going on? For example, WRT adding the ability to choose sub-projects in the bionic project preferences for users. Who is working on that, if anyone? And can we talk to that person? |
Send message Joined: 13 Jun 20 Posts: 6 Credit: 5,301,352 RAC: 176,529 |
I think the answer to the original question is, "the server page is not accurate". For example, the number of tasks in progress under the Computing Status indicates, 45689, whereas the sum of the tasks in progress by application is 25755. The figures do change periodically, but I think that it is pointless to speculate what they really mean because we don't know what happens behind the scenes. Suffice to say that it would be useful for admin to have a tidy up and publish a statement about what's happening both with individual applications and the project as a whole. Nick |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,961,772 RAC: 21,888 |
Clarity is a good thing, which then leads to less questions. But it feels to me like getting clear answers here are rare. Why is it this way with this project?Sometimes the moderators know things for certain because we have been told by someone at the project. For example we usually know when there are tasks running from the testing branch. Other times such as this subject, we are guessing just as you are and this is one of them. The only difference is that the moderators and those who have been with the project since the tasks that would on my machines then which were not even then towards the faster range of the spectrum last nine months have a little more information to base our guesses on. I have over the years learned to live with uncertainty with this project and the lack of accuracy of the server status page is known about by the project people. My main concern is the accuracy of tasks available to send which does tend to be a good guide to what is going on albeit, getting updated only about every two hours means someone who relies on it to know when to switch to CPDN can miss the boat with small batches. |
Send message Joined: 2 Oct 06 Posts: 54 Credit: 27,309,613 RAC: 28,128 |
The numbers for OpenIFS 43r3 Baroclinic Lifecycle just changed dramatically. It was something like 150 in progress, and in a single day it dropped to ~50. Weird. The other two OpenIFS sub projects haven't changed in days. |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,961,772 RAC: 21,888 |
The numbers for OpenIFS 43r3 Baroclinic Lifecycle just changed dramatically. It was something like 150 in progress, and in a single day it dropped to ~50. Weird. The other two OpenIFS sub projects haven't changed in days.It is showing 11 users have returned results in the past 24 hours. I am assuming that users actually means users and not computers in which case I think it is just a bit of a statistical anomaly. It is of course possible that the machines all belong in reality to one person despite the fact that it shows as 11 users returning results in 24 hours. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Weather At Home 2 (wah2): 17k tasks in progress, yet the "Users in last 24 hours" column is just 1. I am assuming that column means users that have returned a task in the last 24 hours. Maybe I misunderstand that column. But if it means what I think it does, with 17k tasks in progress, only 1 returned in the last 24 hours? That can't be right. Well, I completed one today for oifs_43r3_bl. And I am real. This on a Iinux machine. I have another machine that is signed up for CPDN, but it runs Windows 10 and has received no CPDN work since 21 Aug 2022. It does receive work for other projects. Task 22317868 Name oifs_43r3_bl_a4ck_2016092300_15_991_12212423_2 Workunit 12212423 Created 15 Apr 2023, 5:23:15 UTC Sent 15 Apr 2023, 5:24:02 UTC Report deadline 14 Jun 2023, 5:24:02 UTC Received 15 Apr 2023, 12:23:18 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x00000000) Computer ID 1511241 |
Send message Joined: 21 Dec 22 Posts: 5 Credit: 7,815,073 RAC: 6,272 |
The numbers for OpenIFS 43r3 Baroclinic Lifecycle just changed dramatically. It was something like 150 in progress, and in a single day it dropped to ~50. Weird. The other two OpenIFS sub projects haven't changed in days. There was resends of WUs from Feb 13 where deadline expired and was finished by other user. I got one and finished it, expired at this host: https://www.cpdn.org/results.php?hostid=1533008 |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,961,772 RAC: 21,888 |
JagDoc Yes, you have the correct explanation. I have five OIFS tasks currently uploading that are all resends. This is evidence that the shorter deadlines introduced with OIFS work and perhaps could be shortened still further? Just so long as things don't get mixed up because the East Asia regional tasks I have crunching under WINE are going to take 52 days altogether. That means a hundred days on slower machines. While my opinion will get passed on to the project, I have no say in these decisions but would suggest 150 days as an appropriate deadline. This would mean the tasks are not appropriate for those who do not run the tasks 24/7 most of the time. |
Send message Joined: 21 Dec 22 Posts: 5 Credit: 7,815,073 RAC: 6,272 |
JagDoc Yes, you have the correct explanation. I have five OIFS tasks currently uploading that are all resends. This is evidence that the shorter deadlines introduced with OIFS work and perhaps could be shortened still further? Just so long as things don't get mixed up because the East Asia regional tasks I have crunching under WINE are going to take 52 days altogether. That means a hundred days on slower machines. While my opinion will get passed on to the project, I have no say in these decisions but would suggest 150 days as an appropriate deadline. This would mean the tasks are not appropriate for those who do not run the tasks 24/7 most of the time. RNA has at the long running tasks a deadline of 14 days, it will be extended by the server if the host is running the WU further. From the RNA-forum: "If the specified "deadline" is reached, the client reports to the server that it still has work to do on the task, and the server then extends the "deadline" automatically. Unfortunately, the server cannot report the newly scheduled deadline to the client. This all works as long as the machine in question is online at least once every 2 weeks after the initial submission deadline (displayed in the client) (otherwise no "trickle up messages" can be sent)." I think it would be good to do this at the long running models. |
Send message Joined: 15 May 09 Posts: 4535 Credit: 18,961,772 RAC: 21,888 |
RNA has at the long running tasks a deadline of 14 days, it will be extended by the server if the host is running the WU further. Not a bad idea. Details about how often a machine has to contact the server might need to change. A task that takes 52 days running 24/7 is going to take a very long time if only running one day a week. |
©2024 cpdn.org