climateprediction.net (CPDN) home page
Thread 'BOINC scheduler reducing the number of CPDNs?'

Thread 'BOINC scheduler reducing the number of CPDNs?'

Message boards : Number crunching : BOINC scheduler reducing the number of CPDNs?
Message board moderation

To post messages, you must log in.

AuthorMessage
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 61678 - Posted: 13 Dec 2019, 19:06:39 UTC

I have all my Ubuntu 18.04.3 machines that run CPDN (four of them at the moment) set to run four CPDN work units, which I accomplish via the Resource Share vis-a-vis the other projects.

For several weeks, that has worked fine; four have been running regularly. But for the past week, things have been going downhill.
Only one of my machines is now running four CPDNs; two more are running only three, and one is running only two.

It is the latter that I think is the strangest. It is an i7-9700 with eight full cores. Both Rosetta and CPDN are set for 100% resource share. Not much to go wrong, is there?
But after running four CPDN for a few weeks, it went down to three about a week ago, and is now down to two. I may be out of the CPDN business before long. I could use app_configs to set it, but that raises some other problems.

I am now on BONC 7.16.3., which may have something to do with it. It has a few other strangenesses too.
ID: 61678 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61679 - Posted: 13 Dec 2019, 19:59:20 UTC - in response to Message 61678.  

It's more to do with Use at most nn % of the CPUs
than resource share.
On the other hand, I had both of mine set for 4 (50%), and last time one of them got 5.

Computers. :(
ID: 61679 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61680 - Posted: 13 Dec 2019, 20:49:38 UTC

And BOINC may have decided, over a recent period of time, that it needs to devote more resources to your other projects, so it's starting to cut back on climate models.
And the latest climate models are getting resource hungry.
ID: 61680 · Report as offensive     Reply Quote
WB8ILI

Send message
Joined: 1 Sep 04
Posts: 161
Credit: 81,522,141
RAC: 1,164
Message 61681 - Posted: 13 Dec 2019, 21:06:45 UTC

Jim1348 -

You wrote that you have your Resource Share for CPDN set to 100%. That is possible ONLY if CPDN is the only project on your computer.

For example, if you set the Resource Share to 100 (not a percent) for CPDN and 100 for SETI, each project get 50% of your resources.

When you only have one project, it doesn't matter what the Resource Share is. One or 100 is the same percent.

Now, what does Resource Share mean?

I am fairly certain that I figured out a few years ago that it has to with your credits.

Using the 100/100 (50%/50%) example above, if your credits for SETI are "falling behind" because you ran a bunch of CPDN tasks, the BOINC scheduler is going to cut back on CPDN tasks and run more SETI tasks until you get back to the 50%/50% balance you specified.

If anyone thinks I am off base on this, I would be interested in knowing.
ID: 61681 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61682 - Posted: 13 Dec 2019, 21:26:04 UTC - in response to Message 61681.  

Sorry, but credits are just a decorative add on. It's all about the science.
ID: 61682 · Report as offensive     Reply Quote
WB8ILI

Send message
Joined: 1 Sep 04
Posts: 161
Credit: 81,522,141
RAC: 1,164
Message 61683 - Posted: 14 Dec 2019, 2:16:29 UTC

Les - I stand corrected about the credit.

The following is from the BOINC website.

Resource share

The amount of computing resources (CPU time, disk space) allocated to a project is proportional to this number. The default is 100.
Note: At World Community Grid this option is titled "Project Weight".
Note: this is not a percentage. If a computer has 2 projects added, each with resource share 100, each project will get half the resources.
If a project is given a resource share of 0 it will not receive any resources unless other projects are unable to provide tasks. Using the value 0 is known as 'setting a backup project': you are advised always to leave at least one project with a non-zero resource share, otherwise the backup project system cannot function normally.


But, I think i was sort of on the right track. In my 50%/50% example doesn't the above explanation mean that if you run a bunch of CPDN tasks that used a lot of CPU time (and disk space) eventually the BOINC scheduler will allocate more SETI tasks to "even up the score" back to 50%/50%? There is no explanation of the time period used to allocate tasks to make the CPU/disk space proportional to the specified resources.
ID: 61683 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61685 - Posted: 14 Dec 2019, 8:33:42 UTC - in response to Message 61683.  

Finally got my net connection back.

WB8ILI

Your last paragraph is about right.
And what I was trying to say earlier. Not well, I guess.

With multi-projects, BOINC "dithers" about how much work to get from each. It usually can't get it exactly right, because different tasks may take a lot different amount of time to what it's finally worked out for that project, from previous work.
And if the supplied completion time value is way off, that really throws a spanner into the works.

Your last sentence is right - way back, (apparently), BOINC took weeks to get a completion estimate.
I've read that more recent versions of the code work differently, and don't take as long.
Although whether this short guess is better or worse is another matter. :)
ID: 61685 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 61687 - Posted: 14 Dec 2019, 12:47:40 UTC - in response to Message 61685.  

Your last sentence is right - way back, (apparently), BOINC took weeks to get a completion estimate.
I've read that more recent versions of the code work differently, and don't take as long.
Although whether this short guess is better or worse is another matter. :)

I always include "<rec_half_life_days>1.000000</rec_half_life_days>" in my cc_config Options in order to shorten this time.
The projects had been running equally for about a month anyway before the problem started.
(And 100% resource share to each is the same as 50% each, or any other percentages. It is the ratio of each to the sum of all that counts.)

By the way, one machine has gone back up to four CPDNs, but another has gone down to two. So I am treading water here.
ID: 61687 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 61688 - Posted: 14 Dec 2019, 16:25:38 UTC - in response to Message 61687.  
Last modified: 14 Dec 2019, 17:02:07 UTC

And my Ryzen 3700x is now down to one CPDN (N144). I thought for certain that when the last two non-CPDN work units (QuChemPedIA's) ended, then another CPDN would start up. But no, still just two more QuChemPedIA's started.
I don't think the projects themselves can set priorities, but who knows. Maybe the CPDNs just like standing in a queue?

EDIT: I have often seen the case where the BOINC scheduler will prefer one project over another for a while, rather than running both projects at the same time in the proper proportion. But it corrects over time.

Here, I have an app_config limit on CPDN to not run more than four work units at a time, but when it runs fewer, it doesn't automatically try to get back up to four. I think the new BOINC 7.16.3 is getting confused over the long CPDN run times. Someone may need to un-confuse it, preferably before it is released in final form.

EDIT2: It helps to keep a longer buffer. I normally use the default of 0.10 + 0.50 days. By increasing that to 1.0 + 1.50 days, I am back to four CPDNs on my i7-9700 at least. Whether that works over a long period of time is another matter.
ID: 61688 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 61689 - Posted: 14 Dec 2019, 17:52:44 UTC - in response to Message 61688.  
Last modified: 14 Dec 2019, 18:10:18 UTC

Finally, I will just note that the problem seems to have started when I upgraded from BOINC 7.14.2 to 7.16.3.
That is, BOINC 7.14.2 was willing to let me download the long CPDN work units even though they would go past the 0.1 + 0.5 day buffer size.

It appears that BOINC 7.16.3 is a bit more restrictive, though 1.0 + 1.5 days is still less than the N144 estimates of over 3 days on my machines. We will have to adjust our buffer sizes accordingly. It is always a learning experience with BOINC.
ID: 61689 · Report as offensive     Reply Quote

Message boards : Number crunching : BOINC scheduler reducing the number of CPDNs?

©2024 cpdn.org