climateprediction.net (CPDN) home page
Thread 'Scheduler request too recent'

Thread 'Scheduler request too recent'

Message boards : Number crunching : Scheduler request too recent
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Harri Liljeroos

Send message
Joined: 9 Dec 05
Posts: 116
Credit: 12,547,934
RAC: 2,738
Message 61884 - Posted: 2 Jan 2020, 19:05:52 UTC - in response to Message 61880.  

Yep, many projects want to have the cake and eat it too.
ID: 61884 · Report as offensive     Reply Quote
wolfman1360

Send message
Joined: 18 Feb 17
Posts: 81
Credit: 14,024,464
RAC: 5,225
Message 61886 - Posted: 2 Jan 2020, 23:55:14 UTC - in response to Message 61884.  

Yep, many projects want to have the cake and eat it too.

Many projects want to be actively acting on results, being able to make the decisions may heavily hinge on this science being completed. People hording these workunits then turning their computer off for days, if not weeks at a time, aborting the workunits because they take too long, or failing to notice the correct libraries are installed and erroring out workunits certainly isn't helping matters.

Though I am not versed in the specific history, I feel like this project's deadlines were likely founded back when numbered Pentium processors (pentium 3, pentium 4) were in use and the amount of time to complete a single workunit took months for a single machine to complete, even when on 24/7.
It's really quite amazing where we've come since then.
ID: 61886 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 61887 - Posted: 3 Jan 2020, 2:59:14 UTC - in response to Message 61886.  

Though I am not versed in the specific history, I feel like this project's deadlines were likely founded back when numbered Pentium processors (pentium 3, pentium 4) were in use and the amount of time to complete a single workunit took months for a single machine to complete, even when on 24/7.
It's really quite amazing where we've come since then.


Your absolutely right. I was running this project back then on a single core laptop with a 1.2GHz processor with 256Kb of RAM. That right ΒΌ of a GB of RAM. The swap files got a lot of use. That was about 12 or 13 years ago. It took 8 or 9 months to finish a single 160 year WU.
ID: 61887 · Report as offensive     Reply Quote
Harri Liljeroos

Send message
Joined: 9 Dec 05
Posts: 116
Credit: 12,547,934
RAC: 2,738
Message 61888 - Posted: 3 Jan 2020, 6:39:45 UTC

I recall that the project does not actually use/need the deadlines. They are there just because BOINC environment requires to have one. The project wants to have the results returned asap but not all results are required to make meaningful science. The results will probably go through some statistical analysis before being used. When workunits, that are sent out again in case first recipient did not return it by deadline, they often have download errors because original files were already deleted from the server. This suggest to me that not all results are vital (otherwise they would not have been deleted) but of course if more results are returned in timely fashion it is better.
ID: 61888 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,020,584
RAC: 20,684
Message 61889 - Posted: 3 Jan 2020, 10:34:45 UTC - in response to Message 61888.  

I recall that the project does not actually use/need the deadlines. They are there just because BOINC environment requires to have one.


While true that the project does not currently use deadlines, if they were to start using them, it would enable tasks to be re-issued in time to be of use to the researchers. At the time things were set up with long deadlines, all the work was generated and processed at Oxford. Now universities from all over the world generate the work and often do need it much more quickly. These days while they might still get credit, tasks returned a year after being issued are of no use to the scientists.
ID: 61889 · Report as offensive     Reply Quote
Hal Bregg

Send message
Joined: 20 Nov 18
Posts: 20
Credit: 816,342
RAC: 1,139
Message 61892 - Posted: 3 Jan 2020, 18:27:33 UTC - in response to Message 61889.  

I recall that the project does not actually use/need the deadlines. They are there just because BOINC environment requires to have one.


While true that the project does not currently use deadlines, if they were to start using them, it would enable tasks to be re-issued in time to be of use to the researchers. At the time things were set up with long deadlines, all the work was generated and processed at Oxford. Now universities from all over the world generate the work and often do need it much more quickly. These days while they might still get credit, tasks returned a year after being issued are of no use to the scientists.


So what is desireable return time for crunched data to be useful?
ID: 61892 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61894 - Posted: 3 Jan 2020, 20:21:37 UTC - in response to Message 61892.  

ASAP
But within a month or so for most model types.
My turn around for the N144's is about 3 days 17 hours.

The expectation of at least one of the researchers, is that people will only download the same number of tasks as they have processor cores, and run them immediately, without stopping to run work from other projects.
Then get another bunch and do the same.

Which doesn't take into account that most people joining over the past couple of years seem to have no particular interest in the climate models, and load up with lots of work from other projects as well.

And they can always close the batch and go with what they're got. Which, I think, will kill the ones still on peoples computers.
ID: 61894 · Report as offensive     Reply Quote
Hal Bregg

Send message
Joined: 20 Nov 18
Posts: 20
Credit: 816,342
RAC: 1,139
Message 61897 - Posted: 3 Jan 2020, 21:56:08 UTC - in response to Message 61894.  

ASAP

The expectation of at least one of the researchers, is that people will only download the same number of tasks as they have processor cores, and run them immediately, without stopping to run work from other projects.
Then get another bunch and do the same.



It makes perfect sense to donwload amount of work that can be done within relatively short time. I have to admit that I used to download few days worth jobs for few different projects at the same time. Right now when my hardware stock decresed and it's fairly old I'm trying to crucnh one or two projects at the same time and setup BOINC manager to download new work once WUs are completed.

You mentioned about returning work ASAP. This might be tricky for people with old hardware used for distributed computing.
ID: 61897 · Report as offensive     Reply Quote
lazlo_vii

Send message
Joined: 11 Dec 19
Posts: 108
Credit: 3,012,142
RAC: 0
Message 61898 - Posted: 3 Jan 2020, 22:19:42 UTC - in response to Message 61897.  
Last modified: 3 Jan 2020, 22:30:05 UTC

...You mentioned about returning work ASAP. This might be tricky for people with old hardware used for distributed computing.


This is just my two cents.

ASAP should be the goal, but not a number. My systems are set to keep 1 extra day of work each and that makes my turnaround time about 5 days on very modern hardware. For some reason BOINC thinks that 1 day of extra work means keeping at least one extra job per CPU. If I set it to zero extra work I could cut that in half. Older systems might take a few weeks to crunch a model so then a few weeks is ASAP and you might want to think twice about downloading extra work. If you have "legacy" hardware that can't do the work in less than a month or so then you can still do the work. Even if it comes in later than desired it might still be of use to a future research team that is re-using the data.


EDIT:
Les said
ASAP
But within a month or so for most model types.


and I do not think that is unreasonable even if it excludes some legacy CPU boxes. Humanity needs answers to questions we are still learning how to ask and we must use the right tools for the job if we are going to learn enough to make a real difference in our mitigation efforts against climate change.
ID: 61898 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 61899 - Posted: 3 Jan 2020, 23:38:04 UTC

Hal sadi:

You mentioned about returning work ASAP. This might be tricky for people with old hardware used for distributed computing.

For old gear, ASAP might mean 2-3 times as long.
As long as it isn't: Run it for a day, then pause for a week to run other projects.
That's were the problems start.

And multi-project running is what makes putting-a-number-to-it so hard.

************

Lazlo said:
For some reason BOINC thinks that 1 day of extra work means keeping at least one extra job per CPU

I think that enforcing a queue may trigger some setting in BOINC that ignores the length of climate models compared to those from other projects, hence the double work amount.

So let's include another qualifier:
Don't use the Store at least --- days of work
or
Store up to an additional --- days of work

settings. Leave them at the default.

Which will most likely exclude another large number of users, who now have to figure it out themselves.

********************

My mixed bunch is now just under 3 days done, and just over 1 day to go.
ASAP!
:)
ID: 61899 · Report as offensive     Reply Quote
lazlo_vii

Send message
Joined: 11 Dec 19
Posts: 108
Credit: 3,012,142
RAC: 0
Message 61900 - Posted: 3 Jan 2020, 23:51:57 UTC - in response to Message 61899.  

If someone starts "Team Turnaround" where all the members follow these guidelines I will join.
ID: 61900 · Report as offensive     Reply Quote
wolfman1360

Send message
Joined: 18 Feb 17
Posts: 81
Credit: 14,024,464
RAC: 5,225
Message 61901 - Posted: 4 Jan 2020, 0:59:59 UTC - in response to Message 61900.  

If someone starts "Team Turnaround" where all the members follow these guidelines I will join.

I'd be up for this, certainly.
I just started running WCG alongside CPDN. I'm back in my happy place - two of my favorite projects.
But now boinc decided that because I've been crunching CPDN for so long, WCG can use all the available resources. I am hoping it will eventually let CPDN run alongside and they can be good together. My idea is to run max concurrent jobs in an app config with CPDN.
I could micromanage, but I don't have the inclination to babysit like I had to do with other projects (mostly vbox based). I also don't run more than 2 projects on a single host - and if I do, the third is it's at 0% resources in case the main two run out of work for whatever reason. The work will certainly be done in less than a month - even on hardware from 2010, a Bloomfield Xeon 3530, it seems to be plodding steadily along.

I think something else that hurts these projects is many people run these like folding at home - set it and forget it - and don't look at the message boards to view recommended settings or optimizations, memory requirements, etc.
ID: 61901 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 61905 - Posted: 4 Jan 2020, 10:42:38 UTC - in response to Message 61899.  

We've been discussing deadlines numerous times, I wonder can't we finally get WUs with 2-3 months deadline top? This will accommodate older hardware and will clean up the queue significantly/
ID: 61905 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,808,726
RAC: 5,192
Message 61906 - Posted: 4 Jan 2020, 12:18:43 UTC - in response to Message 61905.  

We've been discussing deadlines numerous times, I wonder can't we finally get WUs with 2-3 months deadline top? This will accommodate older hardware and will clean up the queue significantly/

... it used to be the case that the argument was: "if CPDN reduces deadlines then CPDN grabs all CPUs at the expense of other projects, which is not being a good BOINC citizen".
ID: 61906 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 61907 - Posted: 4 Jan 2020, 12:24:22 UTC - in response to Message 61906.  

... it used to be the case that the argument was: "if CPDN reduces deadlines then CPDN grabs all CPUs at the expense of other projects, which is not being a good BOINC citizen".


Is it still the case? Resource share is a viable option to overcome this.
ID: 61907 · Report as offensive     Reply Quote
lazlo_vii

Send message
Joined: 11 Dec 19
Posts: 108
Credit: 3,012,142
RAC: 0
Message 61909 - Posted: 5 Jan 2020, 2:24:20 UTC - in response to Message 61907.  
Last modified: 5 Jan 2020, 2:41:10 UTC

Is it still the case? Resource share is a viable option to overcome this.


Based on what Les said a few posts ago it seems that the Resource Share option may be one of the root causes of the problem. If BOINC sees one project with jobs that have a one week deadline and another project with jobs that have a one year deadline and it also sees the BOINC client has a full queue for both projects it will try to get as many jobs done as it can before the deadlines in either project. In this case it means that CPDN jobs will sit idle so long that they become useless to the researchers before they are finished crunching unless the user does some very careful micromanagement of the Resource Share settings. Micromanagement of a GUI setting in not just counter intuitive, it usually points to a large underlying problem.

Instead of using Resource Share on modern systems with multiple cores a far better option is to use multiple installations of BIONC on each computer to run each project simultaneously. On Windows machines this is complicated, but on Linux, *BSD, and MacOS systems it is far easier because you can use containers and jails. In either case the users will have to overcome a learning curve to make it happen.

EDIT: I guess the real lesson here is that BOINC was conceived and designed at a point in time when PCs only had one CPU and it really does expect you to run one and only one project per PC. The Resource Share option seem to me like an after thought. A bit like "Dial-up Networking" in Windows 95. It isn't really designed to do it from the ground up but it is commonly used because consumer demand has evolved in a way the developers never imagined.
ID: 61909 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,020,584
RAC: 20,684
Message 61911 - Posted: 5 Jan 2020, 7:12:40 UTC - in response to Message 61909.  

Instead of using Resource Share on modern systems with multiple cores a far better option is to use multiple installations of BIONC on each computer to run each project simultaneously. On Windows machines this is complicated, but on Linux, *BSD, and MacOS systems it is far easier because you can use containers and jails. In either case the users will have to overcome a learning curve to make it happen.


I suspect that most of the computers where this issue is the biggest problem belong to those who largely set and forget and who will rarely if ever read the forums to know there is a problem, never mind do something about it and go through that learning curve.
ID: 61911 · Report as offensive     Reply Quote
bernard_ivo

Send message
Joined: 18 Jul 13
Posts: 438
Credit: 25,620,508
RAC: 4,981
Message 61912 - Posted: 5 Jan 2020, 16:38:23 UTC - in response to Message 61909.  



Based on what Les said a few posts ago it seems that the Resource Share option may be one of the root causes of the problem. If BOINC sees one project with jobs that have a one week deadline and another project with jobs that have a one year deadline and it also sees the BOINC client has a full queue for both projects it will try to get as many jobs done as it can before the deadlines in either project. In this case it means that CPDN jobs will sit idle so long that they become useless to the researchers before they are finished crunching unless the user does some very careful micromanagement of the Resource Share settings. Micromanagement of a GUI setting in not just counter intuitive, it usually points to a large underlying problem.


I run WCG and CPDN with resource share 12.5% to 75% (and 12.5% for WUprop) and I rarely have idle CPDN WUs. To be sure I often set no new tasks for WCG in order to ensure full CPDN load when the hopper is full. I also sometimes suspend and resume WCG when CPDN is left idle at 98% perhaps because of the long deadline. Yes this is micromanagement, but with shorter CPDN deadlines I may need to do less so.

I doubt I would invest time to learn to launch multiple BOINC instances on my current 6 machines and micromanage them. One per machine should suffice.
ID: 61912 · Report as offensive     Reply Quote
Mephist0

Send message
Joined: 21 Feb 08
Posts: 47
Credit: 7,929,915
RAC: 0
Message 61944 - Posted: 9 Jan 2020, 22:22:36 UTC - in response to Message 61912.  

Might be a noob question, how do i change the resource between the projects? i want to have like 95% for CPDN and 5% for a backup project... i wish there would have been a setting to finish the backup project and then only do CPDN until all WU is processed and then request new work from the backup project.....
ID: 61944 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 30,992,465
RAC: 14,585
Message 61945 - Posted: 9 Jan 2020, 23:01:09 UTC - in response to Message 61944.  

You need to set the resource share in the computing preferences for each project on your account page for that project.
ID: 61945 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Scheduler request too recent

©2024 cpdn.org