Message boards : Number crunching : Scheduler request too recent
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 9 Dec 05 Posts: 116 Credit: 12,547,934 RAC: 2,738 |
Yep, many projects want to have the cake and eat it too. |
Send message Joined: 18 Feb 17 Posts: 81 Credit: 14,024,464 RAC: 5,225 |
Yep, many projects want to have the cake and eat it too. Many projects want to be actively acting on results, being able to make the decisions may heavily hinge on this science being completed. People hording these workunits then turning their computer off for days, if not weeks at a time, aborting the workunits because they take too long, or failing to notice the correct libraries are installed and erroring out workunits certainly isn't helping matters. Though I am not versed in the specific history, I feel like this project's deadlines were likely founded back when numbered Pentium processors (pentium 3, pentium 4) were in use and the amount of time to complete a single workunit took months for a single machine to complete, even when on 24/7. It's really quite amazing where we've come since then. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Though I am not versed in the specific history, I feel like this project's deadlines were likely founded back when numbered Pentium processors (pentium 3, pentium 4) were in use and the amount of time to complete a single workunit took months for a single machine to complete, even when on 24/7. Your absolutely right. I was running this project back then on a single core laptop with a 1.2GHz processor with 256Kb of RAM. That right ΒΌ of a GB of RAM. The swap files got a lot of use. That was about 12 or 13 years ago. It took 8 or 9 months to finish a single 160 year WU. |
Send message Joined: 9 Dec 05 Posts: 116 Credit: 12,547,934 RAC: 2,738 |
I recall that the project does not actually use/need the deadlines. They are there just because BOINC environment requires to have one. The project wants to have the results returned asap but not all results are required to make meaningful science. The results will probably go through some statistical analysis before being used. When workunits, that are sent out again in case first recipient did not return it by deadline, they often have download errors because original files were already deleted from the server. This suggest to me that not all results are vital (otherwise they would not have been deleted) but of course if more results are returned in timely fashion it is better. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,029,695 RAC: 19,917 |
I recall that the project does not actually use/need the deadlines. They are there just because BOINC environment requires to have one. While true that the project does not currently use deadlines, if they were to start using them, it would enable tasks to be re-issued in time to be of use to the researchers. At the time things were set up with long deadlines, all the work was generated and processed at Oxford. Now universities from all over the world generate the work and often do need it much more quickly. These days while they might still get credit, tasks returned a year after being issued are of no use to the scientists. |
Send message Joined: 20 Nov 18 Posts: 20 Credit: 816,342 RAC: 1,139 |
I recall that the project does not actually use/need the deadlines. They are there just because BOINC environment requires to have one. So what is desireable return time for crunched data to be useful? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
ASAP But within a month or so for most model types. My turn around for the N144's is about 3 days 17 hours. The expectation of at least one of the researchers, is that people will only download the same number of tasks as they have processor cores, and run them immediately, without stopping to run work from other projects. Then get another bunch and do the same. Which doesn't take into account that most people joining over the past couple of years seem to have no particular interest in the climate models, and load up with lots of work from other projects as well. And they can always close the batch and go with what they're got. Which, I think, will kill the ones still on peoples computers. |
Send message Joined: 20 Nov 18 Posts: 20 Credit: 816,342 RAC: 1,139 |
ASAP It makes perfect sense to donwload amount of work that can be done within relatively short time. I have to admit that I used to download few days worth jobs for few different projects at the same time. Right now when my hardware stock decresed and it's fairly old I'm trying to crucnh one or two projects at the same time and setup BOINC manager to download new work once WUs are completed. You mentioned about returning work ASAP. This might be tricky for people with old hardware used for distributed computing. |
Send message Joined: 11 Dec 19 Posts: 108 Credit: 3,012,142 RAC: 0 |
...You mentioned about returning work ASAP. This might be tricky for people with old hardware used for distributed computing. This is just my two cents. ASAP should be the goal, but not a number. My systems are set to keep 1 extra day of work each and that makes my turnaround time about 5 days on very modern hardware. For some reason BOINC thinks that 1 day of extra work means keeping at least one extra job per CPU. If I set it to zero extra work I could cut that in half. Older systems might take a few weeks to crunch a model so then a few weeks is ASAP and you might want to think twice about downloading extra work. If you have "legacy" hardware that can't do the work in less than a month or so then you can still do the work. Even if it comes in later than desired it might still be of use to a future research team that is re-using the data. EDIT: Les said ASAP and I do not think that is unreasonable even if it excludes some legacy CPU boxes. Humanity needs answers to questions we are still learning how to ask and we must use the right tools for the job if we are going to learn enough to make a real difference in our mitigation efforts against climate change. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Hal sadi:
For old gear, ASAP might mean 2-3 times as long. As long as it isn't: Run it for a day, then pause for a week to run other projects. That's were the problems start. And multi-project running is what makes putting-a-number-to-it so hard. ************ Lazlo said: For some reason BOINC thinks that 1 day of extra work means keeping at least one extra job per CPU I think that enforcing a queue may trigger some setting in BOINC that ignores the length of climate models compared to those from other projects, hence the double work amount. So let's include another qualifier: Don't use the Store at least --- days of work or Store up to an additional --- days of work settings. Leave them at the default. Which will most likely exclude another large number of users, who now have to figure it out themselves. ******************** My mixed bunch is now just under 3 days done, and just over 1 day to go. ASAP! :) |
Send message Joined: 11 Dec 19 Posts: 108 Credit: 3,012,142 RAC: 0 |
If someone starts "Team Turnaround" where all the members follow these guidelines I will join. |
Send message Joined: 18 Feb 17 Posts: 81 Credit: 14,024,464 RAC: 5,225 |
If someone starts "Team Turnaround" where all the members follow these guidelines I will join. I'd be up for this, certainly. I just started running WCG alongside CPDN. I'm back in my happy place - two of my favorite projects. But now boinc decided that because I've been crunching CPDN for so long, WCG can use all the available resources. I am hoping it will eventually let CPDN run alongside and they can be good together. My idea is to run max concurrent jobs in an app config with CPDN. I could micromanage, but I don't have the inclination to babysit like I had to do with other projects (mostly vbox based). I also don't run more than 2 projects on a single host - and if I do, the third is it's at 0% resources in case the main two run out of work for whatever reason. The work will certainly be done in less than a month - even on hardware from 2010, a Bloomfield Xeon 3530, it seems to be plodding steadily along. I think something else that hurts these projects is many people run these like folding at home - set it and forget it - and don't look at the message boards to view recommended settings or optimizations, memory requirements, etc. |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
We've been discussing deadlines numerous times, I wonder can't we finally get WUs with 2-3 months deadline top? This will accommodate older hardware and will clean up the queue significantly/ |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,817,837 RAC: 5,198 |
We've been discussing deadlines numerous times, I wonder can't we finally get WUs with 2-3 months deadline top? This will accommodate older hardware and will clean up the queue significantly/ ... it used to be the case that the argument was: "if CPDN reduces deadlines then CPDN grabs all CPUs at the expense of other projects, which is not being a good BOINC citizen". |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
... it used to be the case that the argument was: "if CPDN reduces deadlines then CPDN grabs all CPUs at the expense of other projects, which is not being a good BOINC citizen". Is it still the case? Resource share is a viable option to overcome this. |
Send message Joined: 11 Dec 19 Posts: 108 Credit: 3,012,142 RAC: 0 |
Is it still the case? Resource share is a viable option to overcome this. Based on what Les said a few posts ago it seems that the Resource Share option may be one of the root causes of the problem. If BOINC sees one project with jobs that have a one week deadline and another project with jobs that have a one year deadline and it also sees the BOINC client has a full queue for both projects it will try to get as many jobs done as it can before the deadlines in either project. In this case it means that CPDN jobs will sit idle so long that they become useless to the researchers before they are finished crunching unless the user does some very careful micromanagement of the Resource Share settings. Micromanagement of a GUI setting in not just counter intuitive, it usually points to a large underlying problem. Instead of using Resource Share on modern systems with multiple cores a far better option is to use multiple installations of BIONC on each computer to run each project simultaneously. On Windows machines this is complicated, but on Linux, *BSD, and MacOS systems it is far easier because you can use containers and jails. In either case the users will have to overcome a learning curve to make it happen. EDIT: I guess the real lesson here is that BOINC was conceived and designed at a point in time when PCs only had one CPU and it really does expect you to run one and only one project per PC. The Resource Share option seem to me like an after thought. A bit like "Dial-up Networking" in Windows 95. It isn't really designed to do it from the ground up but it is commonly used because consumer demand has evolved in a way the developers never imagined. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,029,695 RAC: 19,917 |
Instead of using Resource Share on modern systems with multiple cores a far better option is to use multiple installations of BIONC on each computer to run each project simultaneously. On Windows machines this is complicated, but on Linux, *BSD, and MacOS systems it is far easier because you can use containers and jails. In either case the users will have to overcome a learning curve to make it happen. I suspect that most of the computers where this issue is the biggest problem belong to those who largely set and forget and who will rarely if ever read the forums to know there is a problem, never mind do something about it and go through that learning curve. |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
I run WCG and CPDN with resource share 12.5% to 75% (and 12.5% for WUprop) and I rarely have idle CPDN WUs. To be sure I often set no new tasks for WCG in order to ensure full CPDN load when the hopper is full. I also sometimes suspend and resume WCG when CPDN is left idle at 98% perhaps because of the long deadline. Yes this is micromanagement, but with shorter CPDN deadlines I may need to do less so. I doubt I would invest time to learn to launch multiple BOINC instances on my current 6 machines and micromanage them. One per machine should suffice. |
Send message Joined: 21 Feb 08 Posts: 47 Credit: 7,929,915 RAC: 0 |
Might be a noob question, how do i change the resource between the projects? i want to have like 95% for CPDN and 5% for a backup project... i wish there would have been a setting to finish the backup project and then only do CPDN until all WU is processed and then request new work from the backup project..... |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 31,004,889 RAC: 14,391 |
You need to set the resource share in the computing preferences for each project on your account page for that project. |
©2024 cpdn.org