Questions and Answers : Unix/Linux : Multiple CP task management
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Jun 06 Posts: 20 Credit: 1,349,578 RAC: 0 |
Sorry if this is already asked, nothing jumped out looking at topics. Not too long ago, I started seeing some tasks which have estimated running times on the order of 1000 hours. I had mostly been seeing tasks of about 1 day. My machine has 2 cores, and hence if boinc tasks are running, I only have 2 tasks running. If there is one CP task running and one other task running, at the current time the CP task which is running is this 1000 hour one. Which means the other task (about half done) just sits there. I have manually suspended the big task, to get this other task close to completion, and then I will release the hold in the hope the other one will manage to get some time "accidentally" (luck of the round robin draw, so to speak). But is there something else I should do to keep this small task from being stalled by the long task? |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,025,554 RAC: 20,468 |
you can go to your preferences under your account page for CP and your other project and edit the resource share. If you set CP at 200 and the other project at 100 CP will get twice as much cpu time as the other one. There is more about this in the preferences section of the forum. |
Send message Joined: 28 Jun 06 Posts: 20 Credit: 1,349,578 RAC: 0 |
I had juggled those settings in the past. I'll just go with the manual suspend to see how that works for now. Thanks. |
Send message Joined: 6 Jul 06 Posts: 147 Credit: 3,615,496 RAC: 420 |
Are you sure that your preferences are set to use 100% of CPU and to use all 2 cores available? If it is not on 100% then BOINC sees the 1000 hour work unit as more important and throws most resources at that problem. Conan |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Also, does your computer run all day, or only for a few hours a day? |
Send message Joined: 28 Jun 06 Posts: 20 Credit: 1,349,578 RAC: 0 |
Computer runs all the time, and BOINC can use all the CPU if there are no tasks running in the foreground, so to speak. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
It's best to just let BOINC get on with working out how long each type of model takes on YOUR computer. If your computer is thinking 1000 hours, then it's way out, and will slowly decrease that estimate as it runs the model. On my Ivy Bridge, the Moses II models took about 31 hours each, the long hadam3p eu models are taking about 45 hours, and the short hadam3p eu models are taking about 11 hours. The MOSES II models were mostly in beta testing which had many versions, and I'm not sure which ones made it to the main site. I'll get another one, and see how it goes. |
Send message Joined: 31 Aug 04 Posts: 37 Credit: 9,581,380 RAC: 3,853 |
Following up from what Les said... On my main machine, an i7-4770s (3.1GHz clock) running Ubuntu 14.04 and using hyperthreading, the longest jobs I have seen ran for under 300 hours - these were hadcm3n. hadcm3s jobs typically took just under a day, the longer hadam3p_eu jobs about 67 hours, the short hadam3p_eu jobs about 17 hours. All quite quick... The ones that really mess with BOINC Manager's Tasks display are the "original" Moses II jobs - hadam3pm2 - there seems to be a problem in the way they communicate status, and when they report 8.3% progress they are, in fact, nearly finished!!! (I think there's a "factor of 12" problem in there somewhere!) Note, however, that Moses+Triffid tasks (hadam3prm3pm2t_eu) seem to show an accurate progress rate. In practice, on my main machine a Moses II (hadam3pm2) job will run for about 175 hours, whilst a Moses+Triffid job will run for about 180 hours. As I don't know what hardware you have, I can't guess how long a job might take on your machine; a slower machine of mine (an i3-2100 at the same clock rate) typically takes about 50% longer to run. I've never run CP jobs on my laptop, so I've no idea how they'd go on a 2GHz clock machine... Hope the above might reassure you somewhat. As Les says, you might as well just leave it to it! |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Back again. I got 2 of the UK Met Office HadAM3P (global only) with MOSES II landsurface scheme v7.03 , and the estimate to completion is 241 hours. They're going to have to wait a few hours until some shorter models finish, but they may end up at about 127 hours, which is what something similar took last year on beta. |
Send message Joined: 28 Jun 06 Posts: 20 Credit: 1,349,578 RAC: 0 |
This long job is a hadam3p. It is 192 hours in, 710 hours to go. So, inaccurate time estimation isn't the problem. AMD 64 X2 4800+ dual core. Not a new CPU, but it beat the heck out of the VAX 11/785 I did my M.Eng. on in the mid 1980's. I am going to put in a new machine that should be about 30% faster, which on 900 to 1000 hours for my old machine, is still a long job. |
Send message Joined: 31 Mar 13 Posts: 44 Credit: 6,950,896 RAC: 0 |
You have done 106,628 timesteps and a completed task runs to 348,548 timesteps, so you have done just over thirty procent. http://climateapps2.oerc.ox.ac.uk/cpdnboinc/trickle.php?resultid=17770651 http://climateapps2.oerc.ox.ac.uk/cpdnboinc/trickle.php?resultid=17488554 |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
fortran There was a model type loose with a wrong time-to-completion estimate. It was beta tested as a 1 year, and released without change, but as a 10 year type. So the estimate is one tenth of the correct value. It'll be another few hours before I start the 2 that I have, and then I'll be able to see if that's the problem. And email the project if it is. However, I've found a main site one on my Haswell here from last November. If you look at the trickle list at the bottom, you'll see that it was running at about 1.65 seconds per timestep, whereas yours, here, is running at about 5.25 seconds per timestep. So yours is about 3 times slower, and will take about 3 times longer to finish. According to my logs, mine took 113 hours to complete. If I remember correctly, I noted that the percentage estimate was about 8 or 9 percent completed when it finished. And there's a thread here in Number crunching about this problem, from back in November. 3 hours and 49 minutes before I finish the others, and then the 2 MOSES II models can have the computer all to themselves. |
Send message Joined: 28 Jun 06 Posts: 20 Credit: 1,349,578 RAC: 0 |
I am not complaining about how long it takes. It takes what it takes. I have been doing numerical methods for a long time, I don't have a problem with long run times. I just noticed that this 1000 hour job was taking cycles in preference to a "normal" job which is around 1 or 2 days. That normal job finished, and climate prediction hasn't downloaded any more jobs. So I still have this one long job running (along with jobs from 2 other projects). Which is fine. I just thought that if this sort of situation is common, it might be better if the short job finished earlier. But from the early responses in this thread, that ability is not present in BOINC, or ClimatePrediction's use of BOINC. Which is fine, I can put a manual suspend on the long job to get the short job through. I suppose some people run these models to see the pictures. I don't even know if pictures are available. I just run the models. Occasionally I do some Monte Carlo stuff, and can chew up a few hours of CPU time with that, and BOINC waits, which is what it should do. But BOINC is the biggest consistent load my computer sees. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
BOINC has a correction factor to allow it to learn how much processing time each project takes for it's work. But there's only one per computer per project, so for cpdn, which has both very long and very short tasks, it has problems juggling this value. A long time ago, version 5 I think, a rough rule of thumb was that it took BOINC about 10 completed tasks from a project to "learn" about that project. For cpdn, this meant/means a LONG time. And I don't know what applies now, in version 7. The "correction factor" is Task duration correction factor, which can be found near the bottom of each computer's page in your account page. The best way to deal with a mix of long and short models, is to let BOINC get on with things without manual adjustments. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,025,554 RAC: 20,468 |
This task http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=17814513 name hadam3pm2_k2us_1959_10_009464927_2 has so far produced 8zips yet is only showing as 6.79% complete. Not too worried - it is on a laptop with a dodgy screen and battery which needs replacing as it only lasts about 15 minutes. Machine will be retired (freecycled) when task finishes but it illustrates the problem. I will be unable to check the machine for about a week but am hoping it might be close to finishing when I do. |
©2024 cpdn.org