Message boards : Number crunching : Tasks by application = hoarding
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
The Tasks by Application table at the bottom of the Server Status page https://www.cpdn.org/server_status.php shows that most of the WUs are being hoarded and not even running. Just sitting there going to waste. E.g., UK Met Office HadSM4 at N144 resolution says it has no Unsent WUs but 324 In Progress with 2 users in the last 24 hours. I have 7 of those WUs and they're all running. They only run the credits once a week so how can they know who is running what in the last 24 hours??? The applications page says this project has 221 GigaFLOPS average computing (over what period). My computers running these WUs have about 5 GFLOPs each for about 35 GFLOPs total. I wonder if over 38 other computers are running the other 317 WUs??? That would be on the order of 1585 GFLOPs so it implies that most of those WUs are sitting idle. Being hoarded when someone waiting for work could be running them now. If you can't actually complete them in the next 2 weeks you should Abort them and let others run them. |
Send message Joined: 15 May 09 Posts: 4538 Credit: 19,004,017 RAC: 21,574 |
The "Users in the past 24 hours" figure is actually the number of users who have completed tasks in that time I believe as opposed to the number of computers that have returned the trickle up files that are concurrent with the zip files. (I base this on times when on the testing site, I have returned trickle up files and the testing site still shows no users in past 24 hours but once I complete a testing task, it then shows one user at next server update. The problem is with the very long deadlines that CPDN uses. If tasks were sent out with a deadline of say three months instead of typically around one year, the problem would be greatly reduced. (This has been suggested to the project by moderators on more than one occasion." |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
The N216 tasks take me about 8 1/2 days each/ the N144 ones take much less. these are the hadam4 tasks. Currently I have 6 of these tasks and five of them are running. 22028284 12067898 4 Sep 2021, 22:34:50 UTC 18 Aug 2022, 3:54:50 UTC In progress --- --- --- UK Met Office HadAM4 at N216 resolution v8.52 i686-pc-linux-gnu 22142652 12043851 3 Sep 2021, 17:40:30 UTC 16 Aug 2022, 23:00:30 UTC In progress --- --- --- UK Met Office HadAM4 at N216 resolution v8.52 i686-pc-linux-gnu 22141616 12100818 1 Sep 2021, 3:55:45 UTC 14 Aug 2022, 9:15:45 UTC In progress --- --- 3,230.53 UK Met Office HadAM4 at N144 resolution v8.09 i686-pc-linux-gnu 22140418 12068902 29 Aug 2021, 19:14:03 UTC 12 Aug 2022, 0:34:03 UTC In progress --- --- 13,636.74 UK Met Office HadAM4 at N216 resolution v8.52 i686-pc-linux-gnu 21996027 12050242 28 Aug 2021, 3:06:19 UTC 10 Aug 2022, 8:26:19 UTC In progress --- --- 20,375.94 UK Met Office HadAM4 at N216 resolution v8.52 i686-pc-linux-gnu 22044908 12067218 27 Aug 2021, 3:20:04 UTC 9 Aug 2022, 8:40:04 UTC In progress --- --- 20,375.94 UK Met Office HadAM4 at N216 resolution v8.52 i686-pc-linux-gnu |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
If you can't actually complete them in the next 2 weeks you should Abort them and let others run them. Aw. :( Do you apply that to computers actually running workunits too? I've got a total of 22 CPDN workunits running, making perfectly sane progress, but most of them won't be done in the next 2 weeks, simply because I run them as I have surplus solar - and as we go into winter, that becomes less, so I make less progress each day on the WUs (I typically can get 8-10 hours of compute per calendar day, one box makes 24h but even that's going to start getting put to sleep at nights as the days get shorter and nights get longer). The stuff that estimate as around 14d takes me closer to 2 months to compute, but they do get done, and within the timeouts provided. If the timeout were far shorter, I simply wouldn't be able to contribute to CPDN. However, you recognize that the BOINC reporting framework isn't really "right" for the very long running sort of tasks CPDN runs, but then turn around and use it to demonstrate evidence of hoarding? One of the two can be true, but not both. |
Send message Joined: 6 Aug 04 Posts: 195 Credit: 28,332,519 RAC: 10,361 |
If you can't actually complete them in the next 2 weeks you should Abort them and let others run them.Hmm. This i7-based PC is running four tasks in an ubuntu VM. The timescales range from about 7 days for a hadam4 to 31 days for a hadam4h. In the early days of CPDN, tasks took around 3 months on an Intel P4. I'm quite used to nursing the long deadlines, even though M$ up-chucked today, crashing all bar one of the VM tasks. I can't afford the luxury of a 10th generation AMD or Intel CPU machine. Not sure how many of us would do anything if you want us to abort the long hadam4h's or set the deadline at 2 weeks? I'd rather not waste my time looking for ET. |
Send message Joined: 15 May 09 Posts: 4538 Credit: 19,004,017 RAC: 21,574 |
I can't afford the luxury of a 10th generation AMD or Intel CPU machine. Not sure how many of us would do anything if you want us to abort the long hadam4h's or set the deadline at 2 weeks? I wouldn't worry about the deadline being set to two weeks because it ain't going to happen. My personal preference would be to set it somewhere between three and six months though I am sure others have their own ideas of what would be ideal. My currently dead laptop would take about a month to complete the N216 tasks which my Ryzen7 gets through in about nine days if running 5 at once. I think I can get it down to just over 7 if I restrict how many are running even more. The lower resolution N144 tasks complete in about 3 days. However, the real issue isn't slow computers but computers that are rarely switched on. Some tasks that come back past the deadline are still on fast computers |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
My machine has a 16-core Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz proicessor, but the Boinc-Client is currently allowing only 9 cores to be used; in hot weather, that is set to 8. I normally allow only 4 cores to be used for ClimatePrediction, though I tried 5 for that for a couple of weeks. With four cores running CPDN, the N216 tasks take about 8 days to complete, and with five cores they take 8 1/2 to almost 9 days, so I have cut that down to 4 at a time again. Also, the boinc client sometimes does not run 4 of those tasks at once, since then the WGG tasks do not get enough time. I do not have any N144 tasks at the moment, but my impression is that take a little over three days each. My machine normally runs 24/7, but I reboot it every week or two to put in OS updates. I run Red Hat Enterprise Linux 8.4 |
Send message Joined: 6 Oct 06 Posts: 204 Credit: 7,608,986 RAC: 0 |
I live in arid, hot South Asia. At night when I switch on my Air conditioning(to sleep), all my threads are running. In the morning I switch off the air conditioning and as the day heats up and my machines start heating up, I start suspending tasks. Clock speed increases. If that is hoarding, then so be it. Anyway, the project seems to be undergoing maintenance frequently and spends most of its time in dry-dock. Hoarding is a good idea. Now, any good ideas as to how to hoard WU's? |
Send message Joined: 15 May 09 Posts: 4538 Credit: 19,004,017 RAC: 21,574 |
I live in arid, hot South Asia. At night when I switch on my Air conditioning(to sleep), all my threads are running. In the morning I switch off the air conditioning and as the day heats up and my machines start heating up, I start suspending tasks. Clock speed increases. While it is possible to hoard tasks by say temporarily setting your number of cores available to far above what you use in practice, - counting virtual ones I have 16 cores but much of the time depending on task type run only 5, hoarding can increase the time taken for work units to be returned to the project which really doesn't help the science. However having enough work in the buffer to last ten or even twenty days at the rate you get through the tasks, isn't what I count as hoarding. Also the major problem with this is computers that are either switched off or doing other work to the extent that tasks take six months or more which often renders the results useless for those doing PhD research. |
Send message Joined: 6 Oct 06 Posts: 204 Credit: 7,608,986 RAC: 0 |
I live in arid, hot South Asia. At night when I switch on my Air conditioning(to sleep), all my threads are running. In the morning I switch off the air conditioning and as the day heats up and my machines start heating up, I start suspending tasks. Clock speed increases. ________________ I was feeling a bit feline. Anyway, without hoarding I can download and keep at least forty-eight tasks minimum but as I said, the ambient temperature forces me to decide how many tasks to run at any given time. So, hoarding is not possible or feasible for me. I know people hoard and it is irritating to the extreme but human nature and it has been going on ever since this project started up. That is why I keep bringing up these year-long completion dates. They should be slashed to four months but that is a different story and except for banging your head against a granite wall, useless to even mention. Ask Les, he is the resident expert. |
Send message Joined: 4 Oct 15 Posts: 34 Credit: 9,075,151 RAC: 374 |
i think we have to diferentiate between Linux Tasks and Windows Tasks here. As I am running mostly Linux Tasks in 2 VMs, I have downloadad the number of tasks, i can run at a time + 1 extra task for every VM. Why one extra Task? if one task is finished, it could happen, that I am in the one hour window between requests to the server. So with one extra Task per VM, the next task can start, before the finished one is reported. When there were more Windows tasks available, I usually downloadad double or tripple the amount than I could run at a time, this way they mostly lasted till the next Batch became available. The big difference for me is, I have never seen the server running out of Linux Tasks. So they are available at every time, except for when the Server is down due to maintenance or something like this. |
Send message Joined: 15 May 09 Posts: 4538 Credit: 19,004,017 RAC: 21,574 |
The big difference for me is, I have never seen the server running out of Linux Tasks. So they are available at every time, except for when the Server is down due to maintenance or something like this. Though for many years, it was the other way around. (In the period between the current, "Lots for Linux" and the days when tasks would run on Linux, Windows or Mac. It is possible that there may be periods when it goes back to lots of Windows work but that depends almost entirely on universities away from Oxford where research is being done. |
Send message Joined: 6 Oct 06 Posts: 204 Credit: 7,608,986 RAC: 0 |
I was also thinking over the matter. I am running Linux tasks, so what is the argument about, Windows Tasks? With the amount of memory these Linux tasks use, a person not in his right mind may quite possibly be hoarding a few but that is about it. As for Windows tasks when and if they are available I might shift back from World Community Grid. They have a Climate Africa Model which runs on GPU's. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
None of any of this may matter. It's the researchers who decide what happens to generated data. If some part(s) of their data set is not being returned when they need it, they can just re-issue it under a different label, and ignore the original request if/when it eventually gets returned, months after they have produced their results. If you want your efforts to be useful, then get the work crunched and returned as fast as possible. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Not to my knowledge unless it started today. And I didn't get notified of a new beta. Their OPNG COVID tasks run on GPUs, but that's it I believe. Their climate model tasks run on linux/windows/mac in 32 bit and 64 bit. |
Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228 |
Is the impression of hoarding created by the very high number of tasks shown as in progress for applications that have not issued tasks for some time and where the active users shows zero? This, surely, is an historical issue of failed tasks that have not been crossed off the list of tasks outstanding. Would it be possible to synchronise the number shown as outstanding which the number of tasks that are still being processed? |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
Is the impression of hoarding created by the very high number of tasks shown as in progress for applications that have not issued tasks for some time and where the active users shows zero? Yes, there are ghost tasks. I have two out of 8 WUs in progress. One of the ghosts was issued in 2014 and its deadline is 2023. So yeah I run it for 7 years. Several times there have been requests to clean up the ghosts. Not much result. Yes detach, reattach from the project sometimes work, but not always. And yes a shorter deadline circa 4-6 months is completely reasonable to accommodate older machines who run other projects as well. Reissuing tasks might be useful for researches but I've crunched numerous times batches that were no longer of interest to anyone. Yeah my machines saved the last 3rd or 5th attempt of the WU after few years idling on someone's computer. Old batches are not always pulled out. Sometimes I had to manually abort WUs no to waste resources on WUs of no interest. Shorter deadline could fix that as well, but hey it seems too much to ask every time this pops up. |
Send message Joined: 6 Oct 06 Posts: 204 Credit: 7,608,986 RAC: 0 |
_______________________ Most of my COVID run on ARM architecture which has no GPU. As to the laptops, I have allowed all types to run. They come, they go. Maybe I might catch which WU is making use of my GPU. Anyway, hoarding is a zero-sum game. I can get hold of a lot of Windows tasks in cache mode. My settings are such. 24 threads, ten plus ten days but I myself mark no further WU's after 36 WU's. It is useless, selfish to grab further WU's. As to Linux tasks, it is useless to hoard those also. However, I have a lot of ghost WU's on my account, of which I have no knowledge or the server has no knowledge. Maybe, lost in transmission. Which reminds me of a secret internet Black Hole. WU's enter it and then vanish. I wish someone would clean up our accounts pages. My account page shows I have twenty in progress but the fact is, I only have eight WU's. The rest, I have no idea except for the Black Hole theory. P.S. I checked, COVID 19 is quietly using my GPU on the laptops. |
Send message Joined: 1 Sep 04 Posts: 161 Credit: 81,522,141 RAC: 1,164 |
KAMasud - As pointed out elsewhere, on WCC the Africa Rainfall Project does NOT have GPU tasks. Only the Open Pandemics Project does. As far as your "ghost" tasks, I am not sure what you mean by your "accounts page". According to the CPDN server your Computer #1 (the one with the GPU) has 1 task in progress, Computer #2 (I710750H) has 7 in progress, and Computer #3 (I7-8750H) has 6 in progress. All of these tasks were downloaded in the last month or two. If the Tasks tab in the BOINC Manager on your computer shows more that these, you have "lost" or "ghost" tasks. To clear this up, when you have NO tasks on your computer according to the CPDN Server, go to the Projects tab and Remove the project. Wait 10 minutes. Add the Project back. This has always worked for me. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Yeah my machines saved the last 3rd or 5th attempt of the WU after few years idling on someone's computer. Old batches are not always pulled out. I notice this quite frequently, when I bother to look. I am always amazed to see four failures, and yet my machine has no trouble processing the work unit. Most recently Workunit 12043851, Workunit 12100818, |
©2024 cpdn.org