Message boards : Number crunching : Trickles stop new work arriving
Message board moderation
Author | Message |
---|---|
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
If you're running several WAH tasks, the trickles are resetting the 1 hour timer for getting new work, so it's very difficult to get more work. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,705,793 RAC: 9,655 |
Suspend network activity until the timer runs down, and everything can be done in a single burst when you allow it again. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,705,793 RAC: 9,655 |
Actually, forget that. I think your basic premise is wrong. If a trickle becomes due while scheduler contact is backed off (perhaps because of a trickle report from another task), it will be held in a queue until the backoff time has passed. Then, the scheduler will be contacted and all pending operations will be completed in a batch - a work fetch request (if deemed necessary), and all pending trickles reported. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
The problem arises if say you run another project at lower priority, and have maybe 3/4s of your threads running CPDN, and a 1/4 on the other project. At some point, there will be less than the buffer you've set, and this point could come when the CPDN server is backed off due to a recent trickle up. Therefore Boinc will ask the other project, ad infinitum. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,705,793 RAC: 9,655 |
Then suspend a few unstarted tasks for the lower priority project, and let it work off the cache for a while. If you've suspended enough for a work fetch to be needed, it will be done alongside any new trickle reports at the end of the backoff hour. CPDN doesn't need new work often enough to make that an onerous chore. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
I prefer Boinc to automate as much as possible. I spend enough time repairing the hardware! I just thought perhaps it might be an easy setting to change on the server. |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 41,744,071 RAC: 63,130 |
Boinc client's scheduling left a lot to be desired honestly. The trick I use in this situation is to set low-priority project's share to 0 whenever work shows up for high priority projects. That way, boinc client will only fetch minimal number of tasks to fill all the cores, but not the full buffer. When next time CPDN updates, it will request new work. It's not perfect, but at least I only need to manage the project shares occasionally given how sporadic CPDN work is. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
I have several projects set to 10,000 priority - all the rare ones. Currently though, Denis and CPDN (two of these rare ones) are fighting over my cores. Other projects I just want to run normally, I have set to 100. I use 0 only where a computer can run out due to project outages. With this setup, Denis (or even one of the 100 priority projects) will grab an excrementload of work when CPDN says "last contact too recent". Why Boinc gets so much I don't know - for example: 24 threads, with 20 occupied by CPDN (I set CPDN to 2 threads per task as it seems to work just as well overall and gets each one done faster). Boinc runs out of other project work to do so needs my 2 day buffer for 4 threads. So why does it ask Denis for 2 days of work for all 24 threads?!? Since Denis has a 3 day deadline, and CPDN has a 3 month deadline (do they really not want them back sooner?), Boinc panics and shoves Denis on first. And it stays that way until the odd time it can get another CPDN task. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,705,793 RAC: 9,655 |
A sidebar on all this. When a task finishes, it first reports a final trickle, and (a few seconds later) starts to upload the final data file. Here's the timing on one of my machines: 04/02/2024 09:27:28 | climateprediction.net | Sending scheduler request: To send trickle-up message. 04/02/2024 09:27:29 | climateprediction.net | Project requested delay of 3636 seconds 04/02/2024 09:29:52 | climateprediction.net | Computation for task wah2_nz25_n2fo_200705_25_1005_012257314_2 finished 04/02/2024 09:29:58 | climateprediction.net | Finished upload of wah2_nz25_n2fo_200705_25_1005_012257314_2_r1186546406_out.zip (1265821 bytes)If you're using the project setting for the maximum number of tasks in progress, and you have that number running, you can't have a spare task ready to start immediately - that machine will have to wait for about 58 minutes before reporting/fetching. I'm thinking of trying a project setting of n+1 tasks, and using other tools like 'no new tasks' to control the actual resource use. |
©2024 cpdn.org