climateprediction.net home page
Trickles stop new work arriving

Trickles stop new work arriving

Message boards : Number crunching : Trickles stop new work arriving
Message board moderation

To post messages, you must log in.

AuthorMessage
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 70298 - Posted: 3 Feb 2024, 4:16:07 UTC
Last modified: 3 Feb 2024, 4:16:20 UTC

If you're running several WAH tasks, the trickles are resetting the 1 hour timer for getting new work, so it's very difficult to get more work.
ID: 70298 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,690,861
RAC: 10,559
Message 70299 - Posted: 3 Feb 2024, 9:30:33 UTC - in response to Message 70298.  

Suspend network activity until the timer runs down, and everything can be done in a single burst when you allow it again.
ID: 70299 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,690,861
RAC: 10,559
Message 70300 - Posted: 3 Feb 2024, 10:28:07 UTC

Actually, forget that. I think your basic premise is wrong.

If a trickle becomes due while scheduler contact is backed off (perhaps because of a trickle report from another task), it will be held in a queue until the backoff time has passed.

Then, the scheduler will be contacted and all pending operations will be completed in a batch - a work fetch request (if deemed necessary), and all pending trickles reported.
ID: 70300 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 70301 - Posted: 3 Feb 2024, 10:59:57 UTC - in response to Message 70300.  

The problem arises if say you run another project at lower priority, and have maybe 3/4s of your threads running CPDN, and a 1/4 on the other project. At some point, there will be less than the buffer you've set, and this point could come when the CPDN server is backed off due to a recent trickle up. Therefore Boinc will ask the other project, ad infinitum.
ID: 70301 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,690,861
RAC: 10,559
Message 70302 - Posted: 3 Feb 2024, 11:09:37 UTC - in response to Message 70301.  

Then suspend a few unstarted tasks for the lower priority project, and let it work off the cache for a while. If you've suspended enough for a work fetch to be needed, it will be done alongside any new trickle reports at the end of the backoff hour.

CPDN doesn't need new work often enough to make that an onerous chore.
ID: 70302 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 70303 - Posted: 3 Feb 2024, 11:34:17 UTC - in response to Message 70302.  

I prefer Boinc to automate as much as possible. I spend enough time repairing the hardware!

I just thought perhaps it might be an easy setting to change on the server.
ID: 70303 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 127
Credit: 41,541,636
RAC: 58,436
Message 70308 - Posted: 3 Feb 2024, 20:41:35 UTC - in response to Message 70301.  
Last modified: 3 Feb 2024, 20:43:44 UTC

Boinc client's scheduling left a lot to be desired honestly. The trick I use in this situation is to set low-priority project's share to 0 whenever work shows up for high priority projects. That way, boinc client will only fetch minimal number of tasks to fill all the cores, but not the full buffer. When next time CPDN updates, it will request new work. It's not perfect, but at least I only need to manage the project shares occasionally given how sporadic CPDN work is.
ID: 70308 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 70309 - Posted: 3 Feb 2024, 23:08:04 UTC - in response to Message 70308.  

I have several projects set to 10,000 priority - all the rare ones. Currently though, Denis and CPDN (two of these rare ones) are fighting over my cores.
Other projects I just want to run normally, I have set to 100.
I use 0 only where a computer can run out due to project outages.

With this setup, Denis (or even one of the 100 priority projects) will grab an excrementload of work when CPDN says "last contact too recent".

Why Boinc gets so much I don't know - for example:
24 threads, with 20 occupied by CPDN (I set CPDN to 2 threads per task as it seems to work just as well overall and gets each one done faster).
Boinc runs out of other project work to do so needs my 2 day buffer for 4 threads.
So why does it ask Denis for 2 days of work for all 24 threads?!?
Since Denis has a 3 day deadline, and CPDN has a 3 month deadline (do they really not want them back sooner?), Boinc panics and shoves Denis on first. And it stays that way until the odd time it can get another CPDN task.
ID: 70309 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,690,861
RAC: 10,559
Message 70310 - Posted: 4 Feb 2024, 9:48:53 UTC

A sidebar on all this. When a task finishes, it first reports a final trickle, and (a few seconds later) starts to upload the final data file. Here's the timing on one of my machines:

04/02/2024 09:27:28 | climateprediction.net | Sending scheduler request: To send trickle-up message.
04/02/2024 09:27:29 | climateprediction.net | Project requested delay of 3636 seconds
04/02/2024 09:29:52 | climateprediction.net | Computation for task wah2_nz25_n2fo_200705_25_1005_012257314_2 finished
04/02/2024 09:29:58 | climateprediction.net | Finished upload of wah2_nz25_n2fo_200705_25_1005_012257314_2_r1186546406_out.zip (1265821 bytes)
If you're using the project setting for the maximum number of tasks in progress, and you have that number running, you can't have a spare task ready to start immediately - that machine will have to wait for about 58 minutes before reporting/fetching.

I'm thinking of trying a project setting of n+1 tasks, and using other tools like 'no new tasks' to control the actual resource use.
ID: 70310 · Report as offensive     Reply Quote

Message boards : Number crunching : Trickles stop new work arriving

©2024 cpdn.org