climateprediction.net (CPDN) home page
Thread 'OpenIFS Discussion'

Thread 'OpenIFS Discussion'

Message boards : Number crunching : OpenIFS Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 29 · 30 · 31 · 32

AuthorMessage
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 71254 - Posted: 15 Aug 2024, 9:55:31 UTC - in response to Message 71244.  

If the project sets an accurate rsc_memory_bound, then we get the old problem of 8/16GB hosts on old client version running too many tasks. If the project continue to set the inflated rsc_memory_bound
Hang on. CPDN 'continue to set inflated rsc_memory_bound'? We don't set an inflated memory bound -- what makes you say that? I know this because I set them, with values worked out from running tests plus allowing overhead for different configurations. The value set has always been accurate, and that was the problem.

We have yet to test this new fix. Surprisingly it's been released before any testing has been done, so we don't even know if it'll cure it.
---
CPDN Visiting Scientist
ID: 71254 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,706,621
RAC: 9,524
Message 71256 - Posted: 15 Aug 2024, 10:25:20 UTC - in response to Message 71254.  

When the fix was first written, it would have worked OK for CPDN without any changes to the servers. But it was the users of other projects that complained, because there are projects around that do use an inflated rsc_memory_bound, so a test had to be added to the server-side code to work round that. But we do want to know that the original fix works properly, before CPDN go through the hassle of updating the servers.

Memo to self: that probably invalidates my comment that Gianfranco's v8.0.4 will be useful for testing. But I still have a pre-release client build based on the original fix, without the server-side additions - so that can be used for testing with no server-side alterations..
ID: 71256 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 71257 - Posted: 15 Aug 2024, 10:44:00 UTC - in response to Message 71256.  

Good point. It's other projects that aren't accurate with their memory bounds, would be nice if they fix theirs. And great David A did actually (finally) listen to CPDN and add in a fix. I am hopeful this removes the obstacle to CPDN getting the higher resolution, multicore OpenIFS apps out for testing.
ID: 71257 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 127
Credit: 41,749,041
RAC: 63,360
Message 71264 - Posted: 15 Aug 2024, 15:48:13 UTC - in response to Message 71254.  
Last modified: 15 Aug 2024, 15:53:03 UTC

CPDN 'continue to set inflated rsc_memory_bound'? We don't set an inflated memory bound

Sorry I should have said openIFS, not CPDN in general. IIRC, the rsc_memory_bound was set at 8G, though in reality, its RSS never exceeds 6GB. I happily crunched a whole lot of WUs based on 5.5G per WU allocation. I remember that was done on purpose to prevent 8GB hosts from getting the task or 16GB hosts from running more than two (or three?). I've seen the 8GB value in client_state.xml during one of the batches. Perhaps that's no longer the case, but I don't have any WU to verify now.
ID: 71264 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 71266 - Posted: 15 Aug 2024, 18:16:52 UTC - in response to Message 71264.  

Ah yes, you're right, I'd forgotten. We did nudge the memory_bound up to avoid machines with 8Gb downloading the tasks, as they would regularly crash the task.

The model high water memory depends on the output requested by the scientist. One of the memory peaks is when the model has to gather all the data together to form the output. Rather than have a config for each output configuration, I've just set it to the highest + 10%.
---
CPDN Visiting Scientist
ID: 71266 · Report as offensive     Reply Quote
Previous · 1 . . . 29 · 30 · 31 · 32

Message boards : Number crunching : OpenIFS Discussion

©2024 cpdn.org