climateprediction.net (CPDN) home page
Thread 'OpenIFS tasks : make sure boinc client option 'Leave non-GPU tasks in memory' is selected!'

Thread 'OpenIFS tasks : make sure boinc client option 'Leave non-GPU tasks in memory' is selected!'

Message boards : Number crunching : OpenIFS tasks : make sure boinc client option 'Leave non-GPU tasks in memory' is selected!
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1051
Credit: 16,649,638
RAC: 12,396
Message 67146 - Posted: 30 Dec 2022, 14:32:24 UTC

Higher fail rate => Leave non-GPU tasks in memory while suspended must be on!

If your tasks are failing with disk exceeded or filesystem full, please check boincmgr -> Computing Preferences -> Disk & memory.

We're seeing a much higher fail rate with the latest oifs43r3_ps batches caused by disk exceeded type errors. Previously the disk bound was set way too high and was corrected for the latest 39,000 task batches. I suspect these fails are happening because 'Leave non-GPU tasks in memory while suspended' is NOT checked/ticked. This must be checked for the checkpointing/restarts to work properly.

When the model does a restart (say after a power-off/on), it will leave it's latest checkpoint/restart files as backup. The problem is that if the model is frequently swapped in & out of memory it will cause the model to restart frequently and these relatively large files will slowly build up breaking the revised lower disk bound.

I will alter the model's behaviour so all old restart files are deleted once used but I can't do it for these current batches.

Would be nice if there was a way of the task itself setting this boinc option.. (wishful thinking)
ID: 67146 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 67148 - Posted: 30 Dec 2022, 16:10:02 UTC

This advice has been given out repeatedly over the years with respect to the Hadley models. I would expect most of the users who frequent the forums regularly to be doing this already but it is always worth repeating for those who might have missed it before.
ID: 67148 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 318
Credit: 14,986,850
RAC: 9,927
Message 67157 - Posted: 30 Dec 2022, 21:45:19 UTC - in response to Message 67146.  

Would be nice if there was a way of the task itself setting this boinc option.. (wishful thinking)


That option is in global_prefs_override.xml file in /etc/bonc-client/ folder, entry <leave_apps_in_memory>1</leave_apps_in_memory>, 0 or 1 being the values. Would it be possible to add code to change that file to make sure that line is present and the value is 1?

https://boinc.berkeley.edu/wiki/PreferencesXml
ID: 67157 · Report as offensive     Reply Quote
Harri Liljeroos

Send message
Joined: 9 Dec 05
Posts: 116
Credit: 12,547,934
RAC: 2,738
Message 67160 - Posted: 30 Dec 2022, 22:38:30 UTC - in response to Message 67157.  

Modifying the file might be possible but you would need to restart Boinc for it to take effect.
ID: 67160 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 67162 - Posted: 31 Dec 2022, 6:17:50 UTC - in response to Message 67160.  

Modifying the file might be possible but you would need to restart Boinc for it to take effect.
Read local prefs file?
ID: 67162 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 318
Credit: 14,986,850
RAC: 9,927
Message 67163 - Posted: 31 Dec 2022, 6:59:26 UTC - in response to Message 67162.  

Modifying the file might be possible but you would need to restart Boinc for it to take effect.

Read local prefs file?

Yes, the command would be boinccmd --read_global_prefs_override for a standard installation.
ID: 67163 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 127
Credit: 43,158,730
RAC: 75,157
Message 67166 - Posted: 31 Dec 2022, 10:16:55 UTC
Last modified: 31 Dec 2022, 10:18:29 UTC

leave_apps_in_memory controls the suspension behavior. IMO, we should also try to avoid suspension as much as possible at first place. Sometimes keeping the application in memory is just not an option if you actually need the memory. Otherwise, either you have a giant swap and start swapping for seconds to minutes, or your swap is too small and the system run out of memory. To avoid suspension as much as possible, I also do these.

Obvious ones:
<run_on_batteries>1</run_on_batteries>
<run_if_user_active>1</run_if_user_active>


Not so obvious ones:
Set the same ram_max_used_busy_pct and ram_max_used_idle_pct, so BOINC won't suspend task just because someone logs in or inputs are generated.
A long cpu_scheduling_period_minutes if you run multiple projects. On multi-core systems, so long as this is large enough that a few WUs finish in time, I find BOINC client just opportunistically switches to other projects for new tasks instead of suspending running tasks.
ID: 67166 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 67168 - Posted: 31 Dec 2022, 11:54:31 UTC - in response to Message 67166.  

I have put the 128GB ssd system drive from my now dead laptop into my desktop and added it as swap giving me
Swap:      159122424           0   159122424
when i run top. Currently as you can see, none of it being used. This might change once the higher resolution models come on stream or if I ever get a faster connection and start running more tasks at once. I really should go up to 64GB to give me a bit more headpace.
ID: 67168 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1051
Credit: 16,649,638
RAC: 12,396
Message 67183 - Posted: 1 Jan 2023, 12:21:36 UTC - in response to Message 67157.  

Would be nice if there was a way of the task itself setting this boinc option.. (wishful thinking)
That option is in global_prefs_override.xml file in /etc/bonc-client/ folder, entry <leave_apps_in_memory>1</leave_apps_in_memory>, 0 or 1 being the values. Would it be possible to add code to change that file to make sure that line is present and the value is 1?

That's not quite what I meant. The task should be able to set this for itself, or, at least be able to request it from the client. The task does not have access to the client config files. The task knows how it needs to run on volunteer machines. The volunteer knows best how they want their computer to be used. Unfortunately the client does not return to the server whether this option is on or off, as far as I know. If it did, I would disable sending openifs tasks to those machines, then we'd see alot less fails.

This is one of the things that bugs me about boinc, the user is supposed to find out information for themselves which projects need which settings. As a volunteer from the SETI days I really don't want to waste time on forums/website trying to figure out how to get tasks to run. As a boinc app developer, I want to be able to tell the client, "here, these settings are what you need to make it run well, please enable them or alert the user they need to turn them on".

Actually, maybe CPDN can add send a Notice to boinc clients this option needs to be on. I'll bring it up next time I talk to CPDN.
ID: 67183 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4541
Credit: 19,039,635
RAC: 18,944
Message 67184 - Posted: 1 Jan 2023, 12:28:18 UTC

Actually, maybe CPDN can add send a Notice to boinc clients this option needs to be on. I'll bring it up next time I talk to CPDN.
To me, that seems a good way to go. I think the boinc devs would say that dictating such options to the user would go against their philosophy however much you or I might think it a good idea.
ID: 67184 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,748,059
RAC: 5,647
Message 67186 - Posted: 1 Jan 2023, 13:28:35 UTC - in response to Message 67183.  

Unfortunately, I don't think it's as easy as that. The BOINC developers tried to make it as general as possible, considering both the huge range of potential scientific projects out there, and the even huger range of individual volunteers, with wildly varying motivations and hardware. Inevitably, some combinations are bordering on pointless - but trying to rule them out in all cases would be impossible.

For instance, I remember a small startup company asking for help. It sounded like they had a small Windows workgroup, possibly with a server, and their company business required powerful machines with GPUs. And they wanted to put them to beneficial use when available. But it appeared to me that they had BOINC set up not to run when the machines were in use (because they needed the hardware they'd bought), and during the night the machines - although powered up - were logged off (for security). Both conditions precluded the use of GPUs. So they were getting a tiny amount of BOINC work done, and missing deadlines galore. How would we tweak BOINC to avoid that?

Remember, most of the user preferences are set via project websites: local over-rides will predominantly be used by the tiny minority of inquisitive volunteers who frequent message boards like this, or their equivalents like native language team websites. So the most productive single change for CPDN might be to flip the default setting for 'leave applications in memory' to ON, and run a script to change all current settings in the database similarly. It won't be a simple query, because these things are stored in XML blobs, but it could be done.

It would be courteous to send a notice to let people know what you'd done - some people might have good reasons for choosing the other path.

Who knows - you might get some practical use out of Science United at last!
ID: 67186 · Report as offensive     Reply Quote
gemini8

Send message
Joined: 4 Dec 15
Posts: 52
Credit: 2,502,765
RAC: 1,434
Message 67189 - Posted: 1 Jan 2023, 18:15:02 UTC

To leave apps in memory or not might be a question of how much RAM the machine in question has and, naturally, what projects people run.
If I take a look at myself the setting's not too nice, as I run lots of projects at the same time and let the Boinc clients on my machines choose what they get.
This makes for lots of tasks hanging in limbo while others get a higher preference by the manager, and those tasks in limbo all need memory with the setting in question set to on.
Especially when Virtualbox tasks are added to the mix there might be huge amounts of memory which get blocked and can't be used for other projects or by the system itself.
I usually don't leave apps in memory because of that, and I really prefer tasks that are able to resume without problems, but I'll just try for now to keep the tasks in memory.
- - - - - - - - - -
Greetings, Jens
ID: 67189 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 67191 - Posted: 1 Jan 2023, 18:54:22 UTC - in response to Message 67186.  

So the most productive single change for CPDN might be to flip the default setting for 'leave applications in memory' to ON,


The trouble is that in my boincmgr, I can set this option to ON (and I have done so), but it applies to all boinc projects, and most do not need it. It would be nice if this could be done on a per-project basis.

I have never figured out how they can make that option actually work.

Imagine I have CPDN running at a very low priority (I run it at my maximum priority, but lets us assume the contrary) and I am running a lot of CPDN tasks. So a bunch of high priority tasks come along and want to run. The CPDN tasks would go to sleep and could normally get paged out. Now they could not, and I might not have enough RAM to run the new tasks. Or worse, to run some non boinc tasks. They might not have enough swap space on my SSD drive partition that i used for swap. This has never been a problem for me, so it is not one I need solved. But I sure would like to know how this is supposed to work.
ID: 67191 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1051
Credit: 16,649,638
RAC: 12,396
Message 67192 - Posted: 1 Jan 2023, 19:04:47 UTC - in response to Message 67186.  
Last modified: 1 Jan 2023, 19:06:18 UTC

Remember, most of the user preferences are set via project websites: local over-rides will predominantly be used by the tiny minority of inquisitive volunteers who frequent message boards like this, or their equivalents like native language team websites. So the most productive single change for CPDN might be to flip the default setting for 'leave applications in memory' to ON, and run a script to change all current settings in the database similarly. It won't be a simple query, because these things are stored in XML blobs, but it could be done.
Yes, good point. I will bring this up with them (could be they have already done it).

Actually this is another niggle of mine. boincmgr always seems to show the settings from the last project website I changed the settings on, irrespective of what projects I have connected to. Instead it would be nice if the settings were per project. Then CPDN could have 'leave in memory..' and other projects could do whatever? I may have this wrong - am still greenish on how boinc likes to work.

Quick comment on your other point regarding the Windows server group - that should come down to monitoring at the server side and blacklisting iffy volunteers.
ID: 67192 · Report as offensive     Reply Quote
gemini8

Send message
Joined: 4 Dec 15
Posts: 52
Credit: 2,502,765
RAC: 1,434
Message 67193 - Posted: 1 Jan 2023, 19:50:44 UTC - in response to Message 67192.  

boincmgr always seems to show the settings from the last project website I changed the settings on, irrespective of what projects I have connected to. Instead it would be nice if the settings were per project.

It would be nice, but this setting is global and gets propagated across every server and, if you're using one, your project manager.
When you change your settings anywhere they get updated everywhere this way, so you can't do inconsistent things that might break something.
- - - - - - - - - -
Greetings, Jens
ID: 67193 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,748,059
RAC: 5,647
Message 67194 - Posted: 1 Jan 2023, 22:05:01 UTC - in response to Message 67193.  

It would be nice, but this setting is global and gets propagated across every server and, if you're using one, your project manager.
When you change your settings anywhere they get updated everywhere this way, so you can't do inconsistent things that might break something.
That's deliberate, and by design. It's one reason why it's stored in an XML blob, rather than proper database fields - the servers can play 'pass the parcel', without needing to understand the contents.

We could ask then to add an over-ride in app_config.xml?
ID: 67194 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1051
Credit: 16,649,638
RAC: 12,396
Message 67195 - Posted: 2 Jan 2023, 3:55:44 UTC - in response to Message 67193.  

boincmgr always seems to show the settings from the last project website I changed the settings on, irrespective of what projects I have connected to. Instead it would be nice if the settings were per project.
It would be nice, but this setting is global and gets propagated across every server and, if you're using one, your project manager.
When you change your settings anywhere they get updated everywhere this way, so you can't do inconsistent things that might break something.
That's exactly the problem, by changing one project's settings it might break the settings that CPDN needs for tasks to be successful. It seems to assume what's right for one project is right for all, which is a reasonable starting point but not generally the case. There must be, or needs to be an override mechanism then on a per project basis for these project specific client settings? (app_config.xml springs to mind but that's rather finer control, we only need a project wide setting for the client not at the app level). I would also rather this was something sent by the server as a hint to the client (i.e. 'do it if possible'), rather than make the user create yet another XML file.
ID: 67195 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 67196 - Posted: 2 Jan 2023, 5:05:24 UTC - in response to Message 67195.  

There must be, or needs to be an override mechanism then on a per project basis for these project specific client settings? (app_config.xml springs to mind but that's rather finer control, we only need a project wide setting for the client not at the app level).


Notice that, while the file is called app_config.xml, when it is in a project directory, it can apply to either the entire project, or just individual application types.

This one says a maximum of six climate prediction tasks may be run at a time, but if they are one of the three oifs types listed, only one of those at a time may run. (These are not the normal values I have been using, but I am fooling around until I can upload some of the already completed ones. It turns out I cannot download any more because too many are waiting to upload.)

[/var/lib/boinc/projects/climateprediction.net]$ cat app_config.xml 
<app_config>
    <project_max_concurrent>6</project_max_concurrent>
    <app>
        <name>oifs_43r3_bl</name>
        <max_concurrent>1</max_concurrent>
        </app>
    <app>
        <name>oifs_43r3_ps</name>
        <max_concurrent>1</max_concurrent>
        </app>
    <app>
        <name>oifs_43r3</name>
        <max_concurrent>1</max_concurrent>
        </app>
</app_config>

ID: 67196 · Report as offensive     Reply Quote
gemini8

Send message
Joined: 4 Dec 15
Posts: 52
Credit: 2,502,765
RAC: 1,434
Message 67197 - Posted: 2 Jan 2023, 6:49:28 UTC - in response to Message 67195.  

That's exactly the problem, by changing one project's settings it might break the settings that CPDN needs for tasks to be successful. It seems to assume what's right for one project is right for all, which is a reasonable starting point but not generally the case.

This is not about the projects.
This is about the use of one's machines and things like availability of RAM.
Work might get done faster if you let things in memory, and certainly it helps if apps don't handle checkpoints well or don't have any.
But the projects should work on their apps to run stable even if not kept in memory, because they want to get their work done.
The crunchers just rent out their resources in whatever way they like, and if projects don't run stable they might just move on to other projects.
- - - - - - - - - -
Greetings, Jens
ID: 67197 · Report as offensive     Reply Quote
gemini8

Send message
Joined: 4 Dec 15
Posts: 52
Credit: 2,502,765
RAC: 1,434
Message 67198 - Posted: 2 Jan 2023, 6:52:08 UTC - in response to Message 67194.  

We could ask then to add an over-ride in app_config.xml?

That sounds like a good idea!
People playing around with app_config tend to know what they're doing, so this shouldn't impact casual crunchers who just install Boinc and add some projects because they sound interesting.
- - - - - - - - - -
Greetings, Jens
ID: 67198 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Number crunching : OpenIFS tasks : make sure boinc client option 'Leave non-GPU tasks in memory' is selected!

©2024 cpdn.org