Message boards : Number crunching : OpenIFS tasks : make sure boinc client option 'Leave non-GPU tasks in memory' is selected!
Message board moderation
Author | Message |
---|---|
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,578,380 RAC: 15,009 |
Higher fail rate => Leave non-GPU tasks in memory while suspended must be on! If your tasks are failing with disk exceeded or filesystem full, please check boincmgr -> Computing Preferences -> Disk & memory. We're seeing a much higher fail rate with the latest oifs43r3_ps batches caused by disk exceeded type errors. Previously the disk bound was set way too high and was corrected for the latest 39,000 task batches. I suspect these fails are happening because 'Leave non-GPU tasks in memory while suspended' is NOT checked/ticked. This must be checked for the checkpointing/restarts to work properly. When the model does a restart (say after a power-off/on), it will leave it's latest checkpoint/restart files as backup. The problem is that if the model is frequently swapped in & out of memory it will cause the model to restart frequently and these relatively large files will slowly build up breaking the revised lower disk bound. I will alter the model's behaviour so all old restart files are deleted once used but I can't do it for these current batches. Would be nice if there was a way of the task itself setting this boinc option.. (wishful thinking) |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
This advice has been given out repeatedly over the years with respect to the Hadley models. I would expect most of the users who frequent the forums regularly to be doing this already but it is always worth repeating for those who might have missed it before. |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,925,468 RAC: 12,903 |
Would be nice if there was a way of the task itself setting this boinc option.. (wishful thinking) That option is in global_prefs_override.xml file in /etc/bonc-client/ folder, entry <leave_apps_in_memory>1</leave_apps_in_memory>, 0 or 1 being the values. Would it be possible to add code to change that file to make sure that line is present and the value is 1? https://boinc.berkeley.edu/wiki/PreferencesXml |
Send message Joined: 9 Dec 05 Posts: 116 Credit: 12,547,934 RAC: 2,738 |
Modifying the file might be possible but you would need to restart Boinc for it to take effect. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Modifying the file might be possible but you would need to restart Boinc for it to take effect.Read local prefs file? |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,925,468 RAC: 12,903 |
Modifying the file might be possible but you would need to restart Boinc for it to take effect. Yes, the command would be boinccmd --read_global_prefs_override for a standard installation. |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 42,540,021 RAC: 76,179 |
leave_apps_in_memory controls the suspension behavior. IMO, we should also try to avoid suspension as much as possible at first place. Sometimes keeping the application in memory is just not an option if you actually need the memory. Otherwise, either you have a giant swap and start swapping for seconds to minutes, or your swap is too small and the system run out of memory. To avoid suspension as much as possible, I also do these. Obvious ones: <run_on_batteries>1</run_on_batteries> <run_if_user_active>1</run_if_user_active> Not so obvious ones: Set the same ram_max_used_busy_pct and ram_max_used_idle_pct, so BOINC won't suspend task just because someone logs in or inputs are generated. A long cpu_scheduling_period_minutes if you run multiple projects. On multi-core systems, so long as this is large enough that a few WUs finish in time, I find BOINC client just opportunistically switches to other projects for new tasks instead of suspending running tasks. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
I have put the 128GB ssd system drive from my now dead laptop into my desktop and added it as swap giving me Swap: 159122424 0 159122424when i run top. Currently as you can see, none of it being used. This might change once the higher resolution models come on stream or if I ever get a faster connection and start running more tasks at once. I really should go up to 64GB to give me a bit more headpace. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,578,380 RAC: 15,009 |
Would be nice if there was a way of the task itself setting this boinc option.. (wishful thinking)That option is in global_prefs_override.xml file in /etc/bonc-client/ folder, entry <leave_apps_in_memory>1</leave_apps_in_memory>, 0 or 1 being the values. Would it be possible to add code to change that file to make sure that line is present and the value is 1? That's not quite what I meant. The task should be able to set this for itself, or, at least be able to request it from the client. The task does not have access to the client config files. The task knows how it needs to run on volunteer machines. The volunteer knows best how they want their computer to be used. Unfortunately the client does not return to the server whether this option is on or off, as far as I know. If it did, I would disable sending openifs tasks to those machines, then we'd see alot less fails. This is one of the things that bugs me about boinc, the user is supposed to find out information for themselves which projects need which settings. As a volunteer from the SETI days I really don't want to waste time on forums/website trying to figure out how to get tasks to run. As a boinc app developer, I want to be able to tell the client, "here, these settings are what you need to make it run well, please enable them or alert the user they need to turn them on". Actually, maybe CPDN can add send a Notice to boinc clients this option needs to be on. I'll bring it up next time I talk to CPDN. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Actually, maybe CPDN can add send a Notice to boinc clients this option needs to be on. I'll bring it up next time I talk to CPDN.To me, that seems a good way to go. I think the boinc devs would say that dictating such options to the user would go against their philosophy however much you or I might think it a good idea. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,743,089 RAC: 6,177 |
Unfortunately, I don't think it's as easy as that. The BOINC developers tried to make it as general as possible, considering both the huge range of potential scientific projects out there, and the even huger range of individual volunteers, with wildly varying motivations and hardware. Inevitably, some combinations are bordering on pointless - but trying to rule them out in all cases would be impossible. For instance, I remember a small startup company asking for help. It sounded like they had a small Windows workgroup, possibly with a server, and their company business required powerful machines with GPUs. And they wanted to put them to beneficial use when available. But it appeared to me that they had BOINC set up not to run when the machines were in use (because they needed the hardware they'd bought), and during the night the machines - although powered up - were logged off (for security). Both conditions precluded the use of GPUs. So they were getting a tiny amount of BOINC work done, and missing deadlines galore. How would we tweak BOINC to avoid that? Remember, most of the user preferences are set via project websites: local over-rides will predominantly be used by the tiny minority of inquisitive volunteers who frequent message boards like this, or their equivalents like native language team websites. So the most productive single change for CPDN might be to flip the default setting for 'leave applications in memory' to ON, and run a script to change all current settings in the database similarly. It won't be a simple query, because these things are stored in XML blobs, but it could be done. It would be courteous to send a notice to let people know what you'd done - some people might have good reasons for choosing the other path. Who knows - you might get some practical use out of Science United at last! |
Send message Joined: 4 Dec 15 Posts: 52 Credit: 2,494,417 RAC: 1,689 |
To leave apps in memory or not might be a question of how much RAM the machine in question has and, naturally, what projects people run. If I take a look at myself the setting's not too nice, as I run lots of projects at the same time and let the Boinc clients on my machines choose what they get. This makes for lots of tasks hanging in limbo while others get a higher preference by the manager, and those tasks in limbo all need memory with the setting in question set to on. Especially when Virtualbox tasks are added to the mix there might be huge amounts of memory which get blocked and can't be used for other projects or by the system itself. I usually don't leave apps in memory because of that, and I really prefer tasks that are able to resume without problems, but I'll just try for now to keep the tasks in memory. - - - - - - - - - - Greetings, Jens |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
So the most productive single change for CPDN might be to flip the default setting for 'leave applications in memory' to ON, The trouble is that in my boincmgr, I can set this option to ON (and I have done so), but it applies to all boinc projects, and most do not need it. It would be nice if this could be done on a per-project basis. I have never figured out how they can make that option actually work. Imagine I have CPDN running at a very low priority (I run it at my maximum priority, but lets us assume the contrary) and I am running a lot of CPDN tasks. So a bunch of high priority tasks come along and want to run. The CPDN tasks would go to sleep and could normally get paged out. Now they could not, and I might not have enough RAM to run the new tasks. Or worse, to run some non boinc tasks. They might not have enough swap space on my SSD drive partition that i used for swap. This has never been a problem for me, so it is not one I need solved. But I sure would like to know how this is supposed to work. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,578,380 RAC: 15,009 |
Remember, most of the user preferences are set via project websites: local over-rides will predominantly be used by the tiny minority of inquisitive volunteers who frequent message boards like this, or their equivalents like native language team websites. So the most productive single change for CPDN might be to flip the default setting for 'leave applications in memory' to ON, and run a script to change all current settings in the database similarly. It won't be a simple query, because these things are stored in XML blobs, but it could be done.Yes, good point. I will bring this up with them (could be they have already done it). Actually this is another niggle of mine. boincmgr always seems to show the settings from the last project website I changed the settings on, irrespective of what projects I have connected to. Instead it would be nice if the settings were per project. Then CPDN could have 'leave in memory..' and other projects could do whatever? I may have this wrong - am still greenish on how boinc likes to work. Quick comment on your other point regarding the Windows server group - that should come down to monitoring at the server side and blacklisting iffy volunteers. |
Send message Joined: 4 Dec 15 Posts: 52 Credit: 2,494,417 RAC: 1,689 |
boincmgr always seems to show the settings from the last project website I changed the settings on, irrespective of what projects I have connected to. Instead it would be nice if the settings were per project. It would be nice, but this setting is global and gets propagated across every server and, if you're using one, your project manager. When you change your settings anywhere they get updated everywhere this way, so you can't do inconsistent things that might break something. - - - - - - - - - - Greetings, Jens |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,743,089 RAC: 6,177 |
It would be nice, but this setting is global and gets propagated across every server and, if you're using one, your project manager.That's deliberate, and by design. It's one reason why it's stored in an XML blob, rather than proper database fields - the servers can play 'pass the parcel', without needing to understand the contents. We could ask then to add an over-ride in app_config.xml? |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,578,380 RAC: 15,009 |
That's exactly the problem, by changing one project's settings it might break the settings that CPDN needs for tasks to be successful. It seems to assume what's right for one project is right for all, which is a reasonable starting point but not generally the case. There must be, or needs to be an override mechanism then on a per project basis for these project specific client settings? (app_config.xml springs to mind but that's rather finer control, we only need a project wide setting for the client not at the app level). I would also rather this was something sent by the server as a hint to the client (i.e. 'do it if possible'), rather than make the user create yet another XML file.boincmgr always seems to show the settings from the last project website I changed the settings on, irrespective of what projects I have connected to. Instead it would be nice if the settings were per project.It would be nice, but this setting is global and gets propagated across every server and, if you're using one, your project manager. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
There must be, or needs to be an override mechanism then on a per project basis for these project specific client settings? (app_config.xml springs to mind but that's rather finer control, we only need a project wide setting for the client not at the app level). Notice that, while the file is called app_config.xml, when it is in a project directory, it can apply to either the entire project, or just individual application types. This one says a maximum of six climate prediction tasks may be run at a time, but if they are one of the three oifs types listed, only one of those at a time may run. (These are not the normal values I have been using, but I am fooling around until I can upload some of the already completed ones. It turns out I cannot download any more because too many are waiting to upload.) [/var/lib/boinc/projects/climateprediction.net]$ cat app_config.xml <app_config> <project_max_concurrent>6</project_max_concurrent> <app> <name>oifs_43r3_bl</name> <max_concurrent>1</max_concurrent> </app> <app> <name>oifs_43r3_ps</name> <max_concurrent>1</max_concurrent> </app> <app> <name>oifs_43r3</name> <max_concurrent>1</max_concurrent> </app> </app_config> |
Send message Joined: 4 Dec 15 Posts: 52 Credit: 2,494,417 RAC: 1,689 |
That's exactly the problem, by changing one project's settings it might break the settings that CPDN needs for tasks to be successful. It seems to assume what's right for one project is right for all, which is a reasonable starting point but not generally the case. This is not about the projects. This is about the use of one's machines and things like availability of RAM. Work might get done faster if you let things in memory, and certainly it helps if apps don't handle checkpoints well or don't have any. But the projects should work on their apps to run stable even if not kept in memory, because they want to get their work done. The crunchers just rent out their resources in whatever way they like, and if projects don't run stable they might just move on to other projects. - - - - - - - - - - Greetings, Jens |
Send message Joined: 4 Dec 15 Posts: 52 Credit: 2,494,417 RAC: 1,689 |
We could ask then to add an over-ride in app_config.xml? That sounds like a good idea! People playing around with app_config tend to know what they're doing, so this shouldn't impact casual crunchers who just install Boinc and add some projects because they sound interesting. - - - - - - - - - - Greetings, Jens |
©2024 cpdn.org