How to Prevent OpenIFS Download

Author	Message
Brummig Send message Joined: 3 Nov 05 Posts: 26 Credit: 687,388 RAC: 529	Message 67987 - Posted: 23 Jan 2023, 14:56:22 UTC Is it possible to stop my host from downloading OpenIFS tasks, other than by setting No New Tasks for CPDN? The virtual memory use (disk thrashing) brings my host almost to a standstill, and even if I let it run the trickles don't upload. ID: 67987 · Reply Quote

Glenn Carver Send message Joined: 29 Oct 17 Posts: 1048 Credit: 16,404,330 RAC: 16,403	Message 67990 - Posted: 23 Jan 2023, 15:33:03 UTC - in response to Message 67987. Last modified: 23 Jan 2023, 15:37:50 UTC If you are seeing disk thrashing because the machine is swapping, the machine is running too many OpenIFS tasks. Unfortunately, there's an issue with the boinc client that it will start up as many tasks as free cores available to boinc. It does not respect the memory limit of the task, leaving it to volunteers like yourself to fix it. The problem with the client was unexpected and we're looking into workarounds we can put in place on the server to deal with this. You can either adjust your percentage CPUs in boincmgr to reduce the available cpus, or, use an app_config.xml file in the project directory (example below). We've since found that LHC also hit this problem so we'll probably follow their approach to limiting tasks downloaded to machines. An app config is a nice way of controlling exactly how many tasks the client is allowed to run at any time (irrespective of how many tasks are downloaded). The file is specific to a project and should be placed in the /var/lib/boinc/projects/climateprediction.net directory (or wherever your boinc software is installed). Mine looks like this. I set a max of 6 tasks in total across all CPDN apps, and for each OpenIFS app variant, no more than 6 tasks at a time. Each task takes ~5Gb memory so make sure you have enough free RAM and adjust the values below to fit. <app_config> <project_max_concurrent>6</project_max_concurrent> <report_results_immediately/> <app> <name>oifs_43r3</name> <max_concurrent>6</max_concurrent> </app> <app> <name>oifs_43r3_ps</name> <max_concurrent>6</max_concurrent> </app> <app> <name>oifs_43r3_bl</name> <max_concurrent>6</max_concurrent> </app> </app_config> Hope that helps. There's plenty of info about app_config.xml files online if you want to find out more, or ask on the forums as plenty of people know about them. Edit: sorry forgot to answer your question about stopping downloads. You can pause the project and that will stop. However, CPDN have paused the batch server so noone will be getting any more tasks for the time being as there's a backlog of data to be dealt with. Is it possible to stop my host from downloading OpenIFS tasks, other than by setting No New Tasks for CPDN? The virtual memory use (disk thrashing) brings my host almost to a standstill, and even if I let it run the trickles don't upload. ID: 67990 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4535 Credit: 18,966,742 RAC: 21,869	Message 67992 - Posted: 23 Jan 2023, 15:39:10 UTC - in response to Message 67987. Short answer, No. CPDN used like some other projects to allow one to choose which model types you run on a particular host the website but no longer does. You are right at the minimum RAM for these tasks. If you are running more than two at once you will be using swap a lot. I would suggest lowering the number of cores in use to one if you have anything above minimal non-boinc usage. The, "Up[loads are stuck" thread contains details of the saga of lack of disk space/transfer speed to backup storage/tape drive failures. Also OIFS is likely to be the majority of tasks for the short and medium term with others only appearing very occasionally. ID: 67992 · Reply Quote

Richard Haselgrove Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,690,033 RAC: 10,812	Message 67993 - Posted: 23 Jan 2023, 15:39:40 UTC - in response to Message 67990. Hope that helps. There's plenty of info about app_config.xml files online if you want to find out more, or ask on the forums as plenty of people know about them. The official manual is on the BOINC website, at https://boinc.berkeley.edu/wiki/Client_configuration#Project-level_configuration ID: 67993 · Reply Quote

computezrmle Send message Joined: 9 Mar 22 Posts: 30 Credit: 1,065,239 RAC: 556	Message 67994 - Posted: 23 Jan 2023, 15:56:01 UTC - in response to Message 67987. You may also need to upgrade your BOINC client to at least 7.20.x since older versions suffer from a bug related to the 'max_concurrent' options. See: https://github.com/BOINC/boinc/pull/4592 ID: 67994 · Reply Quote

Brummig Send message Joined: 3 Nov 05 Posts: 26 Credit: 687,388 RAC: 529	Message 67996 - Posted: 23 Jan 2023, 16:34:52 UTC Thank you for your quick replies. Dave Jackson spotted the problem. At one point, whilst I was using the computer, BOINC quietly downloaded and ran four OpenIFS tasks simultaneously, plus the two hadam4 tasks it was already running, which was hilarious. At the moment the host is still crunching on one hadam4 (along with other, much less memory-intensive non CPDN tasks), so one OpenIFS is way too many. Is it valid to set <max_concurrent> for each of the OpenIFS apps to 0 (I can set it to 1 and try again once the hadam4 has completed)? The official manual doesn't say. ID: 67996 · Reply Quote

SolarSyonyk Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463	Message 67997 - Posted: 23 Jan 2023, 16:35:52 UTC Huh. I was wondering why that didn't seem to be working right. Looks like 7.16 is what's in the 20.04 repos. I suppose I should upgrade my boxes, may as well, not like they're doing much work... ID: 67997 · Reply Quote

PDW Send message Joined: 29 Nov 17 Posts: 82 Credit: 14,306,974 RAC: 92,121	Message 67999 - Posted: 23 Jan 2023, 16:44:38 UTC - in response to Message 67996. Is it valid to set <max_concurrent> for each of the OpenIFS apps to 0 (I can set it to 1 and try again once the hadam4 has completed)? The official manual doesn't say. 0 is used to indicate no limit, so will try and run as many as <project_max_concurrent> allows or as many as the client thinks it can run if that isn't set (or is also set to 0). ID: 67999 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4535 Credit: 18,966,742 RAC: 21,869	Message 68000 - Posted: 23 Jan 2023, 16:46:41 UTC Last modified: 23 Jan 2023, 16:50:55 UTC OpenIFS apps to 0 (I can set it to 1 and try again once the hadam4 has completed)? The official manual doesn't say. Richard probably knows. There are a number of things where BOINC treats "0" as meaning no restriction which would make my choice setting it to 1 in your situation. Edit: The server uses -1 to indicate a blacklisted computer that will not get any tasks. (CPDN used to use this to stop machines without the 32bit libraries which crashed everything getting work but hasn't recently.) So that may work to indicate not running any tasks of a particular type. ID: 68000 · Reply Quote

Brummig Send message Joined: 3 Nov 05 Posts: 26 Credit: 687,388 RAC: 529	Message 68002 - Posted: 23 Jan 2023, 16:55:01 UTC - in response to Message 68000. Last modified: 23 Jan 2023, 16:56:36 UTC OK, looks like I've no choice but to opt for No New Tasks (Edit: Or try -1 :)). And yes, it looks like the repo for Ubuntu 20.04 LTS needs updating. ID: 68002 · Reply Quote

Dave Jackson Volunteer moderator Send message Joined: 15 May 09 Posts: 4535 Credit: 18,966,742 RAC: 21,869	Message 68005 - Posted: 23 Jan 2023, 18:36:54 UTC - in response to Message 68002. And yes, it looks like the repo for Ubuntu 20.04 LTS needs updating. Richard has recently posted instructions for using Gianfranco's repository which while not official is in general pretty reliable. It is a much simpler option than compiling your own which is what I do. ID: 68005 · Reply Quote

Richard Haselgrove Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,690,033 RAC: 10,812	Message 68007 - Posted: 23 Jan 2023, 18:55:10 UTC - in response to Message 68005. Richard has recently posted instructions... That was message 67761, and the person I was advising seemed happy with the instructions on the page I suggested. ID: 68007 · Reply Quote

SolarSyonyk Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463	Message 68008 - Posted: 23 Jan 2023, 19:09:43 UTC - in response to Message 68007. Richard has recently posted instructions... That was message 67761, and the person I was advising seemed happy with the instructions on the page I suggested. Oh, great! Yeah, that's easy enough to toss in. I should probably update them to 22.04 anyway, though. Now's as good a time as any, they're just chewing on WCG tasks when they get any. ID: 68008 · Reply Quote

xii5ku Send message Joined: 27 Mar 21 Posts: 79 Credit: 78,302,757 RAC: 1,077	Message 68009 - Posted: 23 Jan 2023, 20:54:09 UTC - in response to Message 67990. Last modified: 23 Jan 2023, 21:07:21 UTC Glenn Carver wrote: [...] there's an issue with the boinc client that it will start up as many tasks as free cores available to boinc. It does not respect the memory limit of the task, leaving it to volunteers like yourself to fix it. The problem with the client was unexpected and we're looking into workarounds we can put in place on the server to deal with this. [...] We've since found that LHC also hit this problem so we'll probably follow their approach to limiting tasks downloaded to machines. I haven't been at lhc@home for a while, so don't know what their approach looks like. But a limit on tasks in progress is not a good replacement for the desired limit on tasks which are executing. Stages of a "task in progress": (ready to send) – assigned to host – downloading – ready to run – executing – uploading – ready to report (reported) Each of the stages can take unpredictably long for a variety of reasons. Hence it's clear that # in progress cannot control # executing very well, to put it mildly. Also, oifs_43r3_ps concurrency is only part of the equation. The other part is what else is going on on the host. It is a big difference if the host is running a desktop environment or is a dedicated cruncher. ID: 68009 · Reply Quote

Glenn Carver Send message Joined: 29 Oct 17 Posts: 1048 Credit: 16,404,330 RAC: 16,403	Message 68046 - Posted: 25 Jan 2023, 14:52:27 UTC - in response to Message 68009. Unfortunately we have to work with what we can. It's a workaround to have these controls in place, but at least there is something we can do. It's also clear if we 'do nothing' we end up with chaos on volunteer machines who do not (and why should they?) have app_config files in place. Even then I see people getting it wrong and trying to over-subscribe memory. The deficiency is in the boinc_client. No criticism of the client code, it was probably never designed for the kinds of tasks we need to run. Even if it gets addressed it would take time to roll that new client out. OpenIFS, like most computational fluid dynamics codes, is memory-bandwidth limited (less-so single core speed). Starting as many tasks as available cores is not the way to maximise production of credit with OIFS tasks. Glenn Carver wrote: [...] there's an issue with the boinc client that it will start up as many tasks as free cores available to boinc. It does not respect the memory limit of the task, leaving it to volunteers like yourself to fix it. The problem with the client was unexpected and we're looking into workarounds we can put in place on the server to deal with this. [...] We've since found that LHC also hit this problem so we'll probably follow their approach to limiting tasks downloaded to machines. I haven't been at lhc@home for a while, so don't know what their approach looks like. But a limit on tasks in progress is not a good replacement for the desired limit on tasks which are executing. Stages of a "task in progress": (ready to send) – assigned to host – downloading – ready to run – executing – uploading – ready to report (reported) Each of the stages can take unpredictably long for a variety of reasons. Hence it's clear that # in progress cannot control # executing very well, to put it mildly. Also, oifs_43r3_ps concurrency is only part of the equation. The other part is what else is going on on the host. It is a big difference if the host is running a desktop environment or is a dedicated cruncher. ID: 68046 · Reply Quote

Jean-David Beyer Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154	Message 68047 - Posted: 25 Jan 2023, 17:10:14 UTC - in response to Message 67990. Why do you have the <report_results_immediately/> line in there? ID: 68047 · Reply Quote

SolarSyonyk Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463	Message 68053 - Posted: 25 Jan 2023, 23:05:53 UTC - in response to Message 68046. Even then I see people getting it wrong and trying to over-subscribe memory. Yeah... I gave a couple OOM reapers some exercise early on. My guideline is simple: If I have 5GB per task, it's fine. 4GB per task is not sufficient, even with a lot of swap. ID: 68053 · Reply Quote

AndreyOR Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,780,446 RAC: 19,423	Message 68054 - Posted: 26 Jan 2023, 7:09:46 UTC - in response to Message 68053. Last modified: 26 Jan 2023, 7:12:21 UTC Even then I see people getting it wrong and trying to over-subscribe memory. Yeah... I gave a couple OOM reapers some exercise early on. My guideline is simple: If I have 5GB per task, it's fine. 4GB per task is not sufficient, even with a lot of swap. I agree with the 5GB guideline. I'd add that ~10GB RAM should be left for overhead. I'd argue that the following is an excellent starting point and suspect that most users may not be able to do more without going over the desired less than 5% failure rate. Assuming the PC has enough cores/threads, isn't used heavily (especially RAM) for other things, and BOINC is allowed to use all of the system RAM, the following maximum number of concurrent tasks per amount of RAM should be ran (applies only to current OIFS tasks): 8GB RAM system - 0 tasks 16GB RAM - 1 task 32GB RAM - 4 tasks 64GB RAM - 10 tasks 128GB RAM - 23 tasks 256GB RAM - 49 tasks 512GB RAM - 100 tasks I have no experience with really high RAM systems but would try the same principle and adjust to stay under the 5% failure rate. ID: 68054 · Reply Quote

Vato Send message Joined: 4 Oct 19 Posts: 15 Credit: 9,174,915 RAC: 3,722	Message 68068 - Posted: 27 Jan 2023, 0:18:28 UTC - in response to Message 68054. i have a 8GB machine that runs a single task at a time with 100% success rate. i have a 16GB machines that runs 2 tasks concurrently with 100% success rate. i believe any issue is not about absolute memory available, but how those machines run. ID: 68068 · Reply Quote

Jean-David Beyer Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154	Message 68069 - Posted: 27 Jan 2023, 1:36:01 UTC - in response to Message 68068. i have a 8GB machine that runs a single task at a time with 100% success rate. i have a 16GB machines that runs 2 tasks concurrently with 100% success rate. i believe any issue is not about absolute memory available, but how those machines run. I have a 64 GB machine with 16 cores Intel processor. I currently have 12 cores allocted to Boinc. I allow CPDN to run a maximum of 6 processes, but I limit oifs_43r3_bl tasks to only one at a time oifs_43r3_ps tasks to only five at a time oifs_43r3 tasks to only fice at a time I run five other projects: WCG (4), Einstein (1), Rosetta (3), MilkyWay (2), and Universe (2). The numbers in parenthesis are the maximum number of those I allow to run at a time (if they are all supplying work). These almost always run with 100% success rate. The Oifs tasks have never failed me. Once in a while the legacy CPDN tasks fail, but usually with problems like negative theta and such. I notice my machine often runs successfully on tasks that have several failures before they get assigned to me. For those who care, my machine is ID: 1511241 ID: 68069 · Reply Quote