climateprediction.net (CPDN) home page
Thread 'HOWTO: Use Ubuntu and LXD to help manage and isolate BOINC workloads.'

Thread 'HOWTO: Use Ubuntu and LXD to help manage and isolate BOINC workloads.'

Questions and Answers : Unix/Linux : HOWTO: Use Ubuntu and LXD to help manage and isolate BOINC workloads.
Message board moderation

To post messages, you must log in.

AuthorMessage
lazlo_vii

Send message
Joined: 11 Dec 19
Posts: 108
Credit: 3,012,142
RAC: 0
Message 63038 - Posted: 27 Nov 2020, 14:26:07 UTC
Last modified: 27 Nov 2020, 14:41:08 UTC

No matter what I want to tell myself, I am not a Computer Genius.

In the vast majority of cases letting the application and operating systems I run pick their own default behaviors almost always leads to more productivity and less heartache. In my humble opinion this is not the case for BOINC. The way I see it the BOINC time sharing scheme is deeply flawed if you want to contribute to more than one project at a time. Also, BOINC tasks are not portable between systems even if the systems all run the exact same CPU architecture.

In this guide I will show you how to use Ubuntu and the Linux Container Daemon to overcome these limitations. Using LXD I can have multiple containers running different BOINC projects with no time sharing and isolate them from one another on my CPUs all with just a few commands. Since all of my systems run Ryzen 3000 CPUs I can even shut down a container and move it between systems on my network if needs be.

You will need to know how to get around on the Linux command line, have a good understanding of your hardware, and your network layout to follow this guide effectively.
It is expected that you will read this entire guide, at least follow the important links and bookmark them for later reading, and above all backup your config files before you make any changes to your system(s). This guide will assume that you are going to have all of the systems on the same 192.168.0.0/24 subnet. I will not cover using boinccmd here and instead will focus on simple management using the BOINC Manager program.

If you run Ubuntu in a VM you will not get the full benefits of CPU isolation because VM's use virtual CPUs and to my knowledge there is no way to know which vCPU is connected to which physical CPU or when it will change. That is up to the Hypervisor and you should consult the docs for which ever one you are using.

STEP 1: Configuring your host

By default LXD will place your containers in a private network that you cannot access outside of the containers. For the purpose of easy management we will create a bridge that will allow your containers to get an IP address from your home router and there by allow the BOINC Manager to access them as needed. Starting with Ubuntu 18.04 network management is done with a program called NetPlan. You can find the docs here:

https://netplan.io/reference/

Also look over the Examples they posted on that site if you get stuck.

First, back up your existing configuration file:
sudo mv /etc/netplan/00*.yaml /etc/netplan/00-config.original


If you ever want you old network configuration back just remove our custom config file and rename your backup to 00-config.yaml.

To create a new config we will need to know the name of you network interface card. You can find this by running ip addr and looking at the device names in the output. See man ip for more info. Once you know what your NIC is called (for this guide I will use the name enp3s0) we just need to create the new config that specifies the bridge with:
sudo nano /etc/netplan/00-custom-config.yaml


It should look like this if you are using DHCP. (I you have a static IP address adjust according to you needs) :
network:
  version: 2
  ethernets:
    enp3s0:
      dhcp4: false
      dhcp6: false
  bridges:
    br0:
      interfaces: [enp3s0]
      dhcp4:  true
      parameters:
        stp: true
        forward-delay: 0


To test the net config you can run sudo netplan generate and if doesn't return any errors apply the new config with sudo netplan apply.

Step 2: Install and configure LXD

The people that write LXD have put together what may be one of the finest examples of easy to use documentation in all of the FLOSS ecosystem and I will not do them injustice of butchering it for this guide. You can find the Getting Started guide and other important links here:

https://linuxcontainers.org/lxd/getting-started-cli/

The important thing to note is when you run lxd init that you tell it you do not want to create a new bridge and that you do want to use an existing bridge. If you followed the example above it is called br0. Also, if you want to move containers between systems on your network be sure to say yes when asked if you want LXD to be available on the network.

Next up will be creating containers and getting them ready to crunch work units.
ID: 63038 · Report as offensive     Reply Quote
lazlo_vii

Send message
Joined: 11 Dec 19
Posts: 108
Credit: 3,012,142
RAC: 0
Message 63040 - Posted: 27 Nov 2020, 15:54:22 UTC - in response to Message 63039.  
Last modified: 27 Nov 2020, 16:01:25 UTC

OK, I messed up. I didn't know there was a 60 minute timer on editing a post, so if a mod would be so kind as to remove my post above I would be grateful.

Here I go again:

I am sure that as you worked through the Getting Started Guide for LXD you noticed that it has a lot of options. It can be used to no matter what your computing scale is: from a single host to an entire data center. For now just focus on a simple deployment and as you learn and test it out you can scale up if you want to.

Let's start a container, set for limits on which CPU cores it can use, restart it, log into it, and install updates and stuff!

you@lxd.host:~$ lxc launch ubuntu:focal crunch-con1
you@lxd.host:~$ lxc config set crunch-con1 limits.cpu 0-3
you@lxd.host:~$ lxc restart crunch-con1
you@lxd.host:~$ lxc exec crunch-con1 bash
root@cruch-con1:~# apt update
root@cruch-con1:~# apt upgrade
root@cruch-con1:~# lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          16
On-line CPU(s) list:             0-3 <<--  **NOTICE*** Only using these CPU cores
Off-line CPU(s) list:            4-15 <<--  **NOTICE** Not using these CPU cores
Thread(s) per core:              0
Core(s) per socket:              8
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
...
...
root@cruch-con1:~# apt install boinc-client lib32ncurses6 lib32z1 lib32stdc++-7-dev  
root@cruch-con1:~# nano /etc/boinc-client/gui_rpc_auth.cfg  [b][u]<<-- Only needed if you want to set a password for remote management[/b][/u]
root@cruch-con1:~#nano /etc/boinc-client/remote_hosts.cfg [b][u] <<-- You need to add the IP address(es) of the system(s) running the BOINC Manager to this file[/b][/u]
root@cruch-con1:~# systemctl restart boinc-client
root@cruch-con1:~# exit
you@lxd.host:~$


Once you have created the container, set it's CPU config the way you want it, then installed and configured any software you need it you can launch the BOINC Manager to connect to it. After you configure it you are ready to crunch!

You can find the IP address for all containers on your host like this:
laz@bsquad-host-1:~$ lxc list
+----------------+---------+---------------------+------+-----------+-----------+
|      NAME      |  STATE  |        IPV4         | IPV6 |   TYPE    | SNAPSHOTS |
+----------------+---------+---------------------+------+-----------+-----------+
| brute-squad-01 | RUNNING | 192.168.2.13 (eth0) |      | CONTAINER | 0         |
+----------------+---------+---------------------+------+-----------+-----------+
| brute-squad-02 | RUNNING | 192.168.2.14 (eth0) |      | CONTAINER | 0         |
+----------------+---------+---------------------+------+-----------+-----------+
| brute-squad-03 | RUNNING | 192.168.2.15 (eth0) |      | CONTAINER | 0         |
+----------------+---------+---------------------+------+-----------+-----------+
| brute-squad-04 | RUNNING | 192.168.2.16 (eth0) |      | CONTAINER | 0         |
+----------------+---------+---------------------+------+-----------+-----------+
laz@bsquad-host-1:~$


If you configured LXD to be available across the network you can add the remote system by doing:
lxc remote add remote-hostname-or-IP-address

After you enter the password you set durring lxd init you can access a remote host by doing:
lxc list remote-hostname-or-IP-address:


Like this:
laz@desktop:~$ lxc list bsquad-host-1:
+----------------+---------+---------------------+------+-----------+-----------+
|      NAME      |  STATE  |        IPV4         | IPV6 |   TYPE    | SNAPSHOTS |
+----------------+---------+---------------------+------+-----------+-----------+
| brute-squad-01 | RUNNING | 192.168.2.13 (eth0) |      | CONTAINER | 0         |
+----------------+---------+---------------------+------+-----------+-----------+
| brute-squad-02 | RUNNING | 192.168.2.14 (eth0) |      | CONTAINER | 0         |
+----------------+---------+---------------------+------+-----------+-----------+
| brute-squad-03 | RUNNING | 192.168.2.15 (eth0) |      | CONTAINER | 0         |
+----------------+---------+---------------------+------+-----------+-----------+
| brute-squad-04 | RUNNING | 192.168.2.16 (eth0) |      | CONTAINER | 0         |
+----------------+---------+---------------------+------+-----------+-----------+
laz@desktop:~$


If I want to move a container first I need to stop it, then move it, then reset the CPU limits, then start it:
laz@desktop:~$ lxc stop bsquad-host-1:brute-squad-04
laz@desktop:~$ lxc move bsquad-host-1:brute-squad-04 brute-squad-04
laz@desktop:~$ lxc list
+----------------+---------+------+------+-----------+-----------+
|      NAME      |  STATE  | IPV4 | IPV6 |   TYPE    | SNAPSHOTS |
+----------------+---------+------+------+-----------+-----------+
| brute-squad-04 | STOPPED |      |      | CONTAINER | 0         |
+----------------+---------+------+------+-----------+-----------+
laz@desktop:~$ lxc config set brute-squad-04 limit.cpu 4-7
laz@desktop:~$ lxc start brute-squad-04


So as you can imagine LXD is wonderful tool if you want to manage multiple BOINC projects because you can put each one in it's own container, assign it it's set of CPU's and let it crunch away. If you need to upgrade the PC or something you can move your work from one system to another. Be careful running in-progress tasks on CPU's that have a different generation or manufacturer. They may fail. It would be best to let them finish before moving the container. If you also contribute to projects like Einstein@home or F@H you can use LXD to assign your GPUs to containers as well.

I hope you guys find this an handy as I do,

Laz
ID: 63040 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4538
Credit: 19,008,987
RAC: 21,524
Message 63042 - Posted: 27 Nov 2020, 16:44:26 UTC

Thanks for this. I can see how this may be useful for some. I only run other projects when no work is available for CPDN. What isn't clear to me is whether having the 32 bit libraries installed on the host OS means you don't need them on the cloud image downloaded?
Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer.
ID: 63042 · Report as offensive     Reply Quote
lazlo_vii

Send message
Joined: 11 Dec 19
Posts: 108
Credit: 3,012,142
RAC: 0
Message 63043 - Posted: 27 Nov 2020, 16:54:57 UTC - in response to Message 63042.  
Last modified: 27 Nov 2020, 17:00:58 UTC

Thanks for this. I can see how this may be useful for some. I only run other projects when no work is available for CPDN. What isn't clear to me is whether having the 32 bit libraries installed on the host OS means you don't need them on the cloud image downloaded?


In theory you shouldn't need to install them on the guest if they are installed on the host. On the other hand I like to keep my hosts as clean as I can. What helps one workload might be bad for another.

EDIT: And you are welcome!
ID: 63043 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : HOWTO: Use Ubuntu and LXD to help manage and isolate BOINC workloads.

©2024 cpdn.org