Unprivileged LXC

2024, Dec 11    

LXC is cool, and useful as a VM substitute for many things.

I prefer using LXC in the unprivileged way. In this post, we will use LXC directly, and not touch upon LXD/Incus.

Basic setup

We will mostly be following the official documentation here

The package is in debian bookworm main repos

apt-get install lxc

Basic sanity check

root@ansible-test:~# lxc-checkconfig
LXC version 5.0.2
Kernel configuration not found at /proc/config.gz; searching...
Kernel configuration found at /boot/config-5.16.0-5-amd64

--- Namespaces ---
Namespaces: enabled
Utsname namespace: enabled
Ipc namespace: enabled
...

This should be enough to have privileged containers working

Testing privileged containers

Going through a cycle

root@ansible-test:~# lxc-create test-privileged -t download -- --dist debian --arch amd64 --release bookworm
Downloading the image index
Downloading the rootfs
Downloading the metadata
The image cache is now ready
Unpacking the rootfs

---
You just created a Debian bookworm amd64 (20241209_05:24) container.

To enable SSH, run: apt install openssh-server
No default root or user password are set by LXC.
root@ansible-test:~# lxc-start test-privileged
root@ansible-test:~# lxc-attach test-privileged
root@test-privileged:~# ping -c 1 google.com
PING google.com (142.250.181.206) 56(84) bytes of data.
64 bytes from ham02s21-in-f14.1e100.net (142.250.181.206): icmp_seq=1 ttl=54 time=13.2 ms

--- google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 13.187/13.187/13.187/0.000 ms
root@test-privileged:~# exit
exit
root@ansible-test:~# lxc-stop test-privileged
root@ansible-test:~# lxc-destroy test-privileged

Everything working, and network is available also.

Unprivileged containers

For unprivileged containers to work, we must map process ids and set up which user(s) that are allowed to run the containers.

Mapping ids

root@ansible-test:~# cat /etc/subuid
sysuser:100000:65536
root@ansible-test:~# cat /etc/subgid
sysuser:100000:65536

This mean that the user sysuser is allowed to map 65536 process ids starting from 100000 to something internal for a container (like process 0).

Networking

root@ansible-test:~# cat /etc/lxc/lxc-usernet 
sysuser veth lxcbr0 10

User sysuser is allowed to create and use 10 interfaces.

Global default container config

root@ansible-test:~# cat /etc/lxc/default.conf
lxc.net.0.type = veth
lxc.net.0.link = lxcbr0
lxc.net.0.flags = up

lxc.apparmor.profile = unconfined

Apparmor profile is set to “unconfined”, since the recommended “generated” yielded an error.

Use default config

root@ansible-test:~# cat /etc/lxc/default.conf
lxc.include = /etc/lxc/default.config

lxc.idmap = u 0 100000 65536
lxc.idmap = g 0 100000 65536

The mapping must match the values from /etc/subuid and /etc/subgid

Default user data directory is $HOME/.config/share/lxc. This must be created and give proper permissions.

sysuser@ansible-test:~$ mkdir -p $HOME/.local/share/lxc
sysuser@ansible-test:~$ setfacl -m 100000:x $HOME/.local/share/lxc
sysuser@ansible-test:~$ getfacl $HOME/.local/share/lxc -c
getfacl: Removing leading '/' from absolute path names
user::rwx
user:100000:--x
group::r-x
mask::r-x
other::r-x
sysuser@ansible-test:~$ setfacl -m 100000:x $HOME/.local/share

The user has some uid, but due to mapping, the user id used on disk is different. The value 100000 is from /etc/subuid. We use extended file permission in this case to go beyond the usual user/group/other and read/write/execute model.

Testing unprivileged containers

Going through the cycle

sysuser@ansible-test:~$ lxc-create test-unprivileged -t download -- --dist debian --arch amd64 --release bookworm
Downloading the image index
Downloading the rootfs
Downloading the metadata
The image cache is now ready
Unpacking the rootfs

---
You just created a Debian bookworm amd64 (20241209_05:24) container.

To enable SSH, run: apt install openssh-server
No default root or user password are set by LXC.
sysuser@ansible-test:~$ lxc-unpriv-start test-unprivileged
Running scope as unit: run-rf3efb023379d49ae94a100bcebf9bda1.scope
sysuser@ansible-test:~$ lxc-unpriv-attach test-unprivileged
Running scope as unit: run-r53739eb6da1f4763b01c0a4fabd41961.scope
root@test-unprivileged:/# ping google.com -c 1
PING google.com (142.250.181.206) 56(84) bytes of data.
64 bytes from ham02s21-in-f14.1e100.net (142.250.181.206): icmp_seq=1 ttl=54 time=13.0 ms

--- google.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 13.015/13.015/13.015/0.000 ms
root@test-unprivileged:/# exit
exit
sysuser@ansible-test:~$ lxc-stop test-unprivileged
sysuser@ansible-test:~$ lxc-destroy test-unprivileged

Notice the use of “lxc-unpriv-start” and “lxc-unpriv-attach”. These are helper scripts the handles the systemd issues described in the guide

Alternative network setup

For most of my setups, I want to have the LXC container directly on some subnet, and not go through the default lxcbr0 bridge.

To set up bridging, I usually use Open vswitch since I find it easy to handle vlans and interface it with libvirt. In this specific case, bridge-utils are already installed and used for lxcbr0, so we go with that. For more details see the debian wiki on the topic

Updating /etc/network/interfaces

sysuser@ansible-test:~$ cat /etc/network/interfaces
...
# The primary network interface
# now used by bridge
# allow-hotplug enp1s0
# iface enp1s0 inet dhcp

# bridge
auto lan_br0
iface lan_br0 inet dhcp
    bridge_ports enp1s0

Adjust as applicable for your system.

Updating networking permission

root@ansible-test:~# cat /etc/lxc/lxc-usernet 
sysuser veth lxcbr0 10
sysuser veth lan_br0 10

This will allow sysuser to select either lxcbr0 or lan_br0 (or both) for a container.

And update the default bridge to use when creating a container

sysuser@ansible-test:~$ cat /etc/lxc/default.conf 
lxc.net.0.type = veth
lxc.net.0.link = lan_br0
lxc.net.0.flags = up

lxc.apparmor.profile = unconfined

Play with ifup/down or just reboot to apply network changes.

Note that already existing containers will use a copy of the default config, so either destroy and recreate them or edit the corresponding configuration file (located in /home/sysuser/.local/share/lxc/<container name>/config)

End notes/hooks for the future

  • Create an ansible script that handles the above
  • Look into nested lxc, ie. lxc.apparmor.allow_nesting = 1
  • Read up on how app armor and LXC play nice together
  • Make ansible play nice with unpriviled lxc