LXC
Containers are a lightweight virtualization technology. They are more akin to an enhanced chroot than to full virtualization like Qemu or VMware, both because they do not emulate hardware and because containers share the same operating system as the host. Therefore containers are better compared to Solaris zones or BSD jails. Linux-vserver and OpenVZ are two pre-existing, independently developed implementations of containers-like functionality for Linux. In fact, containers came about as a result of the work to upstream the vserver and OpenVZ functionality. Some vserver and OpenVZ functionality is still missing in containers, however containers can boot many Linux distributions and have the advantage that they can be used with an un-modified upstream kernel.
There are two user-space implementations of containers, each exploiting the same kernel features. Libvirt allows the use of containers through the LXC driver by connecting to 'lxc:///'. This can be very convenient as it supports the same usage as its other drivers. The other implementation, called simply 'LXC', is not compatible with libvirt, but is more flexible with more userspace tools. It is possible to switch between the two, though there are peculiarities which can cause confusion.
In this document we will mainly describe the lxc package. Toward the end, we will describe how to use the libvirt LXC driver.
In this document, a container name will be shown as CN, C1, or C2.
Installation
The lxc package can be installed using
sudo apt-get install lxc
This will pull in the required and recommended dependencies, including cgroup-lite, lvm2, and debootstrap. To use libvirt-lxc, install libvirt-bin. LXC and libvirt-lxc can be installed and used at the same time.
Host Setup
Basic layout of LXC files
Following is a description of the files and directories which are installed and used by LXC.
-
There are two upstart jobs:
-
/etc/init/lxc-net.conf: is an optional job which only runs if /etc/default/lxc specifies USE_LXC_BRIDGE (true by default). It sets up a NATed bridge for containers to use.
-
/etc/init/lxc.conf: runs if LXC_AUTO (true by default) is set to true in /etc/default/lxc. It looks for entries under /etc/lxc/auto/ which are symbolic links to configuration files for the containers which should be started at boot.
-
-
/etc/lxc/lxc.conf: There is a default container creation configuration file, /etc/lxc/lxc.conf, which directs containers to use the LXC bridge created by the lxc-net upstart job. If no configuration file is specified when creating a container, then this one will be used.
-
Examples of other container creation configuration files are found under /usr/share/doc/lxc/examples. These show how to create containers without a private network, or using macvlan, vlan, or other network layouts.
-
The various container administration tools are found under /usr/bin.
-
/usr/lib/lxc/lxc-init is a very minimal and lightweight init binary which is used by lxc-execute. Rather than `booting' a full container, it manually mounts a few filesystems, especially /proc, and executes its arguments. You are not likely to need to manually refer to this file.
-
/usr/lib/lxc/templates/ contains the `templates' which can be used to create new containers of various distributions and flavors. Not all templates are currently supported.
-
/etc/apparmor.d/lxc/lxc-default contains the default Apparmor MAC policy which works to protect the host from containers. Please see the Apparmor for more information.
-
/etc/apparmor.d/usr.bin.lxc-start contains a profile to protect the host from lxc-start while it is setting up the container.
-
/etc/apparmor.d/lxc-containers causes all the profiles defined under /etc/apparmor.d/lxc to be loaded at boot.
-
There are various man pages for the LXC administration tools as well as the lxc.conf container configuration file.
-
/var/lib/lxc is where containers and their configuration information are stored.
-
/var/cache/lxc is where caches of distribution data are stored to speed up multiple container creations.
lxcbr0
When USE_LXC_BRIDGE is set to true in /etc/default/lxc (as it is by default), a bridge called lxcbr0 is created at startup. This bridge is given the private address 10.0.3.1, and containers using this bridge will have a 10.0.3.0/24 address. A dnsmasq instance is run listening on that bridge, so if another dnsmasq has bound all interfaces before the lxc-net upstart job runs, lxc-net will fail to start and lxcbr0 will not exist.
If you have another bridge - libvirt's default virbr0, or a br0 bridge for your default NIC - you can use that bridge in place of lxcbr0 for your containers.
Using a separate filesystem for the container store
LXC stores container information and (with the default backing store) root filesystems under /var/lib/lxc. Container creation templates also tend to store cached distribution information under /var/cache/lxc.
If you wish to use another filesystem than /var, you can mount a filesystem which has more space into those locations. If you have a disk dedicated for this, you can simply mount it at /var/lib/lxc. If you'd like to use another location, like /srv, you can bind mount it or use a symbolic link. For instance, if /srv is a large mounted filesystem, create and symlink two directories:
sudo mkdir /srv/lxclib /srv/lxccache
sudo rm -rf /var/lib/lxc /var/cache/lxc
sudo ln -s /srv/lxclib /var/lib/lxc
sudo ln -s /srv/lxccache /var/cache/lxc
or, using bind mounts:
sudo mkdir /srv/lxclib /srv/lxccache
sudo sed -i '$a \
/srv/lxclib /var/lib/lxc none defaults,bind 0 0 \
/srv/lxccache /var/cache/lxc none defaults,bind 0 0' /etc/fstab
sudo mount -a
Containers backed by lvm
It is possible to use LVM partitions as the backing stores for containers. Advantages of this include flexibility in storage management and fast container cloning. The tools default to using a VG (volume group) named lxc, but another VG can be used through command line options. When a LV is used as a container backing store, the container's configuration file is still /var/lib/lxc/CN/config, but the root fs entry in that file (lxc.rootfs) will point to the lV block device name, i.e. /dev/lxc/CN.
Containers with directory tree and LVM backing stores can co-exist.
Btrfs
If your host has a btrfs /var, the LXC administration tools will detect this and automatically exploit it by cloning containers using btrfs snapshots.
Apparmor
LXC ships with an Apparmor profile intended to protect the host from accidental misuses of privilege inside the container. For instance, the container will not be able to write to /proc/sysrq-trigger or to most /sys files.
The usr.bin.lxc-start profile is entered by running lxc-start. This profile mainly prevents lxc-start from mounting new filesystems outside of the container's root filesystem. Before executing the container's init, LXC requests a switch to the container's profile. By default, this profile is the lxc-container-default policy which is defined in /etc/apparmor.d/lxc/lxc-default. This profile prevents the container from accessing many dangerous paths, and from mounting most filesystems.
If you find that lxc-start is failing due to a legitimate access which is being denied by its Apparmor policy, you can disable the lxc-start profile by doing:
sudo apparmor_parser -R /etc/apparmor.d/usr.bin.lxc-start sudo ln -s /etc/apparmor.d/usr.bin.lxc-start /etc/apparmor.d/disabled/
This will make lxc-start run unconfined, but continue to confine the container itself. If you also wish to disable confinement of the container, then in addition to disabling the usr.bin.lxc-start profile, you must add:
lxc.aa_profile = unconfined
to the container's configuration file. If you wish to run a container in a custom profile, you can create a new profile under /etc/apparmor.d/lxc/. Its name must start with lxc- in order for lxc-start to be allowed to transition to that profile. The lxc-default profile includes the re-usable abstractions file /etc/apparmor.d/abstractions/lxc/container-base. An easy way to start a new profile therefore is to do the same, then add extra permissions at the bottom of your policy.
After creating the policy, load it using:
sudo apparmor_parser -r /etc/apparmor.d/lxc-containers
The profile will automatically be loaded after a reboot, because it is sourced by the file /etc/apparmor.d/lxc-containers. Finally, to make container CN use this new lxc-CN-profile, add the following line to its configuration file:
lxc.aa_profile = lxc-CN-profile
lxc-execute does not enter an Apparmor profile, but the container it spawns will be confined.
Control Groups
Control groups (cgroups) are a kernel feature providing hierarchical task grouping and per-cgroup resource accounting and limits. They are used in containers to limit block and character device access and to freeze (suspend) containers. They can be further used to limit memory use and block i/o, guarantee minimum cpu shares, and to lock containers to specific cpus. By default, LXC depends on the cgroup-lite package to be installed, which provides the proper cgroup initialization at boot. The cgroup-lite package mounts each cgroup subsystem separately under /sys/fs/cgroup/SS, where SS is the subsystem name. For instance the freezer subsystem is mounted under /sys/fs/cgroup/freezer. LXC cgroup are kept under /sys/fs/cgroup/SS/INIT/lxc, where INIT is the init task's cgroup. This is / by default, so in the end the freezer cgroup for container CN would be /sys/fs/cgroup/freezer/lxc/CN.
Privilege
The container administration tools must be run with root user privilege. A utility called lxc-setup was written with the intention of providing the tools with the needed file capabilities to allow non-root users to run the tools with sufficient privilege. However, as root in a container cannot yet be reliably contained, this is not worthwhile. It is therefore recommended to not use lxc-setup, and to provide the LXC administrators the needed sudo privilege.
The user namespace, which is expected to be available in the next Long Term Support (LTS) release, will allow containment of the container root user, as well as reduce the amount of privilege required for creating and administering containers.
LXC Upstart Jobs
As listed above, the lxc package includes two upstart jobs. The first, lxc-net, is always started when the other, lxc, is about to begin, and stops when it stops. If the USE_LXC_BRIDGE variable is set to false in /etc/defaults/lxc, then it will immediately exit. If it is true, and an error occurs bringing up the LXC bridge, then the lxc job will not start. lxc-net will bring down the LXC bridge when stopped, unless a container is running which is using that bridge.
The lxc job starts on runlevel 2-5. If the LXC_AUTO variable is set to true, then it will look under /etc/lxc for containers which should be started automatically. When the lxc job is stopped, either manually or by entering runlevel 0, 1, or 6, it will stop those containers.
To register a container to start automatically, create a symbolic link /etc/default/lxc/name.conf pointing to the container's config file. For instance, the configuration file for a container CN is /var/lib/lxc/CN/config. To make that container auto-start, use the command:
sudo ln -s /var/lib/lxc/CN/config /etc/lxc/auto/CN.conf
Container Administration
Creating Containers
The easiest way to create containers is using lxc-create. This script uses distribution-specific templates under /usr/lib/lxc/templates/ to set up container-friendly chroots under /var/lib/lxc/CN/rootfs, and initialize the configuration in /var/lib/lxc/CN/fstab and /var/lib/lxc/CN/config, where CN is the container name
The simplest container creation command would look like:
sudo lxc-create -t ubuntu -n CN
This tells lxc-create to use the ubuntu template (-t ubuntu) and to call the container CN (-n CN). Since no configuration file was specified (which would have been done with `-f file'), it will use the default configuration file under /etc/lxc/lxc.conf. This gives the container a single veth network interface attached to the lxcbr0 bridge.
The container creation templates can also accept arguments. These can be listed after --. For instance
sudo lxc-create -t ubuntu -n oneiric1 -- -r oneiric
passes the arguments '-r oneiric1' to the ubuntu template.
Help
Help on the lxc-create command can be seen by using lxc-create -h. However, the templates also take their own options. If you do
sudo lxc-create -t ubuntu -h
then the general lxc-create help will be followed by help output specific to the ubuntu template. If no template is specified, then only help for lxc-create itself will be shown.
Ubuntu template
The ubuntu template can be used to create Ubuntu system containers with any release at least as new as 10.04 LTS. It uses debootstrap to create a cached container filesystem which gets copied into place each time a container is created. The cached image is saved and only re-generated when you create a container using the -F (flush) option to the template, i.e.:
sudo lxc-create -t ubuntu -n CN -- -F
The Ubuntu release installed by the template will be the same as that on the host, unless otherwise specified with the -r option, i.e.
sudo lxc-create -t ubuntu -n CN -- -r lucid
If you want to create a 32-bit container on a 64-bit host, pass -a i386 to the container. If you have the qemu-user-static package installed, then you can create a container using any architecture supported by qemu-user-static.
The container will have a user named ubuntu whose password is ubuntu and who is a member of the sudo group. If you wish to inject a public ssh key for the ubuntu user, you can do so with -S sshkey.pub.
You can also bind user jdoe from the host into the container using the -b jdoe option. This will copy jdoe's password and shadow entries into the container, make sure his default group and shell are available, add him to the sudo group, and bind-mount his home directory into the container when the container is started.
When a container is created, the release-updates archive is added to the container's sources.list, and its package archive will be updated. If the container release is older than 12.04 LTS, then the lxcguest package will be automatically installed. Alternatively, if the --trim option is specified, then the lxcguest package will not be installed, and many services will be removed from the container. This will result in a faster-booting, but less upgrade-able container.
Ubuntu-cloud template
The ubuntu-cloud template creates Ubuntu containers by downloading and extracting the published Ubuntu cloud images. It accepts some of the same options as the ubuntu template, namely -r release, -S sshkey.pub, -a arch, and -F to flush the cached image. It also accepts a few extra options. The -C option will create a cloud container, configured for use with a metadata service. The -u option accepts a cloud-init user-data file to configure the container on start. If -L is passed, then no locales will be installed. The -T option can be used to choose a tarball location to extract in place of the published cloud image tarball. Finally the -i option sets a host id for cloud-init, which by default is set to a random string.
Other templates
The ubuntu and ubuntu-cloud templates are well supported. Other templates are available however. The debian template creates a Debian based container, using debootstrap much as the ubuntu template does. By default it installs a debian squeeze image. An alternate release can be chosen by setting the SUITE environment variable, i.e.:
sudo SUITE=sid lxc-create -t debian -n d1
To purge the container image cache, call the template directly and pass it the --clean option.
sudo SUITE=sid /usr/lib/lxc/templates/lxc-debian --clean
A fedora template exists, which creates containers based on fedora releases <= 14. Fedora release 15 and higher are based on systemd, which the template is not yet able to convert into a container-bootable setup. Before the fedora template is able to run, you'll need to make sure that yum and curl are installed. A fedora 12 container can be created with
sudo lxc-create -t fedora -n fedora12 -- -R 12
A OpenSuSE template exists, but it requires the zypper program, which is not yet packaged. The OpenSuSE template is therefore not supported.
Two more templates exist mainly for experimental purposes. The busybox template creates a very small system container based entirely on busybox. The sshd template creates an application container running sshd in a private network namespace. The host's library and binary directories are bind-mounted into the container, though not its /home or /root. To create, start, and ssh into an ssh container, you might:
sudo lxc-create -t sshd -n ssh1
ssh-keygen -f id
sudo mkdir /var/lib/lxc/ssh1/rootfs/root/.ssh
sudo cp id.pub /var/lib/lxc/ssh1/rootfs/root/.ssh/authorized_keys
sudo lxc-start -n ssh1 -d
ssh -i id root@ssh1.
Backing Stores
By default, lxc-create places the container's root filesystem as a directory tree at /var/lib/lxc/CN/rootfs. Another option is to use LVM logical volumes. If a volume group named lxc exists, you can create an lvm-backed container called CN using:
sudo lxc-create -t ubuntu -n CN -B lvm
If you want to use a volume group named schroots, with a 5G xfs filesystem, then you would use
sudo lxc-create -t ubuntu -n CN -B lvm --vgname schroots --fssize 5G --fstype xfs
Cloning
For rapid provisioning, you may wish to customize a canonical container according to your needs and then make multiple copies of it. This can be done with the lxc-clone program. Given an existing container called C1, a new container called C2 can be created using
sudo lxc-clone -o C1 -n C2
If /var/lib/lxc is a btrfs filesystem, then lxc-clone will create C2's filesystem as a snapshot of C1's. If the container's root filesystem is lvm backed, then you can specify the -s option to create the new rootfs as a lvm snapshot of the original as follows:
sudo lxc-clone -s -o C1 -n C2
Both lvm and btrfs snapshots will provide fast cloning with very small initial disk usage.
Starting and stopping
To start a container, use lxc-start -n CN. By default lxc-start will execute /sbin/init in the container. You can provide a different program to execute, plus arguments, as further arguments to lxc-start:
sudo lxc-start -n container /sbin/init loglevel=debug
If you do not specify the -d (daemon) option, then you will see a console (on the container's /dev/console, see Consoles for more information) on the terminal. If you specify the -d option, you will not see that console, and lxc-start will immediately exit success - even if a later part of container startup has failed. You can use lxc-wait or lxc-monitor (see Monitoring container status) to check on the success or failure of the container startup.
To obtain LXC debugging information, use -o filename -l debuglevel, for instance:
sudo lxc-start -o lxc.debug -l DEBUG -n container
Finally, you can specify configuration parameters inline using -s. However, it is generally recommended to place them in the container's configuration file instead. Likewise, an entirely alternate config file can be specified with the -f option, but this is not generally recommended.
While lxc-start runs the container's /sbin/init, lxc-execute uses a minimal init program called lxc-init, which attempts to mount /proc, /dev/mqueue, and /dev/shm, executes the programs specified on the command line, and waits for those to finish executing. lxc-start is intended to be used for system containers, while lxc-execute is intended for application containers (see this article for more).
You can stop a container several ways. You can use shutdown, poweroff and reboot while logged into the container. To cleanly shut down a container externally (i.e. from the host), you can issue the sudo lxc-shutdown -n CN command. This takes an optional timeout value. If not specified, the command issues a SIGPWR signal to the container and immediately returns. If the option is used, as in sudo lxc-shutdown -n CN -t 10, then the command will wait the specified number of seconds for the container to cleanly shut down. Then, if the container is still running, it will kill it (and any running applications). You can also immediately kill the container (without any chance for applications to cleanly shut down) using sudo lxc-stop -n CN. Finally, lxc-kill can be used more generally to send any signal number to the container's init.
While the container is shutting down, you can expect to see some (harmless) error messages, as follows:
$ sudo poweroff [sudo] password for ubuntu: = $ = Broadcast message from ubuntu@cn1 (/dev/lxc/console) at 18:17 ... The system is going down for power off NOW! * Asking all remaining processes to terminate... ...done. * All processes ended within 1 seconds.... ...done. * Deconfiguring network interfaces... ...done. * Deactivating swap... ...fail! umount: /run/lock: not mounted umount: /dev/shm: not mounted mount: / is busy * Will now halt
A container can be frozen with sudo lxc-freeze -n CN. This will block all its processes until the container is later unfrozen using sudo lxc-unfreeze -n CN.
Lifecycle management hooks
Beginning with Ubuntu 12.10, it is possible to define hooks to be executed at specific points in a container's lifetime:
-
Pre-start hooks are run in the host's namespace before the container ttys, consoles, or mounts are up. If any mounts are done in this hook, they should be cleaned up in the post-stop hook.
-
Pre-mount hooks are run in the container's namespaces, but before the root filesystem has been mounted. Mounts done in this hook will be automatically cleaned up when the container shuts down.
-
Mount hooks are run after the container filesystems have been mounted, but before the container has called pivot_root to change its root filesystem.
-
Start hooks are run immediately before executing the container's init. Since these are executed after pivoting into the container's filesystem, the command to be executed must be copied into the container's filesystem.
-
Post-stop hooks are executed after the container has been shut down.
If any hook returns an error, the container's run will be aborted. Any post-stop hook will still be executed. Any output generated by the script will be logged at the debug priority.
See Other configuration options for the configuration file format with which to specify hooks. Some sample hooks are shipped with the lxc package to serve as an example of how to write and use such hooks.
Monitoring container status
Two commands are available to monitor container state changes. lxc-monitor monitors one or more containers for any state changes. It takes a container name as usual with the -n option, but in this case the container name can be a posix regular expression to allow monitoring desirable sets of containers. lxc-monitor continues running as it prints container changes. lxc-wait waits for a specific state change and then exits. For instance,
sudo lxc-monitor -n cont[0-5]*
would print all state changes to any containers matching the listed regular expression, whereas
sudo lxc-wait -n cont1 -s 'STOPPED|FROZEN'
will wait until container cont1 enters state STOPPED or state FROZEN and then exit.
Consoles
Containers have a configurable number of consoles. One always exists on the container's /dev/console. This is shown on the terminal from which you ran lxc-start, unless the -d option is specified. The output on /dev/console can be redirected to a file using the -c console-file option to lxc-start. The number of extra consoles is specified by the lxc.tty variable, and is usually set to 4. Those consoles are shown on /dev/ttyN (for 1 <= N <= 4). To log into console 3 from the host, use
sudo lxc-console -n container -t 3
or if the -t N option is not specified, an unused console will be automatically chosen. To exit the console, use the escape sequence Ctrl-a q. Note that the escape sequence does not work in the console resulting from lxc-start without the -d option.
Each container console is actually a Unix98 pty in the host's (not the guest's) pty mount, bind-mounted over the guest's /dev/ttyN and /dev/console. Therefore, if the guest unmounts those or otherwise tries to access the actual character device 4:N, it will not be serving getty to the LXC consoles. (With the default settings, the container will not be able to access that character device and getty will therefore fail.) This can easily happen when a boot script blindly mounts a new /dev.
Container Inspection
Several commands are available to gather information on existing containers. lxc-ls will report all existing containers in its first line of output, and all running containers in the second line. lxc-list provides the same information in a more verbose format, listing running containers first and stopped containers next. lxc-ps will provide lists of processes in containers. To provide ps arguments to lxc-ps, prepend them with --. For instance, for listing of all processes in container plain,
sudo lxc-ps -n plain -- -ef
lxc-info provides the state of a container and the pid of its init process. lxc-cgroup can be used to query or set the values of a container's control group limits and information. This can be more convenient than interacting with the cgroup filesystem. For instance, to query the list of devices which a running container is allowed to access, you could use
sudo lxc-cgroup -n CN devices.list
or to add mknod, read, and write access to /dev/sda,
sudo lxc-cgroup -n CN devices.allow "b 8:* rwm"
and, to limit it to 300M of RAM,
lxc-cgroup -n CN memory.limit_in_bytes 300000000
lxc-netstat executes netstat in the running container, giving you a glimpse of its network state.
lxc-backup will create backups of the root filesystems of all existing containers (except lvm-based ones), using rsync to back the contents up under /var/lib/lxc/CN/rootfs.backup.1. These backups can be restored using lxc-restore. However, lxc-backup and lxc-restore are fragile with respect to customizations and therefore their use is not recommended.
Destroying containers
Use lxc-destroy to destroy an existing container.
sudo lxc-destroy -n CN
If the container is running, lxc-destroy will exit with a message informing you that you can force stopping and destroying the container with
sudo lxc-destroy -n CN -f
Advanced namespace usage
One of the Linux kernel features used by LXC to create containers is private namespaces. Namespaces allow a set of tasks to have private mappings of names to resources for things like pathnames and process IDs. (See Resources for a link to more information). Unlike control groups and other mount features which are also used to create containers, namespaces cannot be manipulated using a filesystem interface. Therefore, LXC ships with the lxc-unshare program, which is mainly for testing. It provides the ability to create new tasks in private namespaces. For instance,
sudo lxc-unshare -s 'MOUNT|PID' /bin/bash
creates a bash shell with private pid and mount namespaces. In this shell, you can do
root@ubuntu:~# mount -t proc proc /proc root@ubuntu:~# ps -ef UID PID PPID C STIME TTY TIME CMD root 1 0 6 10:20 pts/9 00:00:00 /bin/bash root 110 1 0 10:20 pts/9 00:00:00 ps -ef
so that ps shows only the tasks in your new namespace.
Ephemeral containers
Ephemeral containers are one-time containers. Given an existing container CN, you can run a command in an ephemeral container created based on CN, with the host's jdoe user bound into the container, using:
lxc-start-ephemeral -b jdoe -o CN -- /home/jdoe/run_my_job
When the job is finished, the container will be discarded.
Container Commands
Following is a table of all container commands:
Container commands
Command |
Synopsis |
---|---|
lxc-attach |
(NOT SUPPORTED) Run a command in a running container |
lxc-backup |
Back up the root filesystems for all lvm-backed containers |
lxc-cgroup |
View and set container control group settings |
lxc-checkconfig |
Verify host support for containers |
lxc-checkpoint |
(NOT SUPPORTED) Checkpoint a running container |
lxc-clone |
Clone a new container from an existing one |
lxc-console |
Open a console in a running container |
lxc-create |
Create a new container |
lxc-destroy |
Destroy an existing container |
lxc-execute |
Run a command in a (not running) application container |
lxc-freeze |
Freeze a running container |
lxc-info |
Print information on the state of a container |
lxc-kill |
Send a signal to a container's init |
lxc-list |
List all containers |
lxc-ls |
List all containers with shorter output than lxc-list |
lxc-monitor |
Monitor state changes of one or more containers |
lxc-netstat |
Execute netstat in a running container |
lxc-ps |
View process info in a running container |
lxc-restart |
(NOT SUPPORTED) Restart a checkpointed container |
lxc-restore |
Restore containers from backups made by lxc-backup |
lxc-setcap |
(NOT RECOMMENDED) Set file capabilities on LXC tools |
lxc-setuid |
(NOT RECOMMENDED) Set or remove setuid bits on LXC tools |
lxc-shutdown |
Safely shut down a container |
lxc-start |
Start a stopped container |
lxc-start-ephemeral |
Start an ephemeral (one-time) container |
lxc-stop |
Immediately stop a running container |
lxc-unfreeze |
Unfreeze a frozen container |
lxc-unshare |
Testing tool to manually unshare namespaces |
lxc-version |
Print the version of the LXC tools |
lxc-wait |
Wait for a container to reach a particular state |
Configuration File
LXC containers are very flexible. The Ubuntu lxc package sets defaults to make creation of Ubuntu system containers as simple as possible. If you need more flexibility, this chapter will show how to fine-tune your containers as you need.
Detailed information is available in the lxc.conf(5) man page. Note that the default configurations created by the ubuntu templates are reasonable for a system container and usually do not need customization.
Choosing configuration files and options
The container setup is controlled by the LXC configuration options. Options can be specified at several points:
-
During container creation, a configuration file can be specified. However, creation templates often insert their own configuration options, so we usually specify only network configuration options at this point. For other configuration, it is usually better to edit the configuration file after container creation.
-
The file /var/lib/lxc/CN/config is used at container startup by default.
-
lxc-start accepts an alternate configuration file with the -f filename option.
-
Specific configuration variables can be overridden at lxc-start using -s key=value. It is generally better to edit the container configuration file.
Network Configuration
Container networking in LXC is very flexible. It is triggered by the lxc.network.type configuration file entries. If no such entries exist, then the container will share the host's networking stack. Services and connections started in the container will be using the host's IP address. If at least one lxc.network.type entry is present, then the container will have a private (layer 2) network stack. It will have its own network interfaces and firewall rules. There are several options for lxc.network.type:
-
lxc.network.type=empty: The container will have no network interfaces other than loopback.
-
lxc.network.type=veth: This is the default when using the ubuntu or ubuntu-cloud templates, and creates a veth network tunnel. One end of this tunnel becomes the network interface inside the container. The other end is attached to a bridged on the host. Any number of such tunnels can be created by adding more lxc.network.type=veth entries in the container configuration file. The bridge to which the host end of the tunnel will be attached is specified with lxc.network.link = lxcbr0.
-
lxc.network.type=phys A physical network interface (i.e. eth2) is passed into the container.
Two other options are to use vlan or macvlan, however their use is more complicated and is not described here. A few other networking options exist:
-
lxc.network.flags can only be set to up and ensures that the network interface is up.
-
lxc.network.hwaddr specifies a mac address to assign to the nic inside the container.
-
lxc.network.ipv4 and lxc.network.ipv6 set the respective IP addresses, if those should be static.
-
lxc.network.name specifies a name to assign inside the container. If this is not specified, a good default (i.e. eth0 for the first nic) is chosen.
-
lxc.network.lxcscript.up specifies a script to be called after the host side of the networking has been set up. See the lxc.conf(5) manual page for details.
Control group configuration
Cgroup options can be specified using lxc.cgroup entries. lxc.cgroup.subsystem.item = value instructs LXC to set cgroup subsystem's item to value. It is perhaps simpler to realize that this will simply write value to the file item for the container's control group for subsystem subsystem. For instance, to set the memory limit to 320M, you could add
lxc.cgroup.memory.limit_in_bytes = 320000000
which will cause 320000000 to be written to the file /sys/fs/cgroup/memory/lxc/CN/limit_in_bytes.
Rootfs, mounts and fstab
An important part of container setup is the mounting of various filesystems into place. The following is an example configuration file excerpt demonstrating the commonly used configuration options:
lxc.rootfs = /var/lib/lxc/CN/rootfs
lxc.mount.entry=proc /var/lib/lxc/CN/rootfs/proc proc nodev,noexec,nosuid 0 0
lxc.mount = /var/lib/lxc/CN/fstab
The first line says that the container's root filesystem is already mounted at /var/lib/lxc/CN/rootfs. If the filesystem is a block device (such as an LVM logical volume), then the path to the block device must be given instead.
Each lxc.mount.entry line should contain an item to mount in valid fstab format. The target directory should be prefixed by /var/lib/lxc/CN/rootfs, even if lxc.rootfs points to a block device.
Finally, lxc.mount points to a file, in fstab format, containing further items to mount. Note that all of these entries will be mounted by the host before the container init is started. In this way it is possible to bind mount various directories from the host into the container.
Other configuration options
-
lxc.cap.drop can be used to prevent the container from having or ever obtaining the listed capabilities. For instance, including
lxc.cap.drop = sys_admin
will prevent the container from mounting filesystems, as well as all other actions which require cap_sys_admin. See the capabilities(7) manual page for a list of capabilities and their meanings.
-
lxc.aa_profile = lxc-CN-profile specifies a custom Apparmor profile in which to start the container. See Apparmor for more information.
-
lxc.console=/path/to/consolefile will cause console messages to be written to the specified file.
-
lxc.arch specifies the architecture for the container, for instance x86, or x86_64.
-
lxc.tty=5 specifies that 5 consoles (in addition to /dev/console) should be created. That is, consoles will be available on /dev/tty1 through /dev/tty5. The ubuntu templates set this value to 4.
-
lxc.pts=1024 specifies that the container should have a private (Unix98) devpts filesystem mount. If this is not specified, then the container will share /dev/pts with the host, which is rarely desired. The number 1024 means that 1024 ptys should be allowed in the container, however this number is currently ignored. Before starting the container init, LXC will do (essentially) a
sudo mount -t devpts -o newinstance devpts /dev/pts
inside the container. It is important to realize that the container should not mount devpts filesystems of its own. It may safely do bind or move mounts of its mounted /dev/pts. But if it does
sudo mount -t devpts devpts /dev/pts
it will remount the host's devpts instance. If it adds the newinstance mount option, then it will mount a new private (empty) instance. In neither case will it remount the instance which was set up by LXC. For this reason, and to prevent the container from using the host's ptys, the default Apparmor policy will not allow containers to mount devpts filesystems after the container's init has been started.
-
lxc.devttydir specifies a directory under /dev in which LXC will create its console devices. If this option is not specified, then the ptys will be bind-mounted over /dev/console and /dev/ttyN. However, rare package updates may try to blindly rm -f and then mknod those devices. They will fail (because the file has been bind-mounted), causing the package update to fail. When lxc.devttydir is set to LXC, for instance, then LXC will bind-mount the console ptys onto /dev/lxc/console and /dev/lxc/ttyN, and subsequently symbolically link them to /dev/console and /dev/ttyN. This allows the package updates to succeed, at the risk of making future gettys on those consoles fail until the next reboot. This problem will be ideally solved with device namespaces.
-
The lxc.hook. options specify programs to run at various points in a container's life cycle. See Lifecycle management hooks for more information on these hooks. To have multiple hooks called at any point, list them in multiple entries. The possible values, whose precise meanings are described in Lifecycle management hooks, are
-
lxc.hook.pre-start
-
lxc.hook.pre-mount
-
lxc.hook.mount
-
lxc.hook.start
-
lxc.hook.post-stop
-
-
The lxc.include option specifies another configuration file to be loaded. This allows common configuration sections to be defined once and included by several containers, simplifying updates of the common section.
-
The lxc.seccomp option (introduced with Ubuntu 12.10) specifies a file containing a seccomp policy to load. See Security for more information on seccomp in lxc.
Updates in Ubuntu containers
Because of some limitations which are placed on containers, package upgrades at times can fail. For instance, a package install or upgrade might fail if it is not allowed to create or open a block device. This often blocks all future upgrades until the issue is resolved. In some cases, you can work around this by chrooting into the container, to avoid the container restrictions, and completing the upgrade in the chroot.
Some of the specific things known to occasionally impede package upgrades include:
-
The container modifications performed when creating containers with the --trim option.
-
Actions performed by lxcguest. For instance, because /lib/init/fstab is bind-mounted from another file, mountall upgrades which insist on replacing that file can fail.
-
The over-mounting of console devices with ptys from the host can cause trouble with udev upgrades.
-
Apparmor policy and devices cgroup restrictions can prevent package upgrades from performing certain actions.
-
Capabilities dropped by use of lxc.cap.drop can likewise stop package upgrades from performing certain actions.
Libvirt LXC
Libvirt is a powerful hypervisor management solution with which you can administer Qemu, Xen and LXC virtual machines, both locally and remote. The libvirt LXC driver is a separate implementation from what we normally call LXC. A few differences include:
-
Configuration is stored in xml format
-
There no tools to facilitate container creation
-
By default there is no console on /dev/console
-
There is no support (yet) for container reboot or full shutdown
Converting a LXC container to libvirt-lxc
Creating Containers showed how to create LXC containers. If you've created a valid LXC container in this way, you can manage it with libvirt. Fetch a sample xml file from
wget http://people.canonical.com/~serge/o1.xml
Edit this file to replace the container name and root filesystem locations. Then you can define the container with:
virsh -c lxc:/// define o1.xml
Creating a container from cloud image
If you prefer to create a pristine new container just for LXC, you can download an ubuntu cloud image, extract it, and point a libvirt LXC xml file to it. For instance, find the url for a root tarball for the latest daily Ubuntu 12.04 LTS cloud image using
url1=`ubuntu-cloudimg-query precise daily $arch --format "%{url}\n"`
url=`echo $url1 | sed -e 's/.tar.gz/-root\0/'`
wget $url
filename=`basename $url`
Extract the downloaded tarball, for instance
mkdir $HOME/c1
cd $HOME/c1
sudo tar zxf $filename
Download the xml template
wget http://people.canonical.com/~serge/o1.xml
In the xml template, replace the name o1 with c1 and the source directory /var/lib/lxc/o1/rootfs with $HOME/c1. Then define the container using
virsh define o1.xml
Interacting with libvirt containers
As we've seen, you can create a libvirt-lxc container using
virsh -c lxc:/// define container.xml
To start a container called container, use
virsh -c lxc:/// start container
To stop a running container, use
virsh -c lxc:/// destroy container
Note that whereas the lxc-destroy command deletes the container, the virsh destroy command stops a running container. To delete the container definition, use
virsh -c lxc:/// undefine container
To get a console to a running container, use
virsh -c lxc:/// console container
Exit the console by simultaneously pressing control and ].
The lxcguest package
In the 11.04 (Natty) and 11.10 (Oneiric) releases of Ubuntu, a package was introduced called lxcguest. An unmodified root image could not be safely booted inside a container, but an image with the lxcguest package installed could be booted as a container, on bare hardware, or in a Xen, kvm, or VMware virtual machine.
As of the 12.04 LTS release, the work previously done by the lxcguest package was pushed into the core packages, and the lxcguest package was removed. As a result, an unmodified 12.04 LTS image can be booted as a container, on bare hardware, or in a Xen, kvm, or VMware virtual machine. To use an older release, the lxcguest package should still be used.
Python api
As of 12.10 (Quantal) a python3-lxc package is available which provides a python module, called lxc, for managing lxc containers. An example python session to create and start an Ubuntu container called C1, then wait until it has been shut down, would look like:
# sudo python3 Python 3.2.3 (default, Aug 28 2012, 08:26:03) [GCC 4.7.1 20120814 (prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import lxc __main__:1: Warning: The python-lxc API isn't yet stable and may change at any p oint in the future. >>> c=lxc.Container("C1") >>> c.create("ubuntu") True >>> c.start() True >>> c.wait("STOPPED") True
Debug information for containers started with the python API will be placed in /var/log/lxccontainer.log.
Security
A namespace maps ids to resources. By not providing a container any id with which to reference a resource, the resource can be protected. This is the basis of some of the security afforded to container users. For instance, IPC namespaces are completely isolated. Other namespaces, however, have various leaks which allow privilege to be inappropriately exerted from a container into another container or to the host.
By default, LXC containers are started under a Apparmor policy to restrict some actions. However, while stronger security is a goal for future releases, in 12.04 LTS the goal of the Apparmor policy is not to stop malicious actions but rather to stop accidental harm of the host by the guest. The details of AppArmor integration with lxc are in section Apparmor
Exploitable system calls
It is a core container feature that containers share a kernel with the host. Therefore if the kernel contains any exploitable system calls the container can exploit these as well. Once the container controls the kernel it can fully control any resource known to the host.
Since Ubuntu 12.10 (Quantal) a container can also be constrained by a seccomp filter. Seccomp is a new kernel feature which filters the system calls which may be used by a task and its children. While improved and simplified policy management is expected in the near future, the current policy consists of a simple whitelist of system call numbers. The policy file begins with a version number (which must be 1) on the first line and a policy type (which must be 'whitelist') on the second line. It is followed by a list of numbers, one per line.
In general to run a full distribution container a large number of system calls will be needed. However for application containers it may be possible to reduce the number of available system calls to only a few. Even for system containers running a full distribution security gains may be had, for instance by removing the 32-bit compatibility system calls in a 64-bit container. See Other configuration options for details of how to configure a container to use seccomp. By default, no seccomp policy is loaded.
Resources
-
The DeveloperWorks article LXC: Linux container tools was an early introduction to the use of containers.
-
The Secure Containers Cookbook demonstrated the use of security modules to make containers more secure.
-
Manual pages referenced above can be found at:
-
The upstream LXC project is hosted at Sourceforge.
-
LXC security issues are listed and discussed at the LXC Security wiki page
-
For more on namespaces in Linux, see: S. Bhattiprolu, E. W. Biederman, S. E. Hallyn, and D. Lezcano. Virtual Servers and Check- point/Restart in Mainstream Linux. SIGOPS Op- erating Systems Review, 42(5), 2008.