24.4. Setting KVM processor affinities
This section covers setting processor and processing core affinities with libvirt and KVM guests.
By default, libvirt provisions guests using the hypervisor's default policy. For most hypervisors, the policy is to run guests on any available processing core or CPU. There are times when an explicit policy may be better, in particular for systems with a NUMA (Non-Uniform Memory Access) architecture. A guest on a NUMA system should be pinned to a processing core so that its memory allocations are always local to the node it is running on. This avoids cross-node memory transports which have less bandwidth and can significantly degrade performance.
On a non-NUMA systems some form of explicit placement across the hosts’ sockets, cores and hyperthreads may be more efficient.
The first step in deciding what policy to apply is to determine the host’s memory and CPU topology. The virsh nodeinfo
command provides information about how many sockets, cores and hyperthreads there are attached a host.
# virsh nodeinfo
CPU model: x86_64
CPU(s): 8
CPU frequency: 1000 MHz
CPU socket(s): 2
Core(s) per socket: 4
Thread(s) per core: 1
NUMA cell(s): 1
Memory size: 8179176 kB
This system has eight CPUs, in two sockets, each processor has four cores.
The output shows that that the system has a NUMA architecture. NUMA is more complex and requires more data to accurately interpret. Use the virsh capabilities
to get additional output data on the CPU configuration.
# virsh capabilities
<capabilities>
<host>
<cpu>
<arch>x86_64</arch>
</cpu>
<migration_features>
<live/>
<uri_transports>
<uri_transport>tcp</uri_transport>
</uri_transports>
</migration_features>
<topology>
<cells num='2'>
<cell id='0'>
<cpus num='4'>
<cpu id='0'/>
<cpu id='1'/>
<cpu id='2'/>
<cpu id='3'/>
</cpus>
</cell>
<cell id='1'>
<cpus num='4'>
<cpu id='4'/>
<cpu id='5'/>
<cpu id='6'/>
<cpu id='7'/>
</cpus>
</cell>
</cells>
</topology>
<secmodel>
<model>selinux</model>
<doi>0</doi>
</secmodel>
</host>
[ Additional XML removed ]
</capabilities>
The output shows two NUMA nodes (also know as NUMA cells), each containing four logical CPUs (four processing cores). This system has two sockets, therefore it can be inferred that each socket is a separate NUMA node. For a guest with four virtual CPUs, it would be optimal to lock the guest to physical CPUs 0 to 3, or 4 to 7 to avoid accessing non-local memory, which are significantly slower than accessing local memory.
If a guest requires eight virtual CPUs, as each NUMA node only has four physical CPUs, a better utilization may be obtained by running a pair of four virtual CPU guests and splitting the work between them, rather than using a single 8 CPU guest. Running across multiple NUMA nodes significantly degrades performance for physical and virtualized tasks.
Locking a guest to a particular NUMA node offers no benefit if that node does not have sufficient free memory for that guest. libvirt stores information on the free memory available on each node. Use the virsh freecell
command to display the free memory on all NUMA nodes.
# virsh freecell
0: 2203620 kB
1: 3354784 kB
If a guest requires 3 GB of RAM allocated, then the guest should be run on NUMA node (cell) 1. Node 0 only has 2.2GB free which is probably not sufficient for certain guests.
Once you have determined which node to run the guest on, refer to the capabilities data (the output of the virsh capabilities
command) about NUMA topology.
Extract from the virsh capabilities
output.
<topology>
<cells num='2'>
<cell id='0'>
<cpus num='4'>
<cpu id='0'/>
<cpu id='1'/>
<cpu id='2'/>
<cpu id='3'/>
</cpus>
</cell>
<cell id='1'>
<cpus num='4'>
<cpu id='4'/>
<cpu id='5'/>
<cpu id='6'/>
<cpu id='7'/>
</cpus>
</cell>
</cells>
</topology>
Observe that the node 1, <cell id='1'>
, has physical CPUs 4 to 7.
The guest can be locked to a set of CPUs by appending the cpuset
attribute to the configuration file.
While the guest is offline, open the configuration file with virsh edit
.
Locate where the guest's virtual CPU count is specified. Find the vcpus
element.
<vcpus>4</vcpus>
The guest in this example has four CPUs.
Add a cpuset
attribute with the CPU numbers for the relevant NUMA cell.
<vcpus cpuset='4-7'>4</vcpus>
Save the configuration file and restart the guest.
The guest has been locked to CPUs 4 to 7.
The virt-install
provisioning tool provides a simple way to automatically apply a 'best fit' NUMA policy when guests are created.
The cpuset
option for virt-install
can use a CPU set of processors or the parameter auto
. The auto
parameter automatically determines the optimal CPU locking using the available NUMA data.
For a NUMA system, use the --cpuset=auto
with the virt-install
command when creating new guests.
There may be times where modifying CPU affinities on running guests is preferable to rebooting the guest. The virsh vcpuinfo
and virsh vcpupin
commands can perform CPU affinity changes on running guests.
The virsh vcpuinfo
command gives up to date information about where each virtual CPU is running.
In this example, guest1
is a guest with four virtual CPUs is running on a KVM host.
# virsh vcpuinfo guest1
VCPU: 0
CPU: 3
State: running
CPU time: 0.5s
CPU Affinity: yyyyyyyy
VCPU: 1
CPU: 1
State: running
CPU Affinity: yyyyyyyy
VCPU: 2
CPU: 1
State: running
CPU Affinity: yyyyyyyy
VCPU: 3
CPU: 2
State: running
CPU Affinity: yyyyyyyy
The virsh vcpuinfo
output (the yyyyyyyy
value of CPU Affinity
) shows that the guest can presently run on any CPU.
To lock the virtual CPUs to the second NUMA node (CPUs four to seven), run the following commands.
# virsh vcpupin guest1
0 4
# virsh vcpupin guest1
1 5
# virsh vcpupin guest1
2 6
# virsh vcpupin guest1
3 7
The virsh vcpuinfo
command confirms the change in affinity.
# virsh vcpuinfo guest1
VCPU: 0
CPU: 4
State: running
CPU time: 32.2s
CPU Affinity: ----y---
VCPU: 1
CPU: 5
State: running
CPU time: 16.9s
CPU Affinity: -----y--
VCPU: 2
CPU: 6
State: running
CPU time: 11.9s
CPU Affinity: ------y-
VCPU: 3
CPU: 7
State: running
CPU time: 14.6s
CPU Affinity: -------y
Information from the KVM processes can also confirm that the guest is now running on the second NUMA node.
# grep pid /var/run/libvirt/qemu/guest1
.xml
<domstatus state='running' pid='4907'>
# grep Cpus_allowed_list /proc/4907/task/*/status
/proc/4907/task/4916/status:Cpus_allowed_list: 4
/proc/4907/task/4917/status:Cpus_allowed_list: 5
/proc/4907/task/4918/status:Cpus_allowed_list: 6
/proc/4907/task/4919/status:Cpus_allowed_list: 7
</section>