# oc label node perf-node.example.com cpumanager=true
CPU Manager is a Technology Preview feature. |
CPU Manager manages groups of CPUs and constrains workloads to specific CPUs.
CPU Manager is useful for workloads that have some of these attributes:
Require as much CPU time as possible.
Are sensitive to processor cache misses.
Are low-latency network applications.
Coordinate with other processes and benefit from sharing a single processor cache.
To set up CPU Manager:
Optionally, label a node:
# oc label node perf-node.example.com cpumanager=true
Enable CPU manager support on the target node:
# cat /etc/origin/node/node-config.yaml ... kubeletArguments: ... feature-gates: - CPUManager=true cpu-manager-policy: - static cpu-manager-reconcile-period: - 5s kube-reserved: (1) - cpu=500m # systemctl restart atomic-openshift-node
1 | kube-reserved is a required setting. The value may need to be adjusted
depending on your environment. |
Create a pod that requests a core or multiple cores. Both limits and requests must have their CPU value set to a whole integer. That is the number of cores that will be dedicated to this pod:
# cat cpumanager.yaml apiVersion: v1 kind: Pod metadata: generateName: cpumanager- spec: containers: - name: cpumanager image: gcr.io/google_containers/pause-amd64:3.0 resources: requests: cpu: 1 memory: "1G" limits: cpu: 1 memory: "1G" nodeSelector: cpumanager: "true"
Create the pod:
# oc create -f cpumanager.yaml
Verify that the pod is scheduled to the node that you labeled:
# oc describe pod cpumanager Name: cpumanager-4gdtn Namespace: test Node: perf-node.example.com/172.31.62.105 ... Limits: cpu: 1 memory: 1G Requests: cpu: 1 memory: 1G ... QoS Class: Guaranteed Node-Selectors: cpumanager=true region=primary
Verify that the cgroups
are set up correctly. Get the PID of the pause process:
# systemd-cgls -l ├─1 /usr/lib/systemd/systemd --system --deserialize 20 ├─kubepods.slice │ ├─kubepods-pod0ec1ab8b_e1c4_11e7_bb22_027b30990a24.slice │ │ ├─docker-b24e29bc4021064057f941dc5f3538595c317d294f2c8e448b5e61a29c026d1c.scope │ │ │ └─44216 /pause
Pods of QoS tier Guaranteed
are placed within the kubepods.slice
. Pods of other
QoS tiers end up in child cgroups
of kubepods
.
# cd /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-pod0ec1ab8b_e1c4_11e7_bb22_027b30990a24.slice/docker-b24e29bc4021064057f941dc5f3538595c317d294f2c8e448b5e61a29c026d1c.scope # for i in `ls cpuset.cpus tasks` ; do echo -n "$i "; cat $i ; done cpuset.cpus 2 tasks 44216
Check the allowed CPU list for the task:
# grep ^Cpus_allowed_list /proc/44216/status Cpus_allowed_list: 2
Verify that another pod (in this case, the pod in the burstable
QoS tier) on
the system can not run on the core allocated for the Guaranteed
pod:
# cat /sys/fs/cgroup/cpuset/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podbe76ff22_dead_11e7_b99e_027b30990a24.slice/docker-da621bea7569704fc39f84385a179923309ab9d832f6360cccbff102e73f9557.scope/cpuset.cpus 0-1,3
# oc describe node perf-node.example.com ... Capacity: cpu: 4 memory: 16266720Ki pods: 40 Allocatable: cpu: 3500m memory: 16164320Ki pods: 40 --- Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- test cpumanager-4gdtn 1 (28%) 1 (28%) 1G (6%) 1G (6%) test cpumanager-hczts 1 (28%) 1 (28%) 1G (6%) 1G (6%) test cpumanager-r9wrq 1 (28%) 1 (28%) 1G (6%) 1G (6%) ... Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 3 (85%) 3 (85%) 5437500k (32%) 9250M (55%)
This VM has four CPU cores. You set kube-reserved
to 500 millicores, meaning
half of one core is subtracted from the total capacity of the node to arrive at
the Node Allocatable
amount.
You can see that Allocatable CPU
is 3500 millicores. This means we can run three of
our CPU manager pods since each will take one whole core. A whole core is
equivalent to 1000 millicores.
If you try to schedule a fourth pod, the system will accept the pod, but it will never be scheduled:
# oc get pods --all-namespaces |grep test test cpumanager-4gdtn 1/1 Running 0 8m test cpumanager-hczts 1/1 Running 0 8m test cpumanager-nb9d5 0/1 Pending 0 8m test cpumanager-r9wrq 1/1 Running 0 8m