Product SiteDocumentation Site

Chapter 3. Subsystems and Tunable Parameters

3.1. blkio
3.2. cpu
3.3. cpuacct
3.4. cpuset
3.5. devices
3.6. freezer
3.7. memory
3.8. net_cls
3.9. ns
3.10. Additional Resources
Subsystems are kernel modules that are aware of control groups. Typically, they are resource controllers that allocate varying levels of system resources to different control groups. However, subsystems could be programmed for any other interaction with the kernel where the need exists to treat different groups of processes differently. The application programming interface (API) to develop new subsystems is documented in cgroups.txt in the kernel documentation, installed on your system at /usr/share/doc/kernel-doc-kernel-version/Documentation/cgroups/. The latest version of the cgroups documentation is also available on line at http://www.kernel.org/doc/Documentation/cgroups/cgroups.txt. Note, however, that the features in the latest documentation might not match those available in the kernel installed on your system.
State objects that contain the subsystem parameters for a control group are represented as pseudofiles within the control group's virtual file system. These pseudofiles can be manipulated by shell commands or their equivalent system calls. For example, cpuset.cpus is a pseudofile that specifies which CPUs a control group is permitted to access. If /cgroup/cpuset/webserver is a control group for the web server that runs on a system, and we run the following command:
~]# echo 0,2 > /cgroup/cpuset/webserver/cpuset.cpus
The value 0,2 is written to the cpuset.cpus pseudofile and therefore limits any tasks whose PIDs are listed in /cgroup/cpuset/webserver/tasks to use only CPU 0 and CPU 2 on the system.

3.1. blkio

The blkio subsystem controls and monitors access to I/O on block devices by tasks in control groups. Writing values to some of these pseudofiles limits access or bandwidth, and reading values from some of these pseudofiles provides information on I/O operations.
blkio.weight
specifies the relative proportion (weight) of block I/O access available by default to a control group, in the range 100 to 1000. This value is overriden for specific devices by the blkio.weight_device parameter.For example, to assign a default weight of 500 to a control group for access to block devices, run:
echo 500 > blkio.weight
blkio.weight_device
specifies the relative proportion (weight) of I/O access on specific devices available to a control group, in the range 100 to 1000. The value of this parameter overrides the value of blkio.weight for the devices specified. Values take the format major:minor weight, where major and minor are device types and node numbers specified in Linux Allocated Devices, otherwise known as the Linux Devices List and available from http://www.kernel.org/doc/Documentation/devices.txt. For example, to assign a weight of 500 to a control group for access to /dev/sda, run:
echo 8:0 500 > blkio.weight_device
In the Linux Allocated Devices notation, 8:0 represents /dev/sda.
blkio.time
reports the time that a control group had I/O access to specific devices. Entries have three fields: major, minor, and time. Major and minor are device types and node numbers specified in Linux Allocated Devices, and time is the length of time in milliseconds (ms).
blkio.sectors
reports the number of sectors transferred to or from specific devices by a control group. Entries have three fields: major, minor, and sectors. Major and minor are device types and node numbers specified in Linux Allocated Devices, and sectors is the number of disk sectors.
blkio.io_service_bytes
reports the number of bytes transferred to or from specific devices by a control group. Entries have four fields: major, minor, operation, and bytes. Major and minor are device types and node numbers specified in Linux Allocated Devices, operation represents the type of operation (read, write, sync, or async) and bytes is the number of bytes transferred.
blkio.io_serviced
reports the number of I/O operations performed on specific devices by a control group. Entries have four fields: major, minor, operation, and bytes. Major and minor are device types and node numbers specified in Linux Allocated Devices, operation represents the type of operation (read, write, sync, or async) and number represents the number of operations.
blkio.io_service_time
reports the total time between request dispatch and request completion for I/O operations on specific devices by a control group. Entries have four fields: major, minor, operation, and bytes. Major and minor are device types and node numbers specified in Linux Allocated Devices, operation represents the type of operation (read, write, sync, or async) and time is the length of time in nanoseconds (ns). The time is reported in nanoseconds rather than a larger unit so that this report is meaningful even for solid-state devices.
blkio.io_wait_time
reports the total time I/O operations on specific devices by a control group spent waiting for service in the scheduler queues. When you interpret this report, note:
  • the time reported can be greated than the total time elapsed, because the time reported is the cumulative total of all I/O operations for the control group rather than the time that the control group itself spent waiting for I/O operations. To find the time that the group as a whole has spent waiting, use blkio.group_wait_time.
  • if the device has a queue_depth > 1, the time reported only includes the time until the request is dispatched to the device, not any time spent waiting for service while the device re-orders requests.
Entries have four fields: major, minor, operation, and bytes. Major and minor are device types and node numbers specified in Linux Allocated Devices, operation represents the type of operation (read, write, sync, or async) and time is the length of time in nanoseconds (ns). The time is reported in nanoseconds rather than a larger unit so that this report is meaningful even for solid-state devices.
blkio.io_merged
reports the number of BIOS requests merged into requests for I/O operations by a control group. Entries have two fields: number and operation. Number is the number of requests, and operation represents the type of operation (read, write, sync, or async).
blkio.io_queued
reports the number of requests queued for I/O operations by a control group. Entries have two fields: number and operation. Number is the number of requests, and operation represents the type of operation (read, write, sync, or async).
blkio.avg_queue_size
reports the average queue size for I/O operations by a control group, over the entire length of time of the group's existence. The queue size is sampled every time a queue for this control group gets a timeslice. Note that this report is available only if CONFIG_DEBUG_BLK_CGROUP=y is set on the system.
blkio.group_wait_time
reports the total time (in nanoseconds — ns) a control group spent waiting for a timeslice for one of its queues. The report is updated every time a queue for this control group gets a timeslice, so if you read this pseudofile while the control group is waiting for a timeslice, the report will not contain time spent waiting for the operation currently queued. Note that this report is available only if CONFIG_DEBUG_BLK_CGROUP=y is set on the system.
blkio.empty_time
reports the total time (in nanoseconds — ns) a control group spent without any pending requests. The report is updated every time a queue for this control group has a pending request, so if you read this pseudofile while the control group has no pending requests, the report will not contain time spent in the current empty state. Note that this report is available only if CONFIG_DEBUG_BLK_CGROUP=y is set on the system.
blkio.idle_time
reports the total time (in nanoseconds — ns) the scheduler spent idling for a control group in anticipation of a better request than those requests already in other queues or from other groups. The report is updated every time the group is no longer idling, so if you read this pseudofile while the control group is idling, the report will not contain time spent in the current idling state. Note that this report is available only if CONFIG_DEBUG_BLK_CGROUP=y is set on the system.
blkio.dequeue
reports the number of times requests for I/O operations by a control group were dequeued by specific devices. Entries have three fields: major, minor, and number. Major and minor are device types and node numbers specified in Linux Allocated Devices, and number is the number of requests the group was dequeued. Note that this report is available only if CONFIG_DEBUG_BLK_CGROUP=y is set on the system.
blkio.reset_stats
resets the statistics recorded in the other pseudofiles. Write an integer to this file to reset the statistics for this cgroup.