devmapper: Usage of loopback devices is strongly discouraged for production use. Please use `--storage-opt dm.thinpooldev` or use `man docker` to refer to dm.thinpooldev section.
Optimizing storage helps to minimize storage use across all resources. By optimizing storage, administrators help ensure that existing storage resources are working in an efficient manner.
The following table lists the available persistent storage technologies for OpenShift Origin.
Storage type | Description | Examples |
---|---|---|
Block |
|
CNS/CRS GlusterFS [1] iSCSI, Fibre Channel, Ceph RBD, OpenStack Cinder, AWS EBS [1], Dell/EMC Scale.IO, VMware vSphere Volume, GCE Persistent Disk [1], Azure Disk |
File |
|
CNS/CRS GlusterFS [1], RHEL NFS, NetApp NFS [2] , Azure File, Vendor NFS, Vendor GlusterFS [3], Azure File, AWS EFS |
Object |
|
CNS/CRS GlusterFS [1], Ceph Object Storage (RADOS Gateway), OpenStack Swift, Aliyun OSS, AWS S3, Google Cloud Storage, Azure Blob Storage, Vendor S3 [3], Vendor Swift [3] |
As of OpenShift Origin 3.6.1, Container-Native Storage (CNS) GlusterFS (a hyperconverged or cluster-hosted storage solution) and Container-Ready Storage (CRS) GlusterFS (an externally hosted storage solution) provides interfaces for block, file, and object storage for the purpose of the OpenShift Origin registry, logging, and metrics. |
The following table summarizes the recommended and configurable storage technologies for the given OpenShift Origin cluster application.
Storage type | ROX [4] | RWX [5] | Registry | Scaled registry | Metrics | Logging | Apps |
---|---|---|---|---|---|---|---|
Block |
Yes [6] |
No |
Configurable |
Not configurable |
Recommended |
Recommended |
Recommended |
File |
Yes [6] |
Yes |
Configurable |
Configurable |
Configurable |
Configurable |
Recommended |
Object |
Yes |
Yes |
Recommended |
Recommended |
Not configurable |
Not configurable |
Not configurable [7] |
A scaled registry is an OpenShift Origin registry where three or more pod replicas are running. |
In a non-scaled/high-availability (HA) OpenShift Origin registry cluster deployment:
The preferred storage technology is object storage followed by block storage. The storage technology does not need to support RWX access mode.
The storage technology must ensure read-after-write consistency. All NAS storage (excluding CNS/CRS GlusterFS as it uses an object storage interface) are not recommended for OpenShift Origin Registry cluster deployment with production workloads.
While hostPath
volumes are configurable for a non-scaled/HA OpenShift Origin Registry, they are not recommended for cluster deployment.
Corruption may occur when using NFS to back OpenShift Origin registry with production workloads. |
In a scaled/HA OpenShift Origin registry cluster deployment:
The preferred storage technology is object storage. The storage technology must support RWX access mode and must ensure read-after-write consistency.
File storage and block storage are not recommended for a scaled/HA OpenShift Origin registry cluster deployment with production workloads.
All NAS storage (excluding CNS/CRS GlusterFS as it uses an object storage interface) are not recommended for OpenShift Origin Registry cluster deployment with production workloads.
Corruption may occur when using NFS to back OpenShift Origin scaled/HA registry with production workloads. |
In an OpenShift Origin hosted metrics cluster deployment:
The preferred storage technology is block storage.
It is not recommended to use NAS storage (excluding CNS/CRS GlusterFS as it uses a block storage interface from iSCSI) for a hosted metrics cluster deployment with production workloads.
Corruption may occur when using NFS to back a hosted metrics cluster deployment with production workloads. |
In an OpenShift Origin hosted logging cluster deployment:
The preferred storage technology is block storage.
It is not recommended to use NAS storage (excluding CNS/CRS GlusterFS as it uses a block storage interface from iSCSI) for a hosted metrics cluster deployment with production workloads.
Corruption may occur when using NFS to back hosted logging with production workloads. |
Application use cases vary from application to application, as described in the following examples:
Storage technologies that support dynamic PV provisioning have low mount time latencies, and are not tied to nodes to support a healthy cluster.
NFS does not guarantee read-after-write consistency and is not recommended for applications which require it.
Applications that depend on writing to the same, shared NFS export may experience issues with production workloads.
OpenShift Origin Internal etcd: For the best etcd reliability, the lowest consistent latency storage technology is preferable.
OpenStack Cinder: OpenStack Cinder tends to be adept in ROX access mode use cases.
Databases: Databases (RDBMSs, NoSQL DBs, etc.) tend to perform best with dedicated block storage.
Container runtimes store images and containers in a graph driver (a pluggable
storage technology), such as DeviceMapper
and`Overlay`. Each has advantages
and disadvantages.
For more information about Overlay
, including supportability and usage
caveats, see the
Red
Hat Enterprise Linux (RHEL) 7 Release Notes.
Name | Description | Benefits | Limitations |
---|---|---|---|
Device Mapper loop-lvm |
Uses the Device Mapper thin provisioning module (dm-thin-pool) to implement copy-on-write (CoW) snapshots. For each device mapper graph location, thin pool is created based on two block devices, one for data and one for metadata. By default, these block devices are created automatically by using loopback mounts of automatically created sparse files. |
It works out of the box, so it is useful for prototyping and development purposes. |
|
Device Mapper Thin Provisioning |
Also uses LVM, Device Mapper, and the dm-thinp kernel module. It differs by removing the loopback device, talking straight to a raw partition (no filesystem). |
|
|
OverlayFS |
Combines a lower (parent) and upper (child) filesystem and a working directory (on the same filesystem as the child). The lower filesystem is the base image, and when you create new containers, a new upper filesystem is created containing the deltas. |
|
Not POSIX compliant. |
For more information about Overlay
, including supportability and usage caveats, see the
Red Hat Enterprise Linux (RHEL) 7 Release Notes.
In production environments, using a Logical Volume Management (LVM) thin pool on top of regular block devices (not loop devices) for container images and container root file system storage is recommended.
Using a loop device can affect performance issues. While you can still continue to use it, the following warning message is logged:
devmapper: Usage of loopback devices is strongly discouraged for production use. Please use `--storage-opt dm.thinpooldev` or use `man docker` to refer to dm.thinpooldev section.
To ease storage configuration, use the docker-storage-setup
utility, which automates much of the configuration details:
If you had a separate disk drive dedicated to Docker storage (for example, /dev/xvdb), add the following to the /etc/sysconfig/docker-storage-setup file:
DEVS=/dev/xvdb VG=docker_vg
Restart the docker-storage-setup
service:
# systemctl restart docker-storage-setup
After the restart, docker-storage-setup
sets up a volume group named
docker_vg
and creates a thin-pool logical volume. Documentation for thin
provisioning on RHEL is available in the
LVM
Administrator Guide. View the newly created volumes with the lsblk
command:
# lsblk /dev/xvdb NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT xvdb 202:16 0 20G 0 disk └─xvdb1 202:17 0 10G 0 part ├─docker_vg-docker--pool_tmeta 253:0 0 12M 0 lvm │ └─docker_vg-docker--pool 253:2 0 6.9G 0 lvm └─docker_vg-docker--pool_tdata 253:1 0 6.9G 0 lvm └─docker_vg-docker--pool 253:2 0 6.9G 0 lvm
Thin-provisioned volumes are not mounted and have no file system (individual
containers do have an XFS file system), thus they do not show up in |
To verify that Docker is using an LVM thin pool, and to monitor disk space
utilization, use the docker info
command. The Pool Name
corresponds with
the VG
you specified in /etc/sysconfig/docker-storage-setup:
# docker info | egrep -i 'storage|pool|space|filesystem' Storage Driver: devicemapper Pool Name: docker_vg-docker--pool Pool Blocksize: 524.3 kB Backing Filesystem: xfs Data Space Used: 62.39 MB Data Space Total: 6.434 GB Data Space Available: 6.372 GB Metadata Space Used: 40.96 kB Metadata Space Total: 16.78 MB Metadata Space Available: 16.74 MB
By default, a thin pool is configured to use 40% of the underlying block device.
As you use the storage, LVM automatically extends the thin pool up to 100%. This
is why the Data Space Total
value does not match the full size of the
underlying LVM device. This auto-extend technique was used to unify the storage
approach taken in both Red Hat Enterprise Linux and Red Hat Atomic Host, which
only uses a single partition.
In development, Docker in Red Hat distributions defaults to a loopback mounted sparse file. To see if your system is using the loopback mode:
# docker info|grep loop0 Data file: /dev/loop0
Red Hat strongly recommends using the |
Overlay
is also supported for container runtimes use cases as of Red Hat Enterprise Linux
7.2, and provides faster start up time and page cache sharing, which can
potentially improve density by reducing overall memory utilization.
The default Docker storage configuration on Red Hat Enterprise Linux continues
to be DeviceMapper
. While the use of Overlay
as the container’s storage
technology is under evaluation, moving Red Hat Enterprise Linux to Overlay
as
the default in future releases is under consideration. As of Red Hat Enterprise
Linux 7.2, Overlay
became a supported graph driver. As of Red Hat Enterprise
Linux 7.4, SELinux and the Overlay2
graph driver became a supported
combination.
The main advantage of the Overlay
file system is Linux page cache sharing among
containers sharing an image on the same node. This attribute of Overlay
leads to
reduced input/output (I/O) during container startup (and, thus, faster container
startup time by several hundred milliseconds), as well as reduced memory usage
when similar images are running on a node. Both of these results are beneficial
in many environments, especially those with the goal of optimizing for density
and have high container churn rate (such as a build farm), or those that have
significant overlap in image content.
Page cache sharing is not possible with DeviceMapper
because thin-provisioned
devices are allocated on a per-container basis.