Overview

Managing storage is a distinct problem from managing compute resources. OpenShift Origin leverages the Kubernetes persistent volume (PV) framework to allow administrators to provision persistent storage for a cluster. Using persistent volume claims (PVCs), developers can request PV resources without having specific knowledge of the underlying storage infrastructure.

PVCs are specific to a project and are created and used by developers as a means to use a PV. PV resources on their own are not scoped to any single project; they can be shared across the entire OpenShift Origin cluster and claimed from any project. After a PV has been bound to a PVC, however, that PV cannot then be bound to additional PVCs. This has the effect of scoping a bound PV to a single namespace (that of the binding project).

PVs are defined by a PersistentVolume API object, which represents a piece of existing, networked storage in the cluster that has been provisioned by an administrator. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plug-ins like Volumes, but have a lifecycle independent of any individual pod that uses the PV. PV objects capture the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.

High availability of storage in the infrastructure is left to the underlying storage provider.

PVCs are defined by a PersistentVolumeClaim API object, which represents a request for storage by a developer. It is similar to a pod in that pods consume node resources and PVCs consume PV resources. For example, pods can request specific levels of resources (e.g., CPU and memory), while PVCs can request specific storage capacity and access modes (e.g, they can be mounted once read/write or many times read-only).

Lifecycle of a Volume and Claim

PVs are resources in the cluster. PVCs are requests for those resources and also act as claim checks to the resource. The interaction between PVs and PVCs have the following lifecycle.

Provisioning

In response to requests from a developer defined in a PVC, a cluster administrator configures one or more dynamic provisioners that provision storage and a matching PV.

Alternatively, a cluster administrator can create a number of PVs in advance, which carry the details of the real storage that is available for use by cluster users. PVs exist in the API and are available for consumption.

Binding

A user creates a PersistentVolumeClaim with a specific amount of storage requested and with certain access modes and optionally a StorageClass. A control loop in the master watches for new PVCs. It either finds a matching PV or waits for a provisioner for the StorageClass to create one, then binds them together.

The user will always get at least what they asked for, but the volume might be in excess of what was requested. This is especially true with manually provisioned PVs. To minimize the excess, OpenShift Origin binds to the smallest PV that matches all other criteria.

Claims remain unbound indefinitely if a matching volume does not exist or cannot be created with any available provisioner servicing a StorageClass. Claims are bound as matching volumes become available. For example, a cluster with many manually provisioned 50Gi volumes would not match a PVC requesting 100Gi. The PVC can be bound when a 100Gi PV is added to the cluster.

Using

Pods use claims as volumes. The cluster inspects the claim to find the bound volume and mounts that volume for a pod. For those volumes that support multiple access modes, the user specifies which mode is desired when using their claim as a volume in a pod.

Once a user has a claim and that claim is bound, the bound PV belongs to the user for as long as they need it. Users schedule pods and access their claimed PVs by including a persistentVolumeClaim in their pod’s volumes block. See below for syntax details.

Persistent Volume Claim Protection

PVC protection is an alpha feature and may change in a future release of OpenShift Origin.

The purpose of PVC protection is to ensure that PVCs in active use by a pod are not removed from the system, as this may result in data loss.

A PVC is in active use by a pod when the the pod status is Pending, and the pod is assigned to a node or the pod status is Running.

When the PVC protection feature is enabled, if a user deletes a PVC in active use by a pod, the PVC is not immediately removed. PVC removal is postponed until the PVC is no longer actively used by any pods.

You can see that a PVC is protected when the PVC’s status is Terminating and the Finalizers list includes kubernetes.io/pvc-protection:

kubectl describe pvc hostpath
Name:          hostpath
Namespace:     default
StorageClass:  example-hostpath
Status:        Terminating
Volume:
Labels:        <none>
Annotations:   volume.beta.kubernetes.io/storage-class=example-hostpath
               volume.beta.kubernetes.io/storage-provisioner=example.com/hostpath
Finalizers:    [kubernetes.io/pvc-protection]
...

To enable PVC protection, see Configuring Persistent Volume Claim Protection.

Releasing

When a user is done with a volume, they can delete the PVC object from the API which allows reclamation of the resource. The volume is considered "released" when the claim is deleted, but it is not yet available for another claim. The previous claimant’s data remains on the volume which must be handled according to policy.

Reclaiming

The reclaim policy of a PersistentVolume tells the cluster what to do with the volume after it is released. Volumes reclaim policy can either be Retained, Recycled, or Deleted.

Retained reclaim policy allows manual reclamation of the resource for those volume plug-ins that support it. Deleted reclaim policy deletes both the PersistentVolume object from OpenShift Origin and the associated storage asset in external infrastructure, such as AWS EBS, GCE PD, or Cinder volume.

Volumes that were dynamically provisioned are always deleted.

Recycling

If supported by appropriate volume plug-in, recycling performs a basic scrub (rm -rf /thevolume/*) on the volume and makes it available again for a new claim.

The recycle reclaim policy is deprecated in favor of dynamic provisioning and is removed starting in OpenShift Origin 3.6.

You can configure a custom recycler pod template using the controller manager command line arguments as described in the ControllerArguments section. The custom recycler pod template must contain a volumes specification, as shown in the example below:

apiVersion: v1
kind: Pod
metadata:
  name: pv-recycler-
  namespace: default
spec:
  restartPolicy: Never
  volumes:
  - name: nfsvol
    nfs:
      server: any-server-it-will-be-replaced (1)
      path: any-path-it-will-be-replaced (1)
  containers:
  - name: pv-recycler
    image: "gcr.io/google_containers/busybox"
    command: ["/bin/sh", "-c", "test -e /scrub && rm -rf /scrub/..?* /scrub/.[!.]* /scrub/*  && test -z \"$(ls -A /scrub)\" || exit 1"]
    volumeMounts:
    - name: nfsvol
      mountPath: /scrub
1 However, the particular server and path values specified in the custom recycler pod template in the volumes part is replaced with the particular corresponding values from the PV that is being recycled.

Persistent Volumes

Each PV contains a spec and status, which is the specification and status of the volume.

Persistent Volume Object Definition
  apiVersion: v1
  kind: PersistentVolume
  metadata:
    name: pv0003
  spec:
    capacity:
      storage: 5Gi
    accessModes:
      - ReadWriteOnce
    persistentVolumeReclaimPolicy: Recycle
    nfs:
      path: /tmp
      server: 172.17.0.2

Types of Persistent Volumes

OpenShift Origin supports the following PersistentVolume plug-ins:

Capacity

Generally, a PV has a specific storage capacity. This is set using the PV’s capacity attribute.

Currently, storage capacity is the only resource that can be set or requested. Future attributes may include IOPS, throughput, etc.

Access Modes

A PersistentVolume can be mounted on a host in any way supported by the resource provider. Providers have different capabilities and each PV’s access modes are set to the specific modes supported by that particular volume. For example, NFS can support multiple read/write clients, but a specific NFS PV might be exported on the server as read-only. Each PV gets its own set of access modes describing that specific PV’s capabilities.

Claims are matched to volumes with similar access modes. The only two matching criteria are access modes and size. A claim’s access modes represent a request. Therefore, the user may be granted more, but never less. For example, if a claim requests RWO, but the only volume available was an NFS PV (RWO+ROX+RWX), the claim would match NFS because it supports RWO.

Direct matches are always attempted first. The volume’s modes must match or contain more modes than you requested. The size must be greater than or equal to what is expected. If two types of volumes (NFS and iSCSI, for example) both have the same set of access modes, then either of them will match a claim with those modes. There is no ordering between types of volumes and no way to choose one type over another.

All volumes with the same modes are grouped, then sorted by size (smallest to largest). The binder gets the group with matching modes and iterates over each (in size order) until one size matches.

The access modes are:

Access Mode CLI Abbreviation Description

ReadWriteOnce

RWO

The volume can be mounted as read-write by a single node.

ReadOnlyMany

ROX

The volume can be mounted read-only by many nodes.

ReadWriteMany

RWX

The volume can be mounted as read-write by many nodes.

A volume’s AccessModes are descriptors of the volume’s capabilities. They are not enforced constraints. The storage provider is responsible for runtime errors resulting from invalid use of the resource.

For example, a GCE Persistent Disk has AccessModes ReadWriteOnce and ReadOnlyMany. The user must mark their claims as read-only if they want to take advantage of the volume’s ability for ROX. Errors in the provider show up at runtime as mount errors.

iSCSI and Fibre Channel volumes do not have any fencing mechanisms yet. You must ensure the volumes are only used by one node at a time. In certain situations, such as draining a node, the volumes may be used simultaneously by two nodes. Before draining the node, first ensure the pods that use these volumes are deleted.

The table below lists the access modes supported by different PVs:

Table 1. Supported Access Modes for Persistent Volumes
Volume Plug-in ReadWriteOnce ReadOnlyMany ReadWriteMany

AWS EBS

-
-

Azure Disk

-
-

Ceph RBD

-

Fibre Channel

-

GCE Persistent Disk

-
-

GlusterFS

HostPath

-
-

iSCSI

-

NFS

Openstack Cinder

-
-

VMWare vSphere

-
-

Local

-
-

Reclaim Policy

The current reclaim policies are:

Reclaim Policy Description

Retain

Manual reclamation

Recycle

Basic scrub (e.g, rm -rf /<volume>/*)

Currently, only NFS and HostPath support the 'Recycle' reclaim policy.

The recycle reclaim policy is deprecated in favor of dynamic provisioning and is removed starting in OpenShift Origin 3.6.

Phase

A volumes can be found in one of the following phases:

Phase Description

Available

A free resource that is not yet bound to a claim.

Bound

The volume is bound to a claim.

Released

The claim has been deleted, but the resource is not yet reclaimed by the cluster.

Failed

The volume has failed its automatic reclamation.

The CLI shows the name of the PVC bound to the PV.

Mount Options

Mount Options is a Technology Preview feature and it is only available for manually provisioned PVs.

You can specify mount options while mounting a PV by using the annotation volume.beta.kubernetes.io/mount-options.

For example:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv0001
  annotations:
    volume.beta.kubernetes.io/mount-options: rw,nfsvers=4,noexec (1)
spec:
  capacity:
    storage: 1Gi
  accessModes:
  - ReadWriteOnce
  nfs:
    path: /tmp
    server: 172.17.0.2
  persistentVolumeReclaimPolicy: Recycle
  claimRef:
    name: claim1
    namespace: default
1 Specified mount options are then used while mounting the PV to the disk.

The following PV types support mount options:

  • NFS

  • GlusterFS

  • Ceph RBD

  • OpenStack Cinder

  • AWS Elastic Block Store (EBS)

  • GCE Persistent Disk

  • iSCSI

  • Azure Disk

  • Azure File

  • VMWare vSphere

Fibre Channel and HostPath PVs do not support mount options.

Persistent Volume Claims

Each PVC contains a spec and status, which is the specification and status of the claim.

Persistent Volume Claim Object Definition
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: myclaim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi
  storageClassName: gold

Storage Class

Claims can optionally request a specific StorageClass by specifying its name in the storageClassName attribute. Only PVs of the requested class, ones with the same storageClassName as the PVC, can be bound to the PVC. The cluster administrator can configure dynamic provisioners to service one or more storage classes. They create a PV on demand that matches the specifications in the PVC, if they are able.

The cluster administrator can also set a default StorageClass for all PVCs. When a default storage class is configured, the PVC must explicitly ask for StorageClass or storageClassName annotations set to "" to get bound to a PV with a no storage class.

Access Modes

Claims use the same conventions as volumes when requesting storage with specific access modes.

Resources

Claims, like pods, can request specific quantities of a resource. In this case, the request is for storage. The same resource model applies to both volumes and claims.

Claims As Volumes

Pods access storage by using the claim as a volume. Claims must exist in the same namespace as the pod using the claim. The cluster finds the claim in the pod’s namespace and uses it to get the PersistentVolume backing the claim. The volume is then mounted to the host and into the pod:

kind: Pod
apiVersion: v1
metadata:
  name: mypod
spec:
  containers:
    - name: myfrontend
      image: dockerfile/nginx
      volumeMounts:
      - mountPath: "/var/www/html"
        name: mypd
  volumes:
    - name: mypd
      persistentVolumeClaim:
        claimName: myclaim

Block Volume Support

Block Volume Support is a Technology Preview feature and it is only available for manually provisioned PVs.

You can statically provision raw block volumes by including some new API fields in your PV and PVC specifications.

Example Persistent Volume
apiVersion: v1
kind: PersistentVolume
metadata:
  name: block-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  volumeMode: Block (1)
  persistentVolumeReclaimPolicy: Retain
  fc:
    targetWWNs: ["50060e801049cfd1"]
    lun: 0
    readOnly: false
1 volumeMode field indicating that this PV is a raw block volume.
Example Persistent Volume Claim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: block-pvc
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Block (1)
  resources:
    requests:
      storage: 10Gi
1 volumeMode field indicating that a raw block PV is requested.
Example Pod Specification
apiVersion: v1
kind: Pod
metadata:
  name: pod-with-block-volume
spec:
  containers:
    - name: fc-container
      image: fedora:26
      command: ["/bin/sh", "-c"]
      args: [ "tail -f /dev/null" ]
      volumeDevices:  (1)
        - name: data
          devicePath: /dev/xvda (2)
  volumes:
    - name: data
      persistentVolumeClaim:
        claimName: block-pvc (3)
1 volumeDevices (similar to volumeMounts) is used for block devices and can only be used with PersistentVolumeClaim sources.
2 devicePath (similar to mountPath) represents the path to the physical device.
3 The volume source must be of type persistentVolumeClaim and should match the name of the PVC as expected.
Table 2. Accepted Values for VolumeMode
Value Default

Filesystem

Yes

Block

No

Table 3. Binding Scenarios for Block Volumes
PV VolumeMode PVC VolumeMode Binding Result

Filesystem

Filesystem

Bind

Unspecified

Unspecified

Bind

Filesystem

Unspecified

Bind

Unspecified

Filesystem

Bind

Block

Block

Bind

Unspecified

Block

No Bind

Block

Unspecified

No Bind

Filesystem

Block

No Bind

Block

Filesystem

No Bind

Unspecified values result in the default value of Filesystem.

Table 4. Status of Upstream Plug-ins That Support or Will Support Block Volumes
Plug-in Support Block Volume

Fibre Channel

Merged Kube 1.9

Ceph RBD

Merged Kube 1.10

iSCSI

InProgress Kube 1.10

AWS EBS

InProgress Kube 1.10

GCE PD

InProgress Kube 1.10

GlusterFS

InProgress Kube 1.10