Overview

While etcd was updated from etcd v2 to v3 in a previous release, OpenShift Origin continued using an etcd v2 data model and API for both new and upgraded clusters. Starting with OpenShift Origin 3.6, new installations began using the v3 data model as well, providing improved performance and scalability.

For existing clusters that upgraded to OpenShift Origin 3.6, however, the etcd data must be migrated from v2 to v3 as a post-upgrade step. This must be performed using openshift-ansible version 3.6.173.0.21 or later.

Until OpenShift Origin 3.6, it was possible to deploy a cluster with an embedded etcd. As of OpenShift Origin 3.7, this is no longer possible. See Migrating Embedded etcd to External etcd.

The etcd v2 to v3 data migration is performed as an offline migration which means all etcd members and master services are stopped during the migration. Large clusters with up to 600MiB of etcd data can expect a 10 to 15 minute outage of the API, web console, and controllers.

This migration process performs the following steps:

  1. Stop the master API and controller services.

  2. Perform an etcd backup on all etcd members.

  3. Perform a migration on the first etcd host

  4. Remove etcd data from any remaining etcd hosts.

  5. Perform an etcd scaleup operation adding additional etcd hosts one by one.

  6. Re-introduce TTL information on specific keys.

  7. Reconfigure the masters for etcd v3 storage.

  8. Start the master API and controller services.

Before You Begin

You can only begin the etcd data migration process after upgrading to OpenShift Origin 3.6, as previous versions are not compatible with etcd v3 storage. Additionally, the upgrade to OpenShift Origin 3.6 reconfigures cluster DNS services to run on every node, rather than on the masters, which ensures that, even when master services are taken down, existing pods continue to function as expected.

Older deployments with embedded etcd with the etcd API version v2 need to migrate to the external etcd before migrating data. See Migrating Embedded etcd to External etcd.

Running the Automated Migration Playbook

A migration playbook is provided to automate all aspects of the process; this is the preferred method for performing the migration. You must have access to your existing inventory file with both masters and etcd hosts defined in their separate groups.

  1. Pull the latest subscription data from Red Hat Subscription Manager (RHSM):

    # subscription-manager refresh
  2. To get the latest playbooks, manually disable the OpenShift Origin 3.6 channel and enable the 3.7 channel on the host you are running the migration from:

    # subscription-manager repos --disable="rhel-7-server-ose-3.6-rpms" \
        --enable="rhel-7-server-ose-3.7-rpms" \
        --enable="rhel-7-server-extras-rpms" \
        --enable="rhel-7-fast-datapath-rpms"
    # yum clean all
  3. Run the migrate.yml playbook using your inventory file:

    # ansible-playbook [-i /path/to/inventory] \
        ~/openshift-ansible/playbooks/openshift-etcd/migrate.yml

Running the Migration Manually

The following procedure describes the steps required to successfully migrate the cluster (implemented as part of the Ansible etcd migration playbook).

  1. Create an etcd backup. See Backup and Restore for steps.

  2. Stop masters and wait for etcd convergence:

    1. Stop all master services:

      # systemctl stop atomic-openshift-master-api atomic-openshift-master-controllers
    2. Before the migration can proceed, the etcd cluster must be healthy and raft indices of all etcd members must differ by one unit at most. At the same time, all etcd members and master daemons must be stopped.

      To check the etcd cluster is healthy you can run:

      # etcdctl <certificate_details> <endpoint> cluster-health (1)
      member 2a3d833935d9d076 is healthy: got healthy result from https://etcd-test-1:2379
      member a83a3258059fee18 is healthy: got healthy result from https://etcd-test-2:2379
      member 22a9f2ddf18fee5f is healthy: got healthy result from https://etcd-test-3:2379
      cluster is healthy
      1 For <certificate_details>, see Backup and Restore for an example of how to set certificate flags.

      To check a difference of raft indices you can run:

      # ETCDCTL_API=3 etcdctl <certificate_details> <endpoint> -w table endpoint status
      +------------------+------------------+---------+---------+-----------+-----------+------------+
      |     ENDPOINT     |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
      +------------------+------------------+---------+---------+-----------+-----------+------------+
      | etcd-test-1:2379 | 2a3d833935d9d076 | 3.1.9   | 25 kB   | false     |       415 |        995 |
      | etcd-test-2:2379 | a83a3258059fee18 | 3.1.9   | 25 kB   | true      |       415 |        995 |
      | etcd-test-3:2379 | 22a9f2ddf18fee5f | 3.1.9   | 25 kB   | false     |       415 |        995 |
      +------------------+------------------+---------+---------+-----------+-----------+------------+

      If the minimum and maximum of raft indexes over all etcd members differ by more than one unit, wait a minute and try the command again.

  3. Migrate and scale up etcd:

    The migration should not be run repeatedly, as new v2 data can overwrite v3 data that has already migrated.

    1. Stop etcd on all etcd hosts:

      # systemctl stop etcd
    2. Run the following command (with the etcd daemon stopped) on your first etcd host to perform the migration:

      # ETCDCTL_API=3 etcdctl migrate --data-dir=/var/lib/etcd

      The --data-dir target can in a different location depending on the deployment. For example, embedded etcd operates over the /var/lib/origin/openshift.local.etcd directory, and etcd run as a system container operates over the /var/lib/etcd/etcd.etcd directory.

      When complete, the migration responds with the following message if successful:

      finished transforming keys

      If there is no v2 data, it responds with:

      no v2 keys to migrate
    3. On each remaining etcd host, move the existing member directory to a backup location:

      $ mv /var/lib/etcd/member /var/lib/etc/member.old
    4. Create a new cluster on the first host:

      # echo "ETCD_FORCE_NEW_CLUSTER=true" >> /etc/etcd/etcd.conf
      # systemctl start etcd
      # sed -i '/ETCD_FORCE_NEW_CLUSTER=true/d' /etc/etcd/etcd.conf
      # systemctl restart etcd
    5. Scale up additional etcd hosts by following the Adding Additional etcd Members documentation.

    6. When the etcdctl migrate command is run without the --no-ttl option, TTL keys are migrated as well. Given that the TTL keys in v2 data are replaced with leases in v3 data, you must attach leases to all migrated TTL keys (with the etcd daemon running).

      After your etcd cluster is back online with all members, re-introduce the TTL information by running the following on the first master:

      $ oadm migrate etcd-ttl --etcd-address=https://<ip_address>:2379 \
          --cacert=/etc/origin/master/master.etcd-ca.crt \
          --cert=/etc/origin/master/master.etcd-client.crt \
          --key=/etc/origin/master/master.etcd-client.key \
          --ttl-keys-prefix '/kubernetes.io/events' \
          --lease-duration 1h
      $ oadm migrate etcd-ttl --etcd-address=https://<ip_address>:2379 \
          --cacert=/etc/origin/master/master.etcd-ca.crt \
          --cert=/etc/origin/master/master.etcd-client.crt \
          --key=/etc/origin/master/master.etcd-client.key \
          --ttl-keys-prefix '/kubernetes.io/masterleases' \
          --lease-duration 10s
      $ oadm migrate etcd-ttl --etcd-address=https://<ip_address>:2379 \
          --cacert=/etc/origin/master/master.etcd-ca.crt \
          --cert=/etc/origin/master/master.etcd-client.crt \
          --key=/etc/origin/master/master.etcd-client.key \
          --ttl-keys-prefix '/openshift.io/oauth/accesstokens' \
          --lease-duration 86400s
      $ oadm migrate etcd-ttl --etcd-address=https://<ip_address>:2379 \
          --cacert=/etc/origin/master/master.etcd-ca.crt \
          --cert=/etc/origin/master/master.etcd-client.crt \
          --key=/etc/origin/master/master.etcd-client.key \
          --ttl-keys-prefix '/openshift.io/oauth/authorizetokens' \
          --lease-duration 500s
      $ oadm migrate etcd-ttl --etcd-address=https://<ip_address>:2379 \
          --cacert=/etc/origin/master/master.etcd-ca.crt \
          --cert=/etc/origin/master/master.etcd-client.crt \
          --key=/etc/origin/master/master.etcd-client.key \
          --ttl-keys-prefix '/openshift.io/leases/controllers' \
          --lease-duration 10s
  4. Reconfigure the master:

    1. After the migration is complete, the master configuration file (the /etc/origin/master/master-config.yaml file by default) must be updated so the master daemons can use the new storage back end:

      kubernetesMasterConfig:
        apiServerArguments:
          storage-backend:
          - etcd3
          storage-media-type:
          - application/vnd.kubernetes.protobuf
    2. Restart your services, run:

      # systemctl restart atomic-openshift-master-api atomic-openshift-master-controllers

Recovering from Migration Issues

If you discover problems after the migration has completed, you may wish to restore from a backup:

  1. Stop the master services:

    # systemctl stop atomic-openshift-master-api atomic-openshift-master-controllers
  2. Remove the storage-backend and storage-media-type keys from from kubernetesMasterConfig.apiServerArguments section in the master configuration file on each master:

    kubernetesMasterConfig:
      apiServerArguments:
       ...
  3. Restore from backups that were taken prior to the migration, located in a timestamped directory under /var/lib/etcd, such as:

    /var/lib/etcd/openshift-backup-pre-migration20170825135732
  4. Restart master services; run:

    # systemctl restart atomic-openshift-master-api atomic-openshift-master-controllers