Sometimes, things just don't go right. An incident is never planned, by its definition.
In this section, we will review managing your cloud after a disaster, and how to easily backup the persistent storage volumes, which is another approach when you face a disaster. Even apart from the disaster scenario, backup ARE mandatory.
For reference, you cand find a DRP definition here : http://en.wikipedia.org/wiki/Disaster_Recovery_Plan.
A disaster could happen to several components of your architecture : a disk crash, a network loss, a power cut, etc. In this example, we suppose the following setup :
A cloud controller (nova-api, nova-objecstore, nova-network)
A compute node (nova-compute)
A Storage Area Network used by cinder-volumes (aka SAN)
The example disaster will be the worst one : a power loss. That power loss applies to the three components. Let's see what runs and how it runs before the crash :
From the SAN to the cloud controller, we have an active iscsi session (used for the "cinder-volumes" LVM's VG).
From the cloud controller to the compute node we also have active iscsi sessions (managed by cinder-volume).
For every volume an iscsi session is made (so 14 ebs volumes equals 14 sessions).
From the cloud controller to the compute node, we also have iptables/ ebtables rules which allows the access from the cloud controller to the running instance.
And at least, from the cloud controller to the compute node ; saved into database, the current state of the instances (in that case "running" ), and their volumes attachment (mountpoint, volume id, volume status, etc..)
Now, after the power loss occurs and all hardware components restart, the situation is as follows:
From the SAN to the cloud, the ISCSI session no longer exists.
From the cloud controller to the compute node, the ISCSI sessions no longer exist.
From the cloud controller to the compute node, the iptables and ebtables are recreated, since, at boot, nova-network reapply the configurations.
From the cloud controller, instances turn into a shutdown state (because they are no longer running)
Into the database, data was not updated at all, since nova could not have guessed the crash.
Before going further, and in order to prevent the admin to make fatal mistakes, the instances won't be lost, since no "destroy" or "terminate" command had been invoked, so the files for the instances remain on the compute node.
The plan is to perform the following tasks, in that exact order. Any extra step would be dangerous at this stage :
We need to get the current relation from a volume to its instance, since we will recreate the attachment.
We need to update the database in order to clean the stalled state. (After that, we won't be able to perform the first step).
We need to restart the instances (so go from a "shutdown" to a "running" state).
After the restart, we can reattach the volumes to their respective instances.
That step, which is not a mandatory one, exists in an SSH into the instances in order to reboot them.
Instance to Volume relation
We need to get the current relation from a volume to its instance, since we will recreate the attachment :
This relation could be figured by running nova volume-list (note that nova client includes ability to get volume info from cinder)
Database Update
Second, we need to update the database in order to clean the stalled state. Now that we have saved the attachments we need to restore for every volume, the database can be cleaned with the following queries:
mysql> use cinder; mysql> update volumes set mountpoint=NULL; mysql> update volumes set status="available" where status <>"error_deleting"; mysql> update volumes set attach_status="detached"; mysql> update volumes set instance_id=0;
Now, when running nova volume-list all volumes should be available.
Instances Restart
We need to restart the instances. This can be done via a simple nova reboot
$instance
At that stage, depending on your image, some instances will completely reboot and become reachable, while others will stop on the "plymouth" stage.
DO NOT reboot a second time the ones which are stopped at that stage (see below, the fourth step). In fact it depends on whether you added an
/etc/fstab
entry for that volume or not. Images built with the cloud-init package will remain on a pending state, while others will skip the missing volume and start. (More information is available on help.ubuntu.com) But remember that the idea of that stage is only to ask nova to reboot every instance, so the stored state is preserved.Volume Attachment
After the restart, we can reattach the volumes to their respective instances. Now that nova has restored the right status, it is time to perform the attachments via a nova volume-attach
Here is a simple snippet that uses the file we created :
#!/bin/bash while read line; do volume=`echo $line | $CUT -f 1 -d " "` instance=`echo $line | $CUT -f 2 -d " "` mount_point=`echo $line | $CUT -f 3 -d " "` echo "ATTACHING VOLUME FOR INSTANCE - $instance" nova volume-attach $instance $volume $mount_point sleep 2 done < $volumes_tmp_file
At that stage, instances which were pending on the boot sequence (plymouth) will automatically continue their boot, and restart normally, while the ones which booted will see the volume.
SSH into instances
If some services depend on the volume, or if a volume has an entry into fstab, it could be good to simply restart the instance. This restart needs to be made from the instance itself, not via nova. So, we SSH into the instance and perform a reboot :
$ shutdown -r now
Voila! You successfully recovered your cloud after that.
Here are some suggestions :
Use the
parameter errors=remount
in thefstab
file, which will prevent data corruption.The system would lock any write to the disk if it detects an I/O error. This configuration option should be added into the cinder-volume server (the one which performs the ISCSI connection to the SAN), but also into the instances'
fstab
file.Do not add the entry for the SAN's disks to the cinder-volume's
fstab
file.Some systems will hang on that step, which means you could lose access to your cloud-controller. In order to re-run the session manually, you would run the following command before performing the mount:
# iscsiadm -m discovery -t st -p $SAN_IP $ iscsiadm -m node --target-name $IQN -p $SAN_IP -l
For your instances, if you have the whole
/home/
directory on the disk, instead of emptying the/home
directory and map the disk on it, leave a user's directory with the user's bash files and theauthorized_keys
file.This will allow you to connect to the instance, even without the volume attached, if you allow only connections via public keys.
You can download from here a bash script which performs these five steps :
The "test mode" allows you to perform that whole sequence for only one instance.
To reproduce the power loss, connect to the compute node which runs that same instance and close the iscsi session. Do not dettach the volume via nova volume-detach, but instead manually close the iscsi session.
In the following example, the iscsi session is number 15 for that instance :
$ iscsiadm -m session -u -r 15
Do not forget the flag
-r
; otherwise, you will
close ALL sessions.