To recover a KVM/libvirt compute node, see the previous section. Use the following procedure for all other hypervisors.
Procedure 4.7. Review host information
Identify the VMs on the affected hosts, using tools such as a combination of
nova list
andnova show
oreuca-describe-instances
. For example, the following output displays information about instancei-000015b9
that is running on nodenp-rcc54
:$ euca-describe-instances i-000015b9 at3-ui02 running nectarkey (376, np-rcc54) 0 m1.xxlarge 2012-06-19T00:48:11.000Z 115.146.93.60
Review the status of the host by querying the Compute database. Some of the important information is highlighted below. The following example converts an EC2 API instance ID into an OpenStack ID; if you used the
nova
commands, you can substitute the ID directly. You can find the credentials for your database in/etc/nova.conf
.mysql> SELECT * FROM instances WHERE id = CONV('15b9', 16, 10) \G; *************************** 1. row *************************** created_at: 2012-06-19 00:48:11 updated_at: 2012-07-03 00:35:11 deleted_at: NULL ... id: 5561 ... power_state: 5 vm_state: shutoff ... hostname: at3-ui02 host: np-rcc54 ... uuid: 3f57699a-e773-4650-a443-b4b37eed5a06 ... task_state: NULL ...
Procedure 4.8. Recover the VM
After you have determined the status of the VM on the failed host, decide to which compute host the affected VM should be moved. For example, run the following database command to move the VM to
np-rcc46
:mysql> UPDATE instances SET host = 'np-rcc46' WHERE uuid = '3f57699a-e773-4650-a443-b4b37eed5a06';
If using a hypervisor that relies on libvirt (such as KVM), it is a good idea to update the
libvirt.xml
file (found in/var/lib/nova/instances/[instance ID]
). The important changes to make are:Change the
DHCPSERVER
value to the host IP address of the compute host that is now the VM's new home.Update the VNC IP, if it isn't already updated, to:
0.0.0.0
.
Reboot the VM:
$ nova reboot --hard 3f57699a-e773-4650-a443-b4b37eed5a06
In theory, the above database update and nova
reboot
command are all that is required to recover a VM from a
failed host. However, if further problems occur, consider looking at
recreating the network filter configuration using virsh
,
restarting the Compute services or updating the vm_state
and power_state
in the Compute database.