For KVM/libvirt compute node recovery refer to section above, while the guide below may be applicable for other hypervisors.
The first step is to identify the vms on the affected hosts, using tools such as
a combination of nova list
and nova show
or
euca-describe-instances
. Here's an example using the EC2 API
- instance i-000015b9 that is running on node np-rcc54:
i-000015b9 at3-ui02 running nectarkey (376, np-rcc54) 0 m1.xxlarge 2012-06-19T00:48:11.000Z 115.146.93.60
First, you can review the status of the host using the nova database, some of the important information is highlighted below. This example converts an EC2 API instance ID into an openstack ID - if you used the nova
commands, you can substitute the ID directly. You can find the credentials for your database in
/etc/nova.conf
.
SELECT * FROM instances WHERE id = CONV('15b9', 16, 10) \G; *************************** 1. row *************************** created_at: 2012-06-19 00:48:11 updated_at: 2012-07-03 00:35:11 deleted_at: NULL ... id: 5561 ... power_state: 5 vm_state: shutoff ... hostname: at3-ui02 host: np-rcc54 ... uuid: 3f57699a-e773-4650-a443-b4b37eed5a06 ... task_state: NULL ...
Armed with the information of VMs on the failed host, determine which compute host the affected VMs should be moved to. In this case, the VM will move to np-rcc46, which is achieved using this database command:
UPDATE instances SET host = 'np-rcc46' WHERE uuid = '3f57699a-e773-4650-a443-b4b37eed5a06';
Next, if using a hypervisor that relies on libvirt (such as KVM)
it is a good idea to update the libvirt.xml
file
(found in /var/lib/nova/instances/[instance ID]
).
The important changes to make are to change the DHCPSERVER
value to the host ip address of the nova compute host that is the VMs new home, and
update the VNC IP if it isn't already 0.0.0.0
.
Next, reboot the VM:
$ nova reboot --hard 3f57699a-e773-4650-a443-b4b37eed5a06
In theory, the above database update and nova reboot
command
are all that is required to recover the VMs from a failed host.
However, if further problems occur, consider looking at recreating the
network filter configuration using virsh
, restarting the
nova services or updating the vm_state
and
power_state
in the nova database.