Subsections


Disaster Recovery Using Bacula

General

When disaster strikes, you must have a plan, and you must have prepared in advance otherwise the work of recovering your system and your files will be considerably greater. For example, if you have not previously saved the partitioning information for your hard disk, how can you properly rebuild it if the disk must be replaced?

Unfortunately, many of the steps one must take before and immediately after a disaster are very operating system dependent. As a consequence, this chapter will discuss in detail disaster recovery (also called Bare Metal Recovery) for Linux and Solaris. For Solaris, the procedures are still quite manual. For FreeBSD the same procedures may be used but they are not yet developed. For Win32, a number of Bacula users have reported success using BartPE.

Important Considerations

Here are a few important considerations concerning disaster recovery that you should take into account before a disaster strikes.

Steps to Take Before Disaster Strikes

Bare Metal Recovery on Linux with a Rescue CD

As an alternative to creating a Rescue CD, please see the section below entitled Bare Metal Recovery using a LiveCD.

Bacula previously had a Rescue CD. Unfortunately, this CD did not work on every Linux Distro, and in addition, Linux is evolving with different boot methods, more and more complex hardware configurations (LVM, RAID, WiFi, USB, ...). As a consequence, the Bacula Rescue CD as it was originally envisioned no longer exists.

However there are many other good rescue disks available. A so called "Bare Metal" recovery is one where you start with an empty hard disk and you restore your machine. There are also cases where you may lose a file or a directory and want it restored. Please see the previous chapter for more details for those cases.

Bare Metal Recovery assumes that you have the following items for your system:

Requirements

Restoring a Client System

Now, let's assume that your hard disk has just died and that you have replaced it with an new identical drive. In addition, we assume that you have:

  1. A recent Bacula backup (Full plus Incrementals)
  2. A Rescue CDROM.
  3. Your Bacula Director, Catalog, and Storage daemon running on another machine on your local network.

This is a relatively simple case, and later in this chapter, as time permits, we will discuss how you might recover from a situation where the machine that crashes is your main Bacula server (i.e. has the Director, the Catalog, and the Storage daemon).

You will take the following steps to get your system back up and running:

  1. Boot with your Rescue CDROM.
  2. Start the Network (local network)
  3. Re-partition your hard disk(s) as it was before
  4. Re-format your partitions
  5. Restore the Bacula File daemon (static version)
  6. Perform a Bacula restore of all your files
  7. Re-install your boot loader
  8. Reboot

Now for the details ...

Boot with your Rescue CDROM

Each rescue disk boots somewhat differently. Please see the instructions that go with your CDROM.

Start the Network:

You can test it by pinging another machine, or pinging your broken machine machine from another machine. Do not proceed until your network is up.

Partition Your Hard Disk(s):

Format Your Hard Disk(s):

Mount the Newly Formatted Disks:

Somehow get the static File daemon loaded on your system

Put the static file daemon and its conf file in /tmp.

Restore and Start the File Daemon:

chroot /mnt/disk /tmp/bacula-fd -c /tmp/bacula-fd.conf

The above command starts the Bacula File daemon with the proper root disk location (i.e. /mnt/disk/tmp. If Bacula does not start, correct the problem and start it. You can check if it is running by entering:

ps fax

You can kill Bacula by entering:

kill -TERM <pid>

where pid is the first number printed in front of the first occurrence of bacula-fd in the ps fax command.

Now, you should be able to use another computer with Bacula installed to check the status by entering:

status client=xxxx

into the Console program, where xxxx is the name of the client you are restoring.

One common problem is that your bacula-dir.conf may contain machine addresses that are not properly resolved on the stripped down system to be restored because it is not running DNS. This is particularly true for the address in the Storage resource of the Director, which may be very well resolved on the Director's machine, but not on the machine being restored and running the File daemon. In that case, be prepared to edit bacula-dir.conf to replace the name of the Storage daemon's domain name with its IP address.

Restore Your Files:

On the computer that is running the Director, you now run a restore command and select the files to be restored (normally everything), but before starting the restore, there is one final change you must make using the mod option. You must change the Where directory to be the root by using the mod option just before running the job and selecting Where. Set it to:

/

then run the restore.

You might be tempted to avoid using chroot and running Bacula directly and then using a Where to specify a destination of /mnt/disk. This is possible, however, the current version of Bacula always restores files to the new location, and thus any soft links that have been specified with absolute paths will end up with /mnt/disk prefixed to them. In general this is not fatal to getting your system running, but be aware that you will have to fix these links if you do not use chroot.

Final Step:

/sbin/grub-install --root-directory=/mnt/disk /dev/hda

Note, in this case, you omit the chroot command, and you must replace /dev/hda with your boot device. If you don't know what your boot device is, run the ./run_grub script once and it will tell you.

Finally, I've even run into a case where grub-install was unable to rewrite the boot block. In my case, it produced the following error message:

/dev/hdx does not have any corresponding BIOS drive.

The solution is to insure that all your disks are properly mounted on /mnt/disk, then do the following:

chroot /mnt/disk
mount /dev/pts

Then edit the file /boot/grub/grub.conf and uncomment the line that reads:

#boot=/dev/hda

So that it reads:

boot=/dev/hda

Note, the /dev/hda may be /dev/sda or possibly some other drive depending on your configuration, but in any case, it is the same as the one that you previously tried with grub-install.

Then, enter the following commands:

grub --batch --device-map=/boot/grub/device.map \
  --config-file=/boot/grub/grub.conf --no-floppy
root (hd0,0)
setup (hd0)
quit

If the grub call worked, you will get a prompt of grub> before the root, setup, and quit commands, and after entering the setup command, it should indicate that it successfully wrote the MBR (master boot record).

Reboot:

First unmount all your hard disks, otherwise they will not be cleanly shutdown, then reboot your machine by entering exit until you get to the main prompt then enter Ctrl-d. Once back to the main CDROM prompt, you will need to turn the power off, then back on to your machine to get it to reboot.

If everything went well, you should now be back up and running. If not, re-insert the emergency boot CDROM, boot, and figure out what is wrong.

Restoring a Server

Above, we considered how to recover a client machine where a valid Bacula server was running on another machine. However, what happens if your server goes down and you no longer have a running Director, Catalog, or Storage daemon? There are several solutions:

  1. Bring up static versions of your Director, Catalog, and Storage daemon on the damaged machine.

  2. Move your server to another machine.

  3. Use a Hot Spare Server on another Machine.

The first option, is very difficult because it requires you to have created a static version of the Director and the Storage daemon as well as the Catalog. If the Catalog uses MySQL or PostgreSQL, this may or may not be possible. In addition, to loading all these programs on a bare system (quite possible), you will need to make sure you have a valid driver for your tape drive.

The second suggestion is probably a much simpler solution, and one I have done myself. To do so, you might want to consider the following steps:

For additional details of restoring your database, please see the Restoring When Things Go Wrong section of the Console Restore Command chapter of this manual.

Linux Problems or Bugs

Since every flavor and every release of Linux is different, there are likely to be some small difficulties with the scripts, so please be prepared to edit them in a minimal environment. A rudimentary knowledge of vi is very useful. Also, these scripts do not do everything. You will need to reformat Windows partitions by hand, for example.

Getting the boot loader back can be a problem if you are using grub because it is so complicated. If all else fails, reboot your system from your floppy but using the restored disk image, then proceed to a reinstallation of grub (looking at the run-grub script can help). By contrast, lilo is a piece of cake.

Bare Metal Recovery using a LiveCD

As an alternative to the old now defunct Bacula Rescue CDROM, you can use any system rescue or LiveCD to recover your system. The big problem with most rescue or LiveCDs is that they are not designed to capture the current state of your system, so when you boot them on a damaged system, you might be somewhat lost -- e.g. how many of you remember your exact hard disk partitioning.

This lack can be easily corrected by running the part of the Bacula Rescue code that creates a directory containing a static-bacula-fd, a snapshot of your current system disk configuration, and scripts that help restoring it.

Before a disaster strikes:

  1. Run only the make bacula part of the Bacula Rescue procedure to create the static Bacula File daemon, and system disk snapshot.
  2. Save the directory generated (more details below) preferrably on a CDROM or alternatively to some other system.
  3. Possibly run make bacula every night as part of your backup process to ensure that you have a current snapshot of your system.

Then when disaster strikes, do the following:

  1. Boot with your system rescue disk or LiveCD (e.g. Knoppix).
  2. Start the Network (local network).
  3. Copy the Bacula recovery directory to the damaged system using ftp, scp, wget or if your boot disk permits it reading it directly from a CDROM.
  4. Continue as documented above.
  5. Re-partition your hard disk(s) as it was before, if necessary.
  6. Re-format your partitions, if necessary.
  7. Restore the Bacula File daemon (static version).
  8. Perform a Bacula restore of all your files.
  9. Re-install your boot loader.
  10. Reboot.

In order to create the Bacula recovery directory, you need a copy of the Bacula Rescue code as described above, and you must first configure that directory.

Once the configuration is done, you can do the following to create the Bacula recovery directory:

cd <bacula-rescue-source>/linux/cdrom
su (become root)
make bacula

The directory you want to save will be created in the current directory with the name bacula. You need only save that directory either as a directory or possibly as a compressed tar file. If you run this procedure on multiple machines, you will probably want to rename this directory to something like bacula-hostname.

FreeBSD Bare Metal Recovery

The same basic techniques described above also apply to FreeBSD. Although we don't yet have a fully automated procedure, Alex Torres Molina has provided us with the following instructions with a few additions from Jesse Guardiani and Dan Langille:

  1. Boot with the FreeBSD installation disk
  2. Go to Custom, Partition and create your slices and go to Label and create the partitions that you want. Apply changes.
  3. Go to Fixit to start an emergency console.
  4. Create devs ad0 .. .. if they don't exist under /mnt2/dev (in my situation) with MAKEDEV. The device or devices you create depend on what hard drives you have. ad0 is your first ATA drive. da0 would by your first SCSI drive. Under OS version 5 and greater, your device files are most likely automatically created for you.
  5. mkdir /mnt/disk this is the root of the new disk
  6. mount /mnt2/dev/ad0s1a /mnt/disk mount /mnt2/dev/ad0s1c /mnt/disk/var mount /mnt2/dev/ad0s1d /mnt/disk/usr ..... The same hard drive issues as above apply here too. Note, under OS version 5 or higher, your disk devices may be in /dev not /mnt2/dev.
  7. Network configuration (ifconfig xl0 ip/mask + route add default ip-gateway)
  8. mkdir /mnt/disk/tmp
  9. cd /mnt/disk/tmp
  10. Copy bacula-fd and bacula-fd.conf to this path
  11. If you need to, use sftp to copy files, after which you must do this: ln -s /mnt2/usr/bin /usr/bin
  12. chmod u+x bacula-fd
  13. Modify bacula-fd.conf to fit this machine
  14. Copy /bin/sh to /mnt/disk, necessary for chroot
  15. Don't forget to put your bacula-dir's IP address and domain name in /mnt/disk/etc/hosts if it's not on a public net. Otherwise the FD on the machine you are restoring to won't be able to contact the SD and DIR on the remote machine.
  16. mkdir -p /mnt/disk/var/db/bacula
  17. chroot /mnt/disk /tmp/bacula-fd -c /tmp/bacula-fd.conf to start bacula-fd
  18. Now you can go to bacula-dir and restore the job with the entire contents of the broken server.
  19. You must create /proc

Solaris Bare Metal Recovery

The same basic techniques described above apply to Solaris:

However, during the recovery phase, the boot and disk preparation procedures are different:

Once the disk is partitioned, formatted and mounted, you can continue with bringing up the network and reloading Bacula.

Preparing Solaris Before a Disaster

As mentioned above, before a disaster strikes, you should prepare the information needed in the case of problems. To do so, in the rescue/solaris subdirectory enter:

su
./getdiskinfo
./make_rescue_disk

The getdiskinfo script will, as in the case of Linux described above, create a subdirectory diskinfo containing the output from several system utilities. In addition, it will contain the output from the SysAudit program as described in Curtis Preston's book. This file diskinfo/sysaudit.bsi will contain the disk partitioning information that will allow you to manually follow the procedures in the "Unix Backup & Recovery" book to repartition and format your hard disk. In addition, the getdiskinfo script will create a start_network script.

Once you have your disks repartitioned and formatted, do the following:

Bugs and Other Considerations

Directory Modification and Access Times are Modified on pre-1.30 Baculas :

When a pre-1.30 version of Bacula restores a directory, it first must create the directory, then it populates the directory with its files and subdirectories. The act of creating the files and subdirectories updates both the modification and access times associated with the directory itself. As a consequence, all modification and access times of all directories will be updated to the time of the restore.

This has been corrected in Bacula version 1.30 and later. The directory modification and access times are reset to the value saved in the backup after all the files and subdirectories have been restored. This has been tested and verified on normal restore operations, but not verified during a bare metal recovery.

Strange Bootstrap Files:

If any of you look closely at the bootstrap file that is produced and used for the restore (I sure do), you will probably notice that the FileIndex item does not include all the files saved to the tape. This is because in some instances there are duplicates (especially in the case of an Incremental save), and in such circumstances, Bacula restores only the last of multiple copies of a file or directory.

Disaster Recovery of Win32 Systems

Due to open system files, and registry problems, Bacula cannot save and restore a complete Win2K/XP/NT environment.

A suggestion by Damian Coutts using Microsoft's NTBackup utility in conjunction with Bacula should permit a Full bare metal restore of Win2K/XP (and possibly NT systems). His suggestion is to do an NTBackup of the critical system state prior to running a Bacula backup with the following command:

ntbackup backup systemstate /F c:\systemstate.bkf

The backup is the command, the systemstate says to backup only the system state and not all the user files, and the /F c:\systemstate.bkf specifies where to write the state file. this file must then be saved and restored by Bacula. This command can be put in a Client Run Before Job directive so that it is automatically run during each backup, and thus saved to a Bacula Volume.

To restore the system state, you first reload a base operating system, then you would use Bacula to restore all the users files and to recover the c:\systemstate.bkf file, and finally, run NTBackup and catalogue the system statefile, and then select it for restore. The documentation says you can't run a command line restore of the systemstate.

This procedure has been confirmed to work by Ludovic Strappazon -- many thanks!

A new tool is provided in the form of a bacula plugin for the BartPE rescue CD. BartPE is a self-contained WindowsXP boot CD which you can make using the PeBuilder tools available at http://www.nu2.nu/pebuilder/ and a valid Windows XP SP1 CDROM. The plugin is provided as a zip archive. Unzip the file and copy the bacula directory into the plugin directory of your BartPE installation. Edit the configuration files to suit your installation and build your CD according to the instructions at Bart's site. This will permit you to boot from the cd, configure and start networking, start the bacula file client and access your director with the console program. The programs menu on the booted CD contains entries to install the file client service, start the file client service, and start the WX-Console. You can also open a command line window and CD Programs\Bacula and run the command line console bconsole.

Ownership and Permissions on Win32 Systems

Bacula versions after 1.31 should properly restore ownership and permissions on all WinNT/XP/2K systems. If you do experience problems, generally in restores to alternate directories because higher level directories were not backed up by Bacula, you can correct any problems with the SetACL available under the GPL license at: http://sourceforge.net/projects/setacl/.

Alternate Disaster Recovery Suggestion for Win32 Systems

Ludovic Strappazon has suggested an interesting way to backup and restore complete Win32 partitions. Simply boot your Win32 system with a Linux Rescue disk as described above for Linux, install a statically linked Bacula, and backup any of the raw partitions you want. Then to restore the system, you simply restore the raw partition or partitions. Here is the email that Ludovic recently sent on that subject:

I've just finished testing my brand new cd LFS/Bacula
with a raw Bacula backup and restore of my portable.
I can't resist sending you the results: look at the rates !!!
hunt-dir: Start Backup JobId 100, Job=HuntBackup.2003-04-17_12.58.26
hunt-dir: Bacula 1.30 (14Apr03): 17-Apr-2003 13:14
JobId:                  100
Job:                    HuntBackup.2003-04-17_12.58.26
FileSet:                RawPartition
Backup Level:           Full
Client:                 sauvegarde-fd
Start time:             17-Apr-2003 12:58
End time:               17-Apr-2003 13:14
Files Written:          1
Bytes Written:          10,058,586,272
Rate:                   10734.9 KB/s
Software Compression:   None
Volume names(s):        000103
Volume Session Id:      2
Volume Session Time:    1050576790
Last Volume Bytes:      10,080,883,520
FD termination status:  OK
SD termination status:  OK
Termination:            Backup OK
hunt-dir: Begin pruning Jobs.
hunt-dir: No Jobs found to prune.
hunt-dir: Begin pruning Files.
hunt-dir: No Files found to prune.
hunt-dir: End auto prune.
hunt-dir: Start Restore Job RestoreFilesHunt.2003-04-17_13.21.44
hunt-sd: Forward spacing to file 1.
hunt-dir: Bacula 1.30 (14Apr03): 17-Apr-2003 13:54
JobId:                  101
Job:                    RestoreFilesHunt.2003-04-17_13.21.44
Client:                 sauvegarde-fd
Start time:             17-Apr-2003 13:21
End time:               17-Apr-2003 13:54
Files Restored:         1
Bytes Restored:         10,056,130,560
Rate:                   5073.7 KB/s
FD termination status:  OK
Termination:            Restore OK
hunt-dir: Begin pruning Jobs.
hunt-dir: No Jobs found to prune.
hunt-dir: Begin pruning Files.
hunt-dir: No Files found to prune.
hunt-dir: End auto prune.

Restoring to a Running System

If for some reason you want to do a Full restore to a system that has a working kernel (not recommended), you will need to take care not to overwrite the following files:

/etc/grub.conf
/etc/X11/Conf
/etc/fstab
/etc/mtab
/lib/modules
/usr/modules
/usr/X11R6
/etc/modules.conf

Additional Resources

Many thanks to Charles Curley who wrote Linux Complete Backup and Recovery HOWTO for the The Linux Documentation Project. This is an excellent document on how to do Bare Metal Recovery on Linux systems, and it was this document that made me realize that Bacula could do the same thing.

You can find quite a few additional resources, both commercial and free at Storage Mountain, formerly known as Backup Central.

And finally, the O'Reilly book, "Unix Backup & Recovery" by W. Curtis Preston covers virtually every backup and recovery topic including bare metal recovery for a large range of Unix systems.

Kern Sibbald 2009-08-09