19.9 Using Vinum for the Root Filesystem

For a machine that has fully-mirrored filesystems using Vinum, it is desirable to also mirror the root filesystem. Setting up such a configuration is less trivial than mirroring an arbitrary filesystem because:

In the following sections, the term “root volume” is generally used to describe the Vinum volume that contains the root filesystem. It is probably a good idea to use the name "root" for this volume, but this is not technically required in any way. All command examples in the following sections assume this name though.

19.9.1 Starting up Vinum Early Enough for the Root Filesystem

There are several measures to take for this to happen:

19.9.2 Making a Vinum-based Root Volume Accessible to the Bootstrap

Since the current FreeBSD bootstrap is only 7.5 KB of code, and already has the burden of reading files (like /boot/loader) from the UFS filesystem, it is sheer impossible to also teach it about internal Vinum structures so it could parse the Vinum configuration data, and figure out about the elements of a boot volume itself. Thus, some tricks are necessary to provide the bootstrap code with the illusion of a standard "a" partition that contains the root filesystem.

For this to be possible at all, the following requirements must be met for the root volume:

Note that it is desirable and possible that there are multiple plexes, each containing one replica of the root filesystem. The bootstrap process will, however, only use one of these replica for finding the bootstrap and all the files, until the kernel will eventually mount the root filesystem itself. Each single subdisk within these plexes will then need its own "a" partition illusion, for the respective device to become bootable. It is not strictly needed that each of these faked "a" partitions is located at the same offset within its device, compared with other devices containing plexes of the root volume. However, it is probably a good idea to create the Vinum volumes that way so the resulting mirrored devices are symmetric, to avoid confusion.

In order to set up these "a" partitions, for each device containing part of the root volume, the following needs to be done:

  1. The location (offset from the beginning of the device) and size of this device's subdisk that is part of the root volume need to be examined, using the command:

    # gvinum l -rv root
    

    Note that Vinum offsets and sizes are measured in bytes. They must be divided by 512 in order to obtain the block numbers that are to be used in the bsdlabel command.

  2. Run the command:

    # bsdlabel -e devname
    

    for each device that participates in the root volume. devname must be either the name of the disk (like da0) for disks without a slice (aka. fdisk) table, or the name of the slice (like ad0s1).

    If there is already an "a" partition on the device (presumably, containing a pre-Vinum root filesystem), it should be renamed to something else, so it remains accessible (just in case), but will no longer be used by default to bootstrap the system. Note that active partitions (like a root filesystem currently mounted) cannot be renamed, so this must be executed either when being booted from a “Fixit” medium, or in a two-step process, where (in a mirrored situation) the disk that has not been currently booted is being manipulated first.

    Then, the offset the Vinum partition on this device (if any) must be added to the offset of the respective root volume subdisk on this device. The resulting value will become the "offset" value for the new "a" partition. The "size" value for this partition can be taken verbatim from the calculation above. The "fstype" should be 4.2BSD. The "fsize", "bsize", and "cpg" values should best be chosen to match the actual filesystem, though they are fairly unimportant within this context.

    That way, a new "a" partition will be established that overlaps the Vinum partition on this device. Note that the bsdlabel will only allow for this overlap if the Vinum partition has properly been marked using the "vinum" fstype.

  3. That's all! A faked "a" partition does exist now on each device that has one replica of the root volume. It is highly recommendable to verify the result again, using a command like:

    # fsck -n /dev/devnamea
    

It should be remembered that all files containing control information must be relative to the root filesystem in the Vinum volume which, when setting up a new Vinum root volume, might not match the root filesystem that is currently active. So in particular, the files /etc/fstab and /boot/loader.conf need to be taken care of.

At next reboot, the bootstrap should figure out the appropriate control information from the new Vinum-based root filesystem, and act accordingly. At the end of the kernel initialization process, after all devices have been announced, the prominent notice that shows the success of this setup is a message like:

Mounting root from ufs:/dev/gvinum/root

19.9.3 Example of a Vinum-based Root Setup

After the Vinum root volume has been set up, the output of gvinum l -rv root could look like:

...
Subdisk root.p0.s0:
        Size:        125829120 bytes (120 MB)
        State: up
        Plex root.p0 at offset 0 (0  B)
        Drive disk0 (/dev/da0h) at offset 135680 (132 kB)

Subdisk root.p1.s0:
        Size:        125829120 bytes (120 MB)
        State: up
        Plex root.p1 at offset 0 (0  B)
        Drive disk1 (/dev/da1h) at offset 135680 (132 kB)
   

The values to note are 135680 for the offset (relative to partition /dev/da0h). This translates to 265 512-byte disk blocks in bsdlabel's terms. Likewise, the size of this root volume is 245760 512-byte blocks. /dev/da1h, containing the second replica of this root volume, has a symmetric setup.

The bsdlabel for these devices might look like:

...
8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
  a:   245760      281    4.2BSD     2048 16384     0   # (Cyl.    0*- 15*)
  c: 71771688        0    unused        0     0         # (Cyl.    0 - 4467*)
  h: 71771672       16     vinum                        # (Cyl.    0*- 4467*)
   

It can be observed that the "size" parameter for the faked "a" partition matches the value outlined above, while the "offset" parameter is the sum of the offset within the Vinum partition "h", and the offset of this partition within the device (or slice). This is a typical setup that is necessary to avoid the problem described in Section 19.9.4.3. It can also be seen that the entire "a" partition is completely within the "h" partition containing all the Vinum data for this device.

Note that in the above example, the entire device is dedicated to Vinum, and there is no leftover pre-Vinum root partition, since this has been a newly set-up disk that was only meant to be part of a Vinum configuration, ever.

19.9.4 Troubleshooting

If something goes wrong, a way is needed to recover from the situation. The following list contains few known pitfalls and solutions.

19.9.4.1 System Bootstrap Loads, but System Does Not Boot

If for any reason the system does not continue to boot, the bootstrap can be interrupted with by pressing the space key at the 10-seconds warning. The loader variables (like vinum.autostart) can be examined using the show, and manipulated using set or unset commands.

If the only problem was that the Vinum kernel module was not yet in the list of modules to load automatically, a simple load geom_vinum will help.

When ready, the boot process can be continued with a boot -as. The options -as will request the kernel to ask for the root filesystem to mount (-a), and make the boot process stop in single-user mode (-s), where the root filesystem is mounted read-only. That way, even if only one plex of a multi-plex volume has been mounted, no data inconsistency between plexes is being risked.

At the prompt asking for a root filesystem to mount, any device that contains a valid root filesystem can be entered. If /etc/fstab had been set up correctly, the default should be something like ufs:/dev/gvinum/root. A typical alternate choice would be something like ufs:da0d which could be a hypothetical partition that contains the pre-Vinum root filesystem. Care should be taken if one of the alias "a" partitions are entered here that are actually reference to the subdisks of the Vinum root device, because in a mirrored setup, this would only mount one piece of a mirrored root device. If this filesystem is to be mounted read-write later on, it is necessary to remove the other plex(es) of the Vinum root volume since these plexes would otherwise carry inconsistent data.

19.9.4.2 Only Primary Bootstrap Loads

If /boot/loader fails to load, but the primary bootstrap still loads (visible by a single dash in the left column of the screen right after the boot process starts), an attempt can be made to interrupt the primary bootstrap at this point, using the space key. This will make the bootstrap stop in stage two, see Section 12.3.2. An attempt can be made here to boot off an alternate partition, like the partition containing the previous root filesystem that has been moved away from "a" above.

19.9.4.3 Nothing Boots, the Bootstrap Panics

This situation will happen if the bootstrap had been destroyed by the Vinum installation. Unfortunately, Vinum accidentally currently leaves only 4 KB at the beginning of its partition free before starting to write its Vinum header information. However, the stage one and two bootstraps plus the bsdlabel embedded between them currently require 8 KB. So if a Vinum partition was started at offset 0 within a slice or disk that was meant to be bootable, the Vinum setup will trash the bootstrap.

Similarly, if the above situation has been recovered, for example by booting from a “Fixit” medium, and the bootstrap has been re-installed using bsdlabel -B as described in Section 12.3.2, the bootstrap will trash the Vinum header, and Vinum will no longer find its disk(s). Though no actual Vinum configuration data or data in Vinum volumes will be trashed by this, and it would be possible to recover all the data by entering exact the same Vinum configuration data again, the situation is hard to fix at all. It would be necessary to move the entire Vinum partition by at least 4 KB off, in order to have the Vinum header and the system bootstrap no longer collide.

19.9.5 Differences for FreeBSD 4.X

Under FreeBSD 4.X, some internal functions required to make Vinum automatically scan all disks are missing, and the code that figures out the internal ID of the root device is not smart enough to handle a name like /dev/vinum/root automatically. Therefore, things are a little different here.

Vinum must explicitly be told which disks to scan, using a line like the following one in /boot/loader.conf:

vinum.drives="/dev/da0 /dev/da1"

It is important that all drives are mentioned that could possibly contain Vinum data. It does not harm if more drives are listed, nor is it necessary to add each slice and/or partition explicitly, since Vinum will scan all slices and partitions of the named drives for valid Vinum headers.

Since the routines used to parse the name of the root filesystem, and derive the device ID (major/minor number) are only prepared to handle “classical” device names like /dev/ad0s1a, they cannot make any sense out of a root volume name like /dev/vinum/root. For that reason, Vinum itself needs to pre-setup the internal kernel parameter that holds the ID of the root device during its own initialization. This is requested by passing the name of the root volume in the loader variable vinum.root. The entry in /boot/loader.conf to accomplish this looks like:

vinum.root="root"

Now, when the kernel initialization tries to find out the root device to mount, it sees whether some kernel module has already pre-initialized the kernel parameter for it. If that is the case, and the device claiming the root device matches the major number of the driver as figured out from the name of the root device string being passed (that is, "vinum" in our case), it will use the pre-allocated device ID, instead of trying to figure out one itself. That way, during the usual automatic startup, it can continue to mount the Vinum root volume for the root filesystem.

However, when boot -a has been requesting to ask for entering the name of the root device manually, it must be noted that this routine still cannot actually parse a name entered there that refers to a Vinum volume. If any device name is entered that does not refer to a Vinum device, the mismatch between the major numbers of the pre-allocated root parameter and the driver as figured out from the given name will make this routine enter its normal parser, so entering a string like ufs:da0d will work as expected. Note that if this fails, it is however no longer possible to re-enter a string like ufs:vinum/root again, since it cannot be parsed. The only way out is to reboot again, and start over then. (At the “askroot” prompt, the initial /dev/ can always be omitted.)

This, and other documents, can be downloaded from ftp://ftp.FreeBSD.org/pub/FreeBSD/doc/.

For questions about FreeBSD, read the documentation before contacting <[email protected]>.
For questions about this documentation, e-mail <[email protected]>.