In order to address these problems, Vinum implements a four-level hierarchy of objects:
The most visible object is the virtual disk, called a volume. Volumes have essentially the same properties as a UNIX® disk drive, though there are some minor differences. They have no size limitations.
Volumes are composed of plexes, each of which represent the total address space of a volume. This level in the hierarchy thus provides redundancy. Think of plexes as individual disks in a mirrored array, each containing the same data.
Since Vinum exists within the UNIX disk storage framework, it would be possible to use UNIX partitions as the building block for multi-disk plexes, but in fact this turns out to be too inflexible: UNIX disks can have only a limited number of partitions. Instead, Vinum subdivides a single UNIX partition (the drive) into contiguous areas called subdisks, which it uses as building blocks for plexes.
Subdisks reside on Vinum drives, currently UNIX partitions. Vinum drives can contain any number of subdisks. With the exception of a small area at the beginning of the drive, which is used for storing configuration and state information, the entire drive is available for data storage.
The following sections describe the way these objects provide the functionality required of Vinum.
Plexes can include multiple subdisks spread over all drives in the Vinum configuration. As a result, the size of an individual drive does not limit the size of a plex, and thus of a volume.
Vinum implements mirroring by attaching multiple plexes to a volume. Each plex is a representation of the data in a volume. A volume may contain between one and eight plexes.
Although a plex represents the complete data of a volume, it is possible for parts of the representation to be physically missing, either by design (by not defining a subdisk for parts of the plex) or by accident (as a result of the failure of a drive). As long as at least one plex can provide the data for the complete address range of the volume, the volume is fully functional.
Vinum implements both concatenation and striping at the plex level:
A concatenated plex uses the address space of each subdisk in turn.
A striped plex stripes the data across each subdisk. The subdisks must all have the same size, and there must be at least two subdisks in order to distinguish it from a concatenated plex.
The version of Vinum supplied with FreeBSD 5.1 implements two kinds of plex:
Concatenated plexes are the most flexible: they can contain any number of subdisks, and the subdisks may be of different length. The plex may be extended by adding additional subdisks. They require less CPU time than striped plexes, though the difference in CPU overhead is not measurable. On the other hand, they are most susceptible to hot spots, where one disk is very active and others are idle.
The greatest advantage of striped (RAID-0) plexes is that they reduce hot spots: by choosing an optimum sized stripe (about 256 kB), you can even out the load on the component drives. The disadvantages of this approach are (fractionally) more complex code and restrictions on subdisks: they must be all the same size, and extending a plex by adding new subdisks is so complicated that Vinum currently does not implement it. Vinum imposes an additional, trivial restriction: a striped plex must have at least two subdisks, since otherwise it is indistinguishable from a concatenated plex.
Table 13-1 summarizes the advantages and disadvantages of each plex organization.
This, and other documents, can be downloaded from ftp://ftp.FreeBSD.org/pub/FreeBSD/doc/.
For questions about FreeBSD, read the documentation before contacting <[email protected]>.
For questions about this documentation, e-mail <[email protected]>.