Table of Contents
This is a proposal of a mechanism for Amanda to support arbitrary backup programs, that relies on a generic backup driver and scripts or programs that interface with backup programs such as dump, tar, smbclient, and others. It can also be used to introduce pre- and post-backup commands.
The interface is simple, but supports everything that is currently supported by Amanda, and it can be consistently extended to support new abstractions that may be introduced in the backup driver in the future.
This proposal does not imply any modification in the Amanda protocol or in Amanda servers; only Amanda clients have to be modified. By Amanda clients, we refer to hosts whose disks are to be backed up; an Amanda server is a host connected to a tape unit.
Currently (as of release 2.4.1 of Amanda), Amanda clients support three operations: selfcheck, estimate and backup.
Selfcheck is used by the server program amcheck, to check whether a client is responding or if there are configuration or permission problems in the client that might prevent the backup from taking place.
Estimates are requested by the Amanda planner, that runs on the server and collects information about the expected sizes of backups of each disk at several levels. Given this information and the amount of available tape space, the planner can select which disks and which levels it should tell dumper to run.
dumper is yet another server-side program; it requests clients to perform dumps, as determined by planner, and stores these dumps in holding disks or sends them directly to the taper program. The interaction between dumper and taper is beyond the scope of this text.
We are going to focus on the interaction between the Amanda client program and wrappers of dump programs. These wrappers must implement the DUMPER API. The dumptype option `program' should name the wrapper that will be used to back up filesystems of that dumptype. One wrapper may call another, so as to extend its functionality.
Different backup programs present distinct requirements; some must be run as super-user, whereas others can be run under other user-ids. Some require a directory name, the root of the tree to be backed up; others prefer a raw device name; some don't even refer to local disks (SAMBA). Some wrappers may need to know a filesystem type in order to decide which particular backup program to use (dump, vdump, vxdump, xfsdump, backup).
Some provide special options for estimates, whereas others must be started as if a complete dump were to be performed, and must be killed as soon as they print an estimate.
Furthermore, the output formats of these backup programs vary wildly. Some will print estimates and total sizes in bytes, in 512-byte tape blocks units, in Kbytes, Mbytes, Gbytes, and possibly Tbytes in the near future. Some will print a timestamp for the backup; some won't.
There are also restrictions related with possible scheduling policies. For example, some backup programs only support full backups or incrementals based on the last full backup (0-1). Some support full backups or incrementals based on the last backup, be it a full or an incremental backup (0-inf++). Some support incrementals based on a timestamp (incr/date); whereas others are based on a limited number of incremental levels, but incrementals of the same level can be repeated, such as dump (0-9).
Amanda was originally built upon DUMP incremental levels, so this is the only model it currently supports. Backup programs that use other incremental management mechanisms had to be adapted to this policy. Wrapper scripts are responsible for this adaptation.
Another important issue has to do with index generation. Some backup programs can generate indexes, but each one lists files in its own particular format, but they must be stored in a common format, so that the Amanda server can manipulate them.
The DUMPER API must accomodate for all these variations.
We are going to define a standard format of argument lists that the backup driver will provide to wrapper programs, and the expected result of the execution of these wrappers.
The first argument to a wrapper should always be a command name. If no arguments are given, or an unsupported command is requested, an error message should be printed to stderr, and the program should terminate with exit status 1.
As a general mechanism for Amanda to probe for features provided by a backup program, a wrapper script must support at least the `support' command. Some features must be supported, and Amanda won't ever ask about them. Others will be considered as extensions, and Amanda will ask the wrapper whether they are supported before issuing the corresponding commands.
For example, before requesting for an incremental backup of a given level, Amanda should ask the wrapper whether the backup program supports level-based incrementals. We don't currently support backup programs that don't, but we may in the future, so it would be nice if wrappers already implemented the command `support level-incrementals', by returning a 0 exit status, printing, say, the maximum incremental level it supports, i.e., 9. A sample session would be:
% /usr/local/amanda/libexec/wrappers/dump support level-incrementals hda0 9
Note that the result of this support command may depend on filesystem information, so the disklist filesystem entry should be specified as a command line argument. In the next examples, we are not going to use full pathnames to wrapper scripts any more.
We could have defined a `support' command for full backups, but I can't think of a backup program that does not support full backups...
The ability to produce index files is also subject to an invocation of `support' command. When the support sub-command is `index', like in the invocation below, the wrapper must print a list of valid indexing mechanisms, one per line, most preferred first. If indexing is not supported, nothing should be printed, and the exit status should be 1.
DUMP support index hda0
The currently known indexing mechanisms are:
output: implies that the command `index-from-output' generates an index file from the output produced by the backup program (for example, from tar -cv).
image: implies that the command `index-from-image' generates an index file from a backup image (for example, tar -t).
direct: implies that the `backup' command can produce an index file as it generates the backup image.
parse: implies that the `backup-parse' command can produce an index file as it generates the backup formatted output .
The indexing mechanisms will be explicitly requested with the additionnal option `index-<mode>' in the `backup' and `backup-parse' command invocation.
`index-from-image' should be supported, if possible, even if other index commands are not, since it can be used in the future to create index files from previously backed up filesystems.
The `parse-estimate' support subcommand print a list of valid mechanisms to parse the estimate output and write the estimate size to its output, the two mechanisms are:
direct: implies that the `estimate' command can produce the estimate output.
parse: implies that the `estimate-parse' command can produce the estimate output when fed with the `estimate' output.
The estimate parsing mechanisms will be explicitly requested with the additionnal option `estimate-<mode>' in the `estimate' and `estimate-parse' command invocation.
The `parse-backup' support subcommand print a list of valid mechanisms to parse the backup stderr, the two mechanisms are:
direct: implies that the `backup' command can produce the backup-formatted-ouput.
parse: implies that the `backup-parse' command can produce the backup-formatted-ouput when fed with the `backup' stderr.
The backup parsing mechanisms will be explicitly requested with the additional option `backup-<mode>' in the `backup' and `backup-parse' command invocation.
Some other standard `support' sub-commands are `exclude' and `exclude-list'.
One may think (and several people did :-) ) that there should be only one support command, that would print information about all supported commands. The main arguments against this proposal have to do with extensibility:
The availability of commands might vary from filesystem to filesystem. No, I don't have an example, I just want to keep it as open as possible :-) one support subcommand may require command line arguments that others don't, and we can't know in advance what these command line arguments are going to be
The output format and exit status conventions of a support command may vary from command to command; the only pre-defined convention is that, if a wrapper does not know about a support subcommand, it should return exit status 1, implying that the inquired feature is not supported.
We should support commands to perform self-checks, run estimates, backups and restores (for future extensions of the Amanda protocol so as to support restores)
A selfcheck request would go like this:
DUMP selfcheck hda0 option option=value ...
The options specified as command-line arguments are dumptype options enabled for that disk, such as `index', `norecord', etc. Unknown options should be ignored. For each successful check, a message such as:
OK [/dev/hda0 is readable] OK [/usr/sbin/dump is executable]
Errors should be printed as:
ERROR [/etc/dumpdates is not writable]
A wrapper script will certainly have to figure out either the disk device name or its mount point, given a filesystem name such as `hda0', as specified in the disklist. In order to help these scripts, Amanda provides a helper program that can guess device names, mount points and filesystem types, when given disklist entries.
The filesystem type can be useful on some operation systems, in which more than one dump program is available; this information can help automatically selecting the appropriate dump program.
The exit status of selfcheck and of this alternate script are probably going to be disregarded. Anyway, for consistency, selfcheck should return exit status 0 for complete success, 1 if any failures have occurred.
Estimate requests can be on several different forms. An estimate of a full backup may be requested, or estimates for level- or timestamp-based incrementals:
DUMP estimate full hda0 option ... DUMP estimate level 1 hda0 option ... DUMP estimate diff 1998:09:24:01:02:03 hda0 option ...
If requested estimate type is not supported, exit status 3 should be returned.
If the option `estimate-direct' is set, then the `estimate' command should write to stdout the estimated size, in bytes, a pair of numbers that, multiplied by one another, yield the estimated size in bytes.
If the option `estimate-parse' is set, then the `estimate' command should write to stdout the informations needed by the `estimate-parse' command, that should extract from its input the estimated size.
The syntax of `estimate-parse' is identical to that of `estimate'.
Both `estimate' and `estimate-parse' can output the word `KILL', after printing the estimate. In this case, Amanda will send a SIGTERM signal to the process group of the `estimate' process. If it does not die within a few seconds, a SIGKILL will be issued.
If `estimate' or `estimate-parse' succeed, they should exit 0, otherwise exit 1, except for the already listed cases of exit status 3.
The syntax of `backup' is the same as that of `estimate'. The backup image should be written to standard output, whereas stderr should be used for the user-oriented output of the backup program and other messages.
If the option `backup-direct' is set, then the `backup' command should write to stderr a formatted-output-backup.
If the option `backup-parse' is set, then the `backup' command should write to stderr the informations needed by the `backup-parse' command, that should edit its input so that it prints to standard output a formatted-output-backup.
If the option `no-record' is set, then the `backup' command should
not modify its state file (ex. dump should not modify /etc/dumpdates
).
The syntax of `backup-parse' is identical to that of `backup'.
The syntax of the formatted-output-backup is as follow: All lines should start with either `| ' for normal output, `? ' for strange output or `& ' for error output. If the wrapper can determine the total backup size from the output of the backup program, it should print a line starting with `# ', followed by the total backup size in bytes or by a pair of numbers that, multiplied, yield the total backup size; this number will be used for consistency check.
The option `index-direct' should cause commands `backup' to output the index directly to file descriptor 3. The option `index-parse' should cause commands `backup-parse' to output the index directly to file descriptor 3. The syntax of the index file is described in the next section.
The syntax of the `index-from-output' and `index-from-image' commands is identical to the one of `backup'. They are fed the backup output or image, and they must produce a list of files and directories, one per line, to the standard output. Directories must be identified by the `/' termination.
After the file name and a blank space, any additional information about the file or directory, such as permission data, size, etc, can be added. For this reason, blanks and backslashes within filenames should be quoted with backslashes. Linefeeds should be represented as `\n', although it is not always possible to distinguish linefeeds in the middle of filenames from ones that separate one file from another, in the output of, say `restore -t'. It is not clear whether we should also support quoting mechanisms such as `\xHH', `\OOO' or `\uXXXX'.
This command must be followed by a valid backup or restore command, and it should print a shell-command that would produce an equivalent result, i.e., that would perform the backup to standard output, or that would restore the whole filesystem reading from standard input. This command is to be included in the header of backup images, to ease crash-recovery.
Well, that's all. Drop us a note at the amanda-hackers mailing list (mailto://[email protected]) if you have suggestions to improve this document and/or the API. Some help on its implementation would be welcome, too.