Repository Creation and Configuration

Repository Creation and Configuration

Creating a Subversion repository is an incredibly simple task. The svnadmin utility, provided with Subversion, has a subcommand for doing just that. To create a new repository, just run:

$ svnadmin create /path/to/repos

This creates a new repository in the directory /path/to/repos. This new repository begins life at revision 0, which is defined to consist of nothing but the top-level root (/) filesystem directory. Initially, revision 0 also has a single revision property, svn:date, set to the time at which the repository was created.

In Subversion 1.2, a repository is created with an FSFS back-end by default (see the section called “Repository Data Stores”). The back-end can be explicitly chosen with the --fs-type argument:

$ svnadmin create --fs-type fsfs /path/to/repos
$ svnadmin create --fs-type bdb /path/to/other/repos

Warning

Do not create a Berkeley DB repository on a network share—it cannot exist on a remote filesystem such as NFS, AFS, or Windows SMB. Berkeley DB requires that the underlying filesystem implement strict POSIX locking semantics, and more importantly, the ability to map files directly into process memory. Almost no network filesystems provide these features. If you attempt to use Berkeley DB on a network share, the results are unpredictable—you may see mysterious errors right away, or it may be months before you discover that your repository database is subtly corrupted.

If you need multiple computers to access the repository, you create an FSFS repository on the network share, not a Berkeley DB repository. Or better yet, set up a real server process (such as Apache or svnserve), store the repository on a local filesystem which the server can access, and make the repository available over a network. Chapter 6, Server Configuration covers this process in detail.

You may have noticed that the path argument to svnadmin was just a regular filesystem path and not a URL like the svn client program uses when referring to repositories. Both svnadmin and svnlook are considered server-side utilities—they are used on the machine where the repository resides to examine or modify aspects of the repository, and are in fact unable to perform tasks across a network. A common mistake made by Subversion newcomers is trying to pass URLs (even “localfile: ones) to these two programs.

So, after you've run the svnadmin create command, you have a shiny new Subversion repository in its own directory. Let's take a peek at what is actually created inside that subdirectory.

$ ls repos
conf/  dav/  db/  format  hooks/  locks/  README.txt

With the exception of the README.txt and format files, the repository directory is a collection of subdirectories. As in other areas of the Subversion design, modularity is given high regard, and hierarchical organization is preferred to cluttered chaos. Here is a brief description of all of the items you see in your new repository directory:

conf

A directory containing repository configuration files.

dav

A directory provided to Apache and mod_dav_svn for their private housekeeping data.

db

Where all of your versioned data resides. This directory is either a Berkeley DB environment (full of DB tables and other things), or is an FSFS environment containing revision files.

format

A file whose contents are a single integer value that dictates the version number of the repository layout.

hooks

A directory full of hook script templates (and hook scripts themselves, once you've installed some).

locks

A directory for Subversion's repository locking data, used for tracking accessors to the repository.

README.txt

A file which merely informs its readers that they are looking at a Subversion repository.

In general, you shouldn't tamper with your repository “by hand”. The svnadmin tool should be sufficient for any changes necessary to your repository, or you can look to third-party tools (such as Berkeley DB's tool suite) for tweaking relevant subsections of the repository. Some exceptions exist, though, and we'll cover those here.

Hook Scripts

A hook is a program triggered by some repository event, such as the creation of a new revision or the modification of an unversioned property. Each hook is handed enough information to tell what that event is, what target(s) it's operating on, and the username of the person who triggered the event. Depending on the hook's output or return status, the hook program may continue the action, stop it, or suspend it in some way.

The hooks subdirectory is, by default, filled with templates for various repository hooks.

$ ls repos/hooks/
post-commit.tmpl          post-unlock.tmpl          pre-revprop-change.tmpl
post-lock.tmpl            pre-commit.tmpl           pre-unlock.tmpl
post-revprop-change.tmpl  pre-lock.tmpl             start-commit.tmpl

There is one template for each hook that the Subversion repository implements, and by examining the contents of those template scripts, you can see what triggers each such script to run and what data is passed to that script. Also present in many of these templates are examples of how one might use that script, in conjunction with other Subversion-supplied programs, to perform common useful tasks. To actually install a working hook, you need only place some executable program or script into the repos/hooks directory which can be executed as the name (like start-commit or post-commit) of the hook.

On Unix platforms, this means supplying a script or program (which could be a shell script, a Python program, a compiled C binary, or any number of other things) named exactly like the name of the hook. Of course, the template files are present for more than just informational purposes—the easiest way to install a hook on Unix platforms is to simply copy the appropriate template file to a new file that lacks the .tmpl extension, customize the hook's contents, and ensure that the script is executable. Windows, however, uses file extensions to determine whether or not a program is executable, so you would need to supply a program whose basename is the name of the hook, and whose extension is one of the special extensions recognized by Windows for executable programs, such as .exe or .com for programs, and .bat for batch files.

Tip

For security reasons, the Subversion repository executes hook scripts with an empty environment—that is, no environment variables are set at all, not even $PATH or %PATH%. Because of this, a lot of administrators are baffled when their hook script runs fine by hand, but doesn't work when run by Subversion. Be sure to explicitly set environment variables in your hook and/or use absolute paths to programs.

There are nine hooks implemented by the Subversion repository:

start-commit

This is run before the commit transaction is even created. It is typically used to decide if the user has commit privileges at all. The repository passes two arguments to this program: the path to the repository, and username which is attempting the commit. If the program returns a non-zero exit value, the commit is stopped before the transaction is even created. If the hook program writes data to stderr, it will be marshalled back to the client.

pre-commit

This is run when the transaction is complete, but before it is committed. Typically, this hook is used to protect against commits that are disallowed due to content or location (for example, your site might require that all commits to a certain branch include a ticket number from the bug tracker, or that the incoming log message is non-empty). The repository passes two arguments to this program: the path to the repository, and the name of the transaction being committed. If the program returns a non-zero exit value, the commit is aborted and the transaction is removed. If the hook program writes data to stderr, it will be marshalled back to the client.

The Subversion distribution includes some access control scripts (located in the tools/hook-scripts directory of the Subversion source tree) that can be called from pre-commit to implement fine-grained write-access control. Another option is to use the mod_authz_svn Apache httpd module, which provides both read and write access control on individual directories (see the section called “Per-Directory Access Control”). In a future version of Subversion, we plan to implement access control lists (ACLs) directly in the filesystem.

post-commit

This is run after the transaction is committed, and a new revision is created. Most people use this hook to send out descriptive emails about the commit or to make a backup of the repository. The repository passes two arguments to this program: the path to the repository, and the new revision number that was created. The exit code of the program is ignored.

The Subversion distribution includes mailer.py and commit-email.pl scripts (located in the tools/hook-scripts/ directory of the Subversion source tree) that can be used to send email with (and/or append to a log file) a description of a given commit. This mail contains a list of the paths that were changed, the log message attached to the commit, the author and date of the commit, as well as a GNU diff-style display of the changes made to the various versioned files as part of the commit.

Another useful tool provided by Subversion is the hot-backup.py script (located in the tools/backup/ directory of the Subversion source tree). This script performs hot backups of your Subversion repository (a feature supported by the Berkeley DB database back-end), and can be used to make a per-commit snapshot of your repository for archival or emergency recovery purposes.

pre-revprop-change

Because Subversion's revision properties are not versioned, making modifications to such a property (for example, the svn:log commit message property) will overwrite the previous value of that property forever. Since data can be potentially lost here, Subversion supplies this hook (and its counterpart, post-revprop-change) so that repository administrators can keep records of changes to these items using some external means if they so desire. As a precaution against losing unversioned property data, Subversion clients will not be allowed to remotely modify revision properties at all unless this hook is implemented for your repository.

This hook runs just before such a modification is made to the repository. The repository passes four arguments to this hook: the path to the repository, the revision on which the to-be-modified property exists, the authenticated username of the person making the change, and the name of the property itself.

post-revprop-change

As mentioned earlier, this hook is the counterpart of the pre-revprop-change hook. In fact, for the sake of paranoia this script will not run unless the pre-revprop-change hook exists. When both of these hooks are present, the post-revprop-change hook runs just after a revision property has been changed, and is typically used to send an email containing the new value of the changed property. The repository passes four arguments to this hook: the path to the repository, the revision on which the property exists, the authenticated username of the person making the change, and the name of the property itself.

The Subversion distribution includes a propchange-email.pl script (located in the tools/hook-scripts/ directory of the Subversion source tree) that can be used to send email with (and/or append to a log file) the details of a revision property change. This mail contains the revision and name of the changed property, the user who made the change, and the new property value.

pre-lock

This hook runs whenever someone attempts to lock a file. It can be used to prevent locks altogether, or to create a more complex policy specifying exactly which users are allowed to lock particular paths. If the hook notices a pre-existing lock, then it can also decide whether a user is allowed to “steal” the existing lock. The repository passes three arguments to the hook: the path to the repository, the path being locked, and the user attempting to perform the lock. If the program returns a non-zero exit value, the lock action is aborted and anything printed to stderr is marshalled back to the client.

post-lock

This hook runs after a path is locked. The locked path is passed to the hook's stdin, and the hook also receives two arguments: the path to the repository, and the user who performed the lock. The hook is then free to send email notification or record the event in any way it chooses. Because the lock already happened, the output of the hook is ignored.

pre-unlock

This hook runs whenever someone attempts to remove a lock on a file. It can be used to create policies that specify which users are allowed to unlock particular paths. It's particularly important for determining policies about lock breakage. If user A locks a file, is user B allowed to break the lock? What if the lock is more than a week old? These sorts of things can be decided and enforced by the hook. The repository passes three arguments to the hook: the path to the repository, the path being unlocked, and the user attempting to remove the lock. If the program returns a non-zero exit value, the unlock action is aborted and anything printed to stderr is marshalled back to the client.

post-unlock

This hook runs after a path is unlocked. The unlocked path is passed to the hook's stdin, and the hook also receives two arguments: the path to the repository, and the user who removed the lock. The hook is then free to send email notification or record the event in any way it chooses. Because the lock removal already happened, the output of the hook is ignored.

Warning

Do not attempt to modify the transaction using hook scripts. A common example of this would be to automatically set properties such as svn:eol-style or svn:mime-type during the commit. While this might seem like a good idea, it causes problems. The main problem is that the client does not know about the change made by the hook script, and there is no way to inform the client that it is out-of-date. This inconsistency can lead to surprising and unexpected behavior.

Instead of attempting to modify the transaction, it is much better to check the transaction in the pre-commit hook and reject the commit if it does not meet the desired requirements.

Subversion will attempt to execute hooks as the same user who owns the process which is accessing the Subversion repository. In most cases, the repository is being accessed via Apache HTTP server and mod_dav_svn, so this user is the same user that Apache runs as. The hooks themselves will need to be configured with OS-level permissions that allow that user to execute them. Also, this means that any file or programs (including the Subversion repository itself) accessed directly or indirectly by the hook will be accessed as the same user. In other words, be alert to potential permission-related problems that could prevent the hook from performing the tasks you've written it to perform.

Berkeley DB Configuration

A Berkeley DB environment is an encapsulation of one or more databases, log files, region files and configuration files. The Berkeley DB environment has its own set of default configuration values for things like the number of database locks allowed to be taken out at any given time, or the maximum size of the journaling log files, etc. Subversion's filesystem code additionally chooses default values for some of the Berkeley DB configuration options. However, sometimes your particular repository, with its unique collection of data and access patterns, might require a different set of configuration option values.

The folks at Sleepycat (the producers of Berkeley DB) understand that different databases have different requirements, and so they have provided a mechanism for overriding at runtime many of the configuration values for the Berkeley DB environment. Berkeley checks for the presence of a file named DB_CONFIG in each environment directory, and parses the options found in that file for use with that particular Berkeley environment.

The Berkeley configuration file for your repository is located in the db environment directory, at repos/db/DB_CONFIG. Subversion itself creates this file when it creates the rest of the repository. The file initially contains some default options, as well as pointers to the Berkeley DB online documentation so you can read about what those options do. Of course, you are free to add any of the supported Berkeley DB options to your DB_CONFIG file. Just be aware that while Subversion never attempts to read or interpret the contents of the file, and makes no use of the option settings in it, you'll want to avoid any configuration changes that may cause Berkeley DB to behave in a fashion that is unexpected by the rest of the Subversion code. Also, changes made to DB_CONFIG won't take effect until you recover the database environment (using svnadmin recover).