6. Slony-I Maintenance

Slony-I actually does a lot of its necessary maintenance itself, in a "cleanup" thread:

6.1. Watchdogs: Keeping Slons Running

There are a couple of "watchdog" scripts available that monitor things, and restart the slon processes should they happen to die for some reason, such as a network "glitch" that causes loss of connectivity.

You might want to run them...

The "best new way" of managing slon processes is via the combination of Section 19.2, which creates a configuration file for each node in a cluster, and Section 19.3, which uses those configuration files.

This approach is preferable to elder "watchdog" systems in that you can very precisely "nail down," in each config file, the exact desired configuration for each node, and not need to be concerned with what options the watchdog script may or may not give you. This is particularly important if you are using log shipping , where forgetting the -a option could ruin your log shipped node, and thereby your whole day.

6.2. Parallel to Watchdog: generate_syncs.sh

A new script for Slony-I 1.1 is generate_syncs.sh, which addresses the following kind of situation.

Supposing you have some possibly-flakey server where the slon daemon that might not run all the time, you might return from a weekend away only to discover the following situation.

On Friday night, something went "bump" and while the database came back up, none of the slon daemons survived. Your online application then saw nearly three days worth of reasonably heavy transaction load.

When you restart slon on Monday, it hasn't done a SYNC on the master since Friday, so that the next "SYNC set" comprises all of the updates between Friday and Monday. Yuck.

If you run generate_syncs.sh as a cron job every 20 minutes, it will force in a periodic SYNC on the origin, which means that between Friday and Monday, the numerous updates are split into more than 100 syncs, which can be applied incrementally, making the cleanup a lot less unpleasant.

Note that if SYNCs are running regularly, this script won't bother doing anything.

6.3. Testing Slony-I State

In the tools directory, you may find scripts called test_slony_state.pl and test_slony_state-dbi.pl. One uses the Perl/DBI interface; the other uses the Pg interface.

Both do essentially the same thing, namely to connect to a Slony-I node (you can pick any one), and from that, determine all the nodes in the cluster. They then run a series of queries (read only, so this should be quite safe to run) which look at the various Slony-I tables, looking for a variety of sorts of conditions suggestive of problems, including:

Running this once an hour or once a day can help you detect symptoms of problems early, before they lead to performance degradation.

6.4. Replication Test Scripts

In the directory tools may be found four scripts that may be used to do monitoring of Slony-I instances:

6.5. Other Replication Tests

The methodology of the previous section is designed with a view to minimizing the cost of submitting replication test queries; on a busy cluster, supporting hundreds of users, the cost associated with running a few queries is likely to be pretty irrelevant, and the setup cost to configure the tables and data injectors is pretty high.

Three other methods for analyzing the state of replication have stood out:

6.6. Log Files

slon daemons generate some more-or-less verbose log files, depending on what debugging level is turned on. You might assortedly wish to: