You may discover that you have missed replicating things that you wish you were replicating.
This can generally be fairly easily remedied. This section attempts to provide a "task-oriented" view of how to use Slony-I; in effect, to answer the question "How do I do X with Slony-I?", for various values of X.
You cannot directly use slonik SET ADD TABLE or SET ADD SEQUENCE in order to add tables and sequences to a replication set that is presently replicating; you must instead create a new replication set. Once it is identically subscribed (e.g. - the set of providers and subscribers is entirely identical to that for the set it is to merge with), the sets may be merged together using MERGE SET.
Up to and including 1.0.2, there was a potential problem where if MERGE SET is issued while other subscription-related events are pending, it is possible for things to get pretty confused on the nodes where other things were pending. This problem was resolved in 1.0.5. Up until 1.2.1, there was still a problem where MERGE SET could be requested before all the subscriptions were complete, which would "muss things up" on nodes where subscription activity was still under way.
Note that if you add nodes, you will need to add both STORE PATH statements to indicate how nodes communicate with one another, and STORE LISTEN statements to configuration the "communications network" that results from that. See Section 9 for more details on the latter.
It is suggested that you be very deliberate when adding such things. For instance, submitting multiple subscription requests for a particular set in one slonik script often turns out quite badly. If it is truly necessary to automate this, you'll probably want to submit WAIT FOR EVENT requests in between subscription requests in order that the slonik script wait for one subscription to complete processing before requesting the next one.
But in general, it is likely to be easier to cope with complex node reconfigurations by making sure that one change has been successfully processed before going on to the next. It's way easier to fix one thing that has broken than to piece things together after the interaction of five things that have all broken.
Here are a set of "recipes" for how to do various sorts of modifications to replication configuration:
Slony-I does not allow you to add a table to a replication set that is already being replicated. In principle, it would certainly be possible; what would happen is that the SET_ADD_TABLE event would lead to the relevant code from the SUBSCRIBE_SET event being invoked to initialize the table. That would, regrettably, significantly complicate the logic of all of these components, so this is not permitted.
Instead, what you must do is thus:
Add the new table on each node.
In principle, EXECUTE SCRIPT could be used for this, but the fact that this leads to Locking Issues and requires altering all tables in some existing replication set, on all nodes, makes EXECUTE SCRIPT an unattractive approach on a busy system. This breaks the Slony-I feature that you "don't have to interrupt normal activity to introduce replication."
Instead, you may add the table via psql on each node.
Create a new replication set CREATE SET
Add the table to the new set SET ADD TABLE
Request subscription SUBSCRIBE SET for this new set. If there are several nodes, you will need to SUBSCRIBE SET once for each node that should subscribe.
If you wish to know, deterministically, that each subscription has completed, you'll need to submit the following sort of slonik script for each subscription:
SUBSCRIBE SET (ID=1, PROVIDER=1, RECEIVER=2); WAIT FOR EVENT (ORIGIN=2, CONFIRMED = 1); SYNC(ID = 1); WAIT FOR EVENT (ORIGIN=1, CONFIRMED=2);
Once the subscriptions have all been set up so that the new set has an identical set of subscriptions to the old set, you can merge the new set in alongside the old one via MERGE SET
This also answers the question "How do I rename columns on a replicated table?", and, more generally, other questions to the effect of "How do I modify the definitions of replicated tables?"
If you change the "shape" of a replicated table, this needs to take place at exactly the same point in all of the "transaction streams" on all nodes that are subscribed to the set containing the table.
Thus, the way to do this is to construct an SQL script consisting of the DDL changes, and then submit that script to all of the nodes via the Slonik command EXECUTE SCRIPT.
There are a number of "sharp edges" to note...
You absolutely must not include transaction control commands, particularly BEGIN and COMMIT, inside these DDL scripts. Slony-I wraps DDL scripts with a BEGIN/COMMIT pair; adding extra transaction control will mean that parts of the DDL will commit outside the control of Slony-I
Before version 1.2, it was necessary to be exceedingly restrictive about what you tried to process using EXECUTE SCRIPT.
You could not have anything 'quoted' in the script, as this would not be stored and forwarded properly. As of 1.2, quoting is now handled properly.
If you submitted a series of DDL statements, the later ones could not make reference to objects created in the earlier ones, as the entire set of statements was submitted as a single query, where the query plan was based on the state of the database at the beginning, before any modifications had been made. As of 1.2, if there are 12 SQL statements, they are each submitted individually, so that alter table x add column c1 integer; may now be followed by alter table x alter column c1 set not null; .
You will want to remove the various Slony-I components connected to the database(s).
We will just consider, for now, doing this to one node. If you have multiple nodes, you will have to repeat this as many times as necessary.
Components to be Removed:
Log Triggers / Update Denial Triggers
The "cluster" schema containing Slony-I tables indicating the state of the node as well as various stored functions
slon process that manages the node
Optionally, the SQL and pl/pgsql scripts and Slony-I binaries that are part of the PostgreSQL build. (Of course, this would make it challenging to restart replication; it is unlikely that you truly need to do this...)
How To Conveniently Handle Removal
You may use the Slonik DROP NODE command to remove the node from the cluster. This will lead to the triggers and everything in the cluster schema being dropped from the node. The slon process will automatically die off.
In the case of a failed node (where you used FAILOVER to switch to another node), you may need to use UNINSTALL NODE to drop out the triggers and schema and functions.
If the node failed due to some dramatic hardware failure (e.g. disk drives caught fire), there may not be a database left on the failed node; it would only be expected to survive if the failure was one involving a network failure where the database was fine, but you were forced to drop it from replication due to (say) some persistent network outage.
If the above things work out particularly badly, you could submit the SQL command DROP SCHEMA "_ClusterName" CASCADE;, which will drop out Slony-I functions, tables, and triggers alike. That is generally less suitable than UNINSTALL NODE, because that command not only drops the schema and its contents, but also removes any columns previously added in using TABLE ADD KEY.
Things are not fundamentally different whether you are adding a brand new, fresh node, or if you had previously dropped a node and are recreating it. In either case, you are adding a node to replication.
The needful steps are thus...
Determine the node number and any relevant DSNs for the new node. Use PostgreSQL command createdb to create the database; add the table definitions for the tables that are to be replicated, as Slony-I does not automatically propagate that information.
If you do not have a perfectly clean SQL script to add in the tables, then run the tool slony1_extract_schema.sh from the tools directory to get the user schema from the origin node with all Slony-I "cruft" removed.
If the node had been a failed node, you may need to issue the slonik command DROP NODE in order to get rid of its vestiges in the cluster, and to drop out the schema that Slony-I creates.
Issue the slonik command STORE NODE to establish the new node.
At this point, you may start a slon daemon against the new node. It may not know much about the other nodes yet, so the logs for this node may be pretty quiet.
Issue the slonik command STORE PATH to indicate how slon processes are to communicate with the new node. In Slony-I version 1.1 and later, this will then automatically generate listen path entries; in earlier versions, you will need to use STORE LISTEN to generate them manually.
At this point, it is an excellent idea to run the tools script test_slony_state-dbi.pl, which rummages through the state of the entire cluster, pointing out any anomalies that it finds. This includes a variety of sorts of communications problems.
Issue the slonik command SUBSCRIBE SET to subscribe the node to some replication set.
For instance, I want subscriber node 3 to draw data from node 1, when it is presently drawing data from node 2.
This isn't a case for MOVE SET; we're not shifting the origin, just reshaping the subscribers.
For this purpose, you can simply submit SUBSCRIBE SET requests to revise the subscriptions. Subscriptions will not be started from scratch; they will merely be reconfigured.
Discussed in the Log Shipping section...
The ultimate proof is in looking at whether data added at the origin makes it to the subscribers. That's a "simply matter of querying".
There are several ways of examining replication status, however:
Look in the slon logs.
They won't say too much, even at very high debugging levels, on an origin node; at debugging level 2, you should see, on subscribers, that SYNCs are being processed. As of version 1.2, the information reported for SYNC processing includes counts of the numbers of tables processed, as well as numbers of tuples inserted, deleted, and updated.
Look in the view sl_status , on the origin node.
This view will tell how far behind the various subscribing nodes are in processing events from the node where you run the query. It will only be very informative on a node that originates a replication set.
Run the tools script test_slony_state-dbi.pl, which rummages through the state of the entire cluster, pointing out any anomalies that it notices, as well as some information on the status of each node.
Some of this is described under Section 8 but more of a procedure should be written...
You must first pick a node that is connected to the former origin (otherwise it is not straightforward to reverse connections in the move to keep everything connected).
Second, you must run a slonik script with the command LOCK SET to lock the set on the origin node. Note that at this point you have an application outage under way, as what this does is to put triggers on the origin that rejects updates.
Now, submit the slonik MOVE SET request. It's perfectly reasonable to submit both requests in the same slonik script. Now, the origin gets switched over to the new origin node. If the new node is a few events behind, it may take a little while for this to take place.
The Slony-I notion of a SYNC is actually always an incremental thing; a SYNC represents the set of updates that were committed during the scope of a particular SYNC event on the origin node. If a set of updates that altered the entire contents of a table were committed in a single SYNC, that would affect the entire contents of the table. But as far as Slony-I is concerned, this change is "incremental" even though the increment happened to be "the whole table."
The only time that Slony-I "synchronizes" the contents of a table is at the time the subscription is set up, at which time it uses COPY to draw in the entire contents from the provider node.
Since subscriber tables are protected against modification by anything other than Slony-I, there should be no way (aside from horrible bugs) for tables to fall out of synchronization. If they do, there is some rather serious problem with Slony-I.
If some such severe corruption takes place, the answer is to drop the table from replication, then create a new replication set and add it back.