Replication Scheme Design Issues

You must consider the following while designing your replication scheme:

Collision design
Error handling
Assigning Replicator servers
Storage structures

Collision Design

In a multiple database environment, there is always the possibility of two users updating the same record simultaneously on two different databases. When creating a replicated system, consider designing the system to avoid or at least minimize collisions.

To reduce collisions significantly or eliminate them altogether, design transactions that allow newly created information to be presented in the form of inserts and deletes instead of updates. This is particularly true for updates to key columns.

For information on how to resolve collisions, see Collision Resolution.

How Collisions Are Caused

A collision is an event that occurs when simultaneous changes are made in two different databases to rows with identical keys. The Replicator Servers detect this collision condition when the data is transmitted between the two databases.

In a collision condition, Ingres Replicator cannot synchronize the databases without destroying information.

There are five possible situations that cause collision between a source and target database. Collisions can occur during an insert, update, or a delete transaction and is detected in these instances:

Insert transaction detects a duplicate key.
Update transaction is attempted on a record that no longer exists.
Update transaction is attempted on a record that does not match the original.
Delete transaction is attempted on a record that no longer exists.
Delete transaction is attempted on a record that does not match the original.

Note: If you use referential integrity constraints in a distributed environment, you create collision-like conflicts that cannot be resolved. For example, assume a customer record is deleted in one database while at the same time a new order for that customer is entered in a different database. If there is a referential constraint requiring orders to reference existing customers, there is an exception when the order is replicated back to the first database.

The best defense against a collision is prevention. Design your replication system to reduce the probability of collisions. Even in well-designed databases, collisions can occur during a system failure when it becomes necessary to switch between replicated databases. For this reason alone, you need to plan how to handle collisions in your replication system.

Collision Handling

Collision handling happens at the CDDS level; when defining your CDDS you must specify which collision mode you want to use. There are two ways to handle collisions, automatically or manually. Each method has advantages and disadvantages. Automatic resolution takes less time but can produce unexpected results. Manual resolution can give you more control and, in some cases, is the only way you can resolve a conflict.

All collisions are counted as errors; they cause the Replicator Server to log the error, increase the error count, and send e-mail to the mail notification list. If the collision is resolved, the Replicator Server continues processing. If the collision is not resolved, the Replicator Server behaves in accordance with the error mode setting.

Automatic Resolution of Collisions

With automatic resolution, when two records collide, one record prevails over the other. If the operation was an insert or an update, a record survives a collision by overwriting the record in the target database (the target row is deleted and the prevailing source row is inserted in its place). If the transaction was a delete, the record in the target database is deleted. If a record does not survive a collision, its replication operation (insert, update, or delete) for that target is removed from the distribution queue.

Note: Automatic resolution overwrites the entire record and can overwrite columns that have correct information with columns that do not have correct information. You must not use automatic resolution if the information contained in the losing record is important. For example, if your database contains documents that are continually updated with new information, you can lose information with automatic resolution.

Manual Resolution of Collisions

You can resolve collisions manually by editing the base table, the shadow table, and the distribution queue. For instructions, see How You Resolve Collisions Manually. You can also resolve collisions manually using Visual DBA. For step-by-step instructions for resolving collisions manually, see the Visual DBA online help.

Collision Modes

How a Replicator Server acts when it detects a collision depends on the collision mode of the CDDS to which the colliding records belong. The Passive Detection mode involves no action on the part of the Replicator Server. Modes other than Passive Detection require the Replicator Server to search the target database to see if the row it is propagating already exists. If a Replicator Server detects a collision, its subsequent actions are determined by the collision mode and error mode assigned to the CDDS to which the colliding row belongs.

The available collision modes are:

Passive Detection
(Default) This mode detects insert collisions and update and delete collisions where the row to be updated or deleted does not exist. The Replicator Server does not resolve the collision. The collision is an error and the Replicator Server action is dependent on the error mode setting.

This mode requires you to perform manual collision resolution.
Active Detection
This mode detects all collisions, but does not resolve them.

Before propagating the row, the Replicator Server searches the target database to see if the row already exists. If it detects a collision, however, the Replicator Server does not resolve it. The collision is an error and the Replicator Server action is dependent on the error mode setting.

This mode requires you to perform manual collision resolution.
Benign Resolution
This mode detects and resolves benign collisions (when the same row with the same transaction ID arrives at the same target more than once).

Before propagating the row, the Replicator Server searches the target database to see if the row already exists. If the row does exist and it came from the same transaction, the Replicator Server issues a warning message and removes the operation from the distribution queue. Otherwise, the server action is dependent on the error mode setting.

This mode requires you to perform manual collision resolution on all but benign collisions.
Priority Resolution
This mode detects and resolves all collisions according to assigned priorities.

Before propagating the row, the Replicator Server searches the target database to see if the row already exists. If the Replicator Server detects a collision, it resolves it by comparing the priority numbers assigned to the underlying rows. The row with the highest priority number prevails. If the priorities are the same or do not exist, the row with the lower database number survives the collision.

Priority numbers are assigned in a priority collision resolution lookup table. For instructions for creating this lookup table, see chapter "Using Advanced Features."

In this mode, all collisions are resolved automatically.
Last Write Wins
This mode detects and resolves all collisions according to transaction timestamps.

Before propagating the row, the Replicator Server searches the target database to see if the row already exists. If the Replicator Server detects a collision, it resolves it by comparing the transaction timestamps. The row with the later timestamp prevails.

If the timestamps are identical, the row with the lower database number survives the collision.

In this mode, all collisions are resolved automatically.

How Errors Are Handled

Propagation error handling is specified at the CDDS level; when defining your CDDS you must specify which propagation error mode you want to use. The method the servers use to handle an error detected while transmitting information depends on the error mode.

For all propagation error modes, when a server detects an error it does the following:

Logs the error, thereby increasing the error count
For more information on the role of the server in error handling, see Error Handling.
Issues e-mail messages to any users on the mail notification list
For more information on the mail notification list, see Create a Mail Notification.

Note: E-mail error notification is not available on Windows. Using the -NML server flag also turns off error notification. For more information, see the chapter "Using the Replicatior Server."

How Error Modes Affect Server Behavior

Server behavior differs for each propagation error mode.

Note: In the following descriptions, the "current replication transaction" is in the context of the Replicator Server, which can disagree with the original user transaction if the latter updated more than one CDDS.

Error modes and how they affect server behavior are described here:

Skip Transaction
(Default) The Replicator Server:
- Rolls back the current replication transaction
- Processes the next transaction
- Retries the transaction in error during the next processing of the distribution queue
Skip Row
The Replicator Server continues from the error with no rollback performed. The record in error is skipped and processing proceeds to the next record. The record in error is removed from the queue.

Note: This setting disables transaction consistency, therefore data integrity needs to be maintained manually.
Quiet CDDS
The Replicator Server rolls back the current replication transaction. After the rollback of the transaction in error, the Replicator Server quiets the CDDS on the database where the transaction error occurred. (In this context, quiets means disabling propagation from the local database to the target database where the error occurred.)

The Replicator Server continues processing the remaining replicated transactions for the same CDDS on other databases, and for other CDDSs.

For information on reactivating the CDD, see chapter "Using the Replicator Server."
Quiet Database
The Replicator Server rolls back the current replication transaction. After the rollback of the transaction in error, the Replicator Server quiets all CDDSs in the database where the transaction error occurred. (In this context, quiets means disabling propagation from the local database to the target database where the error occurred.)

The Replicator Server continues processing the remaining replicated transactions for other databases.

For information on reactivating the database, see chapter "Using the Replicator Server."
Quiet Server
The Replicator Server:
- Rolls back the current replication transaction
- Stops processing of the remaining batch of replicated transactions
- Changes its status from active to quiet mode

Replicator Server Assignment

You must assign Replicator Servers on full peer databases to transmit replications to targets. Protected and unprotected read-only databases do not need Replicator Servers because they do not transmit information. They do, however, need server numbers assigned to them so that they can receive information.

Assign a Replicator Server number for each database in your replication scheme according to the following guidelines:

The number must be in the range of 1-999 and must correspond to the name of the directory where the server resides. When you set up the Ingres Replicator software, ten server directories (named server1 through server10) are created. You can add more server directories if necessary. For instructions, see chapter "Using the Replicator Server."
The number must be unique within an Ingres Replicator installation but can be the same as a number on another installation.

Replicator Server Scenarios

The following are Replicator Server assignment scenarios that you can use:

Single server
Each local database can have a single Replicator Server that services all targets. If the replication traffic is heavy, performance can be affected.
Different server for every CDDS
Each full peer database can have a Replicator Server for each CDDS. This allows each CDDS to have a different priority (at the operating system level) and different scheduling. Also, if there is a problem with one server, the whole system is not necessarily affected.
Different server for every target database
Each target that the full peer database transmits can have its own server. This further localizes transmission problems that can occur and reduces traffic on individual servers. Many servers can improve performance over the network, however, each server uses resources. For an example of using a different server for every target database, see Example 1: R.E.P.'s CDDS 0.
Different server for every target database and CDDS
Even if the same target database exists in more than one CDDS, it has a different server transmitting to it for every CDDS it participates in. For an example of using different servers for every target database and CDDS, see Example 3: R.E.P.'s Server Assignments.

Storage Structures

In Ingres Replicator, deadlock can occur when tables involved in a query are replicated. When you update the base table, the DBMS must insert a record in the shadow table, possibly the archive table, and the input queue table within the same transaction. The additional inserts can cause deadlock.

The following example shows how deadlock can occur with just two sessions:

User A inserts a record into page 10 of the base table; this locks page 10 of the base table, page 5 of the shadow table, and page 8 of the input queue table to maintain the necessary support tables. The insert is completed and the user holds the transaction—and therefore the locks—open.
User B inserts a record into page 11 of the base table; this locks page 11 of the base table and also requires a lock on page 5 of the shadow table to insert the shadow record. The lock on page 5 of the shadow table cannot be obtained, so the session waits for it.
Meanwhile, user A tries to insert a record into page 11 of the base table. This insert waits for the lock, which user B holds on the base table. Hence, deadlock. User A needs the lock on page 11 of the base table that user B holds, and user B needs the lock on page 5 of the shadow table that user A holds.

The first part of the primary key on both the shadow table and the input queue table is the transaction_id column. This column contains the low part of the Ingres internal 8-byte transaction ID. This transaction ID is incremented by 1 each time a new transaction—which can be user or system internal—is started in the Ingres installation. This means that a nearly monotonically increasing primary key value is generated for each replicated update.

The default structure for the shadow and archive tables is B-tree. Because the primary part of the key is always increasing, every insert to the support tables causes a ladder lock down the right side of the B-tree index, resulting in a lock on the far right leaf page to insert the new index value. This structure therefore makes it likely that all users want to lock the same pages at the same time.

The column ordering on which Ingres Replicator support tables is indexed cannot be changed because access using these keys is built into the internals of the DBMS (much like the Ingres standard system catalogs, of which the Ingres Replicator support tables are just the visible extension). This makes the alteration or re-ordering of key columns impossible.

To avoid deadlock, the solution is to make sure the sessions do not try to lock the same pages in the same tables at the same time. To achieve this, you need some understanding of the keying and table structures of the Ingres Replicator support tables. For detailed information and examples of how to use locking parameters to avoid deadlock, see Strategies for Avoiding Deadlock. The other means of avoiding deadlock is to use storage structures. For more information, see Using Hash Structures to Avoid Deadlock.

Using Hash Structures to Avoid Deadlock

One option to avoid deadlock is to modify the table structure from B-tree to hash. Hash is the best structure to use to avoid the problem of users requiring locks on the same pages. If the modify creates enough hash buckets (minpages), each new insert is hashed to a different bucket, and users never require locks on the same pages.

The actual value of minpages used in the modify statement depends on the number of concurrent users and the volume and frequency of updates to the replicated tables. The value must be as large as possible even though more disk space is needed when the table is modified. The table does not grow in size until more records are inserted that hash to a single page than fits in that page; this event becomes less likely with more hash buckets, which offsets the additional disk space requirement.

The shadow and archive tables present a different situation. Records are continually inserted into these tables for each insert (shadow only), update, and delete (both shadow and archive) on the base replicated table. None of the inserted records are automatically removed, which means that the tables are continually growing in size (until the arcclean utility is used to remove rows that are no longer needed). This causes a problem with hash tables because no matter how many hash buckets are provided, they are eventually filled, and the tables contain unwanted overflow pages. Because each insert must lock the entire overflow chain for the relevant bucket, the longer the chain becomes, the more likely it is that the insert causes lock contention (and therefore possible deadlock).

The key to the efficient use of hash tables is planning. You need to calculate the likely volume of replicated updates for each table in a day and ensure that there are enough spare hash buckets to handle any new additions. The tables must be remodified each night to remove any overflow and increase the number of hash buckets for the next day's use. The shadow and archive tables must also be cleaned out periodically when a logical consistency point (for replicated databases) can be found and while replication is disabled. If there are situations where this type of maintenance is impossible, the row-locking alternative described in the chapter "Using Advanced Features" must be investigated.

Keep in mind that a hash table is more efficient for updates than a B-tree table (provided there is no unwanted overflow); for performance considerations, you must try to use hash tables. Remember that records are auto-deleted from the input queue table, so there is no reason for using row locking on this table. The only exception is where the distribution threads are allowed to lag behind the user update (because of insufficient threads or the value of rep_txq_size is set too low—neither of which is desirable).