Chapter 49. Replication tool

Table of Contents

49.1. Functional Objective
49.2. Use cases
49.3. User guide
49.4. How does it work?

49.1. Functional Objective

The system replication feature aims to clone entire collection of data existing in a Nuxeo system. Consequently the clone has to be importable in another system leading to a complete replication of the system. Such feature is obviously an important gain because it allows:

  • complete backup of system

  • complete data migration

  • replication of complex systems

Three projects are designed to accomplish the objectives:

  • common modules

  • export modules

  • import modules

The export module is also ported on 5.1, allowing migration from older storages to current supported Nuxeo deployment.

49.2. Use cases

  • UC1 System backup

As administrator I want to make a backup of the system

  1. Administrator is managing a complex Nuxeo system (including: users, document base, relations, etc.)

  2. Administrator ensures that for the time the replication occurs no actions are performed on server (no user connects, change documents, etc.)

  3. Administrator starts the system export. No other actions are performed.

  4. Administrator acknowledges the finish of the replication and the results. The UI and the server log provides the right information.

  5. The archive is stored safely.

  • UC2 Data migration 5.1 to 5.2

As Administrator I want to migrate 5.1 server to 5.2 server

  1. Administrator is managing a complex Nuxeo 5.1 system (including: users, document base, relations, etc.) and has a fresh new installed 5.2 system

  2. Administrator ensures that for the time the replication occurs no actions are performed on server (no user connects, change documents, etc.) on both servers

  3. Administrator starts the 5.1 system export. No other actions are performed.

  4. Administrator acknowledges the finish of the replication and the results. The UI and the server log provides the right information.

  5. Administrator copies the clone on a fresh new Nuxeo 5.2 machine.

  6. Administrator starts the import of the clone. No other actions are performed.

  7. Administrator acknowledges the finish of the import and the results. The UI and the server log provides the right information.

  8. As result, the new Nuxeo system is a perfect replication of the initial one.

  • UC3 Backup import

As Administrator I want to import an older backup.

  1. Administrator is managing a Nuxeo 5.2 system (including: users, document base, relations, etc.) and has an older backup archive

  2. Administrator ensures that for the time the import occurs no actions are performed on server (no user connects, change documents, etc.)

  3. Administrator optionally cleans up the DB before import.

  4. Administrator starts the import of the clone. No other actions are performed.

  5. Administrator acknowledges the finish of the import and the results. The UI and the server log provides the right information.

  6. As result, the new Nuxeo system is a perfect replication of the initial one.

49.3. User guide

A complete user guide can be found at http://doc.nuxeo.org/xwiki/bin/view/FAQ/ReplicateNuxeoRepository

49.4. How does it work?

The Nuxeo system contains a heterogeneous collection of data.

In this moment the following type of objects are considered:

  • documentary base

    • usual documents (workspaces and templates content, comments, tags, etc.)

    • versions (the checked in documents)

    • proxies (published documents)

  • relation graphs

  • vocabularies (excluding groups and users)

  • groups and users

  • different other tables (audit, tagging, etc.)

The approach needs to be different in the case of each type.

A special attention must be paid to the Seam and core sessions. Long operations could be broken by the Seam context. Thinking of repositories with 200,000 documents and more the Seam session is not suitable.

Also data as relations or vocabularies are easier to manage through already existing services, meaning the need of container context.

The super user context is always used.

The export and import need to be considered one along the other.

Transaction management

In a JEE environment, we must take care of the transactions in order to:

  • avoid timeouts avoid filling up the transaction cache (ex : prepare statement in case of PGSQL)

  • avoid letting the DataSource in a dirty state

As general TX guideline, we must handle transactions during import/export by hand (not let the container do it) by controlling the core session. We must handle batches (ie : commit any X documents is the DB is in a clean state). During the export, since we don't write into Nuxeo, TX is not that important. During import TX management is very important. It is important to maintain the right order in importing resources. Also, the import must be done one by one or in small chunks. The resume log MUST be in sync with the TX management : only batches that are successful must be logged.

Replication directory structure

/Replication Root

/Documentary Base

/Usual Documents

/Workspaces

/workspace1

/folder1

.........................

/Templates

.........................

/Versions

/version1 ID

.........................

/Relations

/graph1

.........................

/Vocabularies

/directory1

.........................

/Groups

/group1

.........................

/Users

/user1

.........................

/Tables

/Audit

/Tagging

Anyway, storing the document in a FS tree is a good idea:

  • avoid FS problems (too may children)

  • allow easy multi-threading import

The system replication is made inside a single directory named “Replication Root”. Under it, “Documentary Base” contains the documents. Under it, the “Usual documents” contains the repository exported muck likely the export utility. The file names are the names of the documents. The path of Nuxeo documents is unique, so it can be used without worrying of duplicates. We can find the usual documents exported, with blobs offline, with a new ACL encoded export for each document (inside the usual document.xml file). The lifecycle state is already saved in the document export). And also a new file named “import.export” containing the contextual metadata required for core import. Under “Documentary Base” the “Versions” folder contains the checked in versions. The versions are exported as the usual documents with proper metadata for core import. All versions are exported flat in directories named as their ID. The proxy documents are exported amongst the usual documents. They should only contain the core export-import metadata, these being enough for reconstructing the proxy.

Under “Relations” every graph is RDF exported as “rdf.xml” under the graph name folder.

Under “Vocabularies” every directory is custom XML exported as “vocabulary.xml” under the directory name folder.

Under “Groups” and “Users” the existing entities are saved in folders named by the entity name. Inside every directory a custom XML file “user.xml” or “group.xml” file is holding the Nuxeo specific data on the entity (see NuxeoPrincipal and NuxeoGroup).

Under “Tables” the existing and named tables are CSV exported.

The user has to provide the name of repository to be exported. The service itself creates a new thread to run the export, respectively the import, and returns immediately in the bean.

The import occurs actually in 2 stages: first the document is core imported (in order to ensure the ID preservation) using the contextual metadata; and after the actual import occurs (including the ACL). Data migration may include additional steps. Because there were some schema changes between Nuxeo versions data exported form a version 5.1.X may need to be modified before being imported into a 5.2. For this, we have a 3 steps pipe : export from source; apply transformations; import transformed data. Before actual import, a transformer can be contributed as Java extension mechanism. The transformer receives the exported document XML representation and can touch it in any way. The resulted XML is later used in import instead of original one.