Chapter 29. Nuxeo Core Import / Export API

Table of Contents

29.1. Export Format
29.1.1. document.xml format
29.1.2. Inlining Blobs
29.2. Document Pipe
29.3. Document Reader
29.4. Document Writer
29.5. Document Transformer
29.6. API Examples
29.6.1. Exporting data from a Nuxeo repository to a Zip archive
29.6.2. Importing data from a Zip archive to a Nuxeo repository
29.6.3. Export a single document as an XML with blobs inlined.

The import / export service is providing an API to export a set of documents from the repository in an XML format and then re-importing them back.

The service can also be used to create in batch document trees from valid import archives or to provide a simple solution of creating and retrieving repository data. This could be used for example to expose repository data through REST or raw HTTP requests.

Export and import mechanism is extensible so that you can easily create you custom format for exported data. The default format provided by Nuxeo EP is described below.

The import / export module is part of the nuxeo-core-api bundle and it is located under the org.nuxeo.ecm.core.api.io package.

29.1. Export Format

A document will be exported as a directory using as name the document node name and containing a document.xml file which hold the document metadata and properties as defined by document schemas. Document blobs if any are by default exported as separate files inside the document directory. There is also an option to export blobs inlined as Base64 encoded data inside the document.xml.

When exporting trees document children are put as subdirectories inside the document parent directory.

Optionally each service in nuxeo that store persistent data related to documents like the workflow, relation or annotation services may also export their own data inside the document folder as XML files.

A document tree will be exported as directory tree. Here is an example of an export tree containing relations information for a workspace named workspace1:

+ workspace1
    + document.xml
    + relations.xml

    + doc1
      + document.xml
      + relations.xml

    + doc2
      + document.xml
      + relations.xml
      + file1.blob

    + doc3
      + document.xml
    

29.1.1. document.xml format

Here is an XML that correspond to a document containing a blob. The blob is exported as a separate file:

<?xml version="1.0" encoding="UTF-8"?>

<document repository="default" id="633cf240-0c03-4326-8b3b-0960cf1a4d80">
  <system>
    <type>File</type>
    <path>/default-domain/workspaces/ws/test</path>
    <lifecycle-state>project</lifecycle-state>
    <lifecycle-policy>default</lifecycle-policy>
    <access-control>
      <acl name="inherited">
        <entry principal="administrators" permission="Everything" grant="true"/>
        <entry principal="members" permission="Read" grant="true"/>
        <entry principal="members" permission="Version" grant="true"/>
        <entry principal="Administrator" permission="Everything" grant="true"/>
      </acl>
    </access-control>
  </system>
  <schema xmlns="http://www.nuxeo.org/ecm/schemas/files/" name="files">
    <files/>
  </schema>
  <schema xmlns:dc="http://www.nuxeo.org/ecm/schemas/dublincore/" name="dublincore">
    <dc:valid/>
    <dc:issued/>
    <dc:coverage></dc:coverage>
    <dc:title>test</dc:title>
    <dc:modified>Fri Sep 21 20:49:26 CEST 2007</dc:modified>
    <dc:creator>Administrator</dc:creator>
    <dc:subjects/>
    <dc:expired/>
    <dc:language></dc:language>
    <dc:rights>test</dc:rights>
    <dc:contributors>
      <item>Administrator</item>
    </dc:contributors>
    <dc:created>Fri Sep 21 20:48:53 CEST 2007</dc:created>
    <dc:source></dc:source>
    <dc:description/>
    <dc:format></dc:format>
  </schema>
  <schema xmlns="http://www.nuxeo.org/ecm/schemas/file/" name="file">
    <content>
      <encoding></encoding>
      <mime-type>application/octet-stream</mime-type>
      <data>cd1f161f.blob</data>
    </content>
    <filename>error.txt</filename>
  </schema>
  <schema xmlns="http://project.nuxeo.com/geide/schemas/uid/" name="uid">
    <minor_version>0</minor_version>
    <uid/>
    <major_version>1</major_version>
  </schema>
  <schema xmlns="http://www.nuxeo.org/ecm/schemas/common/" name="common">
    <icon-expanded/>
    <icon/>
    <size/>
  </schema>
</document>

You can see that the generated document is containing one [system] section and one or more [schema] sections. The system section contains all system (internal) document properties like document type, path, lifecycle state and access control configuration. For each schema defined by the document type there is a schema entry which contains the document properties belonging to that schema. The XSD schema that correspond to that schema can be used to validate the content of the schema section. Anyway this is true only in the case of inlined blobs. By default, for performance reasons, the blobs are put outside the XML file in their own file.

So instead of encoding the blob in the XML file a reference to an external file is preserved: cd1f161f.blob

Here is how the same blob will be serialized when inlining blobs (an option of the repository reader):

  <schema xmlns="http://www.nuxeo.org/ecm/schemas/file/" name="file">
    <content>
      <encoding></encoding>
      <mime-type>application/octet-stream</mime-type>
      <data>
       b3JnLmpib3NzLnJlbW90aW5nLkNhbm5vdENvbm5lY3RFeGNlcHRpb246IENhbiBub3QgZ2V0IGNv
       bm5lY3Rpb24gdG8gc2VydmVyLiAgUHJvYmxlbSBlc3RhYmxpc2hpbmcgc29ja2V0IGNvbm5lY3Rp 
       [...]
       </data>
    </content>
    <filename>error.txt</filename>
  </schema>

29.1.2. Inlining Blobs

There is an option to inline the blob content in the XML file as a Base64 encoded text. This is less optimized but this is the canonic format to export a document data prior to XSD validation of document schemas.

Of course this is less optimized than writing the raw blob data in external files but provides a way to encode the entire document content in a single file and in a well known and validated format.

By default when exporting documents from the repository blobs are not inlined. To activate the inlining option you must set call the method on the DocumentModelReader you are using to fetch data from the repository:

reader.setInlineBlobs(boolean inlineBlobs);

29.2. Document Pipe

An export process is a chain of three sub processes:

  1. fetching data from repository

  2. transforming the data if necessary

  3. writing the data to an external system

In the same way an import can be defined as a chain of three sub processes:

  1. fetching data from external sources

  2. transforming the data if necessary

  3. writing the data into the repository

We will name the process chain used to perform imports and exports as a Document Pipe.

In both cases (imports and exports) a document pipe is dealing with the same type of objects:

  1. A document reader

  2. Zero or more document transformers

  3. A document writer

So the DocumentPipe will use a reader to fetch data that will be passed through registered transformers and then written down using a document writer.

See the API Examples for examples on how to use a Document Pipe.

29.3. Document Reader

A document reader is responsible to read some input data and convert it into a DOM representation. The DOM representation is using the format explained in Document XML section. Currently dom4j Documents are used as the DOM objects.

For example a reader may extract documents from the repository and to output it as XML DOM objects. Or it may be used to read files from a file system and convert them into DOM objects to be able to import them in a Nuxeo repository.

To change the way document are extracted and transformed to a DOM representation you can implement your own Document Reader. Currently Nuxeo provides several flavors of document readers:

  1. Repository readers - these category of readers are used to extract data from the repository as DOM objects. All of these readers are extending DocumentModelReader:

    • SingleDocumentReader - this one reads a single document given its ID and export it as a dom4j Document.

    • DocumentChildrenReader - this one reads the children of a given document and export each one as dom4j Document.

    • DocumentTreeReader - this one reads the entire subtree rooted in the given document and export each node in the tree as a dom4j Document.

    • DocumentListReader - this one is taking as input a list of document models and export them as domj Documents. This is useful when wanting to export a search result for example.

  2. External readers used to read data as DOM objects from external sources like file systems or databases. The following readers are provided:

    • XMLDirectoryReader - read a directory tree in the format supported by Nuxeo (as described in Export Format section). This can be used to import deflated nuxeo archives or hand created document directories.

    • NuxeoArchiveReader - read Nuxeo EP exported archives to import them in a repository. Note that only zip archives created by nuxeo exporter are supported.

    • ZipReader - read a zip archive and output DOM objects. This reader can read both Nuxeo zip archives and regular zip archives (hand made). Reading a Nuxeo archive is more optimized - because Nuxeo zip archives entries are added to the archive in a predefined order that makes possible reading the entire archive tree on the fly without unziping the content of the archive on the filesystem first. If the zip archive is not recognized as a Nuxeo archive the zip will be deflated in a temporary folder on the file system and the XMLDirectoryReader will be used to read the content.

To create a custom reader you need to implement the interface org.nuxeo.ecm.core.api.io.DocumentReader

29.4. Document Writer

A document writer is responsible to write the documents that exit the pipe in a document store. This storage can be a File System, A Nuxeo Repository or any database or data storage as long as you have a writer that supports it.

The following DocumentWriters are provided by Nuxeo:

  1. Repository Writers - These ones are writing documents to a Nuxeo repository. They are useful to perform imports into the repository.

    • DocumentModelWriter - writes documents inside a Nuxeo Repository. This writer is creating new document models for each one of the imported documents.

    • DocumentModelUpdater - writes documents inside a Nuxeo Repository. This writer is updating documents that have the same ID as the imported ones or create new documents otherwise.

  2. External Writers - are writers that write documents on an external storage. They are useful to perform exports from the repository.

    • XMLDocumentWriter - writes a document as a XML file with blobs inlined.

    • XMLDocumentTreeWriter - writes a list of documents inside a unique XML file with blobs inlined. The document tags will be included in a root tag

      <documents> .. </documents>
    • XMLDirectoryWriter - writes documents as a folder tree on the file system. To read back the exported tree you may use XMLDirectoryReader

    • NuxeoArchiveWriter - writes documents inside a Nuxeo azip archive. To read back the archive you may use the NuxeoArchiveReader

To create a custom writer you need to implement the interface org.nuxeo.ecm.core.api.io.DocumentWriter

29.5. Document Transformer

Document transformers are useful to transform documents that enter the pipe and before being sent to the writer. This way you can remove, add or modify some properties from the documents, or other information contained by the exported DOM object.

As documents are expressed as XML DOM objects you can also use XSLT transformations inside your transformer.

To create a custom transformer you need to implement the interface org.nuxeo.ecm.core.api.io.DocumentTransformer

29.6. API Examples

Performing exports and imports can be done by following these steps:

  1. Instantiate a new DocumentPipe:

    // create a pipe that will process 10 documents on each iteration
    DocumentPipe pipe = new DocumentPipeImpl(10);

    The page size argument is important when you are running the pipe on a machine different than the one containing the source of the data (the one from where the reader will fetch data). This way you can fetch several documents at once improving performances.

  2. Create a new DocumentReader that will be used to fetch data and put it into the pipe. Depending on the data you want to import you can choose between existing DocumentReader implementation or you may write your own if needed:

    reader = new DocumentTreeReader(docMgr, src, true);
    pipe.setReader(reader);

    In this example we use a DocumentTreeReader which will read an entire sub-tree form the repository rooted in 'src' document.

    The docMgr argument represent a session to the repository, the 'src' is the root of the tree to export and the 'true' flag means to exclude the root from the exported tree.

  3. Create a DocumentWriter that will be used to write down the outputed by the pipe.

    writer = new XMLDirectoryWriter(new File("/tmp/export"));
    pipe.setWriter(writer);

    In this example we instantiate a writer that will write exported data onto the file system as a folder tree.

  4. Optionally you may add one or more Document Transformers to transform documents that enters the pipe.

    MyTransformer transformer = new MyTransformer();
    pipe.addTransformer(transformer);
  5. And now run the pipe ...

    pipe.run();

29.6.1.  Exporting data from a Nuxeo repository to a Zip archive

DocumentReader reader = null;
DocumentWriter writer = null;

try {
  DocumentModel src = getTestWorkspace();
  reader = new DocumentTreeReader(docMgr, root, true);
  writer = new NuxeoArchiveWriter(new File("/tmp/export.zip"));
  // creating a pipe
  DocumentPipe pipe = new DocumentPipeImpl(10);
  pipe.setReader(reader);
  pipe.setWriter(writer);
  pipe.run();	
} finally { 
  if (reader != null) {
    reader.close(); 
  }
  if (writer != null) { 
    writer.close(); 
  } 
}

29.6.2. Importing data from a Zip archive to a Nuxeo repository

DocumentReader reader = null;
DocumentWriter writer = null;
try {
  DocumentModel src = getTestWorkspace();
  reader = new ZipReader(new File("/tmp/export.zip"));
  writer = new DocumentModelWriter(docMgr, "import-domain/Workspaces/ws");
	
  // creating a pipe
  DocumentPipe pipe = new DocumentPipeImpl(10);
  pipe.setReader(reader);
  pipe.setWriter(writer);
  pipe.run();
} finally { 
  if (reader != null) {
    reader.close(); 
  } 
  if (writer != null) {
    writer.close(); 
  }
}
	

29.6.3. Export a single document as an XML with blobs inlined.

DocumentReader reader = null;
DocumentWriter writer = null;

try { 
  DocumentModel src = getTestWorkspace();
  reader = new SingleDocumentReader(docMgr, src);
	
  // inline blobs
  ((DocumentTreeReader)reader).setInlineBlobs(true);
  writer = new XMLDocumentWriter(new File("/tmp/export.zip"));
	
  // creating a pipe
  DocumentPipe pipe = new DocumentPipeImpl();
	
  // optionally adding a transformer
  pipe.addTransformer(new MyTransformer());
  pipe.setReader(reader);
  pipe.setWriter(writer); pipe.run();
	
} finally { 
  if (reader != null) {
    reader.close(); 
  } 
  if (writer != null) { 
    writer.close();
  }
}