Chapter 23. Nuxeo Conversion Service

Table of Contents

23.1. Conversion Service vs Transformation Service
23.1.1. Motivations for this API changes
23.1.2. What has been improved
23.1.3. About compatibility
23.2. Using Conversion Service
23.2.1. built-in converters
23.2.2. Conversion Service API
23.2.3. Configuring converter service
23.2.4. Contributing converters
23.2.5. Converters based on external command line tools

The nuxeo-core-convert provides a service to manage conversion of Blobs from one format to an other.

nuxeo-core-convert is only available starting with 5.2-M4.

23.1. Conversion Service vs Transformation Service

In 5.2 Conversion Service replaces the Transformation service that is now deprecated.

23.1.1. Motivations for this API changes

Transformation service had some API design issues that we wanted to correct. Because full text indexing is now handled by the repository we also had to have a core service to manage full text conversion. We decided to define a brand new API with a new service and we changed the service name to avoid any confusion and be able to provide backward compatibility.

23.1.2. What has been improved

  • API is now simpler

    there are only converters, not transformers and plugins like before

  • ConversionService includes a caching system

    this eliminate the need of having custom cache managed by all high level services that may use converters (like the preview service)

  • Data input/output is now handled via BlobHolder interface.

    There is only data structure (no more TransformDocuments or plain Blobs). This also makes the caching system more efficient sice link between the blobs and the associated DocumentModel is preserved when available.

  • Availability check API interface.

    ConversionService now provides an API to know if converter is available, this is usefull when the converter depends on an external program that must be installed on the server (like OpenOffice server)

23.1.3. About compatibility

Transformer API is now deprecated and the old transform-service implementations and plugins have been removed from default distribution. Nevertheless, we provide a tranformer-compat bundle that handles compatibity between the old and the new API.

In order to activate this compatibility you need to deploy :

  • nuxeo-platform-transform-api

  • nuxeo-platform-transform-compat

23.1.3.1. Using transformer API

Transformers API is stil available, but the implementation now wraps calls to the ConversionService. This means you can use the new Converters from the old Tranformation service API. All default included Transformers have been migrated to converters with the same name. Code that was using transformers should still work in 5.2.

23.1.3.2. Contributed transformers and plugins

Contributions to the old Transformation service are now contributed to the ConversionService using specific converters that wraps transformers or Plugins.

There are some limitations thought :

  • Transformers must be based on the TransformerImpl class

  • you may have to change the dependencies in pom.xml and MANIFEST.MF to point to the compat artifact instead of the core one.

23.2. Using Conversion Service

23.2.1. built-in converters

Inside nuxeo-core-convert-plugins

  • pdf2text, xml2text, html2text, word2text, xl2text, ppt2text, oo2text

    text extractors for common office formats

  • rfc822totext

    text extractors mime encoded mails

  • any2text

    meta-converter for text extraction

Inside nuxeo-platform-convert

  • pdf2html

    PDF to html conversion based on pdftohtml command line tool

  • office2html

    convert standard office formats to html (uses openoffice)

  • any2html

    compound converted to convert any input to html

  • any2pdf

    cuse OpenOffice to generate PDF

23.2.2. Conversion Service API

Conversion Service can be accessed via the standard Nuxeo Service lookup :

ConversionService conversionService = Framework.getService(ConversionService.class);

To convert a BlobHolder to a given destination mime type :

BlobHolder result = conversionService.convertToMimeType("text/plain", blobHolder, params);

params is a simple Map<String,Serializable> to pass parameters to the converter (can be null);

To use a known converter :

BlobHolder result = conversionService.convert("converterName", blobHolder, params);

To find a converter to a given conversion :

String converterName = conversionService.getConverterName(sourceMimeType, destinationMimeType);

To test if a converter is available

ConverterCheckResult checkResult = conversionService.isConverterAvailable("converterName");
			

This call can throw ConverterNotRegistred if the target converter does not exist at all. The ConverterCheckResult class provides :

  • a isAvailable() method

  • a getErrorMessage() method

    returns the error that occured while doing the availability check

  • a getInstallationMessage method

    Return the installation message that was contributed by the converter contributor

23.2.3. Configuring converter service

The conversion service supports a global configuration via XML file in order to configure caching.

<?xml version="1.0"?>
<component name="org.nuxeo.ecm.core.convert.config">
  <extension target="org.nuxeo.ecm.core.convert.service.ConversionServiceImpl"
      point="configuration">

    <configuration>
      <!-- define directory location for caching : default to java default tmp dir (java.io.tmpdir) -->
      <cachingDirectory>/var/ConversionCache</cachingDirectory>
      <!-- GC interval in minutes (default = 10 minutes ) -->
      <gcInterval>10</gcInterval>
      <!-- maximum size for disk cache in KB (default to 10*1024) -->
      <diskCacheSize>1024</diskCacheSize>
      <!-- Enables or disables caching (default = true)-->
      <enableCache>true</enableCache>
    </configuration>
  </extension>

</component>

23.2.4. Contributing converters

To contribute a converter, you have to contribute a class that implement the org.nuxeo.ecm.core.convert.extension.Converter interface. This class will be associated to :

  • a converter name

  • a list of source mime-types

  • one destination mime-type

  • optional named parameters

<extension target="org.nuxeo.ecm.core.convert.service.ConversionServiceImpl"
    point="converter">

  <converter name="html2text" class="org.nuxeo.ecm.core.convert.plugins.text.extractors.Html2TextConverter">
      <sourceMimeType>text/html</sourceMimeType>
      <sourceMimeType>text/xhtml</sourceMimeType>
      <destinationMimeType>text/plain</destinationMimeType>
      <parameters>
        <parameter name="myParam">myValue</parameter>
      </parameters>
    </converter>

</extension>

You can also contribute a converter that is a chain of existing converters (what was called a transformer in 5.1 transform service API). To to this, the contributed transformer does not have to define an implementation class, just a chain of either converters or mime-types. If mime-types are used, the conversion service will automatically guess the converter chain from the mime-types steps.

<extension target="org.nuxeo.ecm.core.convert.service.ConversionServiceImpl"
    point="converter">

  <!-- explicit chain of 2 converters : converter1 + converter2 -->
  <converter name="chainedConverter" >
      <sourceMimeType>some/mimetype</sourceMimeType>
      <destinationMimeType>some/other-mimetype</destinationMimeType>
      <conversionSteps>
        <subconverter>converter1</subconverter>
        <subconverter>converter2</subconverter>
      </conversionSteps>
    </converter>

  <!-- define chain via mime types : foo/bar1 => foo/bar2 => foo/bar3 -->
  <converter name="chainedMimeType" >
      <sourceMimeType>foo/bar1</sourceMimeType>
      <destinationMimeType>foo/bar3</destinationMimeType>
      <conversionSteps>
        <step>foo/bar2</step>
      </conversionSteps>
    </converter>

</extension>

When using chained converters, the additionnal optionnal parameters are passed to each underlying converter.

Converter based on external tools (such as command line or OpenOffice server based) can implement the ExternalConverter interface. This interfaces adds a isConverterAvailable() method that will be called in order to check converter availability.

23.2.5. Converters based on external command line tools

A lot of conversion tools comes as command line executable. So in some case it's interesting to wraps these command lines into a converter.

For that purpose, we provide a base class for converters that are based on a command line wrapped by the nuxeo commandLine service.

The base class org.nuxeo.ecm.platform.convert.plugins.CommandLineBasedConverter handles all the dirty work, and you only have to override the methods to define the parameters of the command line and the parsing of the output.

<extension target="org.nuxeo.ecm.core.convert.service.ConversionServiceImpl"
    point="converter">
 <!-- converter based on the pdftohml command line -->
 <converter name="pdf2html" class="org.nuxeo.ecm.platform.convert.plugins.PDF2HtmlConverter">
      <sourceMimeType>application/pdf</sourceMimeType>
      <destinationMimeType>text/html</destinationMimeType>
      <parameters>
        <parameter name="CommandLineName">pdftohtml</parameter>
      </parameters>
    </converter>

</extension>