The nuxeo-core-convert
provides a service to
manage conversion of Blobs from one format to an other.
nuxeo-core-convert
is only available starting
with 5.2-M4.
In 5.2 Conversion Service replaces the Transformation service that is now deprecated.
Transformation service had some API design issues that we wanted to correct. Because full text indexing is now handled by the repository we also had to have a core service to manage full text conversion. We decided to define a brand new API with a new service and we changed the service name to avoid any confusion and be able to provide backward compatibility.
API is now simpler
there are only converters, not transformers and plugins like before
ConversionService includes a caching system
this eliminate the need of having custom cache managed by all high level services that may use converters (like the preview service)
Data input/output is now handled via
BlobHolder
interface.
There is only data structure (no more TransformDocuments or
plain Blobs). This also makes the caching system more efficient
sice link between the blobs and the associated
DocumentModel
is preserved when
available.
Availability check API interface.
ConversionService now provides an API to know if converter is available, this is usefull when the converter depends on an external program that must be installed on the server (like OpenOffice server)
Transformer API is now deprecated and the old transform-service implementations and plugins have been removed from default distribution. Nevertheless, we provide a tranformer-compat bundle that handles compatibity between the old and the new API.
In order to activate this compatibility you need to deploy :
nuxeo-platform-transform-api
nuxeo-platform-transform-compat
Transformers API is stil available, but the implementation now wraps calls to the ConversionService. This means you can use the new Converters from the old Tranformation service API. All default included Transformers have been migrated to converters with the same name. Code that was using transformers should still work in 5.2.
Contributions to the old Transformation service are now contributed to the ConversionService using specific converters that wraps transformers or Plugins.
There are some limitations thought :
Transformers must be based on the TransformerImpl class
you may have to change the dependencies in pom.xml and MANIFEST.MF to point to the compat artifact instead of the core one.
Inside nuxeo-core-convert-plugins
pdf2text, xml2text, html2text, word2text, xl2text, ppt2text, oo2text
text extractors for common office formats
rfc822totext
text extractors mime encoded mails
any2text
meta-converter for text extraction
Inside nuxeo-platform-convert
pdf2html
PDF to html conversion based on pdftohtml command line tool
office2html
convert standard office formats to html (uses openoffice)
any2html
compound converted to convert any input to html
any2pdf
cuse OpenOffice to generate PDF
Conversion Service can be accessed via the standard Nuxeo Service lookup :
ConversionService conversionService = Framework.getService(ConversionService.class);
To convert a BlobHolder
to a given
destination mime type :
BlobHolder result = conversionService.convertToMimeType("text/plain", blobHolder, params);
params is a simple Map<String,Serializable>
to pass parameters to the converter (can be null);
To use a known converter :
BlobHolder result = conversionService.convert("converterName", blobHolder, params);
To find a converter to a given conversion :
String converterName = conversionService.getConverterName(sourceMimeType, destinationMimeType);
To test if a converter is available
ConverterCheckResult checkResult = conversionService.isConverterAvailable("converterName");
This call can throw
ConverterNotRegistred
if the target converter
does not exist at all. The ConverterCheckResult
class provides :
a isAvailable() method
a getErrorMessage() method
returns the error that occured while doing the availability check
a getInstallationMessage method
Return the installation message that was contributed by the converter contributor
The conversion service supports a global configuration via XML file in order to configure caching.
<?xml version="1.0"?> <component name="org.nuxeo.ecm.core.convert.config"> <extension target="org.nuxeo.ecm.core.convert.service.ConversionServiceImpl" point="configuration"> <configuration> <!-- define directory location for caching : default to java default tmp dir (java.io.tmpdir) --> <cachingDirectory>/var/ConversionCache</cachingDirectory> <!-- GC interval in minutes (default = 10 minutes ) --> <gcInterval>10</gcInterval> <!-- maximum size for disk cache in KB (default to 10*1024) --> <diskCacheSize>1024</diskCacheSize> <!-- Enables or disables caching (default = true)--> <enableCache>true</enableCache> </configuration> </extension> </component>
To contribute a converter, you have to contribute a class that
implement the
org.nuxeo.ecm.core.convert.extension.Converter
interface. This class will be associated to :
a converter name
a list of source mime-types
one destination mime-type
optional named parameters
<extension target="org.nuxeo.ecm.core.convert.service.ConversionServiceImpl" point="converter"> <converter name="html2text" class="org.nuxeo.ecm.core.convert.plugins.text.extractors.Html2TextConverter"> <sourceMimeType>text/html</sourceMimeType> <sourceMimeType>text/xhtml</sourceMimeType> <destinationMimeType>text/plain</destinationMimeType> <parameters> <parameter name="myParam">myValue</parameter> </parameters> </converter> </extension>
You can also contribute a converter that is a chain of existing converters (what was called a transformer in 5.1 transform service API). To to this, the contributed transformer does not have to define an implementation class, just a chain of either converters or mime-types. If mime-types are used, the conversion service will automatically guess the converter chain from the mime-types steps.
<extension target="org.nuxeo.ecm.core.convert.service.ConversionServiceImpl" point="converter"> <!-- explicit chain of 2 converters : converter1 + converter2 --> <converter name="chainedConverter" > <sourceMimeType>some/mimetype</sourceMimeType> <destinationMimeType>some/other-mimetype</destinationMimeType> <conversionSteps> <subconverter>converter1</subconverter> <subconverter>converter2</subconverter> </conversionSteps> </converter> <!-- define chain via mime types : foo/bar1 => foo/bar2 => foo/bar3 --> <converter name="chainedMimeType" > <sourceMimeType>foo/bar1</sourceMimeType> <destinationMimeType>foo/bar3</destinationMimeType> <conversionSteps> <step>foo/bar2</step> </conversionSteps> </converter> </extension>
When using chained converters, the additionnal optionnal parameters are passed to each underlying converter.
Converter based on external tools (such as command line or
OpenOffice server based) can implement the
ExternalConverter
interface. This interfaces adds
a isConverterAvailable() method that will be called in order to check
converter availability.
A lot of conversion tools comes as command line executable. So in some case it's interesting to wraps these command lines into a converter.
For that purpose, we provide a base class for converters that are based on a command line wrapped by the nuxeo commandLine service.
The base class
org.nuxeo.ecm.platform.convert.plugins.CommandLineBasedConverter
handles all the dirty work, and you only have to override the methods to
define the parameters of the command line and the parsing of the
output.
<extension target="org.nuxeo.ecm.core.convert.service.ConversionServiceImpl" point="converter"> <!-- converter based on the pdftohml command line --> <converter name="pdf2html" class="org.nuxeo.ecm.platform.convert.plugins.PDF2HtmlConverter"> <sourceMimeType>application/pdf</sourceMimeType> <destinationMimeType>text/html</destinationMimeType> <parameters> <parameter name="CommandLineName">pdftohtml</parameter> </parameters> </converter> </extension>