public abstract class TikaPoweredContentTransformer extends AbstractContentTransformer2
ContentTransformer
implementations which are powered by Apache Tika.
To use Tika to transform some content into Text, Html or XML, create an
implementation of this / use the Auto Detect transformer.
For now, all transformers are registered as regular, rather than explicit
transformations. This should allow you to register your own explicit
transformers and have them nicely take priority.Modifier and Type | Field and Description |
---|---|
protected static java.lang.String |
LINE_BREAK
Windows carriage return line feed pair.
|
protected java.util.List |
sourceMimeTypes |
static java.lang.String |
WRONG_FORMAT_MESSAGE_ID |
transformerDebug
Modifier | Constructor and Description |
---|---|
protected |
TikaPoweredContentTransformer(java.util.List sourceMimeTypes) |
protected |
TikaPoweredContentTransformer(java.lang.String[] sourceMimeTypes) |
Modifier and Type | Method and Description |
---|---|
protected org.apache.tika.parser.ParseContext |
buildParseContext(org.apache.tika.metadata.Metadata metadata,
java.lang.String targetMimeType,
TransformationOptions options)
By default returns a ParseContent that does not recurse
|
protected org.xml.sax.ContentHandler |
getContentHandler(java.lang.String targetMimeType,
java.io.Writer output)
Returns an appropriate Tika ContentHandler for the
requested content type.
|
protected abstract org.apache.tika.parser.Parser |
getParser()
Returns the correct Tika Parser to process
the document.
|
boolean |
isTransformableMimetype(java.lang.String sourceMimetype,
java.lang.String targetMimetype,
TransformationOptions options)
Can we do the requested transformation via Tika?
We support transforming to HTML, XML or Text
|
void |
transformInternal(org.alfresco.service.cmr.repository.ContentReader reader,
org.alfresco.service.cmr.repository.ContentWriter writer,
TransformationOptions options)
Method to be implemented by subclasses wishing to make use of the common infrastructural code
provided by this class.
|
checkTransformable, getTransformationTime, recordTime, register, setProperties, setRegistry, toString, transform, transform, transform
getBeanName, getLimits, getLimits, getLimits, getMaxPages, getMaxSourceSizeKBytes, getMaxSourceSizeKBytes, getMimetypeLimits, getPageLimit, getReadLimitKBytes, getReadLimitTimeMs, getTimeoutMs, isPageLimitSupported, isTransformable, isTransformable, isTransformableSize, setBeanName, setLimits, setMaxPages, setMaxSourceSizeKBytes, setMimetypeLimits, setPageLimit, setPageLimitsSuported, setReaderLimits, setReadLimitKBytes, setReadLimitTimeMs, setTimeoutMs, setTransformerDebug
getMimetype, getMimetypeService, isExplicitTransformation, isSupportedTransformation, setExplicitTransformations, setMimetypeService, setSupportedTransformations
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
isExplicitTransformation
protected java.util.List sourceMimeTypes
protected static final java.lang.String LINE_BREAK
public static final java.lang.String WRONG_FORMAT_MESSAGE_ID
protected TikaPoweredContentTransformer(java.util.List sourceMimeTypes)
protected TikaPoweredContentTransformer(java.lang.String[] sourceMimeTypes)
protected abstract org.apache.tika.parser.Parser getParser()
TikaAutoContentTransformer
which
makes use of the Tika auto-detection.public boolean isTransformableMimetype(java.lang.String sourceMimetype, java.lang.String targetMimetype, TransformationOptions options)
isTransformableMimetype
in interface ContentTransformer
isTransformableMimetype
in class AbstractContentTransformerLimits
sourceMimetype
- the source mimetypetargetMimetype
- the target mimetypeoptions
- the transformation optionsprotected org.xml.sax.ContentHandler getContentHandler(java.lang.String targetMimeType, java.io.Writer output) throws javax.xml.transform.TransformerConfigurationException
javax.xml.transform.TransformerConfigurationException
protected org.apache.tika.parser.ParseContext buildParseContext(org.apache.tika.metadata.Metadata metadata, java.lang.String targetMimeType, TransformationOptions options)
public void transformInternal(org.alfresco.service.cmr.repository.ContentReader reader, org.alfresco.service.cmr.repository.ContentWriter writer, TransformationOptions options) throws java.lang.Exception
AbstractContentTransformer2
transformInternal
in class AbstractContentTransformer2
reader
- the source of the content to transformwriter
- the target to which to write the transformed contentoptions
- a map of options to use when performing the transformation. The map
will never be null.java.lang.Exception
- exceptions will be handled by this class - subclasses can throw anythingCopyright © 2005 - 2010 Alfresco Software, Inc. All Rights Reserved.