org.apache.nutch.parse.msword
Class WordExtractor
java.lang.Object
org.apache.nutch.parse.msword.WordExtractor
- public class WordExtractor
- extends Object
This class extracts the text from a Word 6.0/95/97/2000/XP word doc
- Author:
- Ryan Ackley, Andy Hedges
code to extract all msword properties.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
WordExtractor
public WordExtractor()
- Constructor
extractText
public String extractText(InputStream in)
throws Exception
- Gets the text from a Word document.
- Parameters:
in
- The InputStream representing the Word file.
- Throws:
Exception
extractProperties
public Properties extractProperties(InputStream in)
throws IOException
- Throws:
IOException
Copyright © 2006 The Apache Software Foundation