org.apache.nutch.parse.msword
Class WordExtractor

java.lang.Object
  extended byorg.apache.nutch.parse.msword.WordExtractor

public class WordExtractor
extends Object

This class extracts the text from a Word 6.0/95/97/2000/XP word doc

Author:
Ryan Ackley, Andy Hedges code to extract all msword properties.

Constructor Summary
WordExtractor()
          Constructor
 
Method Summary
 Properties extractProperties(InputStream in)
           
 String extractText(InputStream in)
          Gets the text from a Word document.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WordExtractor

public WordExtractor()
Constructor

Method Detail

extractText

public String extractText(InputStream in)
                   throws Exception
Gets the text from a Word document.

Parameters:
in - The InputStream representing the Word file.
Throws:
Exception

extractProperties

public Properties extractProperties(InputStream in)
                             throws IOException
Throws:
IOException


Copyright © 2006 The Apache Software Foundation