org.apache.nutch.parse.msword
Class MSWordParser

java.lang.Object
  extended byorg.apache.nutch.parse.msword.MSWordParser
All Implemented Interfaces:
Parser

public class MSWordParser
extends Object
implements Parser

parser for mime type application/msword. It is based on org.apache.poi.*. We have to see how well it performs.

Author:
John Xing Note on 20040614 by Xing: Some codes are stacked here for convenience (see inline comments). They may be moved to more appropriate places when new codebase stabilizes, especially after code for indexing is written., Andy Hedges code to extract all msword properties.

Field Summary
 
Fields inherited from interface org.apache.nutch.parse.Parser
X_POINT_ID
 
Constructor Summary
MSWordParser()
           
 
Method Summary
 Parse getParse(Content content)
          Creates the parse for some content.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MSWordParser

public MSWordParser()
Method Detail

getParse

public Parse getParse(Content content)
Description copied from interface: Parser
Creates the parse for some content.

Specified by:
getParse in interface Parser


Copyright © 2006 The Apache Software Foundation