org.apache.nutch.analysis
Class NutchDocumentAnalyzer

java.lang.Object
  extended byorg.apache.lucene.analysis.Analyzer
      extended byorg.apache.nutch.analysis.NutchDocumentAnalyzer

public class NutchDocumentAnalyzer
extends Analyzer

The analyzer used for Nutch documents. Uses the JavaCC-defined lexical analyzer NutchDocumentTokenizer, with no stop list. This keeps it consistent with query parsing.


Field Summary
static Analyzer ANCHOR_ANALYZER
          Analyzer used to analyze anchors.
static Analyzer CONTENT_ANALYZER
          Analyzer used to index textual content.
static int INTER_ANCHOR_GAP
          The number of unused term positions between anchors in the anchor field.
 
Constructor Summary
NutchDocumentAnalyzer()
           
 
Method Summary
 TokenStream tokenStream(String fieldName, Reader reader)
          Returns a new token stream for text from the named field.
 
Methods inherited from class org.apache.lucene.analysis.Analyzer
getPositionIncrementGap, tokenStream
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CONTENT_ANALYZER

public static final Analyzer CONTENT_ANALYZER
Analyzer used to index textual content.


INTER_ANCHOR_GAP

public static final int INTER_ANCHOR_GAP
The number of unused term positions between anchors in the anchor field.

See Also:
Constant Field Values

ANCHOR_ANALYZER

public static final Analyzer ANCHOR_ANALYZER
Analyzer used to analyze anchors.

Constructor Detail

NutchDocumentAnalyzer

public NutchDocumentAnalyzer()
Method Detail

tokenStream

public TokenStream tokenStream(String fieldName,
                               Reader reader)
Returns a new token stream for text from the named field.



Copyright © 2006 The Apache Software Foundation