org.apache.nutch.analysis
Class NutchDocumentAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.nutch.analysis.NutchDocumentAnalyzer
- public class NutchDocumentAnalyzer
- extends Analyzer
The analyzer used for Nutch documents. Uses the JavaCC-defined lexical
analyzer NutchDocumentTokenizer
, with no stop list. This keeps it
consistent with query parsing.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
CONTENT_ANALYZER
public static final Analyzer CONTENT_ANALYZER
- Analyzer used to index textual content.
INTER_ANCHOR_GAP
public static final int INTER_ANCHOR_GAP
- The number of unused term positions between anchors in the anchor
field.
- See Also:
- Constant Field Values
ANCHOR_ANALYZER
public static final Analyzer ANCHOR_ANALYZER
- Analyzer used to analyze anchors.
NutchDocumentAnalyzer
public NutchDocumentAnalyzer()
tokenStream
public TokenStream tokenStream(String fieldName,
Reader reader)
- Returns a new token stream for text from the named field.
Copyright © 2006 The Apache Software Foundation