NutchDocumentAnalyzer (Nutch 0.7.2 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.nutch.analysis
Class NutchDocumentAnalyzer

java.lang.Object
  org.apache.lucene.analysis.Analyzer
      org.apache.nutch.analysis.NutchDocumentAnalyzer

public class NutchDocumentAnalyzer
extends Analyzer

The analyzer used for Nutch documents. Uses the JavaCC-defined lexical analyzer NutchDocumentTokenizer, with no stop list. This keeps it consistent with query parsing.

Field Summary
`static Analyzer`	`ANCHOR_ANALYZER` Analyzer used to analyze anchors.
`static Analyzer`	`CONTENT_ANALYZER` Analyzer used to index textual content.
`static int`	`INTER_ANCHOR_GAP` The number of unused term positions between anchors in the anchor field.

Constructor Summary
`NutchDocumentAnalyzer()`

Method Summary
`TokenStream`	`tokenStream(String fieldName, Reader reader)` Returns a new token stream for text from the named field.

Methods inherited from class org.apache.lucene.analysis.Analyzer

getPositionIncrementGap, tokenStream

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

CONTENT_ANALYZER

public static final Analyzer CONTENT_ANALYZER

Analyzer used to index textual content.

INTER_ANCHOR_GAP

public static final int INTER_ANCHOR_GAP

The number of unused term positions between anchors in the anchor field.

See Also:: Constant Field Values

ANCHOR_ANALYZER

public static final Analyzer ANCHOR_ANALYZER

Analyzer used to analyze anchors.

Constructor Detail