NutchDocumentTokenizer (Nutch 0.7.2 API)

Overview

Package

Class

Use

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.nutch.analysis
Class NutchDocumentTokenizer

java.lang.Object
  org.apache.lucene.analysis.TokenStream
      org.apache.lucene.analysis.Tokenizer
          org.apache.nutch.analysis.NutchDocumentTokenizer

All Implemented Interfaces:: NutchAnalysisConstants

public final class NutchDocumentTokenizer
extends Tokenizer
implements NutchAnalysisConstants

The tokenizer used for Nutch document text. Implemented in terms of our JavaCC-generated lexical analyzer, NutchAnalysisTokenManager, shared with the query parser.

Field Summary

Fields inherited from class org.apache.lucene.analysis.Tokenizer

input

Fields inherited from interface org.apache.nutch.analysis.NutchAnalysisConstants

ACRONYM, APOSTROPHE, ATSIGN, C_PLUS_PLUS, C_SHARP, CJK, COLON, DEFAULT, DIGIT, DOT, EOF, IRREGULAR_WORD, LETTER, MINUS, PLUS, QUOTE, SIGRAM, SLASH, tokenImage, WHITE, WORD, WORD_PUNCT

Constructor Summary
`NutchDocumentTokenizer(Reader reader)` Construct a tokenizer for the text in a Reader.

Method Summary
`static void`	`main(String[] args)` For debugging.
`Token`	`next()` Returns the next token in the stream, or null at EOF.

Methods inherited from class org.apache.lucene.analysis.Tokenizer

close

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail