org.apache.nutch.analysis
Class NutchAnalysis

java.lang.Object
  extended byorg.apache.nutch.analysis.NutchAnalysis
All Implemented Interfaces:
NutchAnalysisConstants

public class NutchAnalysis
extends Object
implements NutchAnalysisConstants

The JavaCC-generated Nutch lexical analyzer and query parser.


Field Summary
 org.apache.nutch.analysis.Token jj_nt
           
 boolean lookingAhead
           
 org.apache.nutch.analysis.Token token
           
 NutchAnalysisTokenManager token_source
           
 
Fields inherited from interface org.apache.nutch.analysis.NutchAnalysisConstants
ACRONYM, APOSTROPHE, ATSIGN, C_PLUS_PLUS, C_SHARP, CJK, COLON, DEFAULT, DIGIT, DOT, EOF, IRREGULAR_WORD, LETTER, MINUS, PLUS, QUOTE, SIGRAM, SLASH, tokenImage, WHITE, WORD, WORD_PUNCT
 
Constructor Summary
NutchAnalysis(org.apache.nutch.analysis.CharStream stream)
           
NutchAnalysis(NutchAnalysisTokenManager tm)
           
 
Method Summary
 ArrayList compound(String field)
          Parse a compound term that is interpreted as an implicit phrase query.
 void disable_tracing()
           
 void enable_tracing()
           
 org.apache.nutch.analysis.ParseException generateParseException()
           
 org.apache.nutch.analysis.Token getNextToken()
           
 org.apache.nutch.analysis.Token getToken(int index)
           
 void infix()
          Characters which can be used to form compound terms.
static boolean isStopWord(String word)
          True iff word is a stop word.
static void main(String[] args)
          For debugging.
 void nonOpInfix()
          Parse infix characters except plus and minus.
 void nonOpOrTerm()
          Parse anything but a term or an operator (plur or minus or quote).
 void nonTerm()
          Parse anything but a term or a quote.
 void nonTermOrEOF()
           
 Query parse()
          Parse a query.
static Query parseQuery(String queryString)
          Construct a query parser for the text in a reader.
 ArrayList phrase(String field)
          Parse an explcitly quoted phrase query.
 void ReInit(org.apache.nutch.analysis.CharStream stream)
           
 void ReInit(NutchAnalysisTokenManager tm)
           
 String term()
          Parse a single term.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

token_source

public NutchAnalysisTokenManager token_source

token

public org.apache.nutch.analysis.Token token

jj_nt

public org.apache.nutch.analysis.Token jj_nt

lookingAhead

public boolean lookingAhead
Constructor Detail

NutchAnalysis

public NutchAnalysis(org.apache.nutch.analysis.CharStream stream)

NutchAnalysis

public NutchAnalysis(NutchAnalysisTokenManager tm)
Method Detail

isStopWord

public static boolean isStopWord(String word)
True iff word is a stop word. Stop words are only removed from queries. Every word is indexed.


parseQuery

public static Query parseQuery(String queryString)
                        throws IOException
Construct a query parser for the text in a reader.

Throws:
IOException

main

public static void main(String[] args)
                 throws Exception
For debugging.

Throws:
Exception

parse

public final Query parse()
                  throws org.apache.nutch.analysis.ParseException
Parse a query.

Throws:
org.apache.nutch.analysis.ParseException

phrase

public final ArrayList phrase(String field)
                       throws org.apache.nutch.analysis.ParseException
Parse an explcitly quoted phrase query. Note that this may return a single term, a trivial phrase.

Throws:
org.apache.nutch.analysis.ParseException

compound

public final ArrayList compound(String field)
                         throws org.apache.nutch.analysis.ParseException
Parse a compound term that is interpreted as an implicit phrase query. Compounds are a sequence of terms separated by infix characters. Note that htis may return a single term, a trivial compound.

Throws:
org.apache.nutch.analysis.ParseException

term

public final String term()
                  throws org.apache.nutch.analysis.ParseException
Parse a single term.

Throws:
org.apache.nutch.analysis.ParseException

nonTerm

public final void nonTerm()
                   throws org.apache.nutch.analysis.ParseException
Parse anything but a term or a quote.

Throws:
org.apache.nutch.analysis.ParseException

nonTermOrEOF

public final void nonTermOrEOF()
                        throws org.apache.nutch.analysis.ParseException
Throws:
org.apache.nutch.analysis.ParseException

nonOpOrTerm

public final void nonOpOrTerm()
                       throws org.apache.nutch.analysis.ParseException
Parse anything but a term or an operator (plur or minus or quote).

Throws:
org.apache.nutch.analysis.ParseException

infix

public final void infix()
                 throws org.apache.nutch.analysis.ParseException
Characters which can be used to form compound terms.

Throws:
org.apache.nutch.analysis.ParseException

nonOpInfix

public final void nonOpInfix()
                      throws org.apache.nutch.analysis.ParseException
Parse infix characters except plus and minus.

Throws:
org.apache.nutch.analysis.ParseException

ReInit

public void ReInit(org.apache.nutch.analysis.CharStream stream)

ReInit

public void ReInit(NutchAnalysisTokenManager tm)

getNextToken

public final org.apache.nutch.analysis.Token getNextToken()

getToken

public final org.apache.nutch.analysis.Token getToken(int index)

generateParseException

public org.apache.nutch.analysis.ParseException generateParseException()

enable_tracing

public final void enable_tracing()

disable_tracing

public final void disable_tracing()


Copyright © 2006 The Apache Software Foundation