Package org.apache.nutch.analysis.lang

Text document language identifier.

See:
          Description

Class Summary
HTMLLanguageParser An HtmlParseFilter that looks for possible indications of content language.
LanguageIdentifier Identify the language of a content, based on statistical analysis.
LanguageIndexingFilter An IndexingFilter that add a lang (language) field to the document.
LanguageQueryFilter A QueryFilter that handles "lang:" query clauses.
NGramProfile This class represents a ngram profile.
 

Package org.apache.nutch.analysis.lang Description

Text document language identifier.

Language profiles are based on material from http://www.isi.edu/~koehn/europarl/.



Copyright © 2006 The Apache Software Foundation