|
|||||||||||
PREV NEXT | FRAMES NO FRAMES |
Packages that use Parse | |
org.apache.nutch.analysis.lang | Text document language identifier. |
org.apache.nutch.indexer | Maintain Lucene full-text indexes. |
org.apache.nutch.indexer.basic | A basic indexing plugin. |
org.apache.nutch.indexer.more | A more indexing plugin. |
org.apache.nutch.parse | |
org.apache.nutch.parse.html | An HTML document parsing plugin. |
org.apache.nutch.parse.js | |
org.apache.nutch.parse.msword | A Word document parsing plugin. |
org.apache.nutch.parse.pdf | A pdf parsing plugin. |
org.apache.nutch.parse.text | A plain text parsing plugin. |
org.creativecommons.nutch | Sample plugins that parse and index Creative Commons medadata. |
Uses of Parse in org.apache.nutch.analysis.lang |
Methods in org.apache.nutch.analysis.lang that return Parse | |
Parse |
HTMLLanguageParser.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Scan the HTML document looking at possible indications of content language. |
Methods in org.apache.nutch.analysis.lang with parameters of type Parse | |
Parse |
HTMLLanguageParser.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Scan the HTML document looking at possible indications of content language. |
Document |
LanguageIndexingFilter.filter(Document doc,
Parse parse,
FetcherOutput fo)
|
Uses of Parse in org.apache.nutch.indexer |
Methods in org.apache.nutch.indexer with parameters of type Parse | |
Document |
IndexingFilter.filter(Document doc,
Parse parse,
FetcherOutput fo)
Adds fields or otherwise modifies the document that will be indexed for a parse. |
static Document |
IndexingFilters.filter(Document doc,
Parse parse,
FetcherOutput fo)
Run all defined filters. |
Uses of Parse in org.apache.nutch.indexer.basic |
Methods in org.apache.nutch.indexer.basic with parameters of type Parse | |
Document |
BasicIndexingFilter.filter(Document doc,
Parse parse,
FetcherOutput fo)
|
Uses of Parse in org.apache.nutch.indexer.more |
Methods in org.apache.nutch.indexer.more with parameters of type Parse | |
Document |
MoreIndexingFilter.filter(Document doc,
Parse parse,
FetcherOutput fo)
|
Uses of Parse in org.apache.nutch.parse |
Classes in org.apache.nutch.parse that implement Parse | |
class |
ParseImpl
The result of parsing a page's raw content. |
Methods in org.apache.nutch.parse that return Parse | |
Parse |
HtmlParseFilter.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page. |
Parse |
ParseStatus.getEmptyParse()
A convenience method. |
Parse |
Parser.getParse(Content c)
Creates the parse for some content. |
static Parse |
HtmlParseFilters.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Run all defined filters. |
Methods in org.apache.nutch.parse with parameters of type Parse | |
Parse |
HtmlParseFilter.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page. |
static Parse |
HtmlParseFilters.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Run all defined filters. |
Uses of Parse in org.apache.nutch.parse.html |
Methods in org.apache.nutch.parse.html that return Parse | |
Parse |
HtmlParser.getParse(Content content)
|
Uses of Parse in org.apache.nutch.parse.js |
Methods in org.apache.nutch.parse.js that return Parse | |
Parse |
JSParseFilter.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
|
Parse |
JSParseFilter.getParse(Content c)
|
Methods in org.apache.nutch.parse.js with parameters of type Parse | |
Parse |
JSParseFilter.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
|
Uses of Parse in org.apache.nutch.parse.msword |
Methods in org.apache.nutch.parse.msword that return Parse | |
Parse |
MSWordParser.getParse(Content content)
|
Uses of Parse in org.apache.nutch.parse.pdf |
Methods in org.apache.nutch.parse.pdf that return Parse | |
Parse |
PdfParser.getParse(Content content)
|
Uses of Parse in org.apache.nutch.parse.text |
Methods in org.apache.nutch.parse.text that return Parse | |
Parse |
TextParser.getParse(Content content)
|
Uses of Parse in org.creativecommons.nutch |
Methods in org.creativecommons.nutch that return Parse | |
Parse |
CCParseFilter.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page. |
Methods in org.creativecommons.nutch with parameters of type Parse | |
Document |
CCIndexingFilter.filter(Document doc,
Parse parse,
FetcherOutput fo)
|
Parse |
CCParseFilter.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page. |
|
|||||||||||
PREV NEXT | FRAMES NO FRAMES |