org.apache.nutch.parse.js
Class JSParseFilter
java.lang.Object
org.apache.nutch.parse.js.JSParseFilter
- All Implemented Interfaces:
- HtmlParseFilter, Parser
- public class JSParseFilter
- extends Object
- implements HtmlParseFilter, Parser
This class is a heuristic link extractor for JavaScript files and
code snippets. The general idea of a two-pass regex matching comes from
Heritrix. Parts of the code come from OutlinkExtractor.java
by Stephan Strittmatter.
- Author:
- Andrzej Bialecki <[email protected]>
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final Logger LOG
JSParseFilter
public JSParseFilter()
filter
public Parse filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
- Description copied from interface:
HtmlParseFilter
- Adds metadata or otherwise modifies a parse of HTML content, given
the DOM tree of a page.
- Specified by:
filter
in interface HtmlParseFilter
getParse
public Parse getParse(Content c)
- Description copied from interface:
Parser
- Creates the parse for some content.
- Specified by:
getParse
in interface Parser
main
public static void main(String[] args)
throws Exception
- Throws:
Exception
Copyright © 2006 The Apache Software Foundation