org.apache.nutch.net
Class RegexURLFilter
java.lang.Object
org.apache.nutch.net.RegexURLFilter
- All Implemented Interfaces:
- URLFilter
- public class RegexURLFilter
- extends Object
- implements URLFilter
Filters URLs based on a file of regular expressions. The file is named by
(1) property "urlfilter.regex.file" in ./conf/nutch-default.xml, and
(2) attribute "file" in plugin.xml of this plugin
Attribute "file" has higher precedence if defined.
The format of this file is:
[+-]
where plus means go ahead and index it and minus means no.
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
RegexURLFilter
public RegexURLFilter()
throws IOException,
org.apache.oro.text.regex.MalformedPatternException
RegexURLFilter
public RegexURLFilter(String filename)
throws IOException,
org.apache.oro.text.regex.MalformedPatternException
filter
public String filter(String url)
- Specified by:
filter
in interface URLFilter
main
public static void main(String[] args)
throws IOException,
org.apache.oro.text.regex.MalformedPatternException
- Throws:
IOException
org.apache.oro.text.regex.MalformedPatternException
Copyright © 2006 The Apache Software Foundation