org.apache.nutch.net
Class RegexURLFilter

java.lang.Object
  extended byorg.apache.nutch.net.RegexURLFilter
All Implemented Interfaces:
URLFilter

public class RegexURLFilter
extends Object
implements URLFilter

Filters URLs based on a file of regular expressions. The file is named by (1) property "urlfilter.regex.file" in ./conf/nutch-default.xml, and (2) attribute "file" in plugin.xml of this plugin Attribute "file" has higher precedence if defined.

The format of this file is:

 [+-]
 
where plus means go ahead and index it and minus means no.


Field Summary
 
Fields inherited from interface org.apache.nutch.net.URLFilter
X_POINT_ID
 
Constructor Summary
RegexURLFilter()
           
RegexURLFilter(String filename)
           
 
Method Summary
 String filter(String url)
           
static void main(String[] args)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RegexURLFilter

public RegexURLFilter()
               throws IOException,
                      org.apache.oro.text.regex.MalformedPatternException

RegexURLFilter

public RegexURLFilter(String filename)
               throws IOException,
                      org.apache.oro.text.regex.MalformedPatternException
Method Detail

filter

public String filter(String url)
Specified by:
filter in interface URLFilter

main

public static void main(String[] args)
                 throws IOException,
                        org.apache.oro.text.regex.MalformedPatternException
Throws:
IOException
org.apache.oro.text.regex.MalformedPatternException


Copyright © 2006 The Apache Software Foundation