org.apache.nutch.protocol.http
Class RobotRulesParser

java.lang.Object
  extended byorg.apache.nutch.protocol.http.RobotRulesParser

public class RobotRulesParser
extends Object

This class handles the parsing of robots.txt files. It emits RobotRules objects, which describe the download permissions as described in RobotRulesParser.

Author:
Tom Pierce, Mike Cafarella, Doug Cutting

Nested Class Summary
static class RobotRulesParser.RobotRuleSet
          This class holds the rules which were parsed from a robots.txt file, and can test paths against those rules.
 
Field Summary
static Logger LOG
           
 
Constructor Summary
RobotRulesParser()
           
RobotRulesParser(String[] robotNames)
          Creates a new RobotRulesParser which will use the supplied robotNames when choosing which stanza to follow in robots.txt files.
 
Method Summary
static boolean isAllowed(URL url)
           
static void main(String[] argv)
          command-line main for testing
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final Logger LOG
Constructor Detail

RobotRulesParser

public RobotRulesParser()

RobotRulesParser

public RobotRulesParser(String[] robotNames)
Creates a new RobotRulesParser which will use the supplied robotNames when choosing which stanza to follow in robots.txt files. Any name in the array may be matched. The order of the robotNames determines the precedence- if many names are matched, only the rules associated with the robot name having the smallest index will be used.

Method Detail

isAllowed

public static boolean isAllowed(URL url)
                         throws ProtocolException,
                                IOException
Throws:
ProtocolException
IOException

main

public static void main(String[] argv)
command-line main for testing



Copyright © 2006 The Apache Software Foundation