org.apache.nutch.protocol.http
Class RobotRulesParser
java.lang.Object
org.apache.nutch.protocol.http.RobotRulesParser
- public class RobotRulesParser
- extends Object
This class handles the parsing of robots.txt
files.
It emits RobotRules objects, which describe the download permissions
as described in RobotRulesParser.
- Author:
- Tom Pierce, Mike Cafarella, Doug Cutting
Nested Class Summary |
static class |
RobotRulesParser.RobotRuleSet
This class holds the rules which were parsed from a robots.txt
file, and can test paths against those rules. |
Constructor Summary |
RobotRulesParser()
|
RobotRulesParser(String[] robotNames)
Creates a new RobotRulesParser which will use the
supplied robotNames when choosing which stanza to
follow in robots.txt files. |
Method Summary |
static boolean |
isAllowed(URL url)
|
static void |
main(String[] argv)
command-line main for testing |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final Logger LOG
RobotRulesParser
public RobotRulesParser()
RobotRulesParser
public RobotRulesParser(String[] robotNames)
- Creates a new
RobotRulesParser
which will use the
supplied robotNames
when choosing which stanza to
follow in robots.txt
files. Any name in the array
may be matched. The order of the robotNames
determines the precedence- if many names are matched, only the
rules associated with the robot name having the smallest index
will be used.
isAllowed
public static boolean isAllowed(URL url)
throws ProtocolException,
IOException
- Throws:
ProtocolException
IOException
main
public static void main(String[] argv)
- command-line main for testing
Copyright © 2006 The Apache Software Foundation