org.apache.nutch.db
Class WebDBReader

java.lang.Object
  extended byorg.apache.nutch.db.WebDBReader
All Implemented Interfaces:
IWebDBReader

public class WebDBReader
extends Object
implements IWebDBReader

The WebDBReader implements all the read-only parts of accessing our web database. All the writing ones can be found in WebDBWriter.

Author:
Mike Cafarella

Constructor Summary
WebDBReader(NutchFileSystem nfs, File dbDir)
          Open a web db reader for the named directory.
 
Method Summary
 void close()
          Shutdown
 Link[] getLinks(MD5Hash md5)
          Grab all the links from the given MD5 hash.
 Link[] getLinks(UTF8 url)
          Get all the hyperlinks that link TO the indicated URL.
 Page getPage(String url)
          Get Page from the pagedb with the given URL
 Page[] getPages(MD5Hash md5)
          Get Pages from the pagedb according to their content hash.
 Enumeration links()
          Return all the links, by target URL
static void main(String[] argv)
          The WebDBReader.main() provides some handy utility methods for looking through the contents of the webdb.
 long numLinks()
          Return the number of links in our db.
 long numPages()
          Return the number of pages we're dealing with
 boolean pageExists(MD5Hash md5)
          Test whether a certain piece of content is in the database, but don't bother returning the Page(s) itself.
 Enumeration pages()
          Iterate through all the Pages, sorted by URL
 Enumeration pagesByMD5()
          Iterate through all the Pages, sorted by MD5
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WebDBReader

public WebDBReader(NutchFileSystem nfs,
                   File dbDir)
            throws IOException,
                   FileNotFoundException
Open a web db reader for the named directory.

Method Detail

close

public void close()
           throws IOException
Shutdown

Specified by:
close in interface IWebDBReader
Throws:
IOException

getPage

public Page getPage(String url)
             throws IOException
Get Page from the pagedb with the given URL

Specified by:
getPage in interface IWebDBReader
Throws:
IOException

getPages

public Page[] getPages(MD5Hash md5)
                throws IOException
Get Pages from the pagedb according to their content hash.

Specified by:
getPages in interface IWebDBReader
Throws:
IOException

pageExists

public boolean pageExists(MD5Hash md5)
                   throws IOException
Test whether a certain piece of content is in the database, but don't bother returning the Page(s) itself.

Specified by:
pageExists in interface IWebDBReader
Throws:
IOException

pages

public Enumeration pages()
                  throws IOException
Iterate through all the Pages, sorted by URL

Specified by:
pages in interface IWebDBReader
Throws:
IOException

pagesByMD5

public Enumeration pagesByMD5()
                       throws IOException
Iterate through all the Pages, sorted by MD5

Specified by:
pagesByMD5 in interface IWebDBReader
Throws:
IOException

numPages

public long numPages()
Return the number of pages we're dealing with

Specified by:
numPages in interface IWebDBReader

getLinks

public Link[] getLinks(UTF8 url)
                throws IOException
Get all the hyperlinks that link TO the indicated URL.

Specified by:
getLinks in interface IWebDBReader
Throws:
IOException

getLinks

public Link[] getLinks(MD5Hash md5)
                throws IOException
Grab all the links from the given MD5 hash.

Specified by:
getLinks in interface IWebDBReader
Throws:
IOException

links

public Enumeration links()
Return all the links, by target URL

Specified by:
links in interface IWebDBReader

numLinks

public long numLinks()
Return the number of links in our db.

Specified by:
numLinks in interface IWebDBReader

main

public static void main(String[] argv)
                 throws FileNotFoundException,
                        IOException
The WebDBReader.main() provides some handy utility methods for looking through the contents of the webdb. Hoo-boy!

Throws:
FileNotFoundException
IOException


Copyright © 2006 The Apache Software Foundation