org.apache.nutch.db
Class DBSectionReader

java.lang.Object
  extended byorg.apache.nutch.db.DBSectionReader

public class DBSectionReader
extends Object

DBSectionReader reads a discrete portion of a WebDB. It may implement its methods with either a local MapFile.Reader object or (eventually) a remote- machine network interface. For the moment, we do only the MapFile.Reader implementation (much of the code for this was moved from the earlier pre-distributed version of WebDBReadaer).

Author:
Mike Cafarella

Constructor Summary
DBSectionReader(NutchFileSystem nfs, File sectionFile, WritableComparator comparator)
          Right now we assume we're getting a File that is a MapFile.Reader directory.
 
Method Summary
 void close()
           
 Vector getLinks(MD5Hash md5)
          Grab all the links from the given MD5 hash.
 Vector getLinks(UTF8 url)
          Get all the hyperlinks that link TO the indicated URL.
 Page getPage(UTF8 url, Page p)
          Fetch a Page with the given URL, and fill it into the pre-allocated Page 'p'.
 Vector getPages(MD5Hash md5)
          Get Pages from the db according to their content hash.
 Enumeration links()
          Return all the links, by target URL
 boolean pageExists(MD5Hash md5)
          Test whether a certain piece of content is in the db, but don't bother returning it.
 Enumeration pages()
          Iterate through all the Pages, sorted by URL
 Enumeration pagesByMD5()
          Iterate through all the Pages, sorted by MD5
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DBSectionReader

public DBSectionReader(NutchFileSystem nfs,
                       File sectionFile,
                       WritableComparator comparator)
                throws IOException
Right now we assume we're getting a File that is a MapFile.Reader directory. But in the future we could also check for existence of a "remote-network" file, similar to the way we do now for distributed index reading. Then, we would either create a MapFile.Reader or a network client for one.

Method Detail

getPage

public Page getPage(UTF8 url,
                    Page p)
             throws IOException
Fetch a Page with the given URL, and fill it into the pre-allocated Page 'p'.

Throws:
IOException

getPages

public Vector getPages(MD5Hash md5)
                throws IOException
Get Pages from the db according to their content hash.

Throws:
IOException

pageExists

public boolean pageExists(MD5Hash md5)
                   throws IOException
Test whether a certain piece of content is in the db, but don't bother returning it.

Throws:
IOException

pages

public Enumeration pages()
                  throws IOException
Iterate through all the Pages, sorted by URL

Throws:
IOException

pagesByMD5

public Enumeration pagesByMD5()
                       throws IOException
Iterate through all the Pages, sorted by MD5

Throws:
IOException

getLinks

public Vector getLinks(UTF8 url)
                throws IOException
Get all the hyperlinks that link TO the indicated URL.

Throws:
IOException

getLinks

public Vector getLinks(MD5Hash md5)
                throws IOException
Grab all the links from the given MD5 hash.

Throws:
IOException

links

public Enumeration links()
                  throws IOException
Return all the links, by target URL

Throws:
IOException

close

public void close()
           throws IOException
Throws:
IOException


Copyright © 2006 The Apache Software Foundation