org.apache.nutch.db
Class WebDBWriter

java.lang.Object
  extended byorg.apache.nutch.db.WebDBWriter
All Implemented Interfaces:
IWebDBWriter

public class WebDBWriter
extends Object
implements IWebDBWriter

This is a wrapper class that allows us to reorder write operations to the linkdb and pagedb. It is useful only for objects like UpdateDatabaseTool, which just does writes. The WebDBWriter is a traditional single-pass database writer. It does not cache any instructions to disk (but it does in memory, with possible resorting). It certainly does nothing in a distributed fashion. There are other implementors of IWebDBWriter that do all that fancy stuff.

Author:
Mike Cafarella

Nested Class Summary
static class WebDBWriter.LinkInstruction
          Holds an instruction over a Link.
static class WebDBWriter.LinkInstructionWriter
          LinkInstructionWriter very efficiently writes a LinkInstruction to a SequenceFile.Writer.
static class WebDBWriter.PageInstruction
          PageInstruction holds an operation over a Page.
static class WebDBWriter.PageInstructionWriter
          PageInstructionWriter very efficiently writes a PageInstruction to a SequenceFile.Writer.
 
Constructor Summary
WebDBWriter(NutchFileSystem fs, File dbDir)
          Create a WebDBWriter.
 
Method Summary
 void addLink(Link lr)
          Add a link to the link database
 void addPage(Page page)
          Add a page to the page database
 void addPageIfNotPresent(Page page)
          Don't replace the one in the database, if there is one.
 void addPageIfNotPresent(Page page, Link link)
          Don't replace the one in the database, if there is one.
 void addPageWithScore(Page page)
          Add a page to the page database, with a brand-new score
 void close()
          Shutdown
static void createWebDB(NutchFileSystem nfs, File dbDir)
          Create the WebDB for the first time.
 void deleteLink(MD5Hash md5)
          Remove links with the given MD5 from the db.
 void deletePage(String url)
          Remove a page from the page database.
static void main(String[] argv)
          The WebDBWriter.main() provides some handy methods for testing the WebDB.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WebDBWriter

public WebDBWriter(NutchFileSystem fs,
                   File dbDir)
            throws IOException
Create a WebDBWriter.

Method Detail

createWebDB

public static void createWebDB(NutchFileSystem nfs,
                               File dbDir)
                        throws IOException
Create the WebDB for the first time.

Throws:
IOException

close

public void close()
           throws IOException
Shutdown

Specified by:
close in interface IWebDBWriter
Throws:
IOException

addPage

public void addPage(Page page)
             throws IOException
Add a page to the page database

Specified by:
addPage in interface IWebDBWriter
Throws:
IOException

addPageWithScore

public void addPageWithScore(Page page)
                      throws IOException
Add a page to the page database, with a brand-new score

Specified by:
addPageWithScore in interface IWebDBWriter
Throws:
IOException

addPageIfNotPresent

public void addPageIfNotPresent(Page page)
                         throws IOException
Don't replace the one in the database, if there is one.

Specified by:
addPageIfNotPresent in interface IWebDBWriter
Throws:
IOException

addPageIfNotPresent

public void addPageIfNotPresent(Page page,
                                Link link)
                         throws IOException
Don't replace the one in the database, if there is one. If we do insert the new Page, then we should also insert the given Link object.

Specified by:
addPageIfNotPresent in interface IWebDBWriter
Throws:
IOException

deletePage

public void deletePage(String url)
                throws IOException
Remove a page from the page database.

Specified by:
deletePage in interface IWebDBWriter
Throws:
IOException

addLink

public void addLink(Link lr)
             throws IOException
Add a link to the link database

Specified by:
addLink in interface IWebDBWriter
Throws:
IOException

deleteLink

public void deleteLink(MD5Hash md5)
                throws IOException
Remove links with the given MD5 from the db.

Throws:
IOException

main

public static void main(String[] argv)
                 throws FileNotFoundException,
                        IOException
The WebDBWriter.main() provides some handy methods for testing the WebDB.

Throws:
FileNotFoundException
IOException


Copyright © 2006 The Apache Software Foundation