org.apache.nutch.tools
Class UpdateDatabaseTool

java.lang.Object
  extended byorg.apache.nutch.tools.UpdateDatabaseTool

public class UpdateDatabaseTool
extends Object

This class takes the output of the fetcher and updates the page and link DBs accordingly. Eventually, as the database scales, this will broken into several phases, each consuming and emitting batch files, but, for now, we're doing it all here.

Author:
Doug Cutting

Field Summary
static boolean IGNORE_INTERNAL_LINKS
           
static Logger LOG
           
static int MAX_OUTLINKS_PER_PAGE
           
static float NEW_EXTERNAL_LINK_FACTOR
           
static float NEW_INTERNAL_LINK_FACTOR
           
 
Constructor Summary
UpdateDatabaseTool(IWebDBWriter webdb, boolean additionsAllowed, int maxCount)
          Take in the WebDBWriter, instantiated elsewhere.
 
Method Summary
 void close()
          Shut everything down.
static void main(String[] args)
          Create the UpdateDatabaseTool, and pass in a WebDBWriter.
 void updateForSegment(NutchFileSystem nfs, String directory)
          Iterate through items in the FetcherOutput.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NEW_INTERNAL_LINK_FACTOR

public static final float NEW_INTERNAL_LINK_FACTOR

NEW_EXTERNAL_LINK_FACTOR

public static final float NEW_EXTERNAL_LINK_FACTOR

MAX_OUTLINKS_PER_PAGE

public static final int MAX_OUTLINKS_PER_PAGE

IGNORE_INTERNAL_LINKS

public static final boolean IGNORE_INTERNAL_LINKS

LOG

public static final Logger LOG
Constructor Detail

UpdateDatabaseTool

public UpdateDatabaseTool(IWebDBWriter webdb,
                          boolean additionsAllowed,
                          int maxCount)
Take in the WebDBWriter, instantiated elsewhere.

Method Detail

updateForSegment

public void updateForSegment(NutchFileSystem nfs,
                             String directory)
                      throws IOException
Iterate through items in the FetcherOutput. For each one, determine whether the pages need to be added to the webdb, or what fields need to be changed.

Throws:
IOException

close

public void close()
           throws IOException
Shut everything down.

Throws:
IOException

main

public static void main(String[] args)
                 throws Exception
Create the UpdateDatabaseTool, and pass in a WebDBWriter.

Throws:
Exception


Copyright © 2006 The Apache Software Foundation