org.apache.nutch.db
Class Page

java.lang.Object
  extended byorg.apache.nutch.db.Page
All Implemented Interfaces:
Cloneable, Comparable, Writable, WritableComparable

public class Page
extends Object
implements WritableComparable, Cloneable

A row in the Page Database.

   type   name    description
 ---------------------------------------------------------------
   byte   VERSION  - A byte indicating the version of this entry.
   String URL      - The url of a page.  This is the primary key.
   128bit ID       - The MD5 hash of the contents of the page.
   64bit  DATE     - The date this page should be refetched.
   byte   RETRIES  - The number of times we've failed to fetch this page.
   byte   INTERVAL - Frequency, in days, this page should be refreshed.
   float  SCORE   - Multiplied into the score for hits on this page.
   float  NEXTSCORE   - Multiplied into the score for hits on this page.
 

Author:
Mike Cafarella, Doug Cutting

Nested Class Summary
static class Page.Comparator
          Compares pages by MD5, then by URL.
static class Page.UrlComparator
          Compares pages by URL only.
 
Constructor Summary
Page()
          Construct a page ready to be read by readFields(DataInput).
Page(String urlString, float score)
           
Page(String urlString, float score, float nextScore, long nextFetch)
           
Page(String urlString, float score, long nextFetch)
           
Page(String urlString, MD5Hash md5)
          Construct a new, default page, due to be fetched.
 
Method Summary
 Object clone()
           
 int compareTo(Object o)
          Compare to another Page object
 long computeDomainID()
          Compute domain ID from URL
 boolean equals(Object o)
           
 byte getFetchInterval()
           
 MD5Hash getMD5()
           
 long getNextFetchTime()
           
 float getNextScore()
           
 int getNumOutlinks()
           
 byte getRetriesSinceFetch()
           
 float getScore()
           
 UTF8 getURL()
           
 int hashCode()
           
static Page read(DataInput in)
           
 void readFields(DataInput in)
          Reads the fields of this object from in.
 void set(Page that)
          Copy the contents of another instance into this instance.
 void setFetchInterval(byte fetchInterval)
           
 void setMD5(MD5Hash md5)
           
 void setNextFetchTime(long nextFetch)
           
 void setNumOutlinks(int numOutlinks)
           
 void setRetriesSinceFetch(int retries)
           
 void setScore(float score)
           
 void setScore(float score, float nextScore)
           
 void setURL(String url)
           
 String toString()
          Print out the Page
 String toTabbedString()
          A tab-delimited text version of the Page's data.
 void write(DataOutput out)
          Write the bytes out to the bytestream
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Page

public Page()
Construct a page ready to be read by readFields(DataInput).


Page

public Page(String urlString,
            MD5Hash md5)
     throws MalformedURLException
Construct a new, default page, due to be fetched.


Page

public Page(String urlString,
            float score)
     throws MalformedURLException

Page

public Page(String urlString,
            float score,
            long nextFetch)
     throws MalformedURLException

Page

public Page(String urlString,
            float score,
            float nextScore,
            long nextFetch)
     throws MalformedURLException
Method Detail

readFields

public void readFields(DataInput in)
                throws IOException
Description copied from interface: Writable
Reads the fields of this object from in. For efficiency, implementations should attempt to re-use storage in the existing object where possible.

Specified by:
readFields in interface Writable
Throws:
IOException

set

public void set(Page that)
Copy the contents of another instance into this instance.


write

public void write(DataOutput out)
           throws IOException
Write the bytes out to the bytestream

Specified by:
write in interface Writable
Throws:
IOException

compareTo

public int compareTo(Object o)
Compare to another Page object

Specified by:
compareTo in interface Comparable

read

public static Page read(DataInput in)
                 throws IOException
Throws:
IOException

getURL

public UTF8 getURL()

setURL

public void setURL(String url)
            throws MalformedURLException
Throws:
MalformedURLException

getMD5

public MD5Hash getMD5()

setMD5

public void setMD5(MD5Hash md5)

getNextFetchTime

public long getNextFetchTime()

setNextFetchTime

public void setNextFetchTime(long nextFetch)

getRetriesSinceFetch

public byte getRetriesSinceFetch()

setRetriesSinceFetch

public void setRetriesSinceFetch(int retries)

getFetchInterval

public byte getFetchInterval()

setFetchInterval

public void setFetchInterval(byte fetchInterval)

getNumOutlinks

public int getNumOutlinks()

setNumOutlinks

public void setNumOutlinks(int numOutlinks)

getScore

public float getScore()

getNextScore

public float getNextScore()

setScore

public void setScore(float score)

setScore

public void setScore(float score,
                     float nextScore)

computeDomainID

public long computeDomainID()
                     throws MalformedURLException
Compute domain ID from URL

Throws:
MalformedURLException

toString

public String toString()
Print out the Page


toTabbedString

public String toTabbedString()
A tab-delimited text version of the Page's data.


equals

public boolean equals(Object o)

hashCode

public int hashCode()

clone

public Object clone()


Copyright © 2006 The Apache Software Foundation