Package org.apache.nutch.db

Web database: tracks page fetches and link structure.

See:
          Description

Interface Summary
IWebDBReader IWebDBReader is an interface to the consolidated page/link database.
IWebDBWriter IWebDBWriter is an interface to the consolidated page/link database.
 

Class Summary
DBKeyDivision DBKeyDivision exists for other DB classes to figure out how to find the right distributed-DB section.
DBSectionReader DBSectionReader reads a discrete portion of a WebDB.
DistributedWebDBReader The WebDBReader implements all the read-only parts of accessing our web database.
DistributedWebDBWriter This is a wrapper class that allows us to reorder write operations to the linkdb and pagedb.
DistributedWebDBWriter.LinkInstruction Holds an instruction over a Link.
DistributedWebDBWriter.LinkInstruction.MD5Comparator Sorts the instruction first by Md5, then by opcode.
DistributedWebDBWriter.LinkInstruction.UrlComparator Sorts the instruction first by url, then by opcode.
DistributedWebDBWriter.LinkInstructionWriter LinkInstructionWriter very efficiently writes a LinkInstruction to an EditSectionGroupWriter.
DistributedWebDBWriter.PageInstruction PageInstruction holds an operation over a Page.
DistributedWebDBWriter.PageInstruction.PageComparator Sorts the instruction first by Page, then by opcode.
DistributedWebDBWriter.PageInstruction.UrlComparator Sorts the instruction first by url, then by opcode.
DistributedWebDBWriter.PageInstructionWriter PageInstructionWriter very efficiently writes a PageInstruction to an EditSectionGroupWriter.
EditSectionGroupReader The EditSectionGroupReader will read in an edits-file that was built in a distributed way.
EditSectionGroupWriter The EditSectionGroupWriter maintains a set of EditSectionWriter objects.
EditSectionGroupWriter.KeyExtractor Edit instructions are Comparable, but they also have an "inner" key like MD5Hash or URL that is also Comparable.
EditSectionGroupWriter.LinkMD5Extractor Get the MD5 from a LinkInstruction
EditSectionGroupWriter.LinkURLExtractor Get the URL from a LinkInstruction
EditSectionGroupWriter.PageMD5Extractor Get the MD5 from a PageInstruction
EditSectionGroupWriter.PageURLExtractor Get the URL from a PageInstruction
EditSectionWriter EditSectionWriter writes a discrete portion of a WebDB.
Link This is the field in the Link Database.
Link.MD5Comparator MD5Comparator is the opposite.
Link.UrlComparator URLComparator uses the standard method where, uh, the URL comes first.
Page A row in the Page Database.
Page.Comparator Compares pages by MD5, then by URL.
Page.UrlComparator Compares pages by URL only.
WebDBAnchors Utility that extracts the set of anchor texts for a URL from the database.
WebDBInjector This class takes a flat file of URLs and adds them as entries into a pagedb.
WebDBReader The WebDBReader implements all the read-only parts of accessing our web database.
WebDBWriter This is a wrapper class that allows us to reorder write operations to the linkdb and pagedb.
WebDBWriter.LinkInstruction Holds an instruction over a Link.
WebDBWriter.LinkInstruction.MD5Comparator Sorts the instruction first by Md5, then by opcode.
WebDBWriter.LinkInstruction.UrlComparator Sorts the instruction first by url, then by opcode.
WebDBWriter.LinkInstructionWriter LinkInstructionWriter very efficiently writes a LinkInstruction to a SequenceFile.Writer.
WebDBWriter.PageInstruction PageInstruction holds an operation over a Page.
WebDBWriter.PageInstruction.PageComparator Sorts the instruction first by Page, then by opcode.
WebDBWriter.PageInstruction.UrlComparator Sorts the instruction first by url, then by opcode.
WebDBWriter.PageInstructionWriter PageInstructionWriter very efficiently writes a PageInstruction to a SequenceFile.Writer.
 

Package org.apache.nutch.db Description

Web database: tracks page fetches and link structure.



Copyright © 2006 The Apache Software Foundation