Package org.apache.nutch.tools

Interface Summary
PruneIndexTool.PruneChecker This interface can be used to implement additional checking on matching documents.
 

Class Summary
CrawlTool  
DistributedAnalysisTool DistributedAnalysisTool performs link-analysis by reading exclusively from a IWebDBReader, and writing to an IWebDBWriter.
FetchListTool This class takes an IWebDBReader, computes a relevant subset, and then emits the subset.
FetchListTool.SortableScore SortableScore is just a WritableComparable Float!
LinkAnalysisTool LinkAnalysisTool performs link-analysis by using the DistributedAnalysisTool.
ParseSegment Parse contents in one segment.
PruneIndexTool This tool prunes existing Nutch indexes of unwanted content.
PruneIndexTool.PrintFieldsChecker This checker's main function is just to print out selected field values from each document, just before they are deleted.
PruneIndexTool.StoreUrlsChecker This checker's main function is just to store the URLs of each document to be deleted in a text file.
SegmentMergeTool This class cleans up accumulated segments data, and merges them into a single (or optionally multiple) segment(s), with no duplicates in it.
SegmentMergeTool.SegmentMergeStatus  
UpdateDatabaseTool This class takes the output of the fetcher and updates the page and link DBs accordingly.
UpdateSegmentsFromDb Update scores and links in a set of segments from the current information in a web database.
UpdateSegmentsFromDb.BySegmentComparator Used internally only.
UpdateSegmentsFromDb.ByUrlComparator Used internally only.
UpdateSegmentsFromDb.SegmentPage Used internally only.
UpdateSegmentsFromDb.Update Used internally only.
WebDBAdminTool The WebDBAdminTool is for Nutch administrators who need special access to the webdb.
 



Copyright © 2006 The Apache Software Foundation