|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.nutch.tools.DistributedAnalysisTool
DistributedAnalysisTool performs link-analysis by reading exclusively from a IWebDBReader, and writing to an IWebDBWriter. This tool can be used in phases via the command line to compute the LinkAnalysis score across many machines. For a single iteration of LinkAnalysis, you must have: 1) An "initRound" step that writes down how the work should be divided. This outputs a "dist" directory which must be made available to later steps. It requires the input db directory. 2) As many simultaneous "computeRound" steps as you like, but this number must be determined in step 1. Each step may be run on different machines, or on the same, or however you like. It requires the the "db" and "dist" directories (or copies) as inputs. Each run will output an "instructions file". 3) A "completeRound" step, which integrates the results of all the many "computeRound" steps. It writes to a "db" directory. It assumes that all the instructions files have been gathered into a single "dist" input directory. If you're running everything on a single filesystem, this will happen easily. If not, then you will have to gather the files by hand (or with a script). For more iterations, repeat steps 1 - 3!
Field Summary | |
static Logger |
LOG
|
static long |
OUTLINK_LIMIT
|
Constructor Summary | |
DistributedAnalysisTool(NutchFileSystem nfs,
File dbDir)
Give the pagedb and linkdb files and their cache sizes |
Method Summary | |
void |
completeRound(File distDir,
File scoreFile)
This method collates and executes all the instructions computed by the many executors of computeRound(). |
void |
computeRound(int processId,
File distDir)
This method is invoked by one of the many processes involved in LinkAnalysis. |
boolean |
initRound(int numProcesses,
File distDir)
This method prepares the ground for a set of processes to distribute a round of LinkAnalysis work. |
static void |
main(String[] argv)
Kick off the link analysis. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static final Logger LOG
public static final long OUTLINK_LIMIT
Constructor Detail |
public DistributedAnalysisTool(NutchFileSystem nfs, File dbDir) throws IOException, FileNotFoundException
Method Detail |
public boolean initRound(int numProcesses, File distDir) throws IOException
IOException
public void computeRound(int processId, File distDir) throws IOException
IOException
public void completeRound(File distDir, File scoreFile) throws IOException
IOException
public static void main(String[] argv) throws IOException
IOException
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |