|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.nutch.tools.ParseSegment
Parse contents in one segment.
It assumes, under given segment, existence of ./fetcher_output/, which is typically generated after a non-parsing fetcher run (i.e., fetcher is started with option -noParsing).
Contents in one segemnt are parsed and saved in these steps:
In the end, ./fetcher/ should be identical to one resulted from fetcher run WITHOUT option -noParsing.
By default, intermediates ./parser.unsorted and ./parser.sorted are removed at the end, unless option -noClean is used. However ./fetcher_output/ is kept intact.
Check Fetcher.java and FetcherOutput.java for further discussion.
Field Summary | |
static Logger |
LOG
|
Constructor Summary | |
ParseSegment(NutchFileSystem nfs,
String directory,
boolean dryRun)
ParseSegment constructor |
Method Summary | |
static void |
main(String[] args)
main method |
void |
parse()
Parse contents by multiple threads and save as unsorted ParserOutput |
void |
save()
Split sorted ParserOutput into ParseData and ParseText, and generate new FetcherOutput with updated status |
void |
setClean(boolean clean)
Set if clean intermediates. |
static void |
setLogLevel(Level level)
Set the logging level. |
void |
setThreadCount(int threadCount)
Set thread count |
void |
sort()
Sort ParserOutput |
void |
status()
Display the status of the parser run. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static final Logger LOG
Constructor Detail |
public ParseSegment(NutchFileSystem nfs, String directory, boolean dryRun) throws IOException
Method Detail |
public void setThreadCount(int threadCount)
public static void setLogLevel(Level level)
public void setClean(boolean clean)
public void status()
public void parse() throws IOException, InterruptedException
IOException
InterruptedException
public void sort() throws IOException
IOException
public void save() throws IOException
IOException
public static void main(String[] args) throws Exception
Exception
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |