|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.nutch.segment.SegmentReader
This class holds together all data readers for an existing segment. Some convenience methods are also provided, to read from the segment and to reposition the current pointer.
Field Summary | |
ArrayFile.Reader |
contentReader
|
ArrayFile.Reader |
fetcherReader
|
long |
finished
The time when fetching of this segment finished, as recorded in fetcher output data. |
boolean |
isParsed
|
static Logger |
LOG
|
NutchFileSystem |
nfs
|
ArrayFile.Reader |
parseDataReader
|
ArrayFile.Reader |
parseTextReader
|
File |
segmentDir
|
long |
size
|
long |
started
The time when fetching of this segment started, as recorded in fetcher output data. |
Constructor Summary | |
SegmentReader(File dir)
Open a segment for reading. |
|
SegmentReader(File dir,
boolean autoFix)
Open a segment for reading. |
|
SegmentReader(NutchFileSystem nfs,
File dir)
Open a segment for reading. |
|
SegmentReader(NutchFileSystem nfs,
File dir,
boolean autoFix)
Open a segment for reading. |
|
SegmentReader(NutchFileSystem nfs,
File dir,
boolean withContent,
boolean withParseText,
boolean withParseData,
boolean autoFix)
Open a segment for reading. |
Method Summary | |
void |
close()
Close all readers. |
void |
dump(boolean sorted,
PrintStream output)
Dump the segment's content in human-readable format. |
static boolean |
fixSegment(NutchFileSystem nfs,
File dir,
boolean withContent,
boolean withParseText,
boolean withParseData,
boolean dryrun)
Attempt to fix a partially corrupted segment. |
boolean |
get(long n,
FetcherOutput fo,
Content co,
ParseText pt,
ParseData pd)
Get a specified entry from the segment. |
static boolean |
isParsedSegment(NutchFileSystem nfs,
File segdir)
|
long |
key()
Return the current key position. |
static void |
main(String[] args)
Command-line wrapper. |
boolean |
next(FetcherOutput fo,
Content co,
ParseText pt,
ParseData pd)
Read values from all open readers. |
void |
reset()
Reset all readers. |
void |
seek(long n)
Seek to a position in all readers. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static final Logger LOG
public ArrayFile.Reader fetcherReader
public ArrayFile.Reader contentReader
public ArrayFile.Reader parseTextReader
public ArrayFile.Reader parseDataReader
public boolean isParsed
public long started
public long finished
public long size
public File segmentDir
public NutchFileSystem nfs
Constructor Detail |
public SegmentReader(File dir) throws Exception
dir
- directory containing segment data
Exception
public SegmentReader(NutchFileSystem nfs, File dir) throws Exception
nfs
- filesystemdir
- directory containing segment data
Exception
public SegmentReader(File dir, boolean autoFix) throws Exception
dir
- directory containing segment dataautoFix
- if true, and the segment is corrupted, attempt to
fix errors and try to open it again. If the segment is corrupted, and
autoFix is false, or it was not possible to correct errors, an Exception is
thrown.
Exception
public SegmentReader(NutchFileSystem nfs, File dir, boolean autoFix) throws Exception
nfs
- filesystemdir
- directory containing segment dataautoFix
- if true, and the segment is corrupted, attempt to
fix errors and try to open it again. If the segment is corrupted, and
autoFix is false, or it was not possible to correct errors, an Exception is
thrown.
Exception
public SegmentReader(NutchFileSystem nfs, File dir, boolean withContent, boolean withParseText, boolean withParseData, boolean autoFix) throws Exception
If the segment was created with no-parse option (see FetcherOutput.DIR_NAME_NP
)
then automatically withParseText and withParseData will be forced to false.
nfs
- NutchFileSystem to usedir
- directory containing segment datawithContent
- if true, read Content, otherwise ignore itwithParseText
- if true, read ParseText, otherwise ignore itwithParseData
- if true, read ParseData, otherwise ignore itautoFix
- if true, and the segment is corrupt, try to automatically fix it.
If this parameter is false, and the segment is corrupt, or fixing was unsuccessful,
and Exception is thrown.
Exception
Method Detail |
public static boolean isParsedSegment(NutchFileSystem nfs, File segdir) throws Exception
Exception
public static boolean fixSegment(NutchFileSystem nfs, File dir, boolean withContent, boolean withParseText, boolean withParseData, boolean dryrun)
MapFile.fix(NutchFileSystem, File, Class, Class, boolean)
method.
nfs
- filesystemdir
- segment directorywithContent
- if true, fix content, otherwise ignore itwithParseText
- if true, fix parse_text, otherwise ignore itwithParseData
- if true, fix parse_data, otherwise ignore itdryrun
- if true, only show what would be done without performing any actions
true
if segment was fixed successfully, otherwise
return false
.public boolean get(long n, FetcherOutput fo, Content co, ParseText pt, ParseData pd) throws IOException
n
- position of the entryfo
- storage for FetcherOutput data. Must not be null.co
- storage for Content data, or null.pt
- storage for ParseText data, or null.pd
- storage for ParseData data, or null.
IOException
public boolean next(FetcherOutput fo, Content co, ParseText pt, ParseData pd) throws IOException
IOException
public void seek(long n) throws IOException
IOException
public long key()
public void reset() throws IOException
IOException
public void close()
public void dump(boolean sorted, PrintStream output) throws Exception
sorted
- if true, sort segment entries by URL (ascending). If false,
output entries in the order they occur in the segment.output
- where to dump to
Exception
public static void main(String[] args) throws Exception
Exception
|
|||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |