|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.hadoop.mapred.FileInputFormat<K,V>
public abstract class FileInputFormat<K extends WritableComparable,V extends Writable>
A base class for file-based InputFormat
.
FileInputFormat
is the base class for all file-based
InputFormat
s. This provides generic implementations of
validateInput(JobConf)
and getSplits(JobConf, int)
.
Implementations fo FileInputFormat
can also override the
isSplitable(FileSystem, Path)
method to ensure input-files are
not split-up and are processed as a whole by Mapper
s.
Field Summary | |
---|---|
static org.apache.commons.logging.Log |
LOG
|
Constructor Summary | |
---|---|
FileInputFormat()
|
Method Summary | |
---|---|
protected long |
computeSplitSize(long goalSize,
long minSize,
long blockSize)
|
abstract RecordReader<K,V> |
getRecordReader(InputSplit split,
JobConf job,
Reporter reporter)
Get the RecordReader for the given InputSplit . |
InputSplit[] |
getSplits(JobConf job,
int numSplits)
Splits files returned by listPaths(JobConf) when
they're too big. |
protected boolean |
isSplitable(FileSystem fs,
Path filename)
Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be. |
protected Path[] |
listPaths(JobConf job)
List input directories. |
protected void |
setMinSplitSize(long minSplitSize)
|
void |
validateInput(JobConf job)
Check for validity of the input-specification for the job. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final org.apache.commons.logging.Log LOG
Constructor Detail |
---|
public FileInputFormat()
Method Detail |
---|
protected void setMinSplitSize(long minSplitSize)
protected boolean isSplitable(FileSystem fs, Path filename)
FileInputFormat
implementations can override this and return
false
to ensure that individual input files are never split-up
so that Mapper
s process entire files.
fs
- the file system that the file is onfilename
- the file name to check
public abstract RecordReader<K,V> getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException
InputFormat
RecordReader
for the given InputSplit
.
It is the responsibility of the RecordReader
to respect
record boundaries while processing the logical split to present a
record-oriented view to the individual task.
getRecordReader
in interface InputFormat<K extends WritableComparable,V extends Writable>
split
- the InputSplit
job
- the job that this split belongs to
RecordReader
IOException
protected Path[] listPaths(JobConf job) throws IOException
job
- the job to list input paths for
IOException
- if zero items.public void validateInput(JobConf job) throws IOException
InputFormat
This method is used to validate the input directories when a job is
submitted so that the JobClient
can fail early, with an useful
error message, in case of errors. For e.g. input directory does not exist.
validateInput
in interface InputFormat<K extends WritableComparable,V extends Writable>
job
- job configuration.
InvalidInputException
- if the job does not have valid input
IOException
public InputSplit[] getSplits(JobConf job, int numSplits) throws IOException
listPaths(JobConf)
when
they're too big.
getSplits
in interface InputFormat<K extends WritableComparable,V extends Writable>
job
- job configuration.numSplits
- the desired number of splits, a hint.
InputSplit
s for the job.
IOException
protected long computeSplitSize(long goalSize, long minSize, long blockSize)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |