|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectorg.apache.hadoop.mapred.FileInputFormat<K,V>
public abstract class FileInputFormat<K extends WritableComparable,V extends Writable>
A base class for file-based InputFormat.
FileInputFormat is the base class for all file-based
InputFormats. This provides generic implementations of
validateInput(JobConf) and getSplits(JobConf, int).
Implementations fo FileInputFormat can also override the
isSplitable(FileSystem, Path) method to ensure input-files are
not split-up and are processed as a whole by Mappers.
| Field Summary | |
|---|---|
static org.apache.commons.logging.Log |
LOG
|
| Constructor Summary | |
|---|---|
FileInputFormat()
|
|
| Method Summary | |
|---|---|
protected long |
computeSplitSize(long goalSize,
long minSize,
long blockSize)
|
abstract RecordReader<K,V> |
getRecordReader(InputSplit split,
JobConf job,
Reporter reporter)
Get the RecordReader for the given InputSplit. |
InputSplit[] |
getSplits(JobConf job,
int numSplits)
Splits files returned by listPaths(JobConf) when
they're too big. |
protected boolean |
isSplitable(FileSystem fs,
Path filename)
Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be. |
protected Path[] |
listPaths(JobConf job)
List input directories. |
protected void |
setMinSplitSize(long minSplitSize)
|
void |
validateInput(JobConf job)
Check for validity of the input-specification for the job. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final org.apache.commons.logging.Log LOG
| Constructor Detail |
|---|
public FileInputFormat()
| Method Detail |
|---|
protected void setMinSplitSize(long minSplitSize)
protected boolean isSplitable(FileSystem fs,
Path filename)
FileInputFormat implementations can override this and return
false to ensure that individual input files are never split-up
so that Mappers process entire files.
fs - the file system that the file is onfilename - the file name to check
public abstract RecordReader<K,V> getRecordReader(InputSplit split,
JobConf job,
Reporter reporter)
throws IOException
InputFormatRecordReader for the given InputSplit.
It is the responsibility of the RecordReader to respect
record boundaries while processing the logical split to present a
record-oriented view to the individual task.
getRecordReader in interface InputFormat<K extends WritableComparable,V extends Writable>split - the InputSplitjob - the job that this split belongs to
RecordReader
IOException
protected Path[] listPaths(JobConf job)
throws IOException
job - the job to list input paths for
IOException - if zero items.
public void validateInput(JobConf job)
throws IOException
InputFormatThis method is used to validate the input directories when a job is
submitted so that the JobClient can fail early, with an useful
error message, in case of errors. For e.g. input directory does not exist.
validateInput in interface InputFormat<K extends WritableComparable,V extends Writable>job - job configuration.
InvalidInputException - if the job does not have valid input
IOException
public InputSplit[] getSplits(JobConf job,
int numSplits)
throws IOException
listPaths(JobConf) when
they're too big.
getSplits in interface InputFormat<K extends WritableComparable,V extends Writable>job - job configuration.numSplits - the desired number of splits, a hint.
InputSplits for the job.
IOException
protected long computeSplitSize(long goalSize,
long minSize,
long blockSize)
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||