|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.hadoop.conf.Configuration org.apache.hadoop.mapred.JobConf
public class JobConf
A map/reduce job configuration.
JobConf
is the primary interface for a user to describe a
map-reduce job to the Hadoop framework for execution. The framework tries to
faithfully execute the job as-is described by JobConf
, however:
setNumReduceTasks(int)
), some parameters interact subtly
rest of the framework and/or job-configuration and is relatively more
complex for the user to control finely (e.g. setNumMapTasks(int)
).
JobConf
typically specifies the Mapper
, combiner
(if any), Partitioner
, Reducer
, InputFormat
and
OutputFormat
implementations to be used etc. It also indicates the
set of input files (setInputPath(Path)
/addInputPath(Path)
),
and where the output files should be written (setOutputPath(Path)
).
Optionally JobConf
is used to specify other advanced facets
of the job such as Comparator
s to be used, files to be put in
the DistributedCache
, whether or not intermediate and/or job outputs
are to be compressed (and how), debugability via user-provided scripts
( setMapDebugScript(String)
/setReduceDebugScript(String)
),
for doing post-processing on task logs, task's stdout, stderr, syslog.
and etc.
Here is an example on how to configure a job via JobConf
:
// Create a new JobConf JobConf job = new JobConf(new Configuration(), MyJob.class); // Specify various job-specific parameters job.setJobName("myjob"); job.setInputPath(new Path("in")); job.setOutputPath(new Path("out")); job.setMapperClass(MyJob.MyMapper.class); job.setCombinerClass(MyJob.MyReducer.class); job.setReducerClass(MyJob.MyReducer.class); job.setInputFormat(SequenceFileInputFormat.class); job.setOutputFormat(SequenceFileOutputFormat.class);
JobClient
,
ClusterStatus
,
Tool
,
DistributedCache
Nested Class Summary |
---|
Nested classes/interfaces inherited from class org.apache.hadoop.conf.Configuration |
---|
Configuration.IntegerRanges |
Constructor Summary | |
---|---|
JobConf()
Construct a map/reduce job configuration. |
|
JobConf(Class exampleClass)
Construct a map/reduce job configuration. |
|
JobConf(Configuration conf)
Construct a map/reduce job configuration. |
|
JobConf(Configuration conf,
Class exampleClass)
Construct a map/reduce job configuration. |
|
JobConf(Path config)
Construct a map/reduce configuration. |
|
JobConf(String config)
Construct a map/reduce configuration. |
Method Summary | |
---|---|
void |
addInputPath(Path dir)
Add a Path to the list of inputs for the map-reduce job. |
void |
deleteLocalFiles()
|
void |
deleteLocalFiles(String subdir)
|
Class<? extends Reducer> |
getCombinerClass()
Get the user-defined combiner class used to combine map-outputs before being sent to the reducers. |
boolean |
getCompressMapOutput()
Are the outputs of the maps be compressed? |
InputFormat |
getInputFormat()
Get the InputFormat implementation for the map-reduce job,
defaults to TextInputFormat if not specified explicity. |
Class |
getInputKeyClass()
Deprecated. Call RecordReader.createKey() . |
Path[] |
getInputPaths()
Get the list of input Path s for the map-reduce job. |
Class |
getInputValueClass()
Deprecated. Call RecordReader.createValue() . |
String |
getJar()
Get the user jar for the map-reduce job. |
String |
getJobEndNotificationURI()
Get the uri to be invoked in-order to send a notification after the job has completed (success/failure). |
String |
getJobName()
Get the user-specified job name. |
JobPriority |
getJobPriority()
Get the JobPriority for this job. |
boolean |
getKeepFailedTaskFiles()
Should the temporary files for failed tasks be kept? |
String |
getKeepTaskFilesPattern()
Get the regular expression that is matched against the task names to see if we need to keep the files. |
String[] |
getLocalDirs()
|
Path |
getLocalPath(String pathString)
Constructs a local file name. |
String |
getMapDebugScript()
Get the map task's debug script. |
SequenceFile.CompressionType |
getMapOutputCompressionType()
Get the SequenceFile.CompressionType for the map outputs. |
Class<? extends CompressionCodec> |
getMapOutputCompressorClass(Class<? extends CompressionCodec> defaultValue)
Get the CompressionCodec for compressing the map outputs. |
Class<? extends WritableComparable> |
getMapOutputKeyClass()
Get the key class for the map output data. |
Class<? extends Writable> |
getMapOutputValueClass()
Get the value class for the map output data. |
Class<? extends Mapper> |
getMapperClass()
Get the Mapper class for the job. |
Class<? extends MapRunnable> |
getMapRunnerClass()
Get the MapRunnable class for the job. |
boolean |
getMapSpeculativeExecution()
Should speculative execution be used for this job for map tasks? Defaults to true . |
int |
getMaxMapAttempts()
Get the configured number of maximum attempts that will be made to run a map task, as specified by the mapred.map.max.attempts
property. |
int |
getMaxMapTaskFailuresPercent()
Get the maximum percentage of map tasks that can fail without the job being aborted. |
int |
getMaxReduceAttempts()
Get the configured number of maximum attempts that will be made to run a reduce task, as specified by the mapred.reduce.max.attempts
property. |
int |
getMaxReduceTaskFailuresPercent()
Get the maximum percentage of reduce tasks that can fail without the job being aborted. |
int |
getMaxTaskFailuresPerTracker()
Expert: Get the maximum no. |
int |
getNumMapTasks()
Get configured the number of reduce tasks for this job. |
int |
getNumReduceTasks()
Get configured the number of reduce tasks for this job. |
OutputFormat |
getOutputFormat()
Get the OutputFormat implementation for the map-reduce job,
defaults to TextOutputFormat if not specified explicity. |
Class<? extends WritableComparable> |
getOutputKeyClass()
Get the key class for the job output data. |
WritableComparator |
getOutputKeyComparator()
Get the WritableComparable comparator used to compare keys. |
Path |
getOutputPath()
Get the Path to the output directory for the map-reduce job. |
Class<? extends Writable> |
getOutputValueClass()
Get the value class for job outputs. |
WritableComparator |
getOutputValueGroupingComparator()
Get the user defined WritableComparable comparator for
grouping keys of inputs to the reduce. |
Class<? extends Partitioner> |
getPartitionerClass()
Get the Partitioner used to partition Mapper -outputs
to be sent to the Reducer s. |
boolean |
getProfileEnabled()
Get whether the task profiling is enabled. |
Configuration.IntegerRanges |
getProfileTaskRange(boolean isMap)
Get the range of maps or reduces to profile. |
String |
getReduceDebugScript()
Get the reduce task's debug Script |
Class<? extends Reducer> |
getReducerClass()
Get the Reducer class for the job. |
boolean |
getReduceSpeculativeExecution()
Should speculative execution be used for this job for reduce tasks? Defaults to true . |
String |
getSessionId()
Get the user-specified session identifier. |
boolean |
getSpeculativeExecution()
Deprecated. Use { getMapSpeculativeExecution() or
getReduceSpeculativeExecution() instead.
Should speculative execution be used for this job?
Defaults to true . |
Path |
getSystemDir()
Get the system directory where job-specific files are to be placed. |
String |
getUser()
Get the reported username for this job. |
Path |
getWorkingDirectory()
Get the current working directory for the default file system. |
void |
setCombinerClass(Class<? extends Reducer> theClass)
Set the user-defined combiner class used to combine map-outputs before being sent to the reducers. |
void |
setCompressMapOutput(boolean compress)
Should the map outputs be compressed before transfer? Uses the SequenceFile compression. |
void |
setInputFormat(Class<? extends InputFormat> theClass)
Set the InputFormat implementation for the map-reduce job. |
void |
setInputKeyClass(Class theClass)
Deprecated. Not used |
void |
setInputPath(Path dir)
Set the Path of the input directory for the map-reduce job. |
void |
setInputValueClass(Class theClass)
Deprecated. Not used |
void |
setJar(String jar)
Set the user jar for the map-reduce job. |
void |
setJarByClass(Class cls)
Set the job's jar file by finding an example class location. |
void |
setJobEndNotificationURI(String uri)
Set the uri to be invoked in-order to send a notification after the job has completed (success/failure). |
void |
setJobName(String name)
Set the user-specified job name. |
void |
setJobPriority(JobPriority prio)
Set JobPriority for this job. |
void |
setKeepFailedTaskFiles(boolean keep)
Set whether the framework should keep the intermediate files for failed tasks. |
void |
setKeepTaskFilesPattern(String pattern)
Set a regular expression for task names that should be kept. |
void |
setMapDebugScript(String mDbgScript)
Set the debug script to run when the map tasks fail. |
void |
setMapOutputCompressionType(SequenceFile.CompressionType style)
Set the SequenceFile.CompressionType for the map outputs. |
void |
setMapOutputCompressorClass(Class<? extends CompressionCodec> codecClass)
Set the given class as the CompressionCodec for the map outputs. |
void |
setMapOutputKeyClass(Class<? extends WritableComparable> theClass)
Set the key class for the map output data. |
void |
setMapOutputValueClass(Class<? extends Writable> theClass)
Set the value class for the map output data. |
void |
setMapperClass(Class<? extends Mapper> theClass)
Set the Mapper class for the job. |
void |
setMapRunnerClass(Class<? extends MapRunnable> theClass)
Expert: Set the MapRunnable class for the job. |
void |
setMapSpeculativeExecution(boolean speculativeExecution)
Turn speculative execution on or off for this job for map tasks. |
void |
setMaxMapAttempts(int n)
Expert: Set the number of maximum attempts that will be made to run a map task. |
void |
setMaxMapTaskFailuresPercent(int percent)
Expert: Set the maximum percentage of map tasks that can fail without the job being aborted. |
void |
setMaxReduceAttempts(int n)
Expert: Set the number of maximum attempts that will be made to run a reduce task. |
void |
setMaxReduceTaskFailuresPercent(int percent)
Set the maximum percentage of reduce tasks that can fail without the job being aborted. |
void |
setMaxTaskFailuresPerTracker(int noFailures)
Set the maximum no. |
void |
setNumMapTasks(int n)
Set the number of map tasks for this job. |
void |
setNumReduceTasks(int n)
Set the requisite number of reduce tasks for this job. |
void |
setOutputFormat(Class<? extends OutputFormat> theClass)
Set the OutputFormat implementation for the map-reduce job. |
void |
setOutputKeyClass(Class<? extends WritableComparable> theClass)
Set the key class for the job output data. |
void |
setOutputKeyComparatorClass(Class<? extends WritableComparator> theClass)
Set the WritableComparable comparator used to compare keys. |
void |
setOutputPath(Path dir)
Set the Path of the output directory for the map-reduce job. |
void |
setOutputValueClass(Class<? extends Writable> theClass)
Set the value class for job outputs. |
void |
setOutputValueGroupingComparator(Class theClass)
Set the user defined WritableComparable comparator for
grouping keys in the input to the reduce. |
void |
setPartitionerClass(Class<? extends Partitioner> theClass)
Set the Partitioner class used to partition
Mapper -outputs to be sent to the Reducer s. |
void |
setProfileEnabled(boolean newValue)
Set whether the system should collect profiler information for some of the tasks in this job? The information is stored in the the user log directory. |
void |
setProfileTaskRange(boolean isMap,
String newValue)
Set the ranges of maps or reduces to profile. |
void |
setReduceDebugScript(String rDbgScript)
Set the debug script to run when the reduce tasks fail. |
void |
setReducerClass(Class<? extends Reducer> theClass)
Set the Reducer class for the job. |
void |
setReduceSpeculativeExecution(boolean speculativeExecution)
Turn speculative execution on or off for this job for reduce tasks. |
void |
setSessionId(String sessionId)
Set the user-specified session identifier. |
void |
setSpeculativeExecution(boolean speculativeExecution)
Deprecated. Use setMapSpeculativeExecution(boolean) or
setReduceSpeculativeExecution(boolean) instead.
Turn speculative execution on or off for this job. |
void |
setUser(String user)
Set the reported username for this job. |
void |
setWorkingDirectory(Path dir)
Set the current working directory for the default file system. |
Methods inherited from class org.apache.hadoop.conf.Configuration |
---|
addResource, addResource, addResource, entries, get, get, get, getBoolean, getClass, getClass, getClassByName, getClassLoader, getConfResourceAsInputStream, getConfResourceAsReader, getFile, getFloat, getInt, getLocalPath, getLong, getObject, getRange, getRaw, getResource, getStrings, iterator, main, set, set, setBoolean, setClass, setClassLoader, setInt, setLong, setObject, setQuietMode, toString, write |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public JobConf()
public JobConf(Class exampleClass)
exampleClass
- a class whose containing jar is used as the job's jar.public JobConf(Configuration conf)
conf
- a Configuration whose settings will be inherited.public JobConf(Configuration conf, Class exampleClass)
conf
- a Configuration whose settings will be inherited.exampleClass
- a class whose containing jar is used as the job's jar.public JobConf(String config)
config
- a Configuration-format XML job description file.public JobConf(Path config)
config
- a Configuration-format XML job description file.Method Detail |
---|
public String getJar()
public void setJar(String jar)
jar
- the user jar for the map-reduce job.public void setJarByClass(Class cls)
cls
- the example class.public Path getSystemDir()
public String[] getLocalDirs() throws IOException
IOException
public void deleteLocalFiles() throws IOException
IOException
public void deleteLocalFiles(String subdir) throws IOException
IOException
public Path getLocalPath(String pathString) throws IOException
IOException
public void setInputPath(Path dir)
Path
of the input directory for the map-reduce job.
dir
- the Path
of the input directory for the map-reduce job.public void addInputPath(Path dir)
Path
to the list of inputs for the map-reduce job.
dir
- Path
to be added to the list of inputs for
the map-reduce job.public Path[] getInputPaths()
Path
s for the map-reduce job.
Path
s for the map-reduce job.public String getUser()
public void setUser(String user)
user
- the username for this job.public void setKeepFailedTaskFiles(boolean keep)
keep
- true
if framework should keep the intermediate files
for failed tasks, false
otherwise.public boolean getKeepFailedTaskFiles()
public void setKeepTaskFilesPattern(String pattern)
pattern
- the java.util.regex.Pattern to match against the
task names.public String getKeepTaskFilesPattern()
public void setWorkingDirectory(Path dir)
dir
- the new current working directory.public Path getWorkingDirectory()
public Path getOutputPath()
Path
to the output directory for the map-reduce job.
Some applications need to create/write-to side-files, which differ from the actual job-outputs.
In such cases there could be issues with 2 instances of the same TIP (running simultaneously e.g. speculative tasks) trying to open/write-to the same file (path) on HDFS. Hence the application-writer will have to pick unique names per task-attempt (e.g. using the taskid, say task_200709221812_0001_m_000000_0), not just per TIP.
To get around this the Map-Reduce framework helps the application-writer out by maintaining a special ${mapred.output.dir}/_${taskid} sub-directory for each task-attempt on HDFS where the output of the task-attempt goes. On successful completion of the task-attempt the files in the ${mapred.output.dir}/_${taskid} (only) are promoted to ${mapred.output.dir}. Of course, the framework discards the sub-directory of unsuccessful task-attempts. This is completely transparent to the application.
The application-writer can take advantage of this by creating any
side-files required in ${mapred.output.dir} during execution of his
reduce-task i.e. via getOutputPath()
, and the framework will move
them out similarly - thus she doesn't have to pick unique paths per
task-attempt.
Note: the value of ${mapred.output.dir} during execution
of a particular task-attempt is actually
${mapred.output.dir}/_{$taskid}, not the value set by
setOutputPath(Path)
. So, just create any side-files in the path
returned by getOutputPath()
from map/reduce task to take
advantage of this feature.
The entire discussion holds true for maps of jobs with reducer=NONE (i.e. 0 reduces) since output of the map, in that case, goes directly to HDFS.
Path
to the output directory for the map-reduce job.public void setOutputPath(Path dir)
Path
of the output directory for the map-reduce job.
Note:
dir
- the Path
of the output directory for the map-reduce job.public InputFormat getInputFormat()
InputFormat
implementation for the map-reduce job,
defaults to TextInputFormat
if not specified explicity.
InputFormat
implementation for the map-reduce job.public void setInputFormat(Class<? extends InputFormat> theClass)
InputFormat
implementation for the map-reduce job.
theClass
- the InputFormat
implementation for the map-reduce
job.public OutputFormat getOutputFormat()
OutputFormat
implementation for the map-reduce job,
defaults to TextOutputFormat
if not specified explicity.
OutputFormat
implementation for the map-reduce job.public void setOutputFormat(Class<? extends OutputFormat> theClass)
OutputFormat
implementation for the map-reduce job.
theClass
- the OutputFormat
implementation for the map-reduce
job.public Class getInputKeyClass()
RecordReader.createKey()
.
public void setInputKeyClass(Class theClass)
public Class getInputValueClass()
RecordReader.createValue()
.
public void setInputValueClass(Class theClass)
public void setCompressMapOutput(boolean compress)
compress
- should the map outputs be compressed?public boolean getCompressMapOutput()
true
if the outputs of the maps are to be compressed,
false
otherwise.public void setMapOutputCompressionType(SequenceFile.CompressionType style)
SequenceFile.CompressionType
for the map outputs.
style
- the SequenceFile.CompressionType
to control how the map outputs
are compressed.public SequenceFile.CompressionType getMapOutputCompressionType()
SequenceFile.CompressionType
for the map outputs.
SequenceFile.CompressionType
for map outputs, defaulting to
SequenceFile.CompressionType.RECORD
.public void setMapOutputCompressorClass(Class<? extends CompressionCodec> codecClass)
CompressionCodec
for the map outputs.
codecClass
- the CompressionCodec
class that will compress
the map outputs.public Class<? extends CompressionCodec> getMapOutputCompressorClass(Class<? extends CompressionCodec> defaultValue)
CompressionCodec
for compressing the map outputs.
defaultValue
- the CompressionCodec
to return if not set
CompressionCodec
class that should be used to compress the
map outputs.
IllegalArgumentException
- if the class was specified, but not foundpublic Class<? extends WritableComparable> getMapOutputKeyClass()
public void setMapOutputKeyClass(Class<? extends WritableComparable> theClass)
theClass
- the map output key class.public Class<? extends Writable> getMapOutputValueClass()
public void setMapOutputValueClass(Class<? extends Writable> theClass)
theClass
- the map output value class.public Class<? extends WritableComparable> getOutputKeyClass()
public void setOutputKeyClass(Class<? extends WritableComparable> theClass)
theClass
- the key class for the job output data.public WritableComparator getOutputKeyComparator()
WritableComparable
comparator used to compare keys.
WritableComparable
comparator used to compare keys.public void setOutputKeyComparatorClass(Class<? extends WritableComparator> theClass)
WritableComparable
comparator used to compare keys.
theClass
- the WritableComparable
comparator used to
compare keys.setOutputValueGroupingComparator(Class)
public WritableComparator getOutputValueGroupingComparator()
WritableComparable
comparator for
grouping keys of inputs to the reduce.
for details.
public void setOutputValueGroupingComparator(Class theClass)
WritableComparable
comparator for
grouping keys in the input to the reduce.
This comparator should be provided if the equivalence rules for keys
for sorting the intermediates are different from those for grouping keys
before each call to
Reducer.reduce(WritableComparable, java.util.Iterator, OutputCollector, Reporter)
.
For key-value pairs (K1,V1) and (K2,V2), the values (V1, V2) are passed in a single call to the reduce function if K1 and K2 compare as equal.
Since setOutputKeyComparatorClass(Class)
can be used to control
how keys are sorted, this can be used in conjunction to simulate
secondary sort on values.
Note: This is not a guarantee of the reduce sort being stable in any sense. (In any case, with the order of available map-outputs to the reduce being non-deterministic, it wouldn't make that much sense.)
theClass
- the comparator class to be used for grouping keys.
It should extend WritableComparator
.setOutputKeyComparatorClass(Class)
public Class<? extends Writable> getOutputValueClass()
public void setOutputValueClass(Class<? extends Writable> theClass)
theClass
- the value class for job outputs.public Class<? extends Mapper> getMapperClass()
Mapper
class for the job.
Mapper
class for the job.public void setMapperClass(Class<? extends Mapper> theClass)
Mapper
class for the job.
theClass
- the Mapper
class for the job.public Class<? extends MapRunnable> getMapRunnerClass()
MapRunnable
class for the job.
MapRunnable
class for the job.public void setMapRunnerClass(Class<? extends MapRunnable> theClass)
MapRunnable
class for the job.
Typically used to exert greater control on Mapper
s.
theClass
- the MapRunnable
class for the job.public Class<? extends Partitioner> getPartitionerClass()
Partitioner
used to partition Mapper
-outputs
to be sent to the Reducer
s.
Partitioner
used to partition map-outputs.public void setPartitionerClass(Class<? extends Partitioner> theClass)
Partitioner
class used to partition
Mapper
-outputs to be sent to the Reducer
s.
theClass
- the Partitioner
used to partition map-outputs.public Class<? extends Reducer> getReducerClass()
Reducer
class for the job.
Reducer
class for the job.public void setReducerClass(Class<? extends Reducer> theClass)
Reducer
class for the job.
theClass
- the Reducer
class for the job.public Class<? extends Reducer> getCombinerClass()
Reducer
for the job i.e. getReducerClass()
.
public void setCombinerClass(Class<? extends Reducer> theClass)
The combiner is a task-level aggregation operation which, in some cases,
helps to cut down the amount of data transferred from the Mapper
to
the Reducer
, leading to better performance.
Typically the combiner is same as the the Reducer
for the
job i.e. setReducerClass(Class)
.
theClass
- the user-defined combiner class used to combine
map-outputs.public boolean getSpeculativeExecution()
getMapSpeculativeExecution()
or
getReduceSpeculativeExecution()
instead.
Should speculative execution be used for this job?
Defaults to true
.
true
if speculative execution be used for this job,
false
otherwise.public void setSpeculativeExecution(boolean speculativeExecution)
setMapSpeculativeExecution(boolean)
or
setReduceSpeculativeExecution(boolean)
instead.
Turn speculative execution on or off for this job.
speculativeExecution
- true
if speculative execution
should be turned on, else false
.public boolean getMapSpeculativeExecution()
true
.
true
if speculative execution be
used for this job for map tasks,
false
otherwise.public void setMapSpeculativeExecution(boolean speculativeExecution)
speculativeExecution
- true
if speculative execution
should be turned on for map tasks,
else false
.public boolean getReduceSpeculativeExecution()
true
.
true
if speculative execution be used
for reduce tasks for this job,
false
otherwise.public void setReduceSpeculativeExecution(boolean speculativeExecution)
speculativeExecution
- true
if speculative execution
should be turned on for reduce tasks,
else false
.public int getNumMapTasks()
1
.
public void setNumMapTasks(int n)
Note: This is only a hint to the framework. The actual
number of spawned map tasks depends on the number of InputSplit
s
generated by the job's InputFormat.getSplits(JobConf, int)
.
A custom InputFormat
is typically used to accurately control
the number of map tasks for the job.
The number of maps is usually driven by the total size of the inputs i.e. total number of blocks of the input files.
The right level of parallelism for maps seems to be around 10-100 maps per-node, although it has been set up to 300 or so for very cpu-light map tasks. Task setup takes awhile, so it is best if the maps take at least a minute to execute.
The default behavior of file-based InputFormat
s is to split the
input into logical InputSplit
s based on the total size, in
bytes, of input files. However, the FileSystem
blocksize of the
input files is treated as an upper bound for input splits. A lower bound
on the split size can be set via
mapred.min.split.size.
Thus, if you expect 10TB of input data and have a blocksize of 128MB,
you'll end up with 82,000 maps, unless setNumMapTasks(int)
is
used to set it even higher.
n
- the number of map tasks for this job.InputFormat.getSplits(JobConf, int)
,
FileInputFormat
,
FileSystem.getDefaultBlockSize()
,
FileStatus.getBlockSize()
public int getNumReduceTasks()
1
.
public void setNumReduceTasks(int n)
The right number of reduces seems to be 0.95
or
1.75
multiplied by (<no. of nodes> *
mapred.tasktracker.reduce.tasks.maximum).
With 0.95
all of the reduces can launch immediately and
start transfering map outputs as the maps finish. With 1.75
the faster nodes will finish their first round of reduces and launch a
second wave of reduces doing a much better job of load balancing.
Increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures.
The scaling factors above are slightly less than whole numbers to reserve a few reduce slots in the framework for speculative-tasks, failures etc.
It is legal to set the number of reduce-tasks to zero
.
In this case the output of the map-tasks directly go to distributed
file-system, to the path set by setOutputPath(Path)
. Also, the
framework doesn't sort the map-outputs before writing it out to HDFS.
n
- the number of reduce tasks for this job.public int getMaxMapAttempts()
mapred.map.max.attempts
property. If this property is not already set, the default is 4 attempts.
public void setMaxMapAttempts(int n)
n
- the number of attempts per map task.public int getMaxReduceAttempts()
mapred.reduce.max.attempts
property. If this property is not already set, the default is 4 attempts.
public void setMaxReduceAttempts(int n)
n
- the number of attempts per reduce task.public String getJobName()
public void setJobName(String name)
name
- the job's new name.public String getSessionId()
public void setSessionId(String sessionId)
sessionId
- the new session id.public void setMaxTaskFailuresPerTracker(int noFailures)
noFailures
, the
tasktracker is blacklisted for this job.
noFailures
- maximum no. of failures of a given job per tasktracker.public int getMaxTaskFailuresPerTracker()
public int getMaxMapTaskFailuresPercent()
getMaxMapAttempts()
attempts before being declared as failed.
Defaults to zero
, i.e. any failed map-task results in
the job being declared as JobStatus.FAILED
.
public void setMaxMapTaskFailuresPercent(int percent)
getMaxMapAttempts()
attempts
before being declared as failed.
percent
- the maximum percentage of map tasks that can fail without
the job being aborted.public int getMaxReduceTaskFailuresPercent()
getMaxReduceAttempts()
attempts before being declared as failed.
Defaults to zero
, i.e. any failed reduce-task results
in the job being declared as JobStatus.FAILED
.
public void setMaxReduceTaskFailuresPercent(int percent)
getMaxReduceAttempts()
attempts before being declared as failed.
percent
- the maximum percentage of reduce tasks that can fail without
the job being aborted.public void setJobPriority(JobPriority prio)
JobPriority
for this job.
prio
- the JobPriority
for this job.public JobPriority getJobPriority()
JobPriority
for this job.
JobPriority
for this job.public boolean getProfileEnabled()
public void setProfileEnabled(boolean newValue)
newValue
- true means it should be gatheredpublic Configuration.IntegerRanges getProfileTaskRange(boolean isMap)
isMap
- is the task a map?
public void setProfileTaskRange(boolean isMap, String newValue)
newValue
- a set of integer ranges of the map idspublic void setMapDebugScript(String mDbgScript)
The debug script can aid debugging of failed map tasks. The script is given task's stdout, stderr, syslog, jobconf files as arguments.
The debug command, run on the node where the map failed, is:
$script $stdout $stderr $syslog $jobconf.
The script file is distributed through DistributedCache
APIs. The script needs to be symlinked.
Here is an example on how to submit a script
job.setMapDebugScript("./myscript"); DistributedCache.createSymlink(job); DistributedCache.addCacheFile("/debug/scripts/myscript#myscript");
mDbgScript
- the script namepublic String getMapDebugScript()
setMapDebugScript(String)
public void setReduceDebugScript(String rDbgScript)
The debug script can aid debugging of failed reduce tasks. The script is given task's stdout, stderr, syslog, jobconf files as arguments.
The debug command, run on the node where the map failed, is:
$script $stdout $stderr $syslog $jobconf.
The script file is distributed through DistributedCache
APIs. The script file needs to be symlinked
Here is an example on how to submit a script
job.setReduceDebugScript("./myscript"); DistributedCache.createSymlink(job); DistributedCache.addCacheFile("/debug/scripts/myscript#myscript");
rDbgScript
- the script namepublic String getReduceDebugScript()
setReduceDebugScript(String)
public String getJobEndNotificationURI()
null
if it hasn't
been set.setJobEndNotificationURI(String)
public void setJobEndNotificationURI(String uri)
The uri can contain 2 special parameters: $jobId and $jobStatus. Those, if present, are replaced by the job's identifier and completion-status respectively.
This is typically used by application-writers to implement chaining of Map-Reduce jobs in an asynchronous manner.
uri
- the job end notification uriJobStatus
,
Job Completion and Chaining
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |