This section describes the properties to be set in mapred-site.xml.
Property | Default value/Value to be set |
---|---|
io.sort.factor Number of process results (segments) that will be merged for each map written to the disk | Default value 10 Value to be set 50 |
io.sort.mb Size of memory (MB) for retaining output results of Map tasks | Default value 100 Value to be set The lesser of the following values:
|
mapred.child.java.opts Launch option specified in the JVM that executes Map/Reduce tasks | Default value -Xmx200m Value to be set -server -Xmx$$m -Djava.net.preferIPv4Stack=TRUE $$ = ( physicalMemorySizeInMb - 2048 ) / ( mapred.tasktracker.map.tasks.maximum + mapred.tasktracker.reduce.tasks.maximum ) |
mapred.child.ulimit Maximum size (KB) of process (address) space for Map/Reduce tasks | Default value None Value to be set 0 (unlimited) |
mapred.compress.map.output Whether to compress Map task output results | Default value false Value to be set true |
mapred.local.dir Intermediate file storage directory for MapReduce jobs | Default value ${hadoop.tmp.dir}/mapred/local Value to be set /var/lib/hadoop/mapred/local |
mapred.max.tracker.failures Maximum number of retries within the same TaskTracker when Map/Reduce tasks fail | Default value 4 Value to be set 40 |
mapred.reduce.parallel.copies Multiplicity of Map results of other TaskTrackers obtained by the TaskTracker that executes Reduce tasks | Default value 5 Value to be set 20 |
mapred.reduce.tasks Maximum number of Reduce tasks operated within a MapReduce job | Default value 1 Value to be set mapred.tasktracker.reduce.tasks.maximum * |
mapred.task.tracker.http.address Port number of the TaskTracker HTTP server | Default value 0.0.0.0:50060 Value to be set 0.0.0.0:50060 (reset) |
mapred.tasktracker.map.tasks.maximum Number of Map tasks executed simultaneously by a single TaskTracker | Default value 2 Value to be set The greater of the following values:
|
mapred.tasktracker.reduce.tasks.maximum Number of Reduce tasks executed simultaneously by a single TaskTracker | Default value 2 Value to be set The greater of the following values:
|
mapred.userlog.limit.kb Maximum value of a userlog output by a task | Default value 0 Value to be set 1024 |
mapred.userlog.retain.hours Time (hours) a userlog is retained after a job is completed | Default value 24 (1 day) Value to be set 168 (1 week) (*1) |
mapreduce.history.server.embedded Whether to launch the JVM dedicated to job histories or operate using the JobTracker JVM | Default value None Value to be set true |
mapreduce.tasktracker.group Group to which the TaskTracker process belongs | Default value None Value to be set hadoop |
mapreduce.tasktracker.outofband.heartbeat Whether to push forward a survival notification to the JobTracker when a task is completed | Default value false Value to be set false (reset) |
*1: Specify within a range acceptable for HADOOP_LOG_DIR.