Top
Interstage Big DataParallel Processing ServerV1.0.1 User's Guide
FUJITSU Software

C.3 mapred-site.xml

This section describes the properties to be set in mapred-site.xml.

Property

Default value/Value to be set

io.sort.factor

Number of process results (segments) that will be merged for each map written to the disk

Default value

10

Value to be set

50

io.sort.mb

Size of memory (MB) for retaining output results of Map tasks

Default value

100

Value to be set

The lesser of the following values:

  • 2047

  • valueOf$$In_mapred.child.java.opts * 0.5

mapred.child.java.opts

Launch option specified in the JVM that executes Map/Reduce tasks

Default value

-Xmx200m

Value to be set

-server -Xmx$$m -Djava.net.preferIPv4Stack=TRUE


$$ = ( physicalMemorySizeInMb - 2048 ) / ( mapred.tasktracker.map.tasks.maximum + mapred.tasktracker.reduce.tasks.maximum )

mapred.child.ulimit

Maximum size (KB) of process (address) space for Map/Reduce tasks

Default value

None

Value to be set

0 (unlimited)

mapred.compress.map.output

Whether to compress Map task output results

Default value

false

Value to be set

true

mapred.local.dir

Intermediate file storage directory for MapReduce jobs

Default value

${hadoop.tmp.dir}/mapred/local

Value to be set

/var/lib/hadoop/mapred/local

mapred.max.tracker.failures

Maximum number of retries within the same TaskTracker when Map/Reduce tasks fail

Default value

4

Value to be set

40

mapred.reduce.parallel.copies

Multiplicity of Map results of other TaskTrackers obtained by the TaskTracker that executes Reduce tasks

Default value

5

Value to be set

20

mapred.reduce.tasks

Maximum number of Reduce tasks operated within a MapReduce job

Default value

1

Value to be set

mapred.tasktracker.reduce.tasks.maximum *
numberOfSlaveServers

mapred.task.tracker.http.address

Port number of the TaskTracker HTTP server

Default value

0.0.0.0:50060

Value to be set

0.0.0.0:50060 (reset)

mapred.tasktracker.map.tasks.maximum

Number of Map tasks executed simultaneously by a single TaskTracker

Default value

2

Value to be set

The greater of the following values:

  • numberOfCpuCores -1

  • totalNumberOfPhysicalDisksComprisingThePdfs / numberOfSlaveServers (rounded up to a whole number)

mapred.tasktracker.reduce.tasks.maximum

Number of Reduce tasks executed simultaneously by a single TaskTracker

Default value

2

Value to be set

The greater of the following values:

  • numberOfCpuCores -1

  • totalNumberOfLunsComprisingThePdfs /
    numberOfSlaveServers (rounded up to a whole number)

mapred.userlog.limit.kb

Maximum value of a userlog output by a task

Default value

0

Value to be set

1024

mapred.userlog.retain.hours

Time (hours) a userlog is retained after a job is completed

Default value

24 (1 day)

Value to be set

168 (1 week) (*1)

mapreduce.history.server.embedded

Whether to launch the JVM dedicated to job histories or operate using the JobTracker JVM

Default value

None

Value to be set

true

mapreduce.tasktracker.group

Group to which the TaskTracker process belongs

Default value

None

Value to be set

hadoop

mapreduce.tasktracker.outofband.heartbeat

Whether to push forward a survival notification to the JobTracker when a task is completed

Default value

false

Value to be set

false (reset)

*1: Specify within a range acceptable for HADOOP_LOG_DIR.