Top
Interstage Big DataParallel Processing ServerV1.0.1 User's Guide
FUJITSU Software

C.4 pdfs-site.xml

This section describes the properties to be set in pdfs-site.xml.

Property

Value to be set

pdfs.fs.local.basedir

DFS mount directory path

Path joining BDPP_PDFS_MOUNTPOINT and BDPP_HADOOP_TOP_DIR in bdpp.conf

Example

If the bdpp.conf configuration file has the following settings, this path is /mnt/pdfs/hadoop.

BDPP_PDFS_MOUNTPOINT=/mnt/pdfs
BDPP_HADOOP_TOP_DIR=/hadoop

pdfs.fs.local.homedir

Home directory path for users in the DFS FileSystem class

/user

pdfs.security.authorization

Whether to use the MapReduce job user authentication of the DFS

true

pdfs.fs.local.buffer.size

Default buffer size (bytes) used during Read/Write

524288 (512 KB) (*1)

pdfs.fs.local.block.size

Data size (bytes) into which Map tasks are split for MapReduce jobs

268435456 (256 MB)

pdfs.fs.local.posix.umask

Whether to reflect the process umask value in access permissions set when creating a file or directory

true (*2)

pdfs.fs.local.cache.location

Whether to use the cache local MapReduce feature

true

pdfs.fs.local.cache.minsize

Size (bytes) of files excluded from the cache local MapReduce feature

1048576 (1 MB)

pdfs.fs.local.cache.procs

Number of multiplex executions when the cache local MapReduce feature obtains the memory cache information

40

*1: Use the greater of the value to be set or the io.file.buffer.size value in "C.2 core-site.xml".

*2: true: Uses the umask value (POSIX-compatible)/false: Does not use the umask value (HDFS-compatible)

Information

The cache local MapReduce feature

When pdfs.fs.local.cache.location is enabled (true), this feature obtains the memory cache retention node information of the target file when a MapReduce job starts. It also prioritizes assignment of the Map task to the node that has the cache, as a result, speeding up Map phase processing.

Additionally, as it is costly to obtain the memory cache retention node information, settings can be configured to not obtain information from files that are smaller than the size specified in pdfs.fs.local.cache.minsize.