This section describes the properties to be set in pdfs-site.xml.
Property | Value to be set |
---|---|
pdfs.fs.local.basedir DFS mount directory path | Path joining BDPP_PDFS_MOUNTPOINT and BDPP_HADOOP_TOP_DIR in bdpp.conf Example If the bdpp.conf configuration file has the following settings, this path is /mnt/pdfs/hadoop. BDPP_PDFS_MOUNTPOINT=/mnt/pdfs BDPP_HADOOP_TOP_DIR=/hadoop |
pdfs.fs.local.homedir Home directory path for users in the DFS FileSystem class | /user |
pdfs.security.authorization Whether to use the MapReduce job user authentication of the DFS | true |
pdfs.fs.local.buffer.size Default buffer size (bytes) used during Read/Write | 524288 (512 KB) (*1) |
pdfs.fs.local.block.size Data size (bytes) into which Map tasks are split for MapReduce jobs | 268435456 (256 MB) |
pdfs.fs.local.posix.umask Whether to reflect the process umask value in access permissions set when creating a file or directory | true (*2) |
pdfs.fs.local.cache.location Whether to use the cache local MapReduce feature | true |
pdfs.fs.local.cache.minsize Size (bytes) of files excluded from the cache local MapReduce feature | 1048576 (1 MB) |
pdfs.fs.local.cache.procs Number of multiplex executions when the cache local MapReduce feature obtains the memory cache information | 40 |
*1: Use the greater of the value to be set or the io.file.buffer.size value in "C.2 core-site.xml".
*2: true: Uses the umask value (POSIX-compatible)/false: Does not use the umask value (HDFS-compatible)
Information
The cache local MapReduce feature
When pdfs.fs.local.cache.location is enabled (true), this feature obtains the memory cache retention node information of the target file when a MapReduce job starts. It also prioritizes assignment of the Map task to the node that has the cache, as a result, speeding up Map phase processing.
Additionally, as it is costly to obtain the memory cache retention node information, settings can be configured to not obtain information from files that are smaller than the size specified in pdfs.fs.local.cache.minsize.