If MapReduce jobs that use the cache local MapReduce feature are executed, the DFS fetches the file cache information from all Hadoop cluster nodes.
As the MapReduce job execution user executes remote commands (SSH is used by default) to remote nodes from the job start node to fetch information at that time, settings that enable remote command execution must be set in advance.
Use the pdfs.fs.local.cache.location property to set whether or not the cache local feature is used. The default is "true" (enabled).
Note that the tasks below are not required if "false" (disabled) is set for the pdfs.fs.local.cache.location property.
Example
If the user is bdppuser1 and the DFS mount directory is /mnt/pdfs:
As the settings are shared at all nodes, bdppuser1 executes as shown below at any of the nodes where the DFS is mounted.
$ cd ~ <Enter> $ ssh-keygen -t rsa -N "" -f id_hadoop <Enter> $ echo -n "command=\"/opt/FJSVpdfs/sbin/pdfscachelocal.sh\",no-pty,no-port-forwarding,no-X11-forwarding" > authorized_keys <Enter> $ cat id_hadoop.pub >> authorized_keys <Enter> $ hadoop fs -mkdir .pdfs <Enter> $ hadoop fs -chmod 700 .pdfs <Enter> $ hadoop fs -moveFromLocal id_hadoop id_hadoop.pub authorized_keys .pdfs/ <Enter> $ hadoop fs -chmod 600 .pdfs/id_hadoop .pdfs/authorized_keys <Enter> $ hadoop fs -chmod 644 .pdfs/id_hadoop.pub <Enter>
An entry like the one below is set in the authorized_keys file shown above.
$ hadoop fs -cat .pdfs/authorized_keys <Enter> "command=/opt/pdfs/sbin/pdfscachelocal.sh",no-pty,no-port-forwarding,no-X11-forwarding ssh-rsa publicKey bdppuser1@localNodeName
Reflect the settings to .ssh/authorized_keys of the user bdppuser1 home directory at all slave servers and development servers.
bdppuser1 executes the following:
$ xargs -ti ssh {} "umask 0077; mkdir -p .ssh; cat /mnt/pdfs/hadoop/user/bdppuser1/.pdfs/authorized_keys >> .ssh/authorized_keys" < /etc/hadoop/slaves <Enter> $ ssh develop "umask 0077; mkdir -p .ssh; cat /mnt/pdfs/hadoop/user/bdppuser1/.pdfs/authorized_keys >> .ssh/authorized_keys" <Enter>
The settings can be checked by performing remote execution as follows:
$ echo /dummy | ssh -o IdentityFile=/mnt/pdfs/hadoop/user/bdppuser1/.pdfs/id_hadoop -o StrictHostKeyChecking=no -o BatchMode=no remoteNodeName /opt/FJSVpdfs/sbin/pdfscachelocal.sh <Enter> 2 (*1)
*1: Normally "2" is output.