SSH authentication key settings to be used by the cache local function
If MapReduce jobs that use the cache local MapReduce function are executed, DFS fetches file cache information from all Hadoop cluster nodes.
Since the MapReduce job execution user executes remote commands (SSH is used by default) to remote nodes from the job start node to fetch information at that time, settings that enable remote command execution must be set in advance.
Use the pdfs.fs.local.cache.location property to set whether or not the cache local function is used. The default is "true" (enabled).
Note that the tasks below are not required if "false" (disabled) is set for the pdfs.fs.local.cache.location property.
Example
If the user is bdppuser1 and the DFS mount directory is /mnt/pdfs:
Since the settings are shared at all nodes, bdppuser1 executes as shown below at any of the nodes where DFS is mounted.
$ cd ~ $ ssh-keygen -t rsa -N "" -f id_hadoop $ echo -n "command=\"/opt/FJSVpdfs/sbin/pdfscachelocal.sh\",no-pty,no-port-forwarding,no-X11-forwarding " > authorized_keys $ cat id_hadoop.pub >> authorized_keys $ hadoop fs -mkdir .pdfs $ hadoop fs -chmod 700 .pdfs $ hadoop fs -moveFromLocal id_hadoop id_hadoop.pub authorized_keys .pdfs/ $ hadoop fs -chmod 600 .pdfs/id_hadoop .pdfs/authorized_keys $ hadoop fs -chmod 644 .pdfs/id_hadoop.pub
An entry like the one below is set in the authorized_keys file shown above.
$ hadoop fs -cat .pdfs/authorized_keys "command=/opt/pdfs/sbin/pdfscachelocal.sh",no-pty,no-port-forwarding,no-X11-forwarding ssh-rsa public key bdppuser1@local node name
Reflect the settings to .ssh/authorized_keys of the user bdppuser1 home directory at all slave nodes and development servers.
bdppuser1 executes the following:
$ xargs -ti ssh {} "umask 0077; mkdir -p .ssh; cat /mnt/pdfs/hadoop/user/bdppuser1/.pdfs/authorized_keys >> .ssh/authorized_keys" < /etc/hadoop/slaves $ ssh develop "umask 0077; mkdir -p .ssh; cat /mnt/pdfs/hadoop/user/bdppuser1/.pdfs/authorized_keys >> .ssh/authorized_keys"
The settings can be checked by performing remote execution as follows:
$ echo /dummy | ssh -o IdentityFile=/mnt/pdfs/hadoop/user/bdppuser1/.pdfs/id_hadoop -o StrictHostKeyChecking=no -o BatchMode=no remote node name /opt/FJSVpdfs/sbin/pdfscachelocal.sh
2 (*1)
(*1) Normally "2" is output.
MapReduce job user authentication key settings
The tasks here are not required if "SSH authentication key settings to be used by the cache local function" were implemented.
Since authentication key checking is performed during the MapReduce job user authentication of DFS, an authentication key file must be created in advance. Use the pdfs.security.authorization property in the pdfs-site.xml file to set whether or not job user authentication is used. The default is "false" (disabled).
Note that the tasks below are not required if "false" (disabled) is set for pdfs.security.authorization.
Refer to "pdfs-site.xml file" for information on the pdfs.security.authorization property.
Example
If the user is bdppuser1 and the DFS mount directory is /mnt/pdfs:
Since the settings are shared at all nodes, bdppuser1 executes as shown below at any of the nodes where DFS is mounted.
$ cd ~ $ cat > id_hadoop any keyword character string
ctrl-d $ hadoop fs -mkdir .pdfs $ hadoop fs -chmod 700 .pdfs $ hadoop fs -moveFromLocal id_hadoop .pdfs/ $ hadoop fs -chmod 600 .pdfs/id_hadoop $