The DFS setup sequence is as follows:
Registering DFS Management Server Information to the Management Partition
Generating the DFS File System Configuration Information
The DFS setup sequence is as follows:
Perform the setup by first installing the primary master server and then the secondary master server, in order to build a master server replicated configuration.
Point
The table below provides examples of information such as the device name specified in setup with the following file system configuration. Refer to the actual information for your system when you are setting up.
| :/dev/disk/by-id/scsi-1FUJITSU_300000370105 |
| :/dev/disk/by-id/scsi-1FUJITSU_300000370106 |
| :/dev/disk/by-id/scsi-1FUJITSU_300000370107 |
| :1 |
| :pdfs1 |
| : master1 (primary), master2 (secondary) |
| :slave1, slave2, slave3, slave4, slave5 |
| :develop |
| :collaborate |
A by-id name generated by the udev function is used for shared disk device names.
Use either the udevinfo or udevadm command to ascertain the by-id name from the conventional compatible device name.
An example of checking the by-id name is shown below.
Example
Determining device names from compatible device names using by-id name
Under Red Hat(R) Enterprise Linux(R) 5:
# udevinfo -q symlink -n /dev/sdb <Enter> disk/by-id/scsi-1FUJITSU_300000370105 # udevinfo -q symlink -n /dev/sdc <Enter> disk/by-id/scsi-1FUJITSU_300000370106 # udevinfo -q symlink -n /dev/sdd <Enter> disk/by-id/scsi-1FUJITSU_300000370107 # udevinfo -q symlink -n /dev/sde <Enter> disk/by-id/scsi-1FUJITSU_300000370108
Under Red Hat(R) Enterprise Linux(R) 6:
# udevadm info -q symlink -n /dev/sdb <Enter> block/8:48 disk/by-id/scsi-1FUJITSU_300000370105 # udevadm info -q symlink -n /dev/sdc <Enter> block/8:48 disk/by-id/scsi-1FUJITSU_300000370106 # udevadm info -q symlink -n /dev/sdd <Enter> block/8:48 disk/by-id/scsi-1FUJITSU_300000370107 # udevadm info -q symlink -n /dev/sde <Enter> block/8:48 disk/by-id/scsi-1FUJITSU_300000370108
Note
In order to use the by-id name checked using the udevinfo or udevadm command, "/dev/" must be added at the start of the name.
If shared disk partition information is changed using the fdisk, parted, or similar command, refer to "4.2.4 Partition Information of Shared Disk Device Modified with fdisk(8) is not Reflected" in the "Primesoft Distributed File System for Hadoop V1 User's Guide" and refresh the partition information at all servers.
See
Refer to the online manual pages for details of the udevinfo and udevadm commands.
Point
The by-id name is a device name generated from the unique identification information set in the hard disk.
Use of the by-id names enables each server to always use the same device name to access a specific disk.
The DFS management partition can operate in either Logical Unit (physical) units or disk partition (logical) units. If volume copy using ETERNUS SF AdvancedCopy Manager is performed, take into account the device units supported by ETERNUS SF AdvancedCopy Manager.
Refer to the "ETERNUS SF AdvancedCopy Manager Operation Guide" for ETERNUS SF AdvancedCopy Manager details.
When creating a master server replicated configuration, before creating a management partition, be sure to check that a cluster partition has not occurred.
Perform this action on both the primary master server and the secondary master server.
Execute the cftool(1M) command.
Confirm that the displayed state (in the State column) is the same for the two master servers.
# cftool -n <Enter> Node Number State Os Cpu master1 1 UP Linux EM64T master2 2 UP Linux EM64T
See
Refer to the online help of cftool for details of the cftool(1M) command.
If the display result is not identical on the two master servers, cluster partitioning has occurred.
Cancel the cluster partitioning if this is the case.
See
Refer to the "4.2.1 Corrective Action when the pdfsfrmd Daemon Does Not Start" in the "Primesoft Distributed File System for Hadoop V1 User's Guide" for information on canceling cluster partitions.
Initialize the management partition.
Perform this action on the primary master server.
Specify the -c option and the path name of the management partition in the pdfssetup command and execute.
# pdfssetup -c /dev/disk/by-id/scsi-1FUJITSU_300000370105 <Enter>
See
Refer to pdfssetup under "Appendix A Command Reference" in the "Primesoft Distributed File System for Hadoop V1 User's Guide" for details of the pdfssetup command.
Register master server information to the management partition.
This must be done on the primary master server first and then the secondary master server in order to build a master server replicated configuration.
Register master server information to the management partition.
Specify the -a option in the pdfssetup command and execute.
# pdfssetup -a /dev/disk/by-id/scsi-1FUJITSU_300000370105 <Enter>
Check the registered master server information.
The registered information can be checked by executing the pdfssetup command without any options specified.
# pdfssetup <Enter> HOSTID CIPNAME MP_PATH 80380000 master1RMS yes 80380001 master2RMS yes
The management partition path name that has been set can be checked by executing the pdfssetup command with the -p option specified.
# pdfssetup -p <Enter> /dev/disk/by-id/scsi-1FUJITSU_300000370105
Start the pdfsfrmd daemon in order to start operations.
This must be done on the primary master server first and then the secondary master server in order to build a master server replicated configuration.
# pdfsfrmstart <Enter>
See
Refer to pdfsfrmstart under "Appendix A Command Reference" in the "Primesoft Distributed File System for Hadoop V1 User's Guide" for details of the pdfsfrmstart command.
Create the DFS in the partitions to be used.
Perform this action on the primary master server.
Create the file system.
Specify the following options and the representative partition in the pdfsmkfs command and execute.
dataopt option
Specify y to separate the file data area from the representative partition.
blocksz option
Specify the data block size. 8388608 (8MB) is recommended.
data option
Specify the path names of the file data partitions separated with commas.
node option
Specify the host name of the master server (the host name corresponding to the NIC that connects public LAN).
Separate the primary master server and the secondary master server with a comma when building a master server replicated configuration. This option can be omitted if a master server replicated configuration is not required.
Note
Specify the node option in the final option.
When making replicated configuration for the master server:
# pdfsmkfs -o dataopt=y,blocksz=8388608,data=/dev/disk/by-id/scsi-1FUJITSU_300000370107,data=/dev/disk/by-id/scsi-1FUJITSU_300000370108,node=master1,master2 /dev/disk/by-id/scsi-1FUJITSU_300000370106 <Enter>
See
Refer to "3.3.2 File System Configuration Design" for information on data block size.
Refer to "pdfsmkfs" in the "Appendix A Command Reference" of the "Primesoft Distributed File System for Hadoop V1 User's Guide" for information on the pdfsmkfs command.
Confirm the file system information created.
# pdfsinfo /dev/disk/by-id/scsi-1FUJITSU_300000370106 <Enter> /dev/disk/by-id/scsi-1FUJITSU_300000370106: FSID special size Type mount 1 /dev/disk/by-id/scsi-1FUJITSU_300000370106 (864) 25418 META ----- 1 /dev/disk/by-id/scsi-1FUJITSU_300000370106 (864) 5120 LOG ----- 1 /dev/disk/by-id/scsi-1FUJITSU_300000370107 (880) 7341778 DATA ----- 1 /dev/disk/by-id/scsi-1FUJITSU_300000370108 (896) 6578704 DATA -----
Users must be set to the DFS in order for mapred users to execute Hadoop JobTracker and TaskTracker.
This section describes the procedure for setting mapred users to the DFS.
Perform this action on the primary master server.
Set the user ID.
Use the pdfsadm command to set the user ID for executing MapReduce in the MAPRED variable.
# pdfsadm -o MAPRED=mapred /dev/disk/by-id/scsi-1FUJITSU_300000370106 <Enter>
Check that the user ID has been set.
# pdfsinfo -e /dev/disk/by-id/scsi-1FUJITSU_300000370106 <Enter> /dev/disk/by-id/scsi-1FUJITSU_300000370106: MAPRED=mapred
See
Refer to "pdfsadm" under "Appendix A Command Reference" in the "Primesoft Distributed File System for Hadoop V1 User's Guide" for the method for deleting a set MAPRED variable and other pdfsadm command details.
Register the information for the slave servers, development servers and collaboration servers (DFS client information) in the connection authorization list.
Create and register the connection authorization list on the primary master server and then distribute to the secondary master server in order to build a master server replicated configuration.
Information
The DFS manages the DFS clients that can connect to the master server (MDS).
Create a connection authorization list on the master server and register the host names of the server that will connect with the DFS.
Check the file system ID.
Check the target file system ID in the file system information recorded in the management partition.
# pdfsinfo /dev/disk/by-id/scsi-1FUJITSU_300000370106 <Enter> /dev/disk/by-id/scsi-1FUJITSU_300000370106: FSID special size Type mount 1 /dev/disk/by-id/scsi-1FUJITSU_300000370106 (864) 25418 META ----- 1 /dev/disk/by-id/scsi-1FUJITSU_300000370106 (864) 5120 LOG ----- 1 /dev/disk/by-id/scsi-1FUJITSU_300000370107 (880) 7341778 DATA ----- 1 /dev/disk/by-id/scsi-1FUJITSU_300000370108 (896) 6578704 DATA -----
Create a file listing approved connections.
# cd /etc/pdfs <Enter> # cp ./server.conf.sample server.conf.1 <Enter>
Note
Place the connection authorization list file under /etc/pdfs at the master server.
In the connection authorization list file name, change only the file system ID part, not the other part (server.conf.).
Register the host names (the host names corresponding to the NICs connected to the public LAN) of servers (slave servers, development servers, and collaboration servers) permitted to connect in the connection authorization list file.
Use the following format to enter the names:
CLIENT hostNameToBePermittedToConnect
Example
When permitting connection of slave1, slave2, slave3, slave4, slave5, develop, and collaborate:
# cat /etc/pdfs/server.conf.1 <Enter> # # Copyright (c) 2012 FUJITSU LIMITED. All rights reserved. # # /etc/pdfs/server.conf.<FSID> # # List of client hostnames of a file system. # # Notes: # Do not describe hostnames of management servers. # # example: #CLIENT nodeac1 #CLIENT nodeac2 #CLIENT nodeac3 #CLIENT nodeac4 #CLIENT nodeac5 CLIENT develop <-- development environment server you are adding CLIENT collaborate <-- collaboration server you are adding CLIENT slave1 <-- slave server you are adding CLIENT slave2 <-- slave server you are adding CLIENT slave3 <-- slave server you are adding CLIENT slave4 <-- slave server you are adding CLIENT slave5 <-- slave server you are adding
Check the content of the connection authorization list file.
Mount is not possible at the master server if there is an error in the connection authorization list. Therefore, check the following:
The total number of Master servers, slave servers, development servers and collaboration servers does not exceed the maximum number of shared servers.
The specified slave Server, development Server and collaboration Server hosts can be referenced correctly via the network.
The slave server, development server and collaboration server specifications are not duplicated.
Distribute the updated connection authorization list file to the master servers. (only when creating a master server replicated configuration)
# cd /etc/pdfs <Enter> # scp -p ./server.conf.1 root@master2:/etc/pdfs/server.conf.1 <Enter>
Create the mount point and add the DFS entry to /etc/fstab.
This must be done on both the primary master server and the secondary master server in order to build a master server replicated configuration.
Creating the mount point
Create the mount point for mounting the disk partitions on the storage system used as the DFS.
The mount point created must be the same as the value specified in the BDPP_PDFS_MOUNTPOINT parameter in bdpp.conf.
Example
Create the mount point "pdfs" under "/mnt".
# mkdir /mnt/pdfs <Enter>
fstab settings
Add the DFS entry to /etc/fstab.
The parameters specified in the fields for the added entry are as follows:
Field 1 (fs_spec)
Specify the representative partition of the DFS to be mounted.
Field 2 (fs_file)
Specify the mount point you created above.
Field 3 (fs_vfstype)
Specify the pdfs.
Field 4 (fs_mntops)
Specify the mount options to be used when mount is performed.
Ensure that the noauto option is specified.
Determine other option specifications as shown below.
Item to be checked | Option to be specified |
---|---|
If either of the following applies:
| noatime |
If not performing mount at DFS management server startup | noatrc |
If mounting DFS as read only | ro |
See
Refer to pdfsmount under "Appendix A Command Reference" in the "Primesoft Distributed File System for Hadoop V1 User's Guide" for mount option details.
Field 5 (fs_freq)
Specify 0.
Field 6 (fs_passno)
Specify 0.
Example
This example shows the mount point "/mnt/pdfs" and DFS representative partitions defined at "/etc/fstab".
LABEL=/ / ext3 defaults 1 1
LABEL=/boot /boot ext3 defaults 1 2
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
LABEL=SWAP-sda3 swap swap defaults 0 0
/dev/disk/by-id/scsi-1FUJITSU_300000370106 /mnt/pdfs pdfs noauto,noatime 0 0
Note
The entries in /etc/fstab must be the same on both the primary master server and the secondary master server in order to build a master server replicated configuration.
Mount the DFS file system.
Perform this action only on the primary master server.
# pdfsmntgl /dev/disk/by-id/scsi-1FUJITSU_300000370106 <Enter>
Note
The pdfsmntgl command mounts the DFS on all DFS management servers. For this reason, it is not necessary to mount on the secondary master server.
Ensure that the DFS mount sequence is: master servers, slave servers, development servers and collaboration servers. If the slave servers, development servers or collaboration servers are mounted first, then mount will fail, because the master server (MDS) does not exist.
Generate the DFS configuration information file on the master server.
Perform this action on the primary master server.
Check the file system ID.
Check the target file system ID in the file system information recorded in the management partition.
# pdfsinfo /dev/disk/by-id/scsi-1FUJITSU_300000370106 <Enter> /dev/disk/by-id/scsi-1FUJITSU_300000370106: FSID special size Type mount 1 /dev/disk/by-id/scsi-1FUJITSU_300000370106 (864) 25418 META ----- 1 /dev/disk/by-id/scsi-1FUJITSU_300000370106 (864) 5120 LOG ----- 1 /dev/disk/by-id/scsi-1FUJITSU_300000370107 (880) 7341778 DATA ----- 1 /dev/disk/by-id/scsi-1FUJITSU_300000370108 (896) 6578704 DATA -----
Generate the DFS configuration file with the pdfsmkconf command.
# pdfsmkconf <Enter>
Convert the generated configuration information file name to a logical file system name from the file system ID.
# cd pdfsmkconf_out <Enter> # mv ./client.conf.1 client.conf.pdfs1 <Enter>
Note
The DFS configuration information file is created as pdfsmkconf_out/client.conf.fsid in the directory where the pdfsmkconf command is executed.
Other than the file system ID part (client.conf), do not change the name of the DFS configuration file.
Confirm that the user ID for executing MapReduce that was set in "6.1.3.7 Setting the User ID for Executing MapReduce" has been set properly in the DFS configuration information file.
# cat ./client.conf.pdfs1 <Enter>
FSID 1
MDS master1 29000
MDS master2 29000
DEV /dev/disk/by-id/scsi-1FUJITSU_300000370107 0 7341778
DEV /dev/disk/by-id/scsi-1FUJITSU_300000370108 0 6578704
MAPRED mapred
Point
The generated DFS configuration information files must be placed in /etc/pdfs on the DFS clients (slave servers, development servers, and collaboration servers).
When setting up the slave servers, development servers, and collaboration servers explained later, distribute the DFS configuration information file on the master server to each of the servers.
See
Refer to the pdfsmkconf command under "Appendix A Command Reference" in the "Primesoft Distributed File System for Hadoop V1 User's Guide" for details of pdfsmkconf.