Top
Interstage Big DataParallel Processing ServerV1.0.1 User's Guide
FUJITSU Software

6.1.3 DFS Setup

The DFS setup sequence is as follows:

  1. Checking Shared Disk Settings

  2. Checking Cluster Status

  3. Management Partition Initialization

  4. Registering DFS Management Server Information to the Management Partition

  5. Starting the pdfsfrmd daemon

  6. Creating the File System

  7. Setting the User ID for Executing MapReduce

  8. Registering the DFS Client Information

  9. Creating the Mount Point and Configure fstab Settings

  10. Mounting

  11. Generating the DFS File System Configuration Information


The DFS setup sequence is as follows:

Perform the setup by first installing the primary master server and then the secondary master server, in order to build a master server replicated configuration.

Point

The table below provides examples of information such as the device name specified in setup with the following file system configuration. Refer to the actual information for your system when you are setting up.

  • Management partition

:/dev/disk/by-id/scsi-1FUJITSU_300000370105

  • Representative partition

:/dev/disk/by-id/scsi-1FUJITSU_300000370106

  • File data partition

:/dev/disk/by-id/scsi-1FUJITSU_300000370107
/dev/disk/by-id/scsi-1FUJITSU_300000370108

  • File system ID

:1

  • Logical file system name

:pdfs1

  • Master server

: master1 (primary), master2 (secondary)

  • Slave server

:slave1, slave2, slave3, slave4, slave5

  • Development server

:develop

  • Collaboration server

:collaborate


6.1.3.1 Checking Shared Disk Settings

A by-id name generated by the udev function is used for shared disk device names.

Use either the udevinfo or udevadm command to ascertain the by-id name from the conventional compatible device name.

An example of checking the by-id name is shown below.

Example

Determining device names from compatible device names using by-id name

  • Under Red Hat(R) Enterprise Linux(R) 5:

    # udevinfo -q symlink -n /dev/sdb <Enter>
    disk/by-id/scsi-1FUJITSU_300000370105
    # udevinfo -q symlink -n /dev/sdc <Enter>
    disk/by-id/scsi-1FUJITSU_300000370106
    # udevinfo -q symlink -n /dev/sdd <Enter>
    disk/by-id/scsi-1FUJITSU_300000370107
    # udevinfo -q symlink -n /dev/sde <Enter>
    disk/by-id/scsi-1FUJITSU_300000370108
  • Under Red Hat(R) Enterprise Linux(R) 6:

    # udevadm info -q symlink -n /dev/sdb <Enter>
    block/8:48 disk/by-id/scsi-1FUJITSU_300000370105
    # udevadm info -q symlink -n /dev/sdc <Enter>
    block/8:48 disk/by-id/scsi-1FUJITSU_300000370106
    # udevadm info -q symlink -n /dev/sdd <Enter>
    block/8:48 disk/by-id/scsi-1FUJITSU_300000370107
    # udevadm info -q symlink -n /dev/sde <Enter>
    block/8:48 disk/by-id/scsi-1FUJITSU_300000370108

Note

  • In order to use the by-id name checked using the udevinfo or udevadm command, "/dev/" must be added at the start of the name.

  • If shared disk partition information is changed using the fdisk, parted, or similar command, refer to "4.2.4 Partition Information of Shared Disk Device Modified with fdisk(8) is not Reflected" in the "Primesoft Distributed File System for Hadoop V1 User's Guide" and refresh the partition information at all servers.

See

Refer to the online manual pages for details of the udevinfo and udevadm commands.

Point

  • The by-id name is a device name generated from the unique identification information set in the hard disk.
    Use of the by-id names enables each server to always use the same device name to access a specific disk.

  • The DFS management partition can operate in either Logical Unit (physical) units or disk partition (logical) units. If volume copy using ETERNUS SF AdvancedCopy Manager is performed, take into account the device units supported by ETERNUS SF AdvancedCopy Manager.
    Refer to the "ETERNUS SF AdvancedCopy Manager Operation Guide" for ETERNUS SF AdvancedCopy Manager details.


6.1.3.2 Checking Cluster Status

When creating a master server replicated configuration, before creating a management partition, be sure to check that a cluster partition has not occurred.

Perform this action on both the primary master server and the secondary master server.

  1. Execute the cftool(1M) command.
    Confirm that the displayed state (in the State column) is the same for the two master servers.

    # cftool -n <Enter>
    Node    Number State       Os      Cpu
    master1 1      UP          Linux   EM64T
    master2 2      UP          Linux   EM64T

    See

    Refer to the online help of cftool for details of the cftool(1M) command.

  2. If the display result is not identical on the two master servers, cluster partitioning has occurred.
    Cancel the cluster partitioning if this is the case.

    See

    Refer to the "4.2.1 Corrective Action when the pdfsfrmd Daemon Does Not Start" in the "Primesoft Distributed File System for Hadoop V1 User's Guide" for information on canceling cluster partitions.


6.1.3.3 Management Partition Initialization

Initialize the management partition.

Perform this action on the primary master server.

Specify the -c option and the path name of the management partition in the pdfssetup command and execute.

# pdfssetup -c /dev/disk/by-id/scsi-1FUJITSU_300000370105 <Enter>

See

Refer to pdfssetup under "Appendix A Command Reference" in the "Primesoft Distributed File System for Hadoop V1 User's Guide" for details of the pdfssetup command.


6.1.3.4 Registering DFS Management Server Information to the Management Partition

Register master server information to the management partition.

This must be done on the primary master server first and then the secondary master server in order to build a master server replicated configuration.

  1. Register master server information to the management partition.

    Specify the -a option in the pdfssetup command and execute.

    # pdfssetup -a /dev/disk/by-id/scsi-1FUJITSU_300000370105 <Enter>
  2. Check the registered master server information.
    The registered information can be checked by executing the pdfssetup command without any options specified.

    # pdfssetup <Enter>
    HOSTID      CIPNAME     MP_PATH
    80380000    master1RMS     yes
    80380001    master2RMS     yes

    The management partition path name that has been set can be checked by executing the pdfssetup command with the -p option specified.

    # pdfssetup -p <Enter>
    /dev/disk/by-id/scsi-1FUJITSU_300000370105

6.1.3.5 Starting the pdfsfrmd daemon

Start the pdfsfrmd daemon in order to start operations.

This must be done on the primary master server first and then the secondary master server in order to build a master server replicated configuration.

# pdfsfrmstart <Enter>

See

Refer to pdfsfrmstart under "Appendix A Command Reference" in the "Primesoft Distributed File System for Hadoop V1 User's Guide" for details of the pdfsfrmstart command.


6.1.3.6 Creating the File System

Create the DFS in the partitions to be used.

Perform this action on the primary master server.

  1. Create the file system.

    Specify the following options and the representative partition in the pdfsmkfs command and execute.

    • dataopt option

      Specify y to separate the file data area from the representative partition.

    • blocksz option

      Specify the data block size. 8388608 (8MB) is recommended.

    • data option

      Specify the path names of the file data partitions separated with commas.

    • node option

      Specify the host name of the master server (the host name corresponding to the NIC that connects public LAN).

      Separate the primary master server and the secondary master server with a comma when building a master server replicated configuration. This option can be omitted if a master server replicated configuration is not required.

      Note

      Specify the node option in the final option.

    When making replicated configuration for the master server:

    # pdfsmkfs -o dataopt=y,blocksz=8388608,data=/dev/disk/by-id/scsi-1FUJITSU_300000370107,data=/dev/disk/by-id/scsi-1FUJITSU_300000370108,node=master1,master2 /dev/disk/by-id/scsi-1FUJITSU_300000370106 <Enter>

    See

    • Refer to "3.3.2 File System Configuration Design" for information on data block size.

    • Refer to "pdfsmkfs" in the "Appendix A Command Reference" of the "Primesoft Distributed File System for Hadoop V1 User's Guide" for information on the pdfsmkfs command.

  2. Confirm the file system information created.

    # pdfsinfo /dev/disk/by-id/scsi-1FUJITSU_300000370106 <Enter>
    /dev/disk/by-id/scsi-1FUJITSU_300000370106: 
    FSID special                                             size Type mount
       1 /dev/disk/by-id/scsi-1FUJITSU_300000370106 (864)   25418 META -----
       1 /dev/disk/by-id/scsi-1FUJITSU_300000370106 (864)    5120  LOG -----
       1 /dev/disk/by-id/scsi-1FUJITSU_300000370107 (880) 7341778 DATA -----
       1 /dev/disk/by-id/scsi-1FUJITSU_300000370108 (896) 6578704 DATA -----

6.1.3.7 Setting the User ID for Executing MapReduce

Users must be set to the DFS in order for mapred users to execute Hadoop JobTracker and TaskTracker.

This section describes the procedure for setting mapred users to the DFS.

Perform this action on the primary master server.

  1. Set the user ID.
    Use the pdfsadm command to set the user ID for executing MapReduce in the MAPRED variable.

    # pdfsadm -o MAPRED=mapred /dev/disk/by-id/scsi-1FUJITSU_300000370106 <Enter>
  2. Check that the user ID has been set.

    # pdfsinfo -e /dev/disk/by-id/scsi-1FUJITSU_300000370106 <Enter>
    /dev/disk/by-id/scsi-1FUJITSU_300000370106: 
    MAPRED=mapred

See

Refer to "pdfsadm" under "Appendix A Command Reference" in the "Primesoft Distributed File System for Hadoop V1 User's Guide" for the method for deleting a set MAPRED variable and other pdfsadm command details.


6.1.3.8 Registering the DFS Client Information

Register the information for the slave servers, development servers and collaboration servers (DFS client information) in the connection authorization list.

Create and register the connection authorization list on the primary master server and then distribute to the secondary master server in order to build a master server replicated configuration.

Information

The DFS manages the DFS clients that can connect to the master server (MDS).

Create a connection authorization list on the master server and register the host names of the server that will connect with the DFS.

  1. Check the file system ID.
    Check the target file system ID in the file system information recorded in the management partition.

    # pdfsinfo /dev/disk/by-id/scsi-1FUJITSU_300000370106 <Enter>
    /dev/disk/by-id/scsi-1FUJITSU_300000370106: 
    FSID special                                             size Type mount
       1 /dev/disk/by-id/scsi-1FUJITSU_300000370106 (864)   25418 META -----
       1 /dev/disk/by-id/scsi-1FUJITSU_300000370106 (864)    5120  LOG -----
       1 /dev/disk/by-id/scsi-1FUJITSU_300000370107 (880) 7341778 DATA -----
       1 /dev/disk/by-id/scsi-1FUJITSU_300000370108 (896) 6578704 DATA -----
  2. Create a file listing approved connections.

    # cd /etc/pdfs <Enter>
    # cp ./server.conf.sample server.conf.1 <Enter>

    Note

    Place the connection authorization list file under /etc/pdfs at the master server.

    In the connection authorization list file name, change only the file system ID part, not the other part (server.conf.).

  3. Register the host names (the host names corresponding to the NICs connected to the public LAN) of servers (slave servers, development servers, and collaboration servers) permitted to connect in the connection authorization list file.
    Use the following format to enter the names:

    CLIENT hostNameToBePermittedToConnect

    Example

    When permitting connection of slave1, slave2, slave3, slave4, slave5, develop, and collaborate:

    # cat /etc/pdfs/server.conf.1 <Enter>
    #
    # Copyright (c) 2012 FUJITSU LIMITED. All rights reserved.
    #
    #   /etc/pdfs/server.conf.<FSID>
    #
    # List of client hostnames of a file system.
    #
    # Notes: 
    #   Do not describe hostnames of management servers.
    #
    # example: 
    #CLIENT nodeac1
    #CLIENT nodeac2
    #CLIENT nodeac3
    #CLIENT nodeac4
    #CLIENT nodeac5
    CLIENT develop    <-- development environment server you are adding
    CLIENT collaborate  <-- collaboration server you are adding
    CLIENT slave1    <-- slave server you are adding
    CLIENT slave2    <-- slave server you are adding
    CLIENT slave3    <-- slave server you are adding
    CLIENT slave4    <-- slave server you are adding
    CLIENT slave5    <-- slave server you are adding
  4. Check the content of the connection authorization list file.
    Mount is not possible at the master server if there is an error in the connection authorization list. Therefore, check the following:

    • The total number of Master servers, slave servers, development servers and collaboration servers does not exceed the maximum number of shared servers.

    • The specified slave Server, development Server and collaboration Server hosts can be referenced correctly via the network.

    • The slave server, development server and collaboration server specifications are not duplicated.

  5. Distribute the updated connection authorization list file to the master servers. (only when creating a master server replicated configuration)

    # cd /etc/pdfs <Enter>
    # scp -p ./server.conf.1 root@master2:/etc/pdfs/server.conf.1 <Enter>

6.1.3.9 Creating the Mount Point and Configure fstab Settings

Create the mount point and add the DFS entry to /etc/fstab.

This must be done on both the primary master server and the secondary master server in order to build a master server replicated configuration.

Creating the mount point

Create the mount point for mounting the disk partitions on the storage system used as the DFS.

The mount point created must be the same as the value specified in the BDPP_PDFS_MOUNTPOINT parameter in bdpp.conf.

Example

Create the mount point "pdfs" under "/mnt".

# mkdir /mnt/pdfs <Enter>

fstab settings

Add the DFS entry to /etc/fstab.

The parameters specified in the fields for the added entry are as follows:

Example

This example shows the mount point "/mnt/pdfs" and DFS representative partitions defined at "/etc/fstab".

LABEL=/                 /                       ext3    defaults        1 1
LABEL=/boot             /boot                   ext3    defaults        1 2
tmpfs                   /dev/shm                tmpfs   defaults        0 0
devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
sysfs                   /sys                    sysfs   defaults        0 0
proc                    /proc                   proc    defaults        0 0
LABEL=SWAP-sda3         swap                    swap    defaults        0 0

/dev/disk/by-id/scsi-1FUJITSU_300000370106    /mnt/pdfs       pdfs    noauto,noatime   0 0

Note

The entries in /etc/fstab must be the same on both the primary master server and the secondary master server in order to build a master server replicated configuration.


6.1.3.10 Mounting

Mount the DFS file system.

Perform this action only on the primary master server.

# pdfsmntgl /dev/disk/by-id/scsi-1FUJITSU_300000370106 <Enter>

Note

  • The pdfsmntgl command mounts the DFS on all DFS management servers. For this reason, it is not necessary to mount on the secondary master server.

  • Ensure that the DFS mount sequence is: master servers, slave servers, development servers and collaboration servers. If the slave servers, development servers or collaboration servers are mounted first, then mount will fail, because the master server (MDS) does not exist.


6.1.3.11 Generating the DFS File System Configuration Information

Generate the DFS configuration information file on the master server.

Perform this action on the primary master server.

  1. Check the file system ID.
    Check the target file system ID in the file system information recorded in the management partition.

    # pdfsinfo /dev/disk/by-id/scsi-1FUJITSU_300000370106 <Enter>
    /dev/disk/by-id/scsi-1FUJITSU_300000370106: 
    FSID special                                             size Type mount
       1 /dev/disk/by-id/scsi-1FUJITSU_300000370106 (864)   25418 META -----
       1 /dev/disk/by-id/scsi-1FUJITSU_300000370106 (864)    5120  LOG -----
       1 /dev/disk/by-id/scsi-1FUJITSU_300000370107 (880) 7341778 DATA -----
       1 /dev/disk/by-id/scsi-1FUJITSU_300000370108 (896) 6578704 DATA -----
  2. Generate the DFS configuration file with the pdfsmkconf command.

    # pdfsmkconf <Enter>
  3. Convert the generated configuration information file name to a logical file system name from the file system ID.

    # cd pdfsmkconf_out <Enter>
    # mv ./client.conf.1 client.conf.pdfs1 <Enter>

    Note

    The DFS configuration information file is created as pdfsmkconf_out/client.conf.fsid in the directory where the pdfsmkconf command is executed.

    Other than the file system ID part (client.conf), do not change the name of the DFS configuration file.

  4. Confirm that the user ID for executing MapReduce that was set in "6.1.3.7 Setting the User ID for Executing MapReduce" has been set properly in the DFS configuration information file.

    # cat ./client.conf.pdfs1 <Enter>
    FSID 1
    MDS master1 29000
    MDS master2 29000
    DEV /dev/disk/by-id/scsi-1FUJITSU_300000370107 0 7341778
    DEV /dev/disk/by-id/scsi-1FUJITSU_300000370108 0 6578704
    MAPRED mapred 

Point

The generated DFS configuration information files must be placed in /etc/pdfs on the DFS clients (slave servers, development servers, and collaboration servers).

When setting up the slave servers, development servers, and collaboration servers explained later, distribute the DFS configuration information file on the master server to each of the servers.

See

Refer to the pdfsmkconf command under "Appendix A Command Reference" in the "Primesoft Distributed File System for Hadoop V1 User's Guide" for details of pdfsmkconf.