Top
Interstage Big DataParallel Processing ServerV1.0.1 User's Guide
FUJITSU Software

3.3.2 File System Configuration Design

Make the following decisions when designing the configuration of the file system.


Capacity to be allocated to shared disk devices

Estimate the total size of the file data to be stored in the DFS, then based on that, estimate the capacity required by the latter on shared disk devices.

Area type

Estimation method

(a) File data area

Space required for file data

(b) Metadata area

If file data area <= 1 TB: 100 GB
If file data area > 1 TB: 300 GB

(c) Update log area

Estimation is not required because it is contained within the metadata area.

(d) Management partition

Constant value (no estimation required): 1 GB

Total

(a)+(b)+(d)


Partitions used

Determine the shared disk device partitions to be used in the DFS based on the size estimated above, and check the device name.

Determine the areas to be used as the management partitions, representative partitions, and file data partitions of the multiple partitions available.

Example

Examples of the partitions for the areas that comprise the file systems:

- Management partition

:/dev/disk/by-id/scsi-1FUJITSU_300000370105

- Representative partition

:/dev/disk/by-id/scsi-1FUJITSU_300000370106

- File data partitions:

:/dev/disk/by-id/scsi-1FUJITSU_300000370107
/dev/disk/by-id/scsi-1FUJITSU_300000370108

Point

  • Separate the metadata area and file data area and allocate them to separate partitions. The management partition also needs a separate partition.

  • The metadata area and file data area of the DFS can be deployed to a single partition, but by deploying them to separate partitions, the I/O distribution can result in improved throughput.
    If the DFS is constructed with multiple partitions, representative partitions are used as the partitions for the metadata area.

  • A maximum of 256 partitions can be used.

  • A by-id name generated by the udev function is used for shared disk device names (refer to "6.1.3.1 Checking Shared Disk Settings" for details).

Information

Deploying the metadata area and file data area to separate partitions distributes the I/O processes to each area, thus avoiding conflicts.
Specify separation of the file data area using pdfsmkfs with the dataopt option when creating the file system.


Size of the file data area

If it is likely that the DFS size will be extended, allow for future extensions by estimating the maximum extended size when creating the file system.

Specify the maximum size using pdfsmkfs with the maxdsz option when creating the file system.

See

Refer to "pdfsmkfs" in the "Appendix A Command Reference" of the "Primesoft Distributed File System for Hadoop V1 User's Guide" for information on the size of the file data area.


Data block size

The blocksz option of the pdfsmkfs command can be used to specify the data block size during DFS creation. Specification of the data block size enables contiguous allocation of shared disk device area, which enables efficient input-output processing.

A data block size of 8 MB is recommended if the DFS is used by Hadoop.

Note

If 8 MB is specified as the data block size, an 8 MB area is used on the shared disk even if the file size is less than 8 MB.
If a large quantity of small-sized files are stored, give priority to space efficiency and do not specify a data block size.

See

Refer to "3.3.2.1 Relationship between File System Size, Data Block Size and Maximum File Size" for the relationship between the data block size the maximum file system size, and maximum file size.


3.3.2.1 Relationship between File System Size, Data Block Size and Maximum File Size

If a file system is created without specifying a data block size value, the data block size is calculated automatically on the basis of the file data area size or the maximum size of the partitions comprising the file system. The greater the file system size, the greater the data block size. The greater the data block size, the greater the maximum file size.

The table below shows the relationship between the file system size, the data block size and the maximum file size when a file system is created without the data block size being specified.

Table 3.1 Relationship between file system size, data block size and maximum file size

File system size

Data block size

Maximum file size

to 1TB

8 KB

1 TB - 8 KB

(1 TB + 1 Byte) to 2TB

8 KB to 16 KB

(1 TB - 8 KB) to (2 TB - 16 KB)

(2 TB + 1 Byte) to 4 TB

8 KB to 32 KB

(1 TB - 8 KB) to (4 TB - 32 KB)

(4 TB + 1 Byte) to 8 TB

8 KB to 64 KB

(1 TB - 8 KB) to (8 TB - 64 KB)

(8 TB + 1 Byte) to 16 TB

16 KB to 128 KB

(2 TB - 16 KB) to (16 TB - 128 KB)

(16 TB + 1 Byte) to 32TB

32 KB to 256 KB

(4 TB - 32 KB) to (32 TB - 256 KB)

(32 TB + 1 Byte) to 64 TB

64 KB to 512 KB

(8 TB - 64 KB) to (64 TB - 512 KB)

(64 TB + 1 Byte) to 128 TB

128 KB to 1 MB

(16 TB - 128 KB) to (128 TB - 1 MB)

(128 TB + 1 Byte) to 256 TB

256 KB to 1 MB

(32 TB - 256 KB) to (128 TB - 1 MB)

(256 TB + 1 Byte) to 512 TB

512 KB to 1 MB

(64 TB - 512 KB) to (128 TB - 1 MB)

(512 TB + 1 Byte) to 1 PB

1 MB

128 TB - 1 MB

(1 PB + 1 Byte) to 2 PB

2 MB

256 TB - 2 MB

Note

The file system size is not just the file data area size. It also includes the sizes of areas such as the metadata area and the update log area. Therefore, near the boundary values for file system sizes in the above table, the data block size values might be one step smaller.

See

Refer to "B.3 Limit Values" in the "Primesoft Distributed File System for Hadoop V1 User's Guide" for the file system maximum values.

The file system data block size can be changed during file system creation. Refer to pdfsmkfs under "Appendix A Command Reference" in the "Primesoft Distributed File System for Hadoop V1 User's Guide" for details.