Make the following decisions when designing the configuration of the file system.
Estimate the total size of the file data to be stored in the DFS, then based on that, estimate the capacity required by the latter on shared disk devices.
Area type | Estimation method |
---|---|
(a) File data area | Space required for file data |
(b) Metadata area | If file data area <= 1 TB: 100 GB |
(c) Update log area | Estimation is not required because it is contained within the metadata area. |
(d) Management partition | Constant value (no estimation required): 1 GB |
Total | (a)+(b)+(d) |
Determine the shared disk device partitions to be used in the DFS based on the size estimated above, and check the device name.
Determine the areas to be used as the management partitions, representative partitions, and file data partitions of the multiple partitions available.
Example
Examples of the partitions for the areas that comprise the file systems:
- Management partition | :/dev/disk/by-id/scsi-1FUJITSU_300000370105 |
- Representative partition | :/dev/disk/by-id/scsi-1FUJITSU_300000370106 |
- File data partitions: | :/dev/disk/by-id/scsi-1FUJITSU_300000370107 |
Point
Separate the metadata area and file data area and allocate them to separate partitions. The management partition also needs a separate partition.
The metadata area and file data area of the DFS can be deployed to a single partition, but by deploying them to separate partitions, the I/O distribution can result in improved throughput.
If the DFS is constructed with multiple partitions, representative partitions are used as the partitions for the metadata area.
A maximum of 256 partitions can be used.
A by-id name generated by the udev function is used for shared disk device names (refer to "6.1.3.1 Checking Shared Disk Settings" for details).
Information
Deploying the metadata area and file data area to separate partitions distributes the I/O processes to each area, thus avoiding conflicts.
Specify separation of the file data area using pdfsmkfs with the dataopt option when creating the file system.
If it is likely that the DFS size will be extended, allow for future extensions by estimating the maximum extended size when creating the file system.
Specify the maximum size using pdfsmkfs with the maxdsz option when creating the file system.
See
Refer to "pdfsmkfs" in the "Appendix A Command Reference" of the "Primesoft Distributed File System for Hadoop V1 User's Guide" for information on the size of the file data area.
The blocksz option of the pdfsmkfs command can be used to specify the data block size during DFS creation. Specification of the data block size enables contiguous allocation of shared disk device area, which enables efficient input-output processing.
A data block size of 8 MB is recommended if the DFS is used by Hadoop.
Note
If 8 MB is specified as the data block size, an 8 MB area is used on the shared disk even if the file size is less than 8 MB.
If a large quantity of small-sized files are stored, give priority to space efficiency and do not specify a data block size.
See
Refer to "3.3.2.1 Relationship between File System Size, Data Block Size and Maximum File Size" for the relationship between the data block size the maximum file system size, and maximum file size.
If a file system is created without specifying a data block size value, the data block size is calculated automatically on the basis of the file data area size or the maximum size of the partitions comprising the file system. The greater the file system size, the greater the data block size. The greater the data block size, the greater the maximum file size.
The table below shows the relationship between the file system size, the data block size and the maximum file size when a file system is created without the data block size being specified.
File system size | Data block size | Maximum file size |
---|---|---|
to 1TB | 8 KB | 1 TB - 8 KB |
(1 TB + 1 Byte) to 2TB | 8 KB to 16 KB | (1 TB - 8 KB) to (2 TB - 16 KB) |
(2 TB + 1 Byte) to 4 TB | 8 KB to 32 KB | (1 TB - 8 KB) to (4 TB - 32 KB) |
(4 TB + 1 Byte) to 8 TB | 8 KB to 64 KB | (1 TB - 8 KB) to (8 TB - 64 KB) |
(8 TB + 1 Byte) to 16 TB | 16 KB to 128 KB | (2 TB - 16 KB) to (16 TB - 128 KB) |
(16 TB + 1 Byte) to 32TB | 32 KB to 256 KB | (4 TB - 32 KB) to (32 TB - 256 KB) |
(32 TB + 1 Byte) to 64 TB | 64 KB to 512 KB | (8 TB - 64 KB) to (64 TB - 512 KB) |
(64 TB + 1 Byte) to 128 TB | 128 KB to 1 MB | (16 TB - 128 KB) to (128 TB - 1 MB) |
(128 TB + 1 Byte) to 256 TB | 256 KB to 1 MB | (32 TB - 256 KB) to (128 TB - 1 MB) |
(256 TB + 1 Byte) to 512 TB | 512 KB to 1 MB | (64 TB - 512 KB) to (128 TB - 1 MB) |
(512 TB + 1 Byte) to 1 PB | 1 MB | 128 TB - 1 MB |
(1 PB + 1 Byte) to 2 PB | 2 MB | 256 TB - 2 MB |
Note
The file system size is not just the file data area size. It also includes the sizes of areas such as the metadata area and the update log area. Therefore, near the boundary values for file system sizes in the above table, the data block size values might be one step smaller.
See
Refer to "B.3 Limit Values" in the "Primesoft Distributed File System for Hadoop V1 User's Guide" for the file system maximum values.
The file system data block size can be changed during file system creation. Refer to pdfsmkfs under "Appendix A Command Reference" in the "Primesoft Distributed File System for Hadoop V1 User's Guide" for details.