3.2.2 Design the Server Configuration

Make the following design decisions for each server type when designing server configuration.

Environment to install the server

Decide whether to install this product on a physical environment or a virtual environment.

When installing on a virtual environment, install the master server (both primary and secondary), slave servers, and development servers on the virtual environment.

Install the virtualization software and build the virtual environment beforehand.

Note

Apart from the collaboration servers, it is not possible to have a server configuration with a mix of installations on physical environments and virtual environments.

See

Refer to the manuals of the server virtualization software you are using for information on how to build virtual environments.

Master server configuration

Decide whether to have replicated configuration for the master servers.

This product supports replicated configuration for the master server (1 to 1 active/standby type HA cluster configuration). Replicated configuration requires two master servers (a primary and a secondary one). You can configure a system using only one master server if replicated configuration is not necessary.

The replicated configuration is recommended to resolve the single point of failure issue with Hadoop.

Note

When installing on a virtual environment (VMware)

Do not deploy the virtual machine with the master server installed on it to a VMware HA cluster. Also note that the virtual machine with the master server installed on it cannot use features such as VMware vMotion.

Refer to "Appendix H Using PRIMECLUSTER in a VMware Environment" in the "PRIMECLUSTER Installation and Administration Guide 4.3" for points to note when using VMware.

Slave server configuration: This product allows scaling out slave servers, which improves scalability. It is recommended to make an estimate of how many servers are required by performing prototype tests before using the system in a production environment, as the time required for processing depends on factors such as the number of slave servers, the Hadoop applications, and the volume and characteristics of data to be processed. On top of this, determine the maximum number of slave servers, including any future expansion.
Point
A maximum of 128 master servers, slave servers, development servers, and collaboration servers can be set up.

Number of machines in the virtual environment

Decide the number of physical machines to use as the virtual environment.

When many virtual machines are built on one physical machine, you may not achieve the desired results as distributed processing performance reaches a ceiling, determined by things such as the Hadoop application, the amount of data being processed, and other characteristics. For this reason, perform validation using prototypes before using in actual operations, so that you can properly estimate the number of virtual machines for each physical machine.

Note that the virtual machines (guest OS's) where this product is to be installed should be created beforehand.

See

Refer to the manuals of the server virtualization software product for information on how to create virtual machines.

Note

When installing on a virtual environment (KVM)

The name of the virtual machine where the master server is installed (domain name of the guest OS) must match the host name of the virtual machine. Apart from the virtual machine where the collaboration server is installed, the names of the virtual machines should be specified to match the following parameters in bdpp.conf specified during installation:

BDPP_PRIMARY_NAME (primary master server host name)
BDPP_SECONDARY_NAME (secondary master server host name)
BDPP_SERVER_NAME (slave server host name, development server host name)