Top
PRIMECLUSTER Concepts Guide 4.3
FUJITSU Software

3.2.3 Properties

When designing cluster interconnects, it is important for users to consider the following:

3.2.3.1 Bandwidth

PRIMECLUSTER does not require much bandwidth for its own use. PRIMECLUSTER requires less than 0.002 Mbps on each of the cluster interconnects.
For this reason, there is no need to consider the bandwidth for the following conditions:

Refer to the table below as an example of bandwidth use. Suppose that in the configuration shown in Figure 3.2 Typical four-node cluster that there are two 100 Mbps Ethernets configured for the cluster interconnect. Assume that the available bandwidth for each cluster interconnect is 80 Mbps, and assume that the end-user application needs 36 Mbps on each node for the cluster file system and other activities. (This is an example - the actual bandwidth used by an application varies, depending on the application.)

Table 3.1 Example of interconnects with two 100 Mbps Ethernet boards

Item

Bandwidth

Total bandwidth

100 Mbps
Ethernet x 2

80 Mbps

160 Mbps
(= 2 Interconnects x 80 Mbps)

PRIMECLUSTER requirements

0.002 Mbps

0.016 Mbps
(= 4 Nodes x 2 Interconnects x 0.002 Mbps)

User application requirements

36 Mbps

144 Mbps
(= 36 Mbps x 4 Nodes)

Total use = (PRIMECLUSTER requirements + User application requirements) / Total bandwidth of 100Mbps ethernet * 100  = (0.016 + 144) / 160 * 100 = 90%

For this example, two fast-Ethernet interconnects use over 90 percent of the bandwidth.

Note

It is recommended that an initial installation has at least 30 percent available bandwidth capacity because the latency of the interconnect is increased and heartbeat disconnection may be accidentally detected when total use nears 100 percent.

For this example, workload and configuration, one additional fast-Ethernet interconnect should be added to provide the excess capacity. The table below shows the same calculation with this addition.

Table 3.2 Example of interconnects with three 100 Mbps Ethernet boards

Item

Bandwidth

Total bandwidth

100 Mbps
Ethernet x 3

80 Mbps

240 Mbps
(= 3 Interconnects x 80 Mbps)

PRIMECLUSTER requirements

0.002 Mbps

0.024 Mbps
(= 4 Nodes x 3 Interconnects x 0.002 Mbps)

User application requirements

36 Mbps

144 Mbps
(= 36 Mbps x 4 Nodes)

Total use = (PRIMECLUSTER requirements + User application requirements) /  Total bandwidth of 100Mbps ethernet * 100 = (0.024 + 144) / 240 * 100 = 60%

This new configuration gives a comfortable 40 percent available bandwidth margin. PRIMECLUSTER supports a maximum of eight interconnect devices.

3.2.3.2 Latency

As previously stated, PRIMECLUSTER relies on heartbeat requests and responses to determine that nodes or other resources are functional. When a heartbeat is not received in a preset interval, PRIMECLUSTER starts recovery actions. The Cluster Foundation (CF) software on each node sends a heartbeat request every 200 ms on each interconnect to every other node in the cluster. A heartbeat request is sent 50 times in every 200 ms until the timeout period (default: 10 seconds). If there is no response from the target node, CF will mark that node as LEFTCLUSTER.

The 200 ms interval is a reasonable design for a maximum latency in the cluster interconnects. This interval is long enough so that a small message and response can span transcontinental distances. This interval is also fixed and cannot be changed.

3.2.3.3 Reliability

Ethernet as an interconnect technology has not shown any problems with PRIMECLUSTER. The communications protocol used by PRIMECLUSTER is ICF. ICF guarantees that messages are delivered correctly and in order to its clients. However, ICF was designed with fairly reliable communications in mind. When the cluster interconnect is reliable, ICF has very low overhead, but when it is unreliable, the overhead of ICF increases. This is similar to other protocols like TCP/IP; errors in the interconnect will result in messages being resent.

Note

  • Resending messages consumes bandwidth and increases latency and should be avoided at all times.

  • An Ethernet error rate greater than 1 error per 1,000,000 bytes indicates that there is some problem with the Ethernet layer that should be investigated. (Use the command netstat(1) or ip(1)to find the error rate.)

3.2.3.4 Device interface (Solaris)

Note

This section is for Solaris only.

PRIMECLUSTER depends on the DLPI (Data Link Provider Interface) for devices in Solaris. If a device does not support a DLPI interface, PRIMECLUSTER does not recognize the device as eligible for use as a cluster interconnect. In addition, the device must appear to be an Ethernet device. Some devices support TCP/IP, but are not Ethernet-type devices. Keep in mind that PRIMECLUSTER does not use TCP/IP for its cluster interconnects; rather it uses the Ethernet protocols. It is advisable to choose an interconnect device from the list of supported devices (refer to "Software Release Guide PRIMECLUSTER" and "PRIMECLUSTER Installation Guide").

3.2.3.5 Security

With the PRIMECLUSTER family of products, it is assumed that the cluster interconnects are private networks; however, it is possible to use public networks as cluster interconnects because ICF does not interfere with other protocols running on the physical media. The security model for running PRIMECLUSTER depends on physical separation of the cluster interconnect networks.

Note

For reasons of security, it is strongly recommended not to use public networks for the cluster interconnects.

The use of public networks for the cluster interconnects allows any machine on that public network to join the cluster (assuming that it is installed with the PRIMECLUSTER products). Once joined, an unauthorized user, through the node, would have full access to all cluster services.