Top
PRIMECLUSTER Concepts Guide 4.6
FUJITSU Software

3.2.3 Consideration of items during design

When designing cluster interconnects, it is important for users to consider the following:

3.2.3.1 Bandwidth

PRIMECLUSTER does not require much bandwidth for its own use. PRIMECLUSTER requires less than 0.002 Mbps on each of the cluster interconnects.
For this reason, there is no need to consider the bandwidth for the following conditions:

Refer to the table below as an example of bandwidth use. Suppose that in the configuration shown in Figure 3.2 Typical four-node cluster that there are two 100 Mbps Ethernets configured for the cluster interconnect. Assume that the available bandwidth for each cluster interconnect is 80 Mbps, and assume that the end-user application needs 36 Mbps on each node for the cluster file system and other activities. (This is an example. The actual bandwidth used by an application varies, depending on the application.)

Table 3.1 Example of cluster interconnects with two 100 Mbps Ethernet boards

Item

Bandwidth

Total bandwidth

100 Mbps
Ethernet x 2

80 Mbps

160 Mbps

(= 2 Interconnects x 80 Mbps)

PRIMECLUSTER requirements

0.002 Mbps

0.016 Mbps
(= 4 Nodes x 2 Interconnects x 0.002 Mbps)

User application requirements

36 Mbps

144 Mbps
(= 36 Mbps x 4 Nodes)

Total use = (PRIMECLUSTER requirements + User application requirements) / Total bandwidth of 100 Mbps Ethernet boards x 100
= (0.016 + 144) / 160 x 100 = 90%

For this example, two fast-Ethernet interconnects use over 90 percent of the bandwidth.

Note

It is recommended that an initial installation has at least 30 percent available bandwidth capacity because the latency of the cluster interconnect increases and it may cause a false detection of heartbeat failure when total use nears 100 percent.

For this example, workload and configuration, one additional fast-Ethernet interconnect should be added to provide the excess capacity. The table below shows the same calculation with this addition.

Table 3.2 Example of cluster interconnects with three 100 Mbps Ethernet boards

Item

Bandwidth

Total bandwidth

100 Mbps
Ethernet x 3

80 Mbps

240 Mbps

(= 3 Interconnects x 80 Mbps)

PRIMECLUSTER requirements

0.002 Mbps

0.024 Mbps
(= 4 Nodes x 3 Interconnects x 0.002 Mbps)

User application requirements

36 Mbps

144 Mbps
(= 36 Mbps x 4 Nodes)

Total use = (PRIMECLUSTER requirements + User application requirements) / Total bandwidth of 100 Mbps Ethernet boards x 100
= (0.024 + 144) / 240 x 100 = 60%

This new configuration gives a comfortable 40 percent available bandwidth margin, which means that required margins (30 percent or more) are secured. PRIMECLUSTER supports a maximum of four cluster interconnect devices. In the example above, triple redundant cluster interconnects are used.

3.2.3.2 Latency

As previously stated, PRIMECLUSTER relies on heartbeat requests and responses to determine that nodes or other resources are functional. When a heartbeat is not received in a preset interval, PRIMECLUSTER starts recovery actions. The Cluster Foundation (CF) software on each node sends a heartbeat request every 200 ms on each interconnect to every other node in the cluster. A heartbeat request is sent 50 times in every 200 ms until the timeout period (default: 10 seconds). If there is no response from the target node, CF will mark that node as LEFTCLUSTER.

The 200 ms interval is a reasonable design for a maximum latency in the cluster interconnects. This interval is long enough so that a small message and response can span transcontinental distances. This interval is also fixed and cannot be changed.

3.2.3.3 Reliability

Ethernet as an interconnect technology has not shown any problems with PRIMECLUSTER. The communications protocol used by PRIMECLUSTER is ICF. ICF guarantees that messages are delivered correctly and in order to its clients. However, ICF was designed with fairly reliable communications in mind. When the cluster interconnect is reliable, ICF has very low overhead, but when it is unreliable, the overhead of ICF increases. This is similar to other protocols like TCP/IP; errors in the cluster interconnect will result in messages being resent.

Note

  • Resending messages consumes the bandwidth while it also affects the length of response wait time. For these reasons, in order to avoid resending messages, the use of high reliable cluster interconnect is important.

  • An Ethernet error rate greater than 1 error per 1,000,000 bytes indicates that there is some problem with the Ethernet layer that should be investigated. (Use the command netstat(1) or ip(1)to find the error rate.)

3.2.3.4 Device interface (Solaris)

PRIMECLUSTER depends on the DLPI (Data Link Provider Interface) for devices in Solaris. If a device does not support a DLPI interface, PRIMECLUSTER does not recognize the device as eligible for use as a cluster interconnect. In addition, the device must appear to be an Ethernet device. Some devices support TCP/IP, but are not Ethernet-type devices. Keep in mind that PRIMECLUSTER does not use TCP/IP for its cluster interconnects; rather it uses the Ethernet protocols.

3.2.3.5 Security

With the PRIMECLUSTER family of products, it is assumed that the cluster interconnects are private networks; however, it is possible to use public networks as cluster interconnects because ICF does not interfere with other protocols running on the physical media. The security model for running PRIMECLUSTER depends on physical separation of the cluster interconnect networks.

Note

For reasons of security, it is strongly recommended not to use public networks for the cluster interconnects.

The use of public networks for the cluster interconnects allows any machine on that public network to join the cluster (assuming that it is installed with the PRIMECLUSTER products). Once joined, an unauthorized user, through the node, would have full access to all cluster services.