Sets a function to monitor if or not possible to communicate with a GS/SURE system (the other end of communication), that becomes the other end of communication when operating GS/SURE linkage mode. To set monitor-to, use the "hanetobserv" command. See "7.5 hanetobserv Command" as to how to set it. To set an interval to monitor, use the "hanetpoll" command. See "7.7 hanetpoll Command" as to how to set it.
Note
It is necessary to set GS/SURE linkage mode (the operation mode is "c") before executing this setting.
If the local system is running on a clustered system, it switches a node when GS/SURE system (remote host) stops. During this process, if no response is returned from any of the defined monitored remote system by executing "hanetobserv" command, it is recognized as a local NIC failure and it switches the node. Moreover, even though all the GS/SURE system (remote host) stops operating, all monitored remote system does not return responses, and there occurred an unnecessary switching. To avoid this, it is possible to interoperate operational node and standby node to monitor network failures. So that if all the remote system stops operating, it does not mistakenly switch the node.
If operating the cluster, use the "hanetobserv" command to monitor from both operational node and standby command. Keep in mind that since it is necessary to identify the remote node from both operational and standby node, a take-over IP address must be used for a virtual IP address.
This section describes the transfer route error detection sequence.
In GS/SURE linkage mode, issue the ping command for the real IP address of a target that you set with the remote host monitoring function and for the IP addresses of other nodes in the cluster. The time it takes for an error to be detected is as follows:
Error detection time:
Error detection time = monitoring interval (in seconds) X (monitoring frequency - 1) + ping time out period (*1) |
*1: If the monitoring interval is 1 second, ping time out period would be 1 second, otherwise, ping time out period would be 2 seconds.
The default value would look like the following:
5 sec x (5 times - 1) + 2 sec = 22 sec
Note that if the target detects an error first, it will determine that an error has occurred on the transfer route without waiting for the error detection by ping monitoring.
The settings for the error detection time can be changed by using the "hanetpoll" command. For more details on how to make settings, see "7.7 hanetpoll Command".
Figure 3.9 Transfer path error detection sequence
Information
Ping monitoring is performed at regular intervals (in seconds). The maximum interval of time required between the time the monitoring destination fails and the time the next ping is sent. Therefore, it takes up to 27 seconds (22 seconds + 5 seconds by default) to detect the failure after a failure has occurred.
If applications monitor the network, configure the monitoring time so that an error should not be detected while Redundant Line Control function is switching the transfer route.
Note
If no response after the ping command run for 30 seconds, the hang-up will be detected and it will be determined that an error has occurred on the transfer route before running the command again. The hang-up can only be detected when the patch 914233-10 or later is applied in Solaris 10 environment.
This section describes the transfer route recovery detection sequence.
In GS/SURE linkage mode, issue the ping command for the real IP address of the target that you set with the remote host monitoring function. After the transfer route error has been detected, Redundant Line Control function performs recovery monitoring by ping to monitor the state of the recovery of transfer route. The time it takes for recovery to be detected is as follows:
Recovery detection time:
Recovery detection time = recovery monitoring interval (in seconds) |
Note that if the target detects the recovery first, it will determine that the transfer route has recovered without waiting for the recovery detection by ping monitoring.
The settings for the error detection time can be changed by using the "hanetpoll" command. For more details on how to make settings, see "7.7 hanetpoll Command".
Figure 3.10 Transfer path recovery detection sequence