The SF provides the interface for managing the shutdown of cluster nodes when error conditions occur. The SF also advises other PRIMECLUSTER products of the successful completion of node shutdown so that recovery operations can begin.
The SF is made up of the following major components:
The Shutdown Daemon (SD)
One or more Shutdown Agents (SA)
sdtool(1M) command
Shutdown Daemon (SD)
The SD is started at system boot time and is responsible for the following:
Monitoring the state of all cluster nodes
Monitoring the state of all registered SAs
Reacting to indications of cluster node failure and eliminating the nodes forcibly
Resolving split-brain conditions
Notifying other PRIMECLUSTER products that nodes were forcibly eliminated
Checking the route that forcibly eliminates cluster nodes periodically (in 10-minute intervals)
The SD uses SAs to perform most of its work with regard to cluster node monitoring and forced node elimination. In addition to SA's, the SD interfaces with the Cluster Foundation layer's ENS system to receive node failure indications and to advertise node elimination completion.
The SD starts SA periodically (in 10-minute intervals) to check the route that forcibly eliminates cluster nodes.
The SD reflects the checked route status to the test status of each SA (Test State) displayed by the sdtool(1M) command.
Shutdown Agents (SA)
The SA's role is to attempt to shut down a remote cluster node in a manner in which the shutdown can be guaranteed. Some of the SAs are shipped with the SF product, but may differ based on the architecture of the cluster node on which SF is installed. SF allows any PRIMECLUSTER service layer product to shut down a node whether RMS is running or not.
An SA is responsible for shutting down, and verifying the shutdown of a cluster node. Each SA uses a specific method for performing the node shutdown such as:
SA_blade provides an SA for the Fujitsu Technology Solutions Blade servers.
SA_ipmi offers the shutdown agent for IPMI-based systems.
SA_lkcd provides an SA that uses the kernel panic status of other nodes.
SA_mmb provides an SA that uses the management board (MMB) on PRIMEQUEST nodes.
SA_icmp provides an SA that checks whether a node to be stopped is in the active or inactive state by using a network route.
SA_vmchkhost provides an SA of the system which uses the KVM virtual machine function.
SA_libvirtgp and SA_libvirtgr provide an SA of the system which uses the KVM virtual machine function.
See "7.2 Available SAs" for more information on SA.
If more than one SA is used, the first SA in the configuration file is used as the primary SA. SD always uses the primary SA. The other secondary SAs are used as fall back SAs only if the primary SA fails for some reason.
sdtool command
The sdtool(1M) command is the command line interface for interacting with the SD. With it the administrator can:
Start and stop the SD (although this is typically done with an RC script run at boot time)
View the current state of the SA's
Force the SD to reconfigure itself based on new contents of its configuration file
Dump the contents of the current SF configuration
Enable/disable SD debugging output
Eliminate a cluster node
Note
Although the sdtool(1M) command provides a cluster node elimination capability, the preferred method for controlled shutdown of a cluster node is the /sbin/shutdown command.