The fault resource identification is a function that outputs a message to syslogd(8) and Cluster Admin and a history of failed resources to Resource Fault History if a failure occurs in a resource or node that is registered to a cluster application.
After setting the initial configuration of the resource database, specify the settings for enabling fault resource identification and operator intervention request. An example of a message displayed by fault resource identification is shown below.
6750 A resource failure occurred. SysNode:node1RMS userApplication:app0 Resource:apl1
The operator intervention request function displays a query-format message to the operator if a failed resource or a node in which RMS has not been started is found when a cluster application is started. The messages for operator intervention requests are displayed to syslogd(8) and Cluster Admin.
1421 userApplication "app0" was not started automatically because all SysNodes that make up userApplication were not started within the prescribed time. Forcibly start userApplication in SysNode "node1RMS"? (no/yes) Message number: 1001 Warning: When userApplication is forcibly started, the safety check becomes disabled. If the operation is used incorrectly, data may be damaged and the consistency may be lost. Check that userApplication to be forcibly started is not online in the cluster before executing the forced startup.
See
For details on the messages displayed by the fault resource identification function and the messages displayed by the operator intervention request function, see "3.2 CRM View Messages" and "4.2 Operator Intervention Messages" in the "PRIMECLUSTER Messages."
This section describes procedures for operating fault resource identification and operator intervention request.
Note
After PRIMECLUSTER is installed, fault resource identification and operator intervention request are initially disabled. The following Cluster Admin functions are also disabled:
Messages for fault resource identification and operator intervention request are not displayed to Cluster Admin.
The list of resources that are currently affected by faults is not displayed in the Resource Fault History screen of Cluster Admin.
The fault history of the resources is not displayed in the Resource Fault History screen of Cluster Admin.
To view the manual pages of each command, add "/etc/opt/FJSVcluster/man" to the MANPATH variable.
Preparation prior to displaying fault resource identification and operator intervention request messages
The fault resource identification and operator intervention request messages are displayed by using syslogd(8) / rsyslogd(8). daemon.err is specified to determine the priority (facility.level) of the fault resource identification and operator intervention request messages. For details on the priority, see the manual page describing syslog.conf(5) / rsyslogd.conf(5).
If the fault resource identification and operator intervention request messages need to be output to the console, execute the following procedure on all the nodes.
Log in the node using system administrator access privileges.
Check the setting of syslogd / rsyslogd in /etc/syslog.conf to see that daemon.err is set to be displayed on the console.
RHEL5
Check the setting of syslogd in /etc/syslog.conf to see that daemon.err is set to be displayed on the console.
(Example) Daemon.err is set to be displayed on the console.
daemon.err /dev/console
For further details on /etc/syslog.conf, see the manual pages of syslog.conf(5).
If daemon.err is not set to be displayed on the console, change the setting of syslogd in /etc/syslog.conf.
To enable this change, restart the system log daemon by executing the following command.
# /etc/init.d/syslog restart
RHEL6
Check the setting of rsyslogd in /etc/rsyslog.conf to see that daemon.err is set to be displayed on the console.
(Example) Daemon.err is set to be displayed on the console.
daemon.err /dev/console
For further details on /etc/rsyslog.conf, see the manual pages of rsyslog.conf(5).
If daemon.err is not set to be displayed on the console, change the setting of rsyslogd in /etc/rsyslog.conf.
To enable this change, restart the system log daemon by executing the following command.
# /etc/init.d/rsyslog restart
Starting the console.
If you are using the graphical environment, execute the following command to start the console. For the textual environment or the remote environment using SSH or Telnet, this step is not required.
# xterm -C
Enabling the operation of fault resource identification and operator intervention request
Execute the "clsetparam" command and specify the settings for enabling the fault resource identification and operator intervention request. Execute this procedure in any node that is part of the cluster system.
# /etc/opt/FJSVcluster/bin/clsetparam -p AppWatch ON
Execute the "clsetparam" command, and check that the parameters are set so that the operation of fault resource identification and operator intervention request is enabled. Execute this procedure on any node that is part of the cluster system.
# /etc/opt/FJSVcluster/bin/clsetparam -p AppWatch ON
Restart all the operating nodes.
If a node is stopped, the fault resource identification and operator intervention request begin operating in that node from the next node reboot.
Disabling the operation of fault resource identification and operator intervention request
To cancel the setting previously made for "Enabling the operation of fault resource identification and operator intervention request", perform the following procedure:
Execute the "clsetparam" command and specify the settings for disenabling the fault resource identification and operator intervention request. Execute this procedure in any node that is part of the cluster system.
# /etc/opt/FJSVcluster/bin/clsetparam -p AppWatch OFF
Execute the "clsetparam" command, and check that the parameters are set so that the operation of fault resource identification and operator intervention request is disenabled. Execute this procedure on any node that is part of the cluster system.
# /etc/opt/FJSVcluster/bin/clsetparam -p AppWatch OFF
Restart all the operating nodes.
If a node is stopped, the fault resource identification and operator intervention request stop operating in that node from the next node reboot.