7.5 Notes on Operation

This section describes notes when operating PRIMECLUSTER system.

Do not stop RMS while RMS is being started

Heartbeats between nodes are interrupted and the node where RMS is stopped may be forcibly shut down.

Stop RMS after completing its startup processing (completing the state transition processing of a cluster application).

Use hvshut -a to stop RMS on all the nodes simultaneously

When executing the hvshut -l command on all the nodes simultaneously, RMS will not be stopped and occasionally the timeout and hvshut command times out or hangs up.
When stopping RMS on all the nodes, execute the hvshut -a command on any one of the nodes that configures a cluster system.
When stopping RMS on each node, execute the hvshut -l command on the node which stops RMS.

If mistakenly executing the hvshut -l command on all the nodes simultaneously and the hvshut command times out, stop or restart all the nodes. In addition, if the hvshut command hangs up, stop RMS forcibly using the hvshut -f command, and then stop or restart all the nodes.

Do not stop operating system services after stopping RMS

Even if RMS is stopped using the hvshut command, other PRIMECLUSTER services (CF, SF, CRM, and so on) run.

Therefore, if you stop or restart operating system services to modify its information (such as network information), heartbeat monitoring by CF fails and unexpected switchover will be occurred.

When modifying operating system information, be sure to do it after stopping all PRIMECLUSTER services (unloading CF) or in a single-user mode.

Create cluster applications used in RMS before starting RMS

If starting RMS without creating cluster applications, an error message (CML,14) will be output and RMS will not start.

The overview and the methods for creating cluster applications, "Chapter 6 Building Cluster Applications."

If operating systems hang up or slow down on a node in a cluster, a healthy node may be forcibly stopped.

If operating systems hang up or slow down on a node in a cluster due to system load, and so on, CF or RMS detects LEFTCLUSTER and stop the Shutdown Facility stops the node forcibly.

The Shutdown Facility forcibly stops a node according to the survival priority. Therefore, when the hang-up and slowdown of operating systems on the failed node are recovered before a healthy node forcibly stops the failed node, the healthy node may be forcibly stopped first.

When a system volume on a disk device cannot be referred to because all paths failed in a SAN boot /iSCSI boot configuration, the PRIMECLUSTER failure detection function cannot be operated depending on the status of the system.

Because the node which cannot refer to the system volume is unstable, set the node to panic status with the following method.

When you can log in cluster nodes other than the relevant node

Stop the relevant node using the sdtool command.

# sdtool -k <the relevant node>

When you cannot log in on any of the nodes

Set the node to panic status manually with one of the following methods.

Press <Alt> + <SysRq> + <C> on the system console.
Press the NMI button.

For details, see "Linux user guide."

When you start cluster applications manually or confirm the message of a resource failure, check whether a resource with the "MONITORONLY" attribute has been in the fault state.

If you start or switch over cluster applications before the failure of the resource with the "MONITORONLY" attribute is solved, cluster inconsistencies or data corruption may occur.

When you set Firewall and use the state module in Firewall, do not restart the iptables service or the ip6tables service during PRIMECLUSTER operation.

When using the state module in Firewall, restarting the iptables service or the ip6tables service triggers initializing information of the communication status, and subsequent communication may not work correctly. Neither applications nor PRIMECLUSTER can work correctly, when you change the setting of Firewall, perform one of the following operations:

Restarting the cluster node
Reflecting the change by iptables-restore or ip6tables-restore

The following error messages may be output to the console and syslog during system startup in RHEL6 environment

The following messages may be output to the console and syslog during system startup in RHEL6 environment. This does not disrupt ongoing operation.

kernel: Disabling lock debugging due to kernel taint
kernel: clonltrc: module license 'Proprietary' taints kernel.
kernel: symsrv: module license 'Proprietary' taints kernel.
kernel: symsrv: applying 16k kernel stack fix up
kernel: cf: module license 'Proprietary' taints kernel.
kernel: cf: applying 16k kernel stack fix up
kernel: sha: module license 'Proprietary' taints kernel.

The following error messages may be output to the console and Syslog during system startup in RHEL7 environment

The following messages may be output to the console and Syslog during system startup in RHEL7 environment. This does not disrupt ongoing operation.

kernel: Request for unknown module key 'FUJITSU Software: Fujitsu BIOS DB FJMW Certificate: Hexadecimal, forty-digit' err -11
kernel: Disabling lock debugging due to kernel taint
kernel: clonltrc: module license 'Proprietary' taints kernel.
kernel: clonltrc: module verification failed: signature and/or required key missing - tainting kernel
kernel: sfdsk_lib: module verification failed: signature and/or required key missing - tainting kernel
kernel: sha: module license 'Proprietary' taints kernel.
kernel: sha: module verification failed: signature and/or required key missing - tainting kernel
kernel: symsrv: module license 'Proprietary' taints kernel.
kernel: symsrv: applying kernel_stack fix up
kernel: symsrv: module verification failed: signature and/or required key missing - tainting kernel
kernel: cf: applying kernel_stack fix up
kernel: poffinhibit_ipdv: module verification failed: signature and/or required key missing - tainting kernel