F.1.9 Cluster System Related Error

PRIMECLUSTER Global Disk Services Configuration and Administration Guide 4.1 (Solaris(TM) Operating System)

F.1.9 Cluster System Related Error

For cluster system related errors, in one of the following circumstances, take action as indicated for the relevant situation.

The error message "ERROR: class: cannot operate in cluster environment, ..." is output, and the operation cannot be conducted on the class class.

The PRIMECLUSTER CF clinitreset(1M) command ends abnormally outputting an error message # 6675.

Cluster applications become "Inconsistent".
The GFS Shared File System is not mounted on node startup.

(1) The error message "ERROR: class: cannot operate in cluster environment, ..." is output, and the operation cannot be conducted on the class class.

[Explanation]

The local class created when cluster control facility was inactive cannot directly be used in a cluster system. When the cluster control facility is activated, the following message is output to the system log and the GDS daemon log file, and the local class becomes nonoperational.

ERROR: class: cannot operate in cluster environment, created when cluster control facility not ready

This error message will be output when:

The cluster initial configuration was executed after the local class class had been created on a node on which that configuration was incomplete.
The local class class was created in single user mode.
The single node on which the local class class was created was changed over to a cluster system.

[Resolution]

Make the local class available in a cluster system by method a) or method b) as follows. Generally method a) should be used, but method b) should be used in order to prevent the volume data from being backed up and restored.

Method a) Re-creating the local class in the cluster system:

1) Activate the node in single user mode.

2) Back up volume data if necessary.

3) Delete the class.

4) Re-activate the cluster control facility on the node booted in multi-user mode.

5) Re-create the class and volumes deleted in step 3).

6) Restore the volume data backed up in step 2) as needed.

Method b) Converting the local class to one for a cluster system:

Convert the local class to a class for a cluster system taking the following procedures. The following illustrates the procedures when the class name is Class1.

1) Activate the node in single user mode.

ok boot -s

~
INIT:SINGLE USER MODE
Type control-d to proceed with normal startup,
(or give root password for system maintenance): password

2) Stop the GDS management daemon, sdxservd.

# /etc/opt/FJSVsdx/bin/sdx_stop -S

sfdsk: received shutdown request
sfdsk: volume status log updated successfully, class=0x40000004
#

Confirm that the sdxservd daemon was stopped (information on sdxservd daemon processes is not displayed) in the following manner.

# ps -e | grep sdxservd
#

3) Back up the configuration database for the local class Class1.

# rm -rf /var/opt/FJSVsdx/backup/DB/Class1
# /etc/opt/FJSVsdx/bin/sdxcltrandb -B -c Class1

sdxsavedb: INFO: /dev/rdsk/c0t1d0s0: backup succeeded
sdxsavedb: INFO: /dev/rdsk/c1t1d0s0: backup succeeded
sdxsavedb: INFO: Class1: backup succeeded

# cd /var/opt/FJSVsdx/backup/DB/Class1
# ls -l

-rw-r--r--   1 root     other   14164992 May  6 09:00 c0t1d0s0
-rw-r--r--   1 root     other   14164992 May  6 09:00 c1t1d0s0

Verify that there is free space equal to or larger than 150 [MB] under /var/opt/FJSVsdx/backup/DB, and if it insufficient expand it.
If an error occurs, you should not go on to the following procedures but should re-create the local class Class1 according to method a).

4) Convert the configuration database for the local class Class1 to that for a cluster system.

# /etc/opt/FJSVsdx/bin/sdxcltrandb -C -c Class1

sdxconvertdb: INFO: /dev/rdsk/c0t1d0s0: conversion succeeded
sdxconvertdb: INFO: /dev/rdsk/c1t1d0s0: conversion succeeded
sdxconvertdb: INFO: Class1: conversion succeeded

If an error occurs, you should restore the configuration database taking steps from 7-3) onward and re-create the local class Class1 according to method a).

5) Re-activate the cluster control facility by making the node be in multi-user mode.

# init 0

~
SDX:sdxshutdown: ERROR: connection timeout
~

ok boot

~
Console Login:

The following messages are output during shutdown, but there are no problems.

SDX:sdxshutdown: INFO: waiting for a response from sdxservd daemon...
SDX:sdxshutdown: ERROR: connection timeout

6) Verify that the configuration database for the local class Class1 was converted successfully.

# sdxinfo -C -c Class1

OBJ    NAME    TYPE     SCOPE       SPARE
------ ------- -------- ----------- -----
class  Class1  local    Node1           0

Confirm that the node identifier is displayed properly in the SCOPE field. If it is displayed properly, the process is finished.

If the SCOPE field statement is improper, it means that the Class1 configuration database was not converted successfully. If that happens, you should restore the configuration database taking steps from 7-1) onward and re-create the local class Class1 according to method a).

7) Restore the configuration database backed up in step 3) when an error occurred in step 4) or 6).

7-1) Activate the node in the single user mode.

ok boot -s

~
INIT:SINGLE USER MODE
Type control-d to proceed with normal startup,
(or give root password for system maintenance): password

7-2) Stop the GDS management daemon, sdxservd.

# /etc/opt/FJSVsdx/bin/sdx_stop -S

sfdsk: received shutdown request
sfdsk: volume status log updated successfully, class=0x40000004
#

Confirm that the sdxservd daemon was stopped (information on sdxservd daemon processes is not displayed) in the following manner.

# ps -e | grep sdxservd
#

7-3) Restore the configuration database for the local class Class1.

# /etc/opt/FJSVsdx/bin/sdxcltrandb -R -c Class1

sdxrestoredb: INFO: /dev/rdsk/c0t1d0s0: restore succeeded
sdxrestoredb: INFO: /dev/rdsk/c1t1d0s0: restore succeeded
sdxrestoredb: INFO: Class1: restore succeeded

7-4) Re-activate the node in single user mode.

# init 0

~
SDX:sdxshutdown: ERROR: connection timeout
~

ok boot -s

~
INIT:SINGLE USER MODE
Type control-d to proceed with normal startup,
(or give root password for system maintenance): password

The following messages are output during shutdown, but there are no problems.

SDX:sdxshutdown: INFO: waiting for a response from sdxservd daemon...
SDX:sdxshutdown: ERROR: connection timeout

7-5) Verify that the configuration database for the local class Class1 was restored normally.

# sdxinfo -C -c Class1

OBJ    NAME    TYPE     SCOPE       SPARE
------ ------- -------- ----------- -----
class  Class1  local    Node1           0

Confirm that the node identifier is displayed properly in the SCOPE field. If it is displayed properly, the restoration is finished.

(2) The PRIMECLUSTER CF clinitreset(1M) command ends abnormally outputting an error message # 6675.

[Explanation]

When a class exists in a cluster system, initializing the PRIMECLUSTER resource database with the PRIMECLUSTER CF clinitreset command results in that the clinitreset command fails outputting the following error message.

FJSVcluster: ERROR: clinitreset: 6675: Cannot run this command
because Global Disk Services has already been set up.

When a node containing a shadow class is rebooted because of an event such as shutdown or panic, the shadow class is deleted, but the /dev/sfdsk/Class Name directory is not deleted. If the clinitreset command is executed here, the command also fails outputting the error message as above.

[Resolution]

On all nodes in the cluster system, view the configuration of objects and delete a class if any exists. If a class is deleted, volume data will be lost. If necessary, back up volume data in advance.
- For using GDS Management View, see "Removals."
- For using commands, see "Command Reference."
On all nodes in the cluster system, check whether a class directory exists in the /dev/sfdsk directory, and delete a class directory if any exists. The following shows an example when a directory of class Class1 exists.

_adm and _diag are special files used by GDS and cannot be deleted.

# cd /dev/sfdsk
# ls
_adm _diag Class1
# rm -rf Class1

(3) Cluster applications become "Inconsistent".

[Explanation]

If a shared class is not to be used as an RMS resource, volumes included in the class are started on node startup. If a cluster application that uses those volumes are started there, the cluster application becomes "Inconsistent" because the volumes are already active. By default, classes are not to be used as RMS resources. Classes can be made available as RMS resources either by:

Registering them in resources used by cluster applications through the Web-Based Admin View's userApplication Configuration Wizard
Specifying them and using the hvgdsetup -a command

[Resolution]

Make the shared class available as an RMS resource with one of the following methods. After performing the procedures, restart the cluster application.

If the class is not registered in resources used by the cluster application, register it through the userApplication Configuration Wizard.

Execute the following command.

# /usr/opt/reliant/bin/hvgdsetup -a Class Name
~
Do you want to continue with these processes ? y

(4) The GFS Shared File System is not mounted on node startup.

[Explanation]

If a shared class is to be used as an RMS resource, volumes included in the class are not started on node startup. Therefore, the GFS Shared File System on those volumes is not mounted on node startup. By default, classes are not to be used as RMS resources, but they are made available as RMS resources either by:

Registering them in resources used by cluster applications through the Web-Based Admin View's userApplication Configuration Wizard
Specifying them and using the hvgdsetup -a command

[Resolution]

Take one of the following actions.

a) When using the shared class as an RMS resource, do not create the GFS Shared File System on volumes in the class, but create it on volumes in a difference class.
b) When not using the shared class as an RMS resource, make the class unavailable as an RMS resource again with one of the following methods. After performing the procedures, reboot the system.

If the class is registered in resources used by the cluster application, remove it through the userApplication Configuration Wizard.
If the class is not registered in resources used by the cluster application, execute the following command.

# /usr/opt/reliant/bin/hvgdsetup -d Class Name
~
Do you want to continue with these processes ? y
~
Do you need to start volumes in the specified disk class ? n

Contents