F.1.9 Cluster System Related Error

For cluster system related errors, in one of the following circumstances, take action as indicated for the relevant situation.

(1) The error message "ERROR: class: cannot operate in cluster environment, ..." is output, and the operation cannot be conducted on the class class.
(2) The PRIMECLUSTER CF clinitreset(1M) command ends abnormally outputting an error message # 6675.
(3) Cluster applications become "Inconsistent."
(4) The GFS Shared File System is not mounted on node startup.
(5) The disk space of a file system on a shared disk is "Full" (100%).

(1) The error message "ERROR: class: cannot operate in cluster environment, ..." is output, and the operation cannot be conducted on the class class.

Explanation

The local class created when cluster control facility was inactive cannot directly be used in a cluster system. When the cluster control facility is activated, the following message is output to the system log and the GDS daemon log file, and the local class becomes nonoperational.

ERROR: class: cannot operate in cluster environment, created when cluster control facility not ready

This error message will be output when:

The cluster initial configuration was executed after the local class class had been created on a node on which that configuration was incomplete.
The local class class was created in single user mode.
The single node on which the local class class was created was changed over to a cluster system.

Resolution

Make the local class available in a cluster system by method a) or method b) as follows. Generally method a) should be used, but method b) should be used in order to prevent the volume data from being backed up and restored.

Method a) Re-creating the local class in the cluster system:

1) Activate the node in single user mode.

2) Back up volume data if necessary.

3) Delete the class.

4) Re-activate the cluster control facility on the node booted in multi-user mode.

5) Re-create the class and volumes deleted in step 3).

6) Restore the volume data backed up in step 2) as needed.

Method b) Converting the local class to one for a cluster system:

Convert the local class to a class for a cluster system taking the following procedures. The following illustrates the procedures when the class name is Class1.

Note

You need to delete and re-create the classes which were created while the cluster control facility was active.

1) Confirm the classes to be converted or re-created for each node.

# sdxinfo -C
OBJ    NAME    TYPE     SCOPE       SPARE
------ ------- -------- ----------- -----
class  Class1  local    (local)     0
class  Class2  local    Node1       0
class  Class3  shared   Node1:Node2 0

Classes which should be converted
Classes whose TYPE field is "local" and SCOPE field is "(local)."
In the above example, Class 1 meets the definition.
Classes which should be re-created
Classes whose SCOPE field properly indicates a class scope (node identifier).
In the above example, Class2 and Class3 meet the definition.

2) Back up the volume data of the class to be re-created as needed.

3) Delete the class to be re-created.

4) Boot the node, on which the local class Class1 to be converted exists, in single user mode.

ok boot -s

5) Stop the GDS management daemon, sdxservd, on the node where the local class Class1 exists.

# /etc/opt/FJSVsdx/bin/sdx_stop -S
sfdsk: received shutdown request
sfdsk: volume status log updated successfully, class=0x40000004
#

Confirm that the sdxservd daemon was stopped (information on sdxservd daemon processes is not displayed) in the following manner.

# ps -e | grep sdxservd
#

6) Back up the configuration database on the node where the local class Class1 exists.

# rm -rf /var/opt/FJSVsdx/backup/DB/Class1
# /etc/opt/FJSVsdx/bin/sdxcltrandb -B -c Class1
sdxsavedb: INFO: /dev/rdsk/c0t1d0s0: backup succeeded
sdxsavedb: INFO: /dev/rdsk/c1t1d0s0: backup succeeded
sdxsavedb: INFO: Class1: backup succeeded

# cd /var/opt/FJSVsdx/backup/DB/Class1
# ls -l
-rw-r--r--   1 root     other   14164992 May  6 09:00 c0t1d0s0
-rw-r--r--   1 root     other   14164992 May  6 09:00 c1t1d0s0

Note

Verify that there is free space equal to or larger than 150 [MB] under /var/opt/FJSVsdx/backup/DB, and if it insufficient expand it.
If an error occurs, perform the following procedure instead of proceeding to the subsequent steps:
After re-creating the local class Class1 according to method a), re-create the class deleted in step 3) and restore data.

7) On the node where the local class Class1 exists, convert the configuration database for Class1 to that for a cluster system.

# /etc/opt/FJSVsdx/bin/sdxcltrandb -C -c Class1
sdxconvertdb: INFO: /dev/rdsk/c0t1d0s0: conversion succeeded
sdxconvertdb: INFO: /dev/rdsk/c1t1d0s0: conversion succeeded
sdxconvertdb: INFO: Class1: conversion succeeded

Note

If an error occurs, follow the steps from 10-3) to restore the configuration database, and re-create the local class Class1 according to method a). Re-create the class deleted in step 3) and restore data.

8) Re-activate the cluster control facility by rebooting the node, on which the local class Class1 exists, in multi-user mode.

# init 0
...
SDX:sdxshutdown: ERROR: connection timeout
...
ok boot
...
Console Login:

Note

The following messages are output during shutdown, but there are no problems.

SDX:sdxshutdown: INFO: waiting for a response from sdxservd daemon...
SDX:sdxshutdown: ERROR: connection timeout

9) On the node where the local class Class1 exists, verify that the configuration database for the local class Class1 was converted successfully.

# sdxinfo -C -c Class1
OBJ    NAME    TYPE     SCOPE       SPARE
------ ------- -------- ----------- -----
class  Class1  local    Node1       0

Confirm that the node identifier is displayed properly in the SCOPE field. If it is displayed properly, the process is finished.

Note

If the SCOPE field statement is improper, it means that the Class1 configuration database was not converted successfully. If that happens, you should restore the configuration database taking steps from 10-1) onward and re-create the local class Class1 according to method a). Re-create the class deleted in step 3) and restore data.

10) Restore the configuration database backed up in step 6) when an error occurred in step 7) or 9).

Perform the following steps on the node where the local class Class1 exists.

10-1) Activate the node in the single user mode.

ok boot -s

10-2) Stop the GDS management daemon, sdxservd.

# /etc/opt/FJSVsdx/bin/sdx_stop -S
sfdsk: received shutdown request
sfdsk: volume status log updated successfully, class=0x40000004
#

Confirm that the sdxservd daemon was stopped (information on sdxservd daemon processes is not displayed) in the following manner.

# ps -e | grep sdxservd
#

10-3) Restore the configuration database for the local class Class1.

# /etc/opt/FJSVsdx/bin/sdxcltrandb -R -c Class1
sdxrestoredb: INFO: /dev/rdsk/c0t1d0s0: restore succeeded
sdxrestoredb: INFO: /dev/rdsk/c1t1d0s0: restore succeeded
sdxrestoredb: INFO: Class1: restore succeeded

10-4) Re-activate the node in single user mode.

# init 0
...
SDX:sdxshutdown: ERROR: connection timeout
...
ok boot -s

Note

The following messages are output during shutdown, but there are no problems.

SDX:sdxshutdown: INFO: waiting for a response from sdxservd daemon...
SDX:sdxshutdown: ERROR: connection timeout

10-5) Verify that the configuration database for the local class Class1 was restored normally.

# sdxinfo -C -c Class1
OBJ    NAME    TYPE     SCOPE       SPARE
------ ------- -------- ----------- -----
class  Class1  local    Node1       0

Confirm that the node identifier is displayed properly in the SCOPE field. If it is displayed properly, the restoration is finished.

11) After converting the local class, re-create the class deleted in step 3).

Then restore the volume data backed up in step 2) as needed.

(2) The PRIMECLUSTER CF clinitreset(1M) command ends abnormally outputting an error message # 6675.

Explanation

When a class exists in a cluster system, initializing the PRIMECLUSTER resource database with the PRIMECLUSTER CF clinitreset command results in that the clinitreset command fails outputting the following error message.

FJSVcluster: ERROR: clinitreset: 6675: Cannot run this command because Global Disk Services has already been set up.

When a node containing a shadow class is rebooted because of an event such as shutdown or panic, the shadow class is deleted, but the /dev/sfdsk/Class_Name directory is not deleted. If the clinitreset command is executed here, the command also fails outputting the error message as above.

Resolution

On all nodes in the cluster system, view the configuration of objects and delete a class if any exists. If a class is deleted, volume data will be lost. If necessary, back up volume data in advance.
See
- For using GDS Management View, see "5.5 Removals."
- For using commands, see "Appendix D Command Reference."
On all nodes in the cluster system, check whether a class directory exists in the /dev/sfdsk directory, and delete a class directory if any exists. The following shows an example when a directory of class Class1 exists.
_adm and _diag are special files used by GDS and cannot be deleted.
# cd /dev/sfdsk
# ls
_adm _diag Class1
# rm -rf Class1

(3) Cluster applications become "Inconsistent."

Explanation

If a shared class is not to be used as an RMS resource, volumes included in the class are started on node startup. If a cluster application that uses those volumes are started there, the cluster application becomes "Inconsistent" because the volumes are already active. By default, classes are not to be used as RMS resources. Classes can be made available as RMS resources either by:

Registering them in resources used by cluster applications through the Web-Based Admin View's userApplication Configuration Wizard
Specifying them and using the hvgdsetup -a command

Resolution

Make the shared class available as an RMS resource with one of the following methods. After performing the procedures, restart the cluster application.

If the class is not registered in resources used by the cluster application, register it through the userApplication Configuration Wizard.

Execute the following command, if the class is registered with the resource used for the cluster application.

# /opt/SMAW/SMAWRrms/bin/hvgdsetup -a Class_Name
...
Do you want to continue with these processes ? [yes/no] yes

(4) The GFS Shared File System is not mounted on node startup.

Explanation

If a shared class is to be used as an RMS resource, volumes included in the class are not started on node startup. Therefore, the GFS Shared File System on those volumes is not mounted on node startup. By default, classes are not to be used as RMS resources, but they are made available as RMS resources either by:

Registering them in resources used by cluster applications through the Web-Based Admin View's userApplication Configuration Wizard
Specifying them and using the hvgdsetup -a command

Resolution

Take one of the following actions.

a) When using the shared class as an RMS resource, do not create the GFS Shared File System on volumes in the class, but create it on volumes in a difference class.

b) When not using the shared class as an RMS resource, make the class unavailable as an RMS resource again with one of the following methods. After performing the procedures, reboot the system.

If the class is registered in resources used by the cluster application, remove it through the userApplication Configuration Wizard.

If the class is not registered in resources used by the cluster application, execute the following command.

# /opt/SMAW/SMAWRrms/bin/hvgdsetup -d Class_Name
...
Do you want to continue with these processes ? [yes/no] yes
...
Do you need to start volumes in the specified disk class ? [yes/no] no

(5) The disk space of a file system on a shared disk is "Full" (100%).

Explanation

The disk space may become "Full" (100%) while using the switchover file system created on a shared class volume.

Resolution

The recovery procedure is shown below.

1. Check the volume

On a node which is not the target for the recovery, confirm that the volume containing the target file is stopped.

Execute the following command on a node other than the recovery target.

# sdxinfo -V -c class

Example) When the class name is "c0" and the volume name is "v0"

# sdxinfo -V -c c0
OBJ    NAME    CLASS   GROUP   SKIP JRM 1STBLK   LASTBLK  BLOCKS   STATUS
------ ------- ------- ------- ---- --- -------- -------- -------- --------
...
volume v0      c0      g0      off  on  131072   163839   32768    STOP
...

Make sure that STATUS is STOP at the line which NAME is "v0."

2. Start the volume

On the target node for the recovery, execute the following command.

# sdxvolume -N -c class -v volume

Example) When the class name is "c0" and the volume name is "v0"

# sdxvolume -N -c c0 -v v0

3. Mount the file system

On the target node for the recovery, execute the following command.

Example) When the class name is "c0", the volume name is "v0", the file system type is "ufs", and the mount point is "/mnt"

# mount -F ufs /dev/sfdsk/c0/dsk/v0 /mnt

4. Delete unnecessary files

On the target node for the recovery, delete unnecessary data under <mount_point>.

5. Unmount the file system

On the target node for the recovery, execute the following command.

Example) When the mount point is "/mnt"

# umount /mnt

6. Stop the volume

On the target node for the recovery, execute the following command.

# sdxvolume -F -c class -v volume

Example) When the class name is "c0" and the GDS volume name is "v0"

# sdxvolume -F -c c0 -v v0

7. Clear the "Faulted" state of the cluster application

Execute the following command on all nodes which compose the cluster.

# hvutil -c userApplication_name

Example) When the cluster application name is "app1"

# hvutil -c app1

8. Start the cluster application

Execute the following command on an active node.

# hvswitch userApplication_name SysNode

Example) When the cluster application of node1 is "app1"

# hvswitch app1 node1RMS

See

For the hvutil and hvswitch commands, see the hvutil(1M) and hvswitch(1M) manual pages.