PRIMECLUSTER Global Disk Services Configuration and Administration Guide 4.1 (Solaris(TM) Operating System)
Contents PreviousNext

Appendix F Troubleshooting> F.1 Resolving Problems

F.1.9 Cluster System Related Error

For cluster system related errors, in one of the following circumstances, take action as indicated for the relevant situation.


 

(1) The error message "ERROR: class: cannot operate in cluster environment, ..." is output, and the operation cannot be conducted on the class class.

 

[Explanation]

The local class created when cluster control facility was inactive cannot directly be used in a cluster system. When the cluster control facility is activated, the following message is output to the system log and the GDS daemon log file, and the local class becomes nonoperational.


ERROR: class: cannot operate in cluster environment, created when cluster control facility not ready


This error message will be output when:

[Resolution]

Make the local class available in a cluster system by method a) or method b) as follows. Generally method a) should be used, but method b) should be used in order to prevent the volume data from being backed up and restored.

Method a) Re-creating the local class in the cluster system:

1) Activate the node in single user mode.

2) Back up volume data if necessary.

3) Delete the class.

4) Re-activate the cluster control facility on the node booted in multi-user mode.

5) Re-create the class and volumes deleted in step 3).

6) Restore the volume data backed up in step 2) as needed.


Method b) Converting the local class to one for a cluster system:

Convert the local class to a class for a cluster system taking the following procedures. The following illustrates the procedures when the class name is Class1.

1) Activate the node in single user mode.

ok boot -s

~
INIT:SINGLE USER MODE
Type control-d to proceed with normal startup,
(or give root password for system maintenance): password

2) Stop the GDS management daemon, sdxservd.

# /etc/opt/FJSVsdx/bin/sdx_stop -S

sfdsk: received shutdown request
sfdsk: volume status log updated successfully, class=0x40000004
#

Confirm that the sdxservd daemon was stopped (information on sdxservd daemon processes is not displayed) in the following manner.

# ps -e | grep sdxservd
#

3) Back up the configuration database for the local class Class1.

# rm -rf /var/opt/FJSVsdx/backup/DB/Class1
# /etc/opt/FJSVsdx/bin/sdxcltrandb -B -c Class1

sdxsavedb: INFO: /dev/rdsk/c0t1d0s0: backup succeeded
sdxsavedb: INFO: /dev/rdsk/c1t1d0s0: backup succeeded
sdxsavedb: INFO: Class1: backup succeeded

# cd /var/opt/FJSVsdx/backup/DB/Class1
# ls -l

-rw-r--r--   1 root     other   14164992 May  6 09:00 c0t1d0s0
-rw-r--r--   1 root     other   14164992 May  6 09:00 c1t1d0s0

4) Convert the configuration database for the local class Class1 to that for a cluster system.

# /etc/opt/FJSVsdx/bin/sdxcltrandb -C -c Class1

sdxconvertdb: INFO: /dev/rdsk/c0t1d0s0: conversion succeeded
sdxconvertdb: INFO: /dev/rdsk/c1t1d0s0: conversion succeeded
sdxconvertdb: INFO: Class1: conversion succeeded


If an error occurs, you should restore the configuration database taking steps from 7-3) onward and re-create the local class Class1 according to method a).

5) Re-activate the cluster control facility by making the node be in multi-user mode.

# init 0

~
SDX:sdxshutdown: ERROR: connection timeout
~

ok boot

~
Console Login:


The following messages are output during shutdown, but there are no problems.

SDX:sdxshutdown: INFO: waiting for a response from sdxservd daemon...
SDX:sdxshutdown: ERROR: connection timeout


6) Verify that the configuration database for the local class Class1 was converted successfully.

# sdxinfo -C -c Class1

OBJ    NAME    TYPE     SCOPE       SPARE
------ ------- -------- ----------- -----
class  Class1  local    Node1           0


Confirm that the node identifier is displayed properly in the SCOPE field. If it is displayed properly, the process is finished.


If the SCOPE field statement is improper, it means that the Class1 configuration database was not converted successfully. If that happens, you should restore the configuration database taking steps from 7-1) onward and re-create the local class Class1 according to method a).

7) Restore the configuration database backed up in step 3) when an error occurred in step 4) or 6).

7-1) Activate the node in the single user mode.

ok boot -s

~
INIT:SINGLE USER MODE
Type control-d to proceed with normal startup,
(or give root password for system maintenance): password

7-2) Stop the GDS management daemon, sdxservd.

# /etc/opt/FJSVsdx/bin/sdx_stop -S

sfdsk: received shutdown request
sfdsk: volume status log updated successfully, class=0x40000004
#


Confirm that the sdxservd daemon was stopped (information on sdxservd daemon processes is not displayed) in the following manner.

# ps -e | grep sdxservd
#

7-3) Restore the configuration database for the local class Class1.

# /etc/opt/FJSVsdx/bin/sdxcltrandb -R -c Class1

sdxrestoredb: INFO: /dev/rdsk/c0t1d0s0: restore succeeded
sdxrestoredb: INFO: /dev/rdsk/c1t1d0s0: restore succeeded
sdxrestoredb: INFO: Class1: restore succeeded

7-4) Re-activate the node in single user mode.

# init 0

~
SDX:sdxshutdown: ERROR: connection timeout
~

ok boot -s

~
INIT:SINGLE USER MODE
Type control-d to proceed with normal startup,
(or give root password for system maintenance): password


The following messages are output during shutdown, but there are no problems.

SDX:sdxshutdown: INFO: waiting for a response from sdxservd daemon...
SDX:sdxshutdown: ERROR: connection timeout


7-5) Verify that the configuration database for the local class Class1 was restored normally.

# sdxinfo -C -c Class1

OBJ    NAME    TYPE     SCOPE       SPARE
------ ------- -------- ----------- -----
class  Class1  local    Node1           0


Confirm that the node identifier is displayed properly in the SCOPE field. If it is displayed properly, the restoration is finished.

 


 

(2) The PRIMECLUSTER CF clinitreset(1M) command ends abnormally outputting an error message # 6675.

 

[Explanation]

When a class exists in a cluster system, initializing the PRIMECLUSTER resource database with the PRIMECLUSTER CF clinitreset command results in that the clinitreset command fails outputting the following error message.

FJSVcluster: ERROR: clinitreset: 6675: Cannot run this command
because Global Disk Services has already been set up.

When a node containing a shadow class is rebooted because of an event such as shutdown or panic, the shadow class is deleted, but the /dev/sfdsk/Class Name directory is not deleted. If the clinitreset command is executed here, the command also fails outputting the error message as above.

 

[Resolution]

  1. On all nodes in the cluster system, view the configuration of objects and delete a class if any exists. If a class is deleted, volume data will be lost. If necessary, back up volume data in advance.


  2. On all nodes in the cluster system, check whether a class directory exists in the /dev/sfdsk directory, and delete a class directory if any exists. The following shows an example when a directory of class Class1 exists.

    _adm and _diag are special files used by GDS and cannot be deleted.

    # cd /dev/sfdsk
    # ls
    _adm _diag Class1
    # rm -rf Class1

     


 

(3) Cluster applications become "Inconsistent".

 

[Explanation]

If a shared class is not to be used as an RMS resource, volumes included in the class are started on node startup. If a cluster application that uses those volumes are started there, the cluster application becomes "Inconsistent" because the volumes are already active. By default, classes are not to be used as RMS resources. Classes can be made available as RMS resources either by:

 

[Resolution]

Make the shared class available as an RMS resource with one of the following methods. After performing the procedures, restart the cluster application.


 

(4) The GFS Shared File System is not mounted on node startup.

 

[Explanation]

If a shared class is to be used as an RMS resource, volumes included in the class are not started on node startup. Therefore, the GFS Shared File System on those volumes is not mounted on node startup. By default, classes are not to be used as RMS resources, but they are made available as RMS resources either by:

 

[Resolution]

Take one of the following actions.

a) When using the shared class as an RMS resource, do not create the GFS Shared File System on volumes in the class, but create it on volumes in a difference class.
b) When not using the shared class as an RMS resource, make the class unavailable as an RMS resource again with one of the following methods. After performing the procedures, reboot the system.



Contents PreviousNext

All Rights Reserved, Copyright(C) FUJITSU LIMITED 2005