Top
PRIMECLUSTER Global Disk Services  Configuration and AdministrationGuide 4.7

D.1.10 Cluster System Related Error

For cluster system related errors, in one of the following circumstances, take the actions as indicated for the relevant situation.

(1) The error message "ERROR: class: cannot operate in cluster environment, ..." is output, and the operation cannot be conducted on the class class.

Explanation

The local class created when cluster control facility was inactive cannot directly be used in a cluster system. When the cluster control facility is activated, the following message is output to the system log and the GDS daemon log file, and the local class becomes nonoperational.

ERROR: class: cannot operate in cluster environment, created when cluster control facility not ready

This error message will be output when:

Resolution

Re-create the local class in the cluster system to use according to the following steps.


1) In the CF main window of Cluster Admin, execute [Stop CF] in the [Tools] menu to stop CF.

2) Back up volume data if necessary.

3) Delete the class.

4) In the CF main window of Cluster Admin, execute [Load driver] to start CF.

5) Re-create the class and volumes deleted in step 3).

6) Restore the volume data backed up in step 2) as needed.


See

For information on how to operate the CF main window of Cluster Admin, see "PRIMECLUSTER Cluster Foundation (CF) Configuration and Administration Guide."


Explanation

When a class exists in a cluster system, initializing the PRIMECLUSTER resource database with the PRIMECLUSTER CF clinitreset command results in that the clinitreset command fails outputting the following error message.

FJSVcluster: ERROR: clinitreset: 6675: Cannot run this command because Global Disk Services has already been set up.

When a node containing a shadow class is rebooted because of an event such as shutdown and panic, the shadow class is deleted, but the /dev/sfdsk/Class_Name directory is not deleted. If the clinitreset command is executed here, the command also fails outputting the error message as above.

Resolution

  1. On all nodes in the cluster system, view the configuration of objects and delete a class if any exists. If a class is deleted, volume data will be lost. If necessary, back up volume data in advance.

    See

  2. On all nodes in the cluster system, check whether a class directory exists in the /dev/sfdsk directory, and delete a class directory if any exists. The following shows an example when a directory of class Class1 exists.
    _adm and _diag are special files used by GDS and cannot be deleted.

    # cd /dev/sfdsk
    # ls
    _adm _diag Class1
    # rm -rf Class1

Explanation

If a shared class is not to be used as an RMS resource, volumes included in the class are started on node startup. If a cluster application that uses those volumes are started there, the cluster application becomes "Inconsistent" because the volumes are already active. By default, classes are not to be used as RMS resources. Classes can be made available as RMS resources by:

Resolution

Make the shared class available as an RMS resource with one of the following methods. After performing the procedures, restart the cluster application.

Explanation

If a shared class is to be used as an RMS resource, volumes included in the class are not started on node startup. Therefore, the GFS Shared File System on those volumes is not mounted on node startup. By default, classes are not to be used as RMS resources, but they are made available as RMS resources by:

Resolution

Take one of the following actions.

a) When using the shared class as an RMS resource, do not create the GFS Shared File System on volumes in the class, but create it on volumes in a difference class.

b) When not using the shared class as an RMS resource, make the class unavailable as an RMS resource again with one of the following methods. After performing the procedures, reboot the system.

(5) The error message "ERROR: class: cannot operate shared objects, ..." is output, and the shared class class cannot be created; or the error message "ERROR: cluster communication failure" is output, and automatic registration of resources is failed.

Explanation

This phenomenon occurs when the package of GDS is installed before installing other products like PRIMECLUSTER HA Server or PRIMECLUSTER Enterprise Edition and setting up a cluster system.

To determine whether or not your case applies to this case, check the installation dates and times of the FJSVsdx-bas and FJSVclapi packages. If the installation date and time of the FJSVsdx-bas package is earlier than that of the FJSVclapi package, this is the cause of the current trouble.

Example

Results of checking installation dates and times

  • Installation date and time of the FJSVsdx-bas package

    	# rpm -qi FJSVsdx-bas
    Name        : FJSVsdx-bas                  Relocations: (not relocatable)
    ...
    Install Date: Tue Jan 26 15:27:22 2010         Build Host: xxxxxxxx
    ...
  • Installation date and time of the FJSVclapi package

    # rpm -qi FJSVclapi
    Name        : FJSVclapi                    Relocations: (not relocatable)
    ...
    Install Date: Wed Jan 27 19:08:13 2010         Build Host: xxxxxxxx
    ...

In this case, the installation date and time of the FJSVsdx-bas package is earlier than that of the FJSVclapi package. From this you can conclude that the package of GDS has already installed before installing PRIMECLUSTER HA Server, PRIMECLUSTER Enterprise Edition or other products.

Resolution

Install the FJSVsdx-bas package again, overwriting the existing one.

  1. Insert CD2, used for installing PRIMECLUSTER, into the CD-ROM drive and mount it. In the following, the CD mount point is defined as <CDROM_DIR>.

    # mount /media/cdrom
  2. Install the FJSVsdx-bas package again, overwriting the existing one.

    # cd <CDROM_DIR>/Linux/pkgs
    # rpm -Uvh -force <package name>

    <Package name> corresponds to each distribution. Specify the name according to the correspondence chart shown below.

    Distribution

    Package name

    RHEL8(Intel64)

    FJSVsdx-bas.rhel8_x86_64.rpm

    RHEL9(Intel64)

    FJSVsdx-bas.rhel9_x86_64.rpm

  3. Reboot the system.

    # /sbin/shutdown -r now

(6) The disk space of a file system on a shared disk is "Full" (100%).

Explanation

The disk space may become "Full" (100%) while using the switchover file system created on a shared class volume.

Resolution

The recovery procedure is shown below.

  1. Check the volume

    On a node which is not the target for the recovery, confirm that the volume containing the target file is stopped.

    Execute the following command on a node other than the recovery target.

    # sdxinfo -V -c class

    Example) When the class name is "c0" and the volume name is "v0"

    # sdxinfo -V -c c0
    OBJ    NAME    CLASS   GROUP   SKIP JRM 1STBLK   LASTBLK  BLOCKS   STATUS
    ------ ------- ------- ------- ---- --- -------- -------- -------- --------
    ...
    volume v0      c0      g0      off  on  131072   163839   32768    STOP
    ...

    Make sure that STATUS is STOP at the line which NAME is "v0."

  2. Start the volume

    On the target node for the recovery, execute the following command.

    # sdxvolume -N -c class -v volume

    Example) When the class name is "c0" and the volume name is "v0"

    # sdxvolume -N -c c0 -v v0
  3. Mount the file system

    On the target node for the recovery, execute the following command.

    Example) When the class name is "c0", the volume name is "v0", the file system type is "ext4", and the mount point is "/mnt"

    # mount -t ext4 /dev/sfdsk/c0/dsk/v0 /mnt
  4. Delete unnecessary files

    On the target node for the recovery, delete unnecessary data under <mount_point>.

  5. Unmount the file system

    On the target node for the recovery, execute the following command.

    Example) When the mount point is "/mnt"

    # umount /mnt
  6. Stop the volume

    On the target node for the recovery, execute the following command.

    # sdxvolume -F -c class -v volume

    Example) When the class name is "c0" and the GDS volume name is "v0"

    # sdxvolume -F -c c0 -v v0
  7. Clear the "Faulted" state of the cluster application

    Execute the following command on all nodes which compose the cluster.

    # hvutil -c userApplication_name

    Example) When the cluster application name is "app1"

    # hvutil -c app1
  8. Start the cluster application

    Execute the following command on an active node.

    # hvswitch userApplication_name SysNode

    Example) When the cluster application of node 1 is "app1"

    # hvswitch app1 node1RMS

See

For the hvutil and hvswitch commands, see the hvutil(1M) and hvswitch(1M) manual pages.