For cluster system related errors, in one of the following circumstances, take action as indicated for the relevant situation.
Explanation
The local class created when cluster control facility was inactive cannot directly be used in a cluster system. When the cluster control facility is activated, the following message is output to the system log and the GDS daemon log file, and the local class becomes nonoperational.
ERROR: class: cannot operate in cluster environment, created when cluster control facility not ready
This error message will be output when:
The cluster initial configuration was executed after the local class class had been created on a node on which that configuration was incomplete.
The local class class was created in single user mode.
The single node on which the local class class was created was changed over to a cluster system.
Resolution
Make the local class available in a cluster system by method a) or method b) as follows. Generally method a) should be used, but method b) should be used in order to prevent the volume data from being backed up and restored.
Method a) Re-creating the local class in the cluster system:
1) Activate the node in single user mode.
2) Back up volume data if necessary.
3) Delete the class.
4) Re-activate the cluster control facility on the node booted in multi-user mode.
5) Re-create the class and volumes deleted in step 3).
6) Restore the volume data backed up in step 2) as needed.
Method b) Converting the local class to one for a cluster system:
Convert the local class to a class for a cluster system taking the following procedures. The following illustrates the procedures when the class name is Class1.
Note
You need to delete and re-create the classes which were created while the cluster control facility was active.
1) Confirm the classes to be converted or re-created for each node.
# sdxinfo -C
OBJ NAME TYPE SCOPE SPARE
------ ------- -------- ----------- -----
class Class1 local (local) 0
class Class2 local Node1 0
class Class3 shared Node1:Node2 0 |
Classes which should be converted
Classes whose TYPE field is "local" and SCOPE field is "(local)."
In the above example, Class 1 meets the definition.
Classes which should be re-created
Classes whose SCOPE field properly indicates a class scope (node identifier).
In the above example, Class2 and Class3 meet the definition.
2) Back up the volume data of the class to be re-created as needed.
3) Delete the class to be re-created.
4) Boot the node, on which the local class Class1 to be converted exists, in single user mode.
ok boot -s |
5) Stop the GDS management daemon, sdxservd, on the node where the local class Class1 exists.
# /etc/opt/FJSVsdx/bin/sdx_stop -S |
Confirm that the sdxservd daemon was stopped (information on sdxservd daemon processes is not displayed) in the following manner.
# ps -e | grep sdxservd |
6) Back up the configuration database on the node where the local class Class1 exists.
# rm -rf /var/opt/FJSVsdx/backup/DB/Class1 # cd /var/opt/FJSVsdx/backup/DB/Class1 |
Note
Verify that there is free space equal to or larger than 150 [MB] under /var/opt/FJSVsdx/backup/DB, and if it insufficient expand it.
If an error occurs, perform the following procedure instead of proceeding to the subsequent steps:
After re-creating the local class Class1 according to method a), re-create the class deleted in step 3) and restore data.
7) On the node where the local class Class1 exists, convert the configuration database for Class1 to that for a cluster system.
# /etc/opt/FJSVsdx/bin/sdxcltrandb -C -c Class1 |
Note
If an error occurs, follow the steps from 10-3) to restore the configuration database, and re-create the local class Class1 according to method a). Re-create the class deleted in step 3) and restore data.
8) Re-activate the cluster control facility by rebooting the node, on which the local class Class1 exists, in multi-user mode.
# init 0 |
Note
The following messages are output during shutdown, but there are no problems.
SDX:sdxshutdown: INFO: waiting for a response from sdxservd daemon... SDX:sdxshutdown: ERROR: connection timeout |
9) On the node where the local class Class1 exists, verify that the configuration database for the local class Class1 was converted successfully.
# sdxinfo -C -c Class1 |
Confirm that the node identifier is displayed properly in the SCOPE field. If it is displayed properly, the process is finished.
Note
If the SCOPE field statement is improper, it means that the Class1 configuration database was not converted successfully. If that happens, you should restore the configuration database taking steps from 10-1) onward and re-create the local class Class1 according to method a). Re-create the class deleted in step 3) and restore data.
10) Restore the configuration database backed up in step 6) when an error occurred in step 7) or 9).
Perform the following steps on the node where the local class Class1 exists.
10-1) Activate the node in the single user mode.
ok boot -s |
10-2) Stop the GDS management daemon, sdxservd.
# /etc/opt/FJSVsdx/bin/sdx_stop -S |
Confirm that the sdxservd daemon was stopped (information on sdxservd daemon processes is not displayed) in the following manner.
# ps -e | grep sdxservd |
10-3) Restore the configuration database for the local class Class1.
# /etc/opt/FJSVsdx/bin/sdxcltrandb -R -c Class1 |
10-4) Re-activate the node in single user mode.
# init 0 |
Note
The following messages are output during shutdown, but there are no problems.
SDX:sdxshutdown: INFO: waiting for a response from sdxservd daemon... SDX:sdxshutdown: ERROR: connection timeout |
10-5) Verify that the configuration database for the local class Class1 was restored normally.
# sdxinfo -C -c Class1 |
Confirm that the node identifier is displayed properly in the SCOPE field. If it is displayed properly, the restoration is finished.
11) After converting the local class, re-create the class deleted in step 3).
Then restore the volume data backed up in step 2) as needed.
Explanation
When a class exists in a cluster system, initializing the PRIMECLUSTER resource database with the PRIMECLUSTER CF clinitreset command results in that the clinitreset command fails outputting the following error message.
FJSVcluster: ERROR: clinitreset: 6675: Cannot run this command because Global Disk Services has already been set up.
When a node containing a shadow class is rebooted because of an event such as shutdown or panic, the shadow class is deleted, but the /dev/sfdsk/Class_Name directory is not deleted. If the clinitreset command is executed here, the command also fails outputting the error message as above.
Resolution
On all nodes in the cluster system, view the configuration of objects and delete a class if any exists. If a class is deleted, volume data will be lost. If necessary, back up volume data in advance.
See
For using GDS Management View, see "5.5 Removals."
For using commands, see "Appendix D Command Reference."
On all nodes in the cluster system, check whether a class directory exists in the /dev/sfdsk directory, and delete a class directory if any exists. The following shows an example when a directory of class Class1 exists.
_adm and _diag are special files used by GDS and cannot be deleted.
# cd /dev/sfdsk # rm -rf Class1 |
Explanation
If a shared class is not to be used as an RMS resource, volumes included in the class are started on node startup. If a cluster application that uses those volumes are started there, the cluster application becomes "Inconsistent" because the volumes are already active. By default, classes are not to be used as RMS resources. Classes can be made available as RMS resources either by:
Registering them in resources used by cluster applications through the Web-Based Admin View's userApplication Configuration Wizard
Specifying them and using the hvgdsetup -a command
Resolution
Make the shared class available as an RMS resource with one of the following methods. After performing the procedures, restart the cluster application.
If the class is not registered in resources used by the cluster application, register it through the userApplication Configuration Wizard.
Execute the following command, if the class is registered with the resource used for the cluster application.
# /opt/SMAW/SMAWRrms/bin/hvgdsetup -a Class_Name ... Do you want to continue with these processes ? [yes/no] yes |
Explanation
If a shared class is to be used as an RMS resource, volumes included in the class are not started on node startup. Therefore, the GFS Shared File System on those volumes is not mounted on node startup. By default, classes are not to be used as RMS resources, but they are made available as RMS resources either by:
Registering them in resources used by cluster applications through the Web-Based Admin View's userApplication Configuration Wizard
Specifying them and using the hvgdsetup -a command
Resolution
Take one of the following actions.
a) When using the shared class as an RMS resource, do not create the GFS Shared File System on volumes in the class, but create it on volumes in a difference class.
b) When not using the shared class as an RMS resource, make the class unavailable as an RMS resource again with one of the following methods. After performing the procedures, reboot the system.
If the class is registered in resources used by the cluster application, remove it through the userApplication Configuration Wizard.
If the class is not registered in resources used by the cluster application, execute the following command.
# /opt/SMAW/SMAWRrms/bin/hvgdsetup -d Class_Name ... Do you want to continue with these processes ? [yes/no] yes ... Do you need to start volumes in the specified disk class ? [yes/no] no |
(5) The disk space of a file system on a shared disk is "Full" (100%).
Explanation
The disk space may become "Full" (100%) while using the switchover file system created on a shared class volume.
Resolution
The recovery procedure is shown below.
1. Check the volume
On a node which is not the target for the recovery, confirm that the volume containing the target file is stopped.
Execute the following command on a node other than the recovery target.
# sdxinfo -V -c class |
Example) When the class name is "c0" and the volume name is "v0"
# sdxinfo -V -c c0
OBJ NAME CLASS GROUP SKIP JRM 1STBLK LASTBLK BLOCKS STATUS
------ ------- ------- ------- ---- --- -------- -------- -------- --------
...
volume v0 c0 g0 off on 131072 163839 32768 STOP
... |
Make sure that STATUS is STOP at the line which NAME is "v0."
2. Start the volume
On the target node for the recovery, execute the following command.
# sdxvolume -N -c class -v volume |
Example) When the class name is "c0" and the volume name is "v0"
# sdxvolume -N -c c0 -v v0 |
3. Mount the file system
On the target node for the recovery, execute the following command.
Example) When the class name is "c0", the volume name is "v0", the file system type is "ufs", and the mount point is "/mnt"
# mount -F ufs /dev/sfdsk/c0/dsk/v0 /mnt |
4. Delete unnecessary files
On the target node for the recovery, delete unnecessary data under <mount_point>.
5. Unmount the file system
On the target node for the recovery, execute the following command.
Example) When the mount point is "/mnt"
# umount /mnt |
6. Stop the volume
On the target node for the recovery, execute the following command.
# sdxvolume -F -c class -v volume |
Example) When the class name is "c0" and the GDS volume name is "v0"
# sdxvolume -F -c c0 -v v0 |
7. Clear the "Faulted" state of the cluster application
Execute the following command on all nodes which compose the cluster.
# hvutil -c userApplication_name |
Example) When the cluster application name is "app1"
# hvutil -c app1 |
8. Start the cluster application
Execute the following command on an active node.
# hvswitch userApplication_name SysNode |
Example) When the cluster application of node1 is "app1"
# hvswitch app1 node1RMS |
See
For the hvutil and hvswitch commands, see the hvutil(1M) and hvswitch(1M) manual pages.