Top
PRIMECLUSTER Wizard for Oracle 4.5 Configuration and Administration Guide
FUJITSU Software

A.1.2 AutoRecover or Failover

AutoRecover, userApplication failover or degeneration occurred because of an Oracle instance resource failure.

[Case 1] (Standby Operation, Scalable Operation with Oracle RAC, Single-Node Cluster Operation)

If there is insufficient space to archive, and data updating processing through monitoring SQL hangs, an oracle resource might fail.

Afterwards, the service might stop finally on both operating node and standby nodes, because the space of shared disk is insufficient and the failover of userApplication fails on the standby node.
Check the Oracle database alert log, so you might be able to detect the cause of the failure.

Execute the following procedures to backup archived redo logs and reserve enough disk space.

  1. Stop RMS on both the operating node and standby nodes.

    # hvshut -a
  2. On the operating node, mount a volume where the archive redo logs are stored.

    • zpool used

      # sdxvolume -N -c <class>
      # zpool import -d /dev/sfdsk/<class >/dsk <mountpoint>
    • zpool not used

      # sdxvolume -N -c <class> -v <volume>
      # mount -F ufs /dev/sfdsk/<class>/dsk/<volume> <mountpoint>
  3. Move to the mountpoint mounted at Step 2 and backup archived redo logs.

    # cd <mountpoint>
    # mv ./<the directory of archived redo logs>/<archived redo logs> /<the destination for backup>/.
  4. Unmount the volume mounted at Step 2.

    • zpool used

      # cd /
      # zpool export <mountpoint>
      # sdxvolume -F -c <class>
    • zpool not used

      # cd /
      # umount <mountpoint>
      # sdxvolume -F -c <class> -v <volume>
  5. Start RMS on all nodes by executing the hvcm command on any one of the nodes.

    # hvcm -a
  6. Clear the faulted state of userApplication

    # hvutil -c <userApplication>
  7. Start userApplication by executing the hvswitch command on any one of the nodes.

    # hvswitch <userApplication> <SysNode>

[Case 2] (Standby Operation, Scalable Operation with Oracle RAC, Single-Node Cluster Operation)

If monitoring timeout occurs twice in a row, a resource failure will occur. If the following error message is output to syslog, you can determine the cause of this problem:
“ERROR: 0226: Watch Timeout occurred”

Take corrective action on Oracle database.

In Oracle instance monitoring of PRIMECLUSTER Wizard for Oracle, if there is no reply from Oracle database within a specified time, monitoring timeout will be considered. At the first monitoring timeout, the resource will only enter Warning, however, if it occurs twice in a row, a resource failure will be determined.