Oracle instance Monitoring
Monitoring procedure of an Oracle instance is as follows:
Check the Oracle background processes (PMON, SMON) every 30 seconds (static).
If the processes status can be confirmed after Oracle instance gets activated, go to the step "2".
su - <Oracle user>
Local connection to the Oracle instance as the SYSTEM user.
If the state of the database which is OPEN can be confirmed, go to the step "5".
Check if the Oracle background processes (PMON, SMON, DBWn, LGWR, CKPT) are alive.
The monitoring interval can be changed at the setting of "Interval" and its default value is 30 seconds.
Check if SQL(INSERT, UPDATE, DELETE and COMMIT) can be properly executed using the monitoring table on the SYSTEM user's default table space.
The monitoring with SQL is executed in accordance with the setting of "Interval". The elapsed time from the last monitoring is checked. Only when 60 seconds or more pass, the monitoring with SQL is executed.
Monitoring PDB
When UsePDB of Oracle instance resource is set to yes, the monitoring of PDB is executed in accordance with the setting of "Interval".
OPEN_MODE of each PDBs is checked by the V$PDBS table.
In the Oracle Data Guard/Oracle Active Data Guard environment, when the CDB started to the OPEN state, which is able to start the PDBs, this step is executed. For details, see "Starting and Stopping CDB and PDB" in "F.1 Feature Outline".
Oracle instance is reconnected once every 24 hours.
In the standby node, step 1 is executed to confirm that the Oracle background processes (PMON, SMON) do not exist.
SYSTEM user password
PRIMECLUSTER Wizard for Oracle monitors Oracle instance as the SYSTEM user. Set the SYSTEM user's password. Refer to "4.3 clorapass - Register Password for Monitoring".
Monitoring table (FAILSAFEORACLE_<ORACLE_SID>)
PRIMECLUSTER Wizard for Oracle creates a monitoring table on the SYSTEM user's default table space if the monitoring table does not exist. The table will not be deleted.
Warning notification
If the following symptoms are detected, PRIMECLUSTER Wizard for Oracle will notify RMS of the warning state. It is not the Fault state, so a failover will not occur.
Oracle instance cannot be connected due to incorrect SYSTEM user's password that is registered with the "clorapass" command (ORA-01017 detected)
Since the SYSTEM user account is locked, so Oracle instance connection is not allowed (ORA-28000 detected)
Since the SYSTEM user's password has expired, so Oracle instance connection is not allowed (ORA-28001 detected)
When the max session or max process error occurs, so Oracle instance connection is not allowed (ORA-00018 or ORA-00020 detected)
When the monitoring timeout occurs due to getting no reply from SQL for a certain period of time.
If the monitoring timeout occurs, SQL is executed again. If a reply from SQL is received, the Online state is notified.
Oracle database errors that causes failover
If the Oracle database errors are detected, PRIMECLUSTER Wizard for Oracle will notify RMS of the Offline state. Then the Oracle instance resources become the resource failure state and a failover will occur.
If the AutoRecover(A) flags of the Oracle instance resources are selected, the Oracle instances will be restart before failover when the Oracle instance resource failure occurs. For details about AutoRecover(A), refer to "2.2.7.1 Oracle Resource Creation and Registration".
In the following case, the Offline state is notified to RMS:
The background processes (PMON, SMON, DBWn, LGWR and CKPT) do not exist.
Example
For example, the following cases correspond:
Oracle instance terminates abnormally.
Oracle instance is stopped without stopping the monitoring.
Oracle database errors (ORA-xxxxx) are returned after executing SQL.
Oracle database errors (ORA-xxxxx) detected during monitoring will be handled in accordance with the action definition file (/opt/FJSVclora/etc/FJSVclorafm.actionlist).
If the Oracle database errors defined as Of in the action definition file are detected, the Offline state is notified.
See "Appendix G (Information) Action Definition File".
Example
For example, the following cases correspond:
ORA-04031 (out of memory in the shared pool) occurs.
The monitoring timeout occurs twice in a row after executing SQL.
If the reply from SQL does not return for 300 seconds (default), the monitoring timeout occurs and the Oracle resource will be the Warning state. Then PRIMECLUSTER Wizard for Oracle reconnects to the Oracle instance. If the reply does not return for 300 seconds during reconnection, the Offline state is notified.
The monitoring timeout can be changed at the setting of "WatchTimeout" and its default value is 300 seconds.
Example
For example, the following cases correspond:
Oracle Database hangs up because archive logs run out of space.
The system load is too high.
Monitoring PDB
OPEN_MODE of each PDBs is checked by the V$PDBS table. If OPEN_MODE is "READ WRITE", it is judged that the PDB is normal. If OPEN_MODE is not "READ WRITE", it is judged that the PDB is abnormal. The monitoring of PDB is executed in accordance with the setting of "Interval". When the state changes when monitoring it last time, the message is output to syslog. The restart and the failover due to fault of PDBs is not executed.
When the state of PDB becomes normal, the following messages are output.
FSP_PCLW-ORACLE_FJSVclora: INFO: 9142: OPEN_MODE of PDB <PDB name> was OPEN. (CDB=<ORACLE_SID of CDB> PDB=<PDB name> OPEN_MODE=<state of PDB>)
When the state of PDB becomes abnormal, the following messages are output.
FSP_PCLW-ORACLE_FJSVclora: ERROR: 9242: clorapdbmon detected OPEN_MODE of PDB <PDB name> is invalid. (CDB=<ORACLE_SID of CDB> PDB=<PDB name> OPEN_MODE=<state of PDB>)
For details about the monitoring PDB in the Oracle Data Guard/Oracle Active Data Guard environment, see "F.1 Feature Outline".
Note
Failover occurs according to the setting of AutoSwitchOver of userApplication (cluster application).
If AutoSwitchOver=ResourceFailure (at resource failure) is selected, a userApplication will failover when a resource failure occurs.
For details about the settings of userApplication (cluster application), refer to the "PRIMECLUSTER Installation and Administration Guide".
Listener Monitoring
Monitoring procedure of a Listener is as follows:
Makes sure that a Listener process is alive by using ps command.
Makes sure that the net service name is valid by using "tnsping" command.
The monitoring with tnsping is executed in accordance with the setting of "Interval". The elapsed time from the last tnsping is checked. Only when 60 seconds or more pass, the monitoring with tnsping is executed.
Note
When TNSName is set, tnsping is executed. For details about TNSName, refer to "2.2.7.1 Oracle Resource Creation and Registration".
In the standby node, step 1 is executed to confirm that the Listener processes do not exist.
Monitoring timeout
If there is no reply from tnsping command after a certain period of time, the monitoring timeout will be considered then the Oracle Listener resource will be put into Warning. If the monitoring timeout occurs twice in a row, a resource will be considered as fault then a failover will be performed.
The monitoring timeout (the wait time from Oracle Listener) can be changed with WatchTimeout.
Failover
If the Oracle listener errors are detected, PRIMECLUSTER Wizard for Oracle will notify RMS of the Offline state. Then the Oracle listener resources become the resource failure state and a failover will occur.
If the AutoRecover(A) flags of the Oracle listener resources are selected, the Oracle listener will be restart before failover when the Oracle listener resource failure occurs. For details about AutoRecover(A), refer to "2.2.7.1 Oracle Resource Creation and Registration".
In the following case, the Offline state is notified to RMS:
The listener process does not exist.
The tnsping command fails.
The monitoring timeout occurs twice in a row.
Note
Failover occurs according to the setting of AutoSwitchOver of userApplication (cluster application).
If AutoSwitchOver=ResourceFailure (at resource failure) is selected, a userApplication will failover when a resource failure occurs.
For details about the settings of userApplication (cluster application), refer to the "PRIMECLUSTER Installation and Administration Guide".
Oracle ASM instance Monitoring
Oracle ASM is not monitored. NullDetector flag is automatically enabled.