Oracle instance Monitoring
Monitoring procedure of an Oracle instance is as follows:
Check the Oracle background process (PMON, SMON) every 30 seconds (static).
If the process status can be confirmed after Oracle instance gets activated, go to step "2".
su - <Oracle user>
Local connection to the Oracle instance as the SYSTEM user.
If the state of the database which is OPEN can be confirmed, go to step "5".
Check if the Oracle background processes (PMON, SMON, DBWn, LGWR, CKPT) are alive.
The monitoring interval can be changed at the setting of "Interval" and its default value is 30 seconds.
Check if SQL(INSERT, UPDATE, DELETE and COMMIT) can be properly executed using the monitoring table on the SYSTEM user's default table space.
The monitoring with SQL is executed in accordance with the setting of "Interval". The elapsed time from the last monitoring is checked. Only when 60 seconds or more pass, the monitoring with SQL is executed.
Oracle instance is reconnected once every 24 hours.
In the standby node, step 1 is executed to confirm that the Oracle background processes(PMON, SMON) do not exist.
SYSTEM user password
PRIMECLUSTER Wizard for Oracle monitors Oracle instance as the SYSTEM user. Set the SYSTEM user's password. Refer to "4.3 clorapass - Register Password for Monitoring".
Monitoring table (FAILSAFEORACLE_<ORACLE_SID>)
PRIMECLUSTER Wizard for Oracle creates a monitoring table on the SYSTEM user's default table space if the monitoring table does not exist. The table will not be deleted.
Warning notification
If the following symptoms are detected, PRIMECLUSTER Wizard for Oracle will notify RMS of the warning state. It is not the Fault state, so a failover will not occur.
Oracle instance cannot be connected due to incorrect SYSTEM user's password that is registered with the "clorapass" command (ORA-01017 detected)
Since the SYSTEM user account is locked, so Oracle instance connection is not allowed (ORA-28000 detected)
Since the SYSTEM user's password has expired, so Oracle instance connection is not allowed (ORA-28001 detected)
When the max session or max process error occurs, so Oracle instance connection is not allowed (ORA-00018 or ORA-00020 detected)
When the monitoring timeout occurs due to getting no reply from SQL for a certain period of time.
If the monitoring timeout occurs, SQL is executed again. If a reply from SQL is received, the Online state is notified.
Oracle database errors that causes failover
If the Oracle database errors are detected, PRIMECLUSTER Wizard for Oracle will notify RMS of the Offline state. Then the Oracle instance resources become the resource failure state and a failover will occur.
If the AutoRecover(A) flags of the Oracle instance resources are selected, the Oracle instances will be restart before failover when the Oracle instance resource failure occurs. For details about AutoRecover(A), refer to "2.2.7.1 Oracle Resource Creation and Registration".
In the following case, the Offline state is notified to RMS:
The background processes (PMON, SMON, DBWn, LGWR and CKPT) do not exist.
Example
For example, the following cases correspond:
Oracle instance terminates abnormally.
Oracle instance is stopped without stopping the monitoring.
Oracle database errors (ORA-xxxxx) are returned after executing SQL.
Oracle database errors (ORA-xxxxx) detected during monitoring will be handled in accordance with the action definition file(/opt/FJSVclora/etc/FJSVclorafm.actionlist).
If the Oracle database errors defined as Of in the action definition file are detected, the Offline state is notified.
See "Appendix F (Information) Action Definition File".
Example
For example, the following cases correspond:
ORA-04031 (out of memory in the shared pool) occurs.
The monitoring timeout occurs twice in a row after executing SQL.
If the reply from SQL does not return for 300 seconds (default), the monitoring timeout occurs and the Oracle resource will be the Warning state. Then PRIMECLUSTER Wizard for Oracle reconnects to the Oracle instance. If the reply does not return for 300 seconds during reconnection, the Offline state is notified.
The monitoring timeout can be changed at the setting of "WatchTimeout" and its default value is 300 seconds.
Example
For example, the following cases correspond:
Oracle Database hangs up because archive logs run out of space.
The system load is too high.
Note
Failover occurs according to the setting of AutoSwitchOver of userApplication(cluster application).
If AutoSwitchOver=ResourceFailure(at resource failure) is selected, a userApplication will failover when a resource failure occurs.
For details about the settings of userApplication(cluster application), refer to the "PRIMECLUSTER Installation and Administration Guide".
Listener Monitoring
Monitoring procedure of a Listener is as follows:
Makes sure that a Listener process is alive by using ps command.
Makes sure that the net service name is valid by using "tnsping" command.
The monitoring with tnsping is executed in accordance with the setting of "Interval". The elapsed time from the last tnsping is checked. Only when 60 seconds or more pass, the monitoring with tnsping is executed.
Note
When TNSName is set, tnsping is executed. For details about TNSName, refer to "2.2.7.1 Oracle Resource Creation and Registration".
In the standby node, step 1 is executed to confirm that the Listener processes do not exist.
Monitoring timeout
If there is no reply from tnsping command after a certain period of time, the monitoring timeout will be considered then the Oracle Listener resource will be put into Warning. If the monitoring timeout occurs twice in a row, a resource will be considered as fault then a failover will be performed.
The monitoring timeout (the wait time from Oracle Listener) can be changed with WatchTimeout.
Failover
If the Oracle listener errors are detected, PRIMECLUSTER Wizard for Oracle will notify RMS of the Offline state. Then the Oracle listener resources become the resource failure state and a failover will occur.
If the AutoRecover(A) flags of the Oracle listener resources are selected, the Oracle listener will be restart before failover when the Oracle listener resource failure occurs. For details about AutoRecover(A), refer to "2.2.7.1 Oracle Resource Creation and Registration".
In the following case, the Offline state is notified to RMS:
The listener process does not exist.
The tnsping command fails.
The monitoring timeout occurs twice in a row.
Note
Failover occurs according to the setting of AutoSwitchOver of userApplication(cluster application).
If AutoSwitchOver=ResourceFailure(at resource failure) is selected, a userApplication will failover when a resource failure occurs.
For details about the settings of userApplication(cluster application), refer to the "PRIMECLUSTER Installation and Administration Guide".
Oracle ASM instance Monitoring
Oracle ASM is not monitored. NullDetector flag is automatically enabled.