RCI SA is the Shutdown Agent for SPARC Enterprise M-series.
Setup and configuration
Hardware setup of the RCI is performed only by qualified support personnel. Contact field engineers for more information, In addition, you can refer to the manual shipped with the unit and to any relevant PRIMECLUSTER Release Notices for more details on configuration.
Shutdown Agent
There are two kinds of RCI SAs:
SA_pprcip-panics the node through RCI.
SA_pprcir-resets the node through RCI.
The RCI log files are as follows:
/var/opt/SMAWsf/log/SA_pprcip.log /var/opt/SMAWsf/log/SA_pprcir.log
How to check the RCI Monitoring Agent when an RCI error is detected
The RCI Monitoring Agent only discontinues monitoring the node when an RCI error is detected, so the monitoring function is not disrupted on the other nodes.
For how to restore the RCI Monitoring Agent, see "4.5 Error Messages" in "PRIMECLUSTER Messages." See below for how to check the RCI monitoring status.
How to check the RCI monitoring status
Check the Shutdown Facility on all the nodes as follows:
# /opt/SMAW/bin/sdtool -s
An RCI error is detected before the Shutdown Facility is started.
If InitFailed is displayed for Init State of the Agent SA_pprcip.so and SA_pprcir.so on any one of cluster nodes, an RCI transmission failure occurred between the node and the other nodes. This node is excluded from monitoring and elimination.
For example, an RCI transmission failure occurred between nodes, where the sdtool command was executed, and the other nodes in the following:
# /opt/SMAW/bin/sdtool -s Cluster Host Agent SA State Shut State Test State Init State ------------ ------------ -------- ---------- ---------- ---------- node01 SA_pprcip.so Idle Unknown Unknown InitFailed node01 SA_pprcir.so Idle Unknown Unknown InitFailed node02 SA_pprcip.so Idle Unknown Unknown InitFailed node02 SA_pprcir.so Idle Unknown Unknown InitFailed node03 SA_pprcip.so Idle Unknown Unknown InitFailed node03 SA_pprcir.so Idle Unknown Unknown InitFailed
Refer to /var/adm/messages and take corrective action according to the error message instructions.
An RCI error is detected after the Shutdown Facility is started.
If Unknown or TestFailed is displayed for Test State of the Agent SA_pprcip.so and SA_pprcir.so on any one of the nodes, an RCI transmission failure occurred between the node and the other nodes. This node is excluded from monitoring and elimination.
For example, an RCI transmission failure occurred between node02, where the sdtool command was executed, and the other nodes in the following:
# /opt/SMAW/bin/sdtool -s Cluster Host Agent SA State Shut State Test State Init State ------------ ------------ -------- ---------- ---------- ---------- node01 SA_pprcip.so Idle Unknown TestWorked InitWorked node01 SA_pprcir.so Idle Unknown TestWorked InitWorked node02 SA_pprcip.so Idle Unknown TestFailed InitWorked node02 SA_pprcir.so Idle Unknown TestFailed InitWorked node03 SA_pprcip.so Idle Unknown TestWorked InitWorked node03 SA_pprcir.so Idle Unknown TestWorked InitWorked
Refer to /var/adm/messages and take corrective action according to the error message instructions.
Note
When RCI transmission failures are detected, the node which uses the failed transmission route is excluded from monitoring and elimination until the Shutdown Facility is restarted.
If nodes use the same RCI address, the No.7004 error message is output, and the RCI Monitoring Agent daemon is abnormally terminated.
If you turn off a node for maintenance, the No.7003 error message appears on the other nodes. Take corrective action after the node is started after maintenance.