7.5 Examples of Threshold Monitoring

This section provides an overview of threshold monitoring in the form of key examples to enable users to determine what situations require thresholds and the types of thresholds that should be set for them.

Case 1: Online application system at company A

Material - System operation standard and performance requirements (excerpt)

Online application service hours: 8:00 to 18:00 everyday
Online application busy hours: 12:00 to 15:00 everyday
This system requires that operator terminal operation be stress-free even with workload during the busy hours.
Therefore, the target performance of I/O response shall be "30 ms or less," which is a general standard.
The target I/O response performance in hours other than the busy hours shall be "10 ms or less," one third of 30 ms, according to the workload proportion (the workload in the busy hours is about three times higher than that in other hours).
During the busy hours, processing for data reference, updating, and addition may occur concurrently and continue for up to 60 minutes.
If a state in which an I/O response taking 30 ms or more occurs for a period equivalent to 10% (6 minutes) of the said continuous execution, operation at the operator terminal may undergo stress. Therefore, make the settings so that an alarm log will be generated when such state occurs.
If I/O responses during the busy hours come down to 10 ms or less, the same as the performance target in other hours, the I/O response delays that occurred previously shall be deemed as instantaneous symptoms.
Therefore, an alarm log need not be generated when this state occurs.
The event log need not be displayed every time an alarm log is generated but can be displayed only once a day.
(This is because the system administrator checks the condition report once a day.)

Illustration of operational status of company A's online application system (transition of LogicalVolume responses)

An example of threshold monitoring setting for company A's online application system

Number corresponding to Material	Setting item	Setting
1	Threshold Monitoring Time	8:00-18:00
2	Alarm Display Time	12:00-15:00
3	Target	LogicalVolume Response
3	Threshold	30 ms
4	Threshold monitoring Interval	60 minutes
4	Alarm Tolerance Level	Total time: 360 seconds
5	Rearm	10 ms
6	Alarm Display Frequency	Day by day

Case 2: Online shopping system of company B

Material - System operation standard and performance requirements (excerpt)

Online application service hours: 24 hours a day for 365 days a year
Online application busy hours: Cannot be specified.
This system features that the number of accesses gradually increases as the number of member customers increases after start of the production run. It is assumed that the load on storage will also increase gradually. Measures need to be taken when the busy rate of storage resources (CM and disk) comes over 60% to 80%.
This system executes credit card transactions every 5 minutes. Therefore, for five minutes immediately before each transaction, product retrieval and order processing must be executed without stress. If the storage resource is kept in busy state (a state in which the busy rate exceeds 60% to 80%) for five minutes, transactions may be affected. Therefore, make settings so that an alarm log will be generated when such state occurs.
Event log shall be displayed every time an alarm log is generated. The system administrator checks the condition report when an event log is displayed.

Illustration of operational status of company B's online shopping system (transition of CM busy rate)

An example of threshold monitoring setting for the company B's online shopping system

Number corresponding to Material	Setting item	Setting
1	Threshold Monitoring Time	0:00-24:00
2	Alarm Display Time	0:00-24:00
3	Target	CM Busy Rate
3	Threshold	60%
4	Alarm Tolerance Level	Continuous time: 300 seconds
5	Alarm Display Frequency	All

Number corresponding to Material	Setting item	Setting
1	Threshold Monitoring Time	0:00-24:00
2	Alarm Display Time	0:00-24:00
3	Target	RAIDGroup Busy Rate
3	Threshold	80%
4	Alarm Tolerance Level	Continuous time: 300 seconds
5	Alarm Display Frequency	All

Case 3: Batch processing with multiple database servers (cluster system) of company C

Material - System operation standard and performance requirements

System service hours: 24 hours a day, 365 days a year
Batch processing hours: 20:00 to 23:00 every night
This cluster system is an Oracle RAC system consisting of three nodes. There is no problem with the batch processing performance because the amount of processed data is currently small. As the amount of data increases in the future, however, we have concerns over bottlenecks in the performance of FC path transfer between the FC switch and storage.
If an FC path bottleneck occurs, it must be eliminated quickly.
Assume the state in which the port throughput reaches about 80% of the maximum transfer capability as an FC path bottleneck, and make settings so that an alarm log is generated when such state continues for 30 minutes or more.
Event log need not be displayed every time an alarm log is generated but can be displayed only once even when an alarm log is generated more than once in the batch processing hours. The system administrator checks the condition report when an event log is displayed.

Illustration of batch processing with multiple database servers (cluster system) at company C (transition of port throughputs)

An example of threshold monitoring setting for business system backup operation at company C

Number corresponding to Material	Setting item	Setting
1	Threshold Monitoring Time	0:00-24:00
2	Alarm Display Time	20:00-23:00
3	Target	Port Throughput
3	Threshold	80%
4	Alarm Tolerance Level	Continuous time: 1,800 seconds
5	Alarm Display Frequency	Every monitoring time