The ELM mechanism is best illustrated with a few simple examples. The following discussion assumes the cluster consists of two or more nodes named A, B, C, etc.
The lock name for each node has the format "RMS<nnnnnn>", where <nnnnnn> is the 6-digit CF node ID. Therefore, the correspondence of every node's name, CF node ID, and lock name is easily determined. The locks are maintained clusterwide by CF, and any node can request access to any lock.
When a node first requests access to a lock, the request is granted immediately, because the node will initially receive it in the NULL state. This is a special, neutral state that does not conflict with that lock's state on any other node. The local node can then issue a request to ELM to convert the lock to another state. Once the request is granted, the node can hold it in that state, or it can convert it to another state. Those are the only possible operations.
Besides the NULL state, ELM supports only two other states:
concurrent read (CR)
Multiple nodes can hold the lock in this state. This will prevent other nodes from converting the lock to the exclusive state.
exclusive (EX)
Only one node in the cluster can hold the lock in this state. This will prevent other nodes from converting the lock to the concurrent read or exclusive states.
The conversion operation is the central control mechanism for ELM. Requesting a conversion that would conflict with other nodes does not cause an error condition. Instead, the request is put into a queue until ELM can grant the request. The requesting process is put in a wait state, and when it reawakens, it knows its request has been granted.
When a node converts a lock back to the NULL state, it effectively releases the lock, and allows other nodes to convert the lock to their desired state. The requests will be granted in the order they were queued.
For example, suppose A has converted its lock to the EX state. B can request access to A's lock and receive it immediately in the NULL state. If B then issues a request to convert it to the CR state, the request will wait until A (or ELM itself) converts the lock to the NULL or CR state, because neither of these conflict with B's request. Therefore, when B successfully converts A's lock, it knows that A is no longer holding the lock in the EX state.
Since a request to access another node's lock is granted immediately and does not affect the ELM lock mechanism, that step is omitted in the following discussions.
Assume that A is the first node to start RMS. The following sequence occurs:
A converts its own lock to EX mode. This happens immediately, since no other nodes have completed their startup at this point.
A initiates its UDP heartbeat and waits for responses from other nodes. It does not yet attempt to convert the locks of any other node at this point.
Assume that B is the second node to start RMS. The following sequence occurs:
B converts its own lock to EX mode. This happens immediately, since A has not requested access to B's lock at this point.
B initiates its UDP heartbeat and waits for responses from other nodes. It does not attempt to convert the locks of any other nodes at this point.
A detects the first heartbeat from B. A issues a request to convert B's lock to the CR state. Since B holds its lock in the EX state, A's request goes to sleep while it waits in the ELM queue.
Since A's request is sleeping, B must be holding its own lock. Therefore, B must be online, and A marks it accordingly. A continues to mark B as online as long as its request remains asleep.
B detects the heartbeat from A and executes a similar sequence. B tries to convert A's lock to CR mode. But A holds its lock in the EX state, so B's request goes to sleep while it waits in the ELM queue.
As long as B's request remains asleep, B continues to mark A as online.
At this point, both nodes hold their own locks in the EX state, and each has issued a request to convert the other's lock to the CR state. Both requests are sleeping, but that has no effect on anything else either node is doing. However, the unfulfilled request on each node indicates the other base monitor is online.
Assume C is the third node to start RMS. The sequence of operations with each of A and B are similar to the second node startup above:
C converts its own lock in the EX state, initiates its heartbeat, and then waits for the heartbeats from the other nodes.
When it first receives the heartbeat from one of the other nodes, it tries to convert that node's lock to the CR state, and the request goes to sleep.
As long as the request remains asleep, C marks the other node as online.
C performs steps 2 and 3 whenever it receives a heartbeat from a node for the first time.
At the same time, the other online nodes receive C's heartbeat for the first time, and they execute steps 2 and 3 with C's lock.
When the entire cluster is online, the states of the locks on each node are as follows:
The node hold its own lock in the EX state.
The node has issued requests to convert every other node's lock to the CR state, and all of these requests are sleeping.
When CF has issued a LEFTCLUSTER event for a remote node, or when the base monitor on a remote node goes down, ELM converts that node's lock to the NULL state on every node in the cluster.
For example, suppose node B has just gone down. The sequence of events on node A is typical of every other online node:
A's pending request to convert B's lock to the CR state wakes up. A now holds that lock in the CR state.
A immediately converts the lock to the NULL state and marks B as being offline.
A continues to mark B as offline until it receives a heartbeat from B. At that point, A restarts the lock cycle by requesting to convert B's lock to the CR state.
The same sequence would have occurred if node B's base monitor went down first. The major difference would be that, if the node goes down, the LEFTCLUSTER event would precede the ELM lock release; if the base monitor went down, the ELM lock release would precede the LEFTCLUSTER event. Either condition would cause RMS to initiate a node elimination.
The ELM method proactively alerts the other base monitors when an outage occurs somewhere in the cluster. They do not have to wait for heartbeat timeouts to expire.
Note that ELM handles a graceful shutdown in much the same way, but in this case, the node itself releases the lock. Also, at the RMS level, no node elimination is necessary.
When a remote node is busy, its base monitor may respond very slowly. ELM is a state-based method and cannot detect this condition. Therefore, RMS depends on the time-based UDP heartbeat to decide when the remote response has become unacceptably slow.
For example, suppose node B is not down, but its base monitor is responding very slowly. The following sequence will occur on one of the nodes in the cluster, which we will be supposed to node A for this discussion:
A's request to convert B's lock to the CR state continues to sleep, because B is still up.
B's heartbeat period expires (default: 30 seconds), and then its heartbeat recovery period expires (default: 600 seconds).
Based on the heartbeat loss, A directs SF to eliminate B. (Note that ELM still detects no problem.)
When B is eliminated, ELM releases its lock on every node.
A's request to convert B's lock is granted. A immediately releases the lock and marks B as offline.
A waits for B's heartbeat, and the lock cycle starts again.
Other nodes may detect the loss of B's heartbeat before their lock request wakes up, in which case they will also initiate a node elimination of B.
This illustrates why ELM needs the UDP heartbeat as a backup. Without UDP, the ELM lock requests could remain in the queue well beyond the point where the remote node provides no useful services.