Continuous Availability Considerations

Continuous Availability (CA) separates the vRealize Operations cluster into two fault domains and protects the analytics cluster against the loss of a fault domain.

Cluster Management

Clusters consist of a primary node, a primary replica node, a witness node, data nodes, and remote collector nodes.

Enabling Continuous Availability within vRealize Operations is not a disaster recovery solution.

When you enable Continuous Availability, information is stored (duplicated) in two different analytics nodes within the cluster but stretched across fault domains. Due to sizing requirements, continuous availability requires doubling the system’s compute and capacity requirements.

If either the primary node or primary replica node is permanently lost, then you must replace the lost node, which will become the new primary replica node. If it is necessary to have the new primary replica node as the primary node, then you can take the current primary node offline and wait until the primary replica node is promoted to the new primary node. Then bring the former primary node back online and it will be the new primary replica node.

Fault Domains

Fault domains consist of analytics nodes, separated into two zones.

A fault domain consists of one or more analytics nodes grouped according to their physical location in the data center. When configured, two fault domains enable vRealize Operations to tolerate failures of an entire physical location and failures from resources dedicated to a single fault domain.

Witness Node

Witness node is a member of the cluster but not part of the analytics nodes.

To enable CA within vRealize Operations, deploy the witness node in the cluster. The witness node does not collect nor store data.

The witness node serves as a tiebreaker when a decision must be made regarding availability of vRealize Operations when the network connection between the two fault domains is lost.

Analytics Nodes

Analytics nodes consist of a primary node, primary replica node, and data nodes.

When you enable continuous availability, you protect vRealize Operations from data loss if an entire fault domain is lost. If node pairs are lost across fault domains, there may be permanent data loss.

Deploy analytics nodes, within each fault domain, to separate hosts to reduce the chance of data loss if a host fails. You can use DRS anti-affinity rules to ensure that the vRealize Operations nodes remain on separate hosts.

Collector Group

In vRealize Operations, you can create a collector group. A collector group is a collection of nodes ( Cloud Proxy, analytics nodes and remote collectors). You can assign adapters to a collector group, rather than assigning an adapter to a single node.

Note: A collector group must contain the same type of nodes. You cannot mix Cloud Proxy, analytics nodes and remote collectors in a collector group.

When enabling continuous availability, collector groups can be created to collect data from adapters within each fault domain.

Collector groups do not have any correlation with fault domains. The functionality of a collector group is to collect data and provide it to the analytics nodes, which then vRealize Operations decides how to keep the data.

If the node running the adapter collection fails, the adapter is automatically moved to another node in the collector group.

Theoretically, you can install collectors in any place, provided the networking requirements are being met. However, from a failover perspective, it is not recommended to put all the collectors within a single fault domain. If all the collectors are directed to a single fault domain, vRealize Operations stops receiving data if a network outage occurs affecting that fault domain.

The recommendation is to keep remote collectors outside of fault domains or keep half of the remote collectors in fault domain 1 and the remaining remote collectors in fault domain 2.

Assign all normal adapters to collector groups, and not to individual nodes. Hybrid adapters require a two-way communication between the adapter and the monitored endpoint.

For more information about adapters, see Adapter and Management Packs Considerations.