High Availability (HA)

Review and follow the best practices for high availability (HA).

Understand what High Availability (HA) provides (or does not provide) before enabling (or deactivate)

Enabling HA requires double the resources, as data is stored redundantly in two nodes as opposed to only on one node when HA is deactivated. Since the data is being stored in two nodes, this limits the total capacity by approximately 50%.

Review the VMware Aria Operations Sizing Guidelines for more information.

HA allows losing only one data node for the cluster to remain functional
It is important to understand and weigh the cost of the extra resources to the benefits that HA provides.
Activate HA only after all nodes in the cluster have been added and are online
Add all data nodes to the cluster before enabling HA. On new deployments, add data nodes to build the cluster to fit the appropriate sizing and then activate HA. If you are adding new data nodes to an existing cluster, add as many data nodes as necessary, then activate HA. The goal is to minimize the number of times you activate HA; the process to activate HA can be very disruptive so perform only when necessary.
Deploy all analytics nodes for a single VMware Aria Operations cluster in the same data center
It is required to have all analytics nodes in the same data center to ensure latency requirements are consistently met for providing efficient cross node communication and optimal cluster performance.
Deploy analytics cluster nodes on separate hosts for redundancy and isolation
If possible, establish a 1:1 mapping for nodes to hosts. This will protect the cluster if one host goes down, then only one node is lost, and the cluster remains functional. If it is not possible to establish a 1:1 mapping for nodes to host, make sure to separate the primary node and primary replica node on different hosts. This will safeguard the cluster if one of these hosts were to go down.
Use anti-affinity rules that keep nodes on specific hosts in the vSphere cluster
To keep nodes separately on different hosts, use anti-affinity rules to prevent grouping of nodes on specific hosts. The idea is to prevent multiple nodes from going down if hosted on one node.
Name nodes independent of role
Roles may change for nodes so statically naming a node a specific name may be confusing. For example, a node named "Primary" may no longer be the actual primary node after promoting the replica node. This will avoid user confusion associated with poor naming convention.
HA is not a substitute for a backup and recovery (B and R) plan
HA allows the cluster to remain functional only when one node is lost so a separate backup and recovery solution must be used. See VMware Suite Documentation for supported backup utilities and procedures.
HA is not a Disaster Recovery (DR) strategy
HA for VMware Aria Operations is not a disaster recovery mechanism, so a separate DR solution must be used. See the VMware Suite Documentation. HA will allow the cluster to continue running if either the primary node, the replica node, or one data node fails. The entire cluster does not recover if multiple nodes fail at the same time.
Hosts need to reside on the same storage.
For performance and consistency, use of the same storage is required.