Network Extension High Availability protects extended networks from a Network Extension failure at either the source or remote site.

Overview

The Network Extension High Availability (HA) setup requires four Network Extension appliances, with two appliances at the source site and two at the remote site. Together, these two pairs form the HA Group, which is the mechanism for managing Network Extension High Availability. Appliances on the same site require the similar configuration and must have access to the same set of resources.

Similar to a standalone Network Extension deployment, where one Network Extension appliance at the source site pairs with another standalone Network Extension appliance at the remote site, each Network Extension appliance of an HA group pairs with another Network Extension appliance of the same HA group at remote site. This pairing relationship between the two appliances does not change.

Through a process of role negotiation, appliances of a pair are either Active or both Standby. During a failover event, the Standby pair takes over the Network Extension service from the Active pair.

The four Network Extension appliances of an HA group negotiate their roles of Active or Standby automatically after the HA group is formed. Following role negotiation, appliances of a pair are either both Active or both Standby. A heartbeat signal between the Active pair and the Standby pair at each site synchronizes the two appliances. A loss of heartbeats between the Active pair or the Standby pair at either the source or the remote site triggers a failover.

HCX uses vSphere DRS host anti-affinity to place the Active and the Standby appliances on separate hosts.

When an HA Group is operating normally, the HA state is Healthy. If a problem is detected with the Active appliance, the peer Standby appliance pair starts failover actions and sets its role to Active. At this point, the HA state for the group switches to Degraded. Following failover, if the failed Network Extension appliance has recovered from its failure, this appliance and its peer appliance in the pair renegotiate their roles to be Standby, and the HA group returns to Healthy state. An HA-enabled Network Extension Appliance can enter Failed state when it encounters unexpected conditions during HA related operations. The HA group also enters the Failed state as one of the appliances is in Failed state. The Recover selection in the HA Management tab can recover an HA group from Failed state by redeploying the appliance that is in Failed state.

For a summary of Network Extension High Availability operational states and roles, see Monitoring Network Extension High Availability.

Considerations for Network Extension High Availability

  • Systems using Network Extension HA must have license entitlement to HCX Enterprise services.

  • Network Extension HA provides only appliance level resilience. Appliance Uplink resiliency is achieved using the Application Path Resiliency feature in the Service Mesh or multiple HCX uplinks.

  • Each Active and Standby pair is managed as an HA group, which includes upgrading and redeploying appliances. The process for redeploying and updating HA groups is the same as with standalone appliances, except that the operation is applied to both Active and Standby appliances at both the source or remote site. For a list of HA group management operations, see Managing Network Extension High Availability.

  • Network Extension High Availability protects against one Network Extension appliance failure in an HA group. More than one appliance failure in the same HA setup at the same time disrupts the Network Extension service.

  • Network Extension HA operates in Active/Standby mode.

  • Network Extension HA operates without pre-emption, with no automatic failback of an appliance pair to the Active role.

  • Network Extension HA Standby appliances are assigned IP addresses from the Network Profile IP pool.

Limitations for Network Extension High Availability

  • Following a failover event, policy-routed Mobility Optimized Networking (MON) traffic takes longer to recover than non-policy routed traffic. This is due to the time needed for the MON service to re-discover the next hop for the Policy Route traffic after the failover.

  • Network Extension HA does not support Storage DRS anti-affinity.