A failure domain allows automatic recovery of a failed NSX Edge node, based on the allocation rules set in the NSX Edge cluster. Before configuring a Tier-0 stateful Active-Active (A-A) gateway, reference NSX Edge nodes to different failure domains.

A stateful A-A cluster expands or shrinks as you increase the number of NSX Edge nodes. In a stateful active-active cluster, NSX automatically creates sub-clusters out of the existing number of NSX Edge nodes. Each sub-cluster works as a pair of active and backup NSX Edge nodes. When one of the NSX Edge node from a sub-cluster fails, the failure domain associated to that NSX Edge node automatically recovers it.

In this procedure, you will reference NSX Edge nodes to different failure domains.
Note: Ensure that NSX Edge-1 and NSX Edge-2 of sub-cluster-1 belong to two different failure domains.

Procedure

  1. Using the API, create failure domains for the each Edge node that you will add to the stateful A-A cluster, for example, FD1A-Edge1 and FD2A-Edge 2. Set the parameter preferred_active_edge_services to true for both Edge 1 and Edge 2.
    POST /api/v1/failure-domains
    {
    "display_name": "FD1A-Edge1",
    "preferred_active_edge_services": "true"
    }
    
    POST /api/v1/failure-domains
    {
    "display_name": "FD2A-Edge2",
    "preferred_active_edge_services": "true"
    }
  2. Using the API, associate each Edge node with the failure domain for the site. First call the GET /api/v1/transport-nodes/<transport-node-id> API to get the data about the Edge node. Use the result of the GET API as the input for the PUT /api/v1/transport-nodes/<transport-node-id> API, with the additional property, failure_domain_id, set appropriately. For example,
    GET /api/v1/transport-nodes/<transport-node-id>
    Response:
    {
        "resource_type": "TransportNode",
        "description": "Updated NSX configured Test Transport Node",
        "id": "77816de2-39c3-436c-b891-54d31f580961",
        ...
    }
    PUT /api/v1/transport-nodes/<transport-node-id>
    {
        "resource_type": "TransportNode",
        "description": "Updated NSX configured Test Transport Node",
        "id": "77816de2-39c3-436c-b891-54d31f580961",
        ...
        "failure_domain_id": "<UUID>",
    }
    
  3. Using the API, configure the Edge cluster to allocate nodes based on failure domain. First call the GET /api/v1/edge-clusters/<edge-cluster-id> API to get the data about the Edge cluster. Use the result of the GET API as the input for the PUT /api/v1/edge-clusters/<edge-cluster-id> API, with the additional property, allocation_rules set appropriately. For example,
    GET /api/v1/edge-clusters/<edge-cluster-id>
    Response:
    {
        "_revision": 0,
        "id": "bf8d4daf-93f6-4c23-af38-63f6d372e14e",
        "resource_type": "EdgeCluster",
        ...
    }
    PUT /api/v1/edge-clusters/<edge-cluster-id>
    {
        "_revision": 0,
        "id": "bf8d4daf-93f6-4c23-af38-63f6d372e14e",
        "resource_type": "EdgeCluster",
        ...
        "allocation_rules": [
            {
                "action": {
                          "enabled": true,
                          "action_type": "AllocationBasedOnFailureDomain"
                          }
            }
        ],
    }

Results

The NSX Edge nodes are referenced to different failure domains. You can now use them to create a cluster and configure Tier-0 gateway in A-A Stateful HA mode.