In case of an NSX Controller failure, you may still have two controllers that are working. First try to resolve the issue without redeploying the controller cluster.

Problem

NSX Controller cluster failure.

Procedure

  1. Log in to the vSphere Web Client.
  2. From Networking & Security, click Installation > Management.
  3. In the NSX Controller nodes section, go to the Peers columns. If the Peers column shows green boxes, it represents no error in the peer controller connectivity in the cluster. A red box indicates an error with a peer. Click the box to view details.
  4. If Peers column displays problem in the controller cluster, log in to each NSX Controller CLI to perform detailed diagnosis. Run the show control-cluster status command to diagnose the state of each controller. All controllers in the cluster should have the same cluster UUID, however cluster UUID may not be same as the UUID of the master controller. You can find information about deployment issues as described in NSX Controller Deployment Issues.
  5. You can try the following methods to resolve the issue before redeploying the controller node:
    1. Check that the controller is powered on.
    2. Try to ping to and from affected controller to other nodes and manager to check network paths. If you find any network issues, address them as described in NSX Controller Deployment Issues.
    3. Check the Internet Protocol Security (IPSec) status using the following CLI commands.
      • Verify if IPSec is enabled using the show control-cluster network ipsec status command.

      • Verify the status of the IPSec tunnels using the show control-cluster network ipsec tunnels command.

      You can also use the IPSec status information to report VMware technical support.

    4. If the issue is not a network issue, then you can choose whether to reboot or redeploy.

    If you want to reboot a node, note that only one controller reboot should be done at a time. However, if the controller cluster is in a state where more than one node has failed, reboot all of them at the same time. When rebooting a node from a healthy cluster, always confirm that the cluster is reformed properly afterwards, then confirm that resharding has been done properly.

  6. If you decide to redeploy, you must first delete the broken controller and then deploy a new controller.

What to do next

Delete the affected controller as described in Delete an NSX Controller.

Redeploy new controller as described in Redeploy an NSX Controller.