NSX Manager is down or unavailable if majority of the nodes in the cluster are down.
Problem
Solution
- SSH to each of the affected NSX Manager nodes and run following commands:
- Run get filesystem-stats and verify /config and /image is not 100% full.
- Run get core-dumps to verify no cores have gotten generated in NSX Manager.
- Verify there was no datastore outage. See NSX Manager Cluster status Degraded As Datastore-related Components Are Down.
- Check logs for out-of-memory errors. See /var/log/proton/proton-tomcat-wrapper.log
- To restore clustering and UI, any two nodes in a three node cluster must be up. If you are not able to bring any failed node back up, but if there is a healthy node available, then do one of the following steps to restore clustering:
- Deploy a new manager node (as 4th member node), join the existing cluster and then detach one of the failed nodes using CLI cmd detach node <node-uuid> or API POST /api/v1/cluster/<node-uuid>?action=remove_node. The commands should be executed from one of the healthy nodes. Alternatively, you can follow the next bulleted point to deactivate the cluster.
- (Optional) Run the deactivate cluster command on active node such that you end up with single node cluster. Now continue to add the new additional nodes to make a 3-member NSX Manager cluster.
Note: NSX Manager nodes that are removed from the cluster should be powered off and deleted.