NSX Manager is slow to load and tasks fail with message server is overloaded or too many requests.

Problem

NSX Manager UI will fail to load with following error Some appliance components are not functioning properly. Component health: POLICY:UNKNOWN, MANAGER:UNKNOWN, SEARCH:UNKNOWN, NO and clustering related commands will fail using the CLI and API.

Solution

  1. SSH to each of the affected NSX Manager nodes and run following commands:
    1. Run get file-system-stats and verify /config and /image is not 100% full.
    2. Run get core-dumps to verify no cores have gotten generated in NSX Manager.
    3. Verify there was no datastore outage as outlined in step 1b above.
    4. Check logs for out-of-memory errors. See /var/log/proton/proton-tomcat-wrapper.log
  2. To restore clustering and UI, 2 nodes in a 3 node cluster should be up. If you are not able to bring any downed node back up but still have one healthy node then do one of the following steps to restore clustering:
    • Deploy a new manager node (as 4th member node), join the existing cluster and then detach one of the failed nodes using CLI cmd detach node <node-uuid> or API POST /api/v1/cluster/<node-uuid>?action=remove_node. The commands should be executed from one of the healthy nodes.
    • Run the deactivate cluster command on active node such that you end up with single node cluster. Now continue to add the new additional nodes to make a 3-member NSX Manager cluster.
      Note: NSX Manager nodes that are removed from the cluster should be powered off and deleted.