A phantom controller can be a live controller virtual machine (VM) or non-existent VM that can be participating or not participating in the cluster. NSX Manager synchronises the list of all VMs from the vCenter Server inventory. A phantom controller is created when the vCenter Server or host deletes a controller VM without a request from NSX Manager, or when vCenter Server inventory changes the reference MOID of the controller VMs.
When controller is created from NSX, the configuration information is stored inside the NSX Manager. NSX Manager deploys the new controller VM through the vCenter Server.
NSX administrator provides configuration, including IP address pool to the NSX Manager to create a controller. NSX Manager removes an IP address from the pool, and pushes that IP with the rest of the controller configuration as a VM creation request to the vCenter Server. NSX Manager waits for vCenter Server to confirm the status of the request.
The controller creation process was successful: If the controller VM is created successfully, vCenter Server starts the controller VM. NSX Manager stores the Managed Object ID (MOID) of the VM with the rest of the controller’s configuration information. The MOID (or MO-REF) is a unique identifier that vCenter assigns to every object in its inventory. vCenter Server also use this MOID to track the VM if it remains part of the vCenter Server inventory.
The controller creation process was not successful: If the IP and network connection configurations were incorrect, then NSX Manager might not be able to contact vCenter Server. NSX Manager waits for a preset amount of time to create a single node controller cluster (for the first one) or new controller to join the active cluster. If timer expires, NSX Manager requests vCenter Server to delete the VM. The IP address is returned back to the pool and NSX declares controller creation failure.
How Phantom Controller Gets Created
When NSX Manager requests to delete a controller, vCenter Server finds the controller VM using the MOID for deletion.
However, if any vCenter activities result in removal of the controller VM from the vCenter Server inventory, vCenter removes the MOID from its database. Note that the controller VM can still be alive and active on the NSX Manager even after getting removed from the vCenter inventory. But for the vCenter Server, controller VM no longer exists. Even though vCenter Server has removed the VM from its inventory, the VM may not be deleted. If the VM is still active, then it is still participating or attempting to participate in the NSX controller cluster.
Following are the most common example of how phantom controller gets created:
The vCenter Server administrator removes the host that contains the controller VM from the inventory. Later adds the host back. When the host is removed, vCenter Server delete all the MOIDs associated with the host and the VMs within it. When the host is added back later, vCenter Server assigns brand new MOID to the host and the VMs. For the NSX users, the host and VM are still the same, but from the vCenter Server’s perspective, the hosts and VMs are brand new objects. However, for all practical purposes, the hosts and VMs are still the same. The applications that run within the host and VMs do not change.
The vCenter Server administrator deletes the controller VM through vCenter Server or using Host Management. The deletion was not initiated by NSX Manager.
Delete in this case also includes any host/storage failures that result in the loss of the VM. In this case, the VM is lost to vCenter Server and also lost to the cluster and NSX Manager. But because the deletion was not initiated by NSX Manager, both NSX Manager and the controller cluster thinks that the controller is still valid. The controller status returned to the NSX Manager indicates that this controller node is down and not part of the cluster and displayed on the UI. NSX Manager also have logs indicating that the controller is no longer reachable.
What to Do When You See Phantom Controller
Synchronize controllers as described in NSX Controller Is Disconnected.
See the log entries. For cases where the controller VM got deleted accidentally or got corrupted, you must use the Forcefully Delete option to clear the entry from the NSX Manager database. For details, refer to Delete an NSX Controller.
After deleting the controller, confirm that:
The controller VM is actually deleted.
The show controller-cluster startup-nodes command shows only valid controllers.
The syslog entries for the NSX Manager no longer shows an extra controller.
From NSX 6.2.7 or later, NSX Manager verifies with the vCenter inventory to ensure that the controller VM still exist in the inventory based on the original MOID. If NSX Manager cannot find controller VM in the inventory, NSX Manager searches the VM using the VM’s instance UUID. The instance UUID is stored within the VM, so it does not change even when the VM is added back to the vCenter inventory. If NSX Manager is able to find the VM with the instance UUID, NSX Manager updates its database with the new MOID.
However, if you clone the controller VM, the cloned VM has same properties as the original VM along with a new instance UUID. NSX Manager cannot detect MOID for the cloned VM.
Log Entries for Phantom Controller
Following error level log entry is seen when a phantom controller is detected:
2017-07-31 22:15:05.844 UTC ERROR NVPStatusCheck ControllerServiceImpl:2146 - Controller <#> does not exist, might be deleted already. Skip saving its connectivity info.
2017-07-31 22:15:05.769 UTC ERROR NVPStatusCheck ControllerServiceImpl:2580 - the node is created by this NSX Manager <#>, but database has no record and delete might be in progress.