This section provides information about troubleshooting installation issues.
Basic Infrastructure Services
The following services must be running on the appliances and hypervisors, also on vCenter Server if it is used as a compute manager.
Make sure that firewall is not blocking traffic between NSX-T components and hypervisors. Make sure that the required ports are open between the components.
To flush the DNS cache on the NSX Manager, SSH as root to the manager and run the following command:
root@nsx-mgr-01:~# /etc/init.d/resolvconf restart [ ok ] Restarting resolvconf (via systemctl): resolvconf.service.
You can then check the DNS configuration file.
root@nsx-mgr-01:~# cat /etc/resolv.conf # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8) # DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN nameserver 192.168.253.1 search mgt.sg.lab
Checking Communication from Host to Controller and Manager
On an ESXi host using NSX-T CLI commands:
esxi-01.corp.local> get managers - 192.168.110.19 Connected esxi-01.corp.local> get controllers Controller IP Port SSL Status Is Physical Master Session State Controller FQDN 192.168.110.16 1235 enabled connected true up NA
On a KVM host using NSX-T CLI commands:
kvm-01> get managers - 192.168.110.19 Connected kvm-01> get controllers Controller IP Port SSL Status Is Physical Master Session State Controller FQDN 192.168.110.16 1235 enabled connected true up NA
On an ESXi host using host CLI commands:
[root@esxi-01:~] esxcli network ip connection list | grep 1235 tcp 0 0 192.168.110.53:42271 192.168.110.16:1235 ESTABLISHED 67702 newreno netcpa [root@esxi-01:~] [root@esxi-01:~] esxcli network ip connection list | grep 5671 tcp 0 0 192.168.110.253:11721 192.168.110.19:5671 ESTABLISHED 2103688 newreno mpa tcp 0 0 192.168.110.253:30977 192.168.110.19:5671 ESTABLISHED 2103688 newreno mpa
On a KVM host using host CLI commands:
root@kvm-01:/home/vmware# netstat -nap | grep 1235 tcp 0 0 192.168.110.55:53686 192.168.110.16:1235 ESTABLISHED 2554/netcpa root@kvm-01:/home/vmware# root@kvm-01:/home/vmware# root@kvm-01:/home/vmware# netstat -nap | grep 5671 tcp 0 0 192.168.110.55:50108 192.168.110.19:5671 ESTABLISHED 2870/mpa tcp 0 0 192.168.110.55:50110 192.168.110.19:5671 ESTABLISHED 2870/mpa root@kvm-01:/home/vmware# tcpdump -i ens32 port 1235 | grep kvm-01 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on ens32, link-type EN10MB (Ethernet), capture size 262144 bytes <truncated output> 03:46:27.040461 IP nsxcontroller01.corp.local.1235 > kvm-01.corp.local.38754: Flags [P.], seq 3315301231:3315301275, ack 2671171555, win 323, length 44 03:46:27.040509 IP kvm-01.corp.local.38754 > nsxcontroller01.corp.local.1235: Flags [.], ack 44, win 1002, length 0 ^C <truncated output> root@kvm-01:/home/vmware# root@kvm-01:/home/vmware# tcpdump -i ens32 port 5671 | grep kvm-01 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on ens32, link-type EN10MB (Ethernet), capture size 262144 bytes 03:51:16.802934 IP kvm-01.corp.local.58954 > nsxmgr01.corp.local.amqps: Flags [P.], seq 1153:1222, ack 1790, win 259, length 69 03:51:16.823328 IP nsxmgr01.corp.local.amqps > kvm-01.corp.local.58954: Flags [P.], seq 1790:1891, ack 1222, win 254, length 101 ^C <truncated output>
Host Registration Failure
If NSX-T uses the wrong IP address, host registration will fail. This can happen when a host has multiple IP addresses. Trying to delete the transport node leaves it in the Orphaned state. To resolve the issue:
Go to Fabric > Nodes > Hosts, edit the host and remove all IP addresses except the management one.
Click on the errors and select Resolve.
KVM Host Issues
KVM host issues are sometimes caused by not enough disk space. The /boot directory can fill up quickly and cause errors such as:
Failed to install software on host
No space left on device
You can run the command df -h to check available storage. If the /boot directory is at 100%, you can do the following:
Run sudo dpkg --list 'linux-image*' | grep ^ii to see all the kernels installed.
Run uname -r to see your currently running kernel. Do not remove this kernel (linux-image).
Use apt-get purge to remove images you don't need anymore. For example, run sudo apt-get purge linux-image-3.13.0-32-generic linux-image-3.13.0-33-generic.
Reboot the host.
In NSX Manager, check the errors and select Resolve.
Make sure the VMs are powered on.
Configuration Error when Deploying an Edge VM
After deploying an Edge VM, NSX Manager shows the VM's status as configuration error. The manager log has a message similar to the following:
nsx-manager NSX - FABRIC [nsx@6876 comp="nsx-manager" errorCode="MP16027" subcomp="manager"] Edge 758ad396-0754-11e8-877e-005056abf715 is not ready for configuration error occurred, error detail is NSX Edge configuration has failed. The host does not support required cpu features: ['aes'].
Restarting the edge datapath service and then the VM should resolve the issue.
Force Removing a Transport Node
You can remove a transport node that is stuck in the Orphaned state by making the following API call:
DELETE https://<NSX Manager>/api/v1/transport-nodes/<TN ID>?force=true
NSX Manager will not do any validations as to whether you have any active VMs running on the host. You are responsible for deleting the N-VDS and VIBs. If you have the node added through Compute Manager, delete the Compute Manager first and then delete the node. The transport node will be deleted as well.