This section provides information about troubleshooting installation issues.

Basic Infrastructure Services

The following services must be running on the appliances and hypervisors, also on vCenter Server if it is used as a compute manager.

  • NTP

  • DNS

Make sure that firewall is not blocking traffic between NSX-T components and hypervisors. Make sure that the required ports are open between the components.

To flush the DNS cache on the NSX Manager, SSH as root to the manager and run the following command:

root@nsx-mgr-01:~# /etc/init.d/resolvconf restart
[ ok ] Restarting resolvconf (via systemctl): resolvconf.service.

You can then check the DNS configuration file.

root@nsx-mgr-01:~# cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 192.168.253.1
search mgt.sg.lab

Checking Communication from Host to Controller and Manager

On an ESXi host using NSX-T CLI commands:

esxi-01.corp.local> get managers
- 192.168.110.19   Connected
 
esxi-01.corp.local> get controllers
 Controller IP    Port     SSL         Status       Is Physical Master   Session State  Controller FQDN
 192.168.110.16   1235   enabled     connected             true               up               NA

On a KVM host using NSX-T CLI commands:

kvm-01> get managers
- 192.168.110.19   Connected
 
kvm-01> get controllers
 Controller IP    Port     SSL         Status       Is Physical Master   Session State  Controller FQDN
 192.168.110.16   1235   enabled     connected             true               up               NA

On an ESXi host using host CLI commands:

[root@esxi-01:~] esxcli network ip  connection list | grep 1235
tcp         0       0  192.168.110.53:42271                        192.168.110.16:1235   ESTABLISHED     67702  newreno  netcpa
[root@esxi-01:~]
[root@esxi-01:~] esxcli network ip  connection list | grep 5671
tcp         0       0  192.168.110.253:11721             192.168.110.19:5671   ESTABLISHED   2103688  newreno  mpa
tcp         0       0  192.168.110.253:30977             192.168.110.19:5671   ESTABLISHED   2103688  newreno  mpa

On a KVM host using host CLI commands:

root@kvm-01:/home/vmware# netstat -nap | grep 1235
tcp        0      0 192.168.110.55:53686    192.168.110.16:1235     ESTABLISHED 2554/netcpa
root@kvm-01:/home/vmware#
root@kvm-01:/home/vmware#
root@kvm-01:/home/vmware# netstat -nap | grep 5671
tcp        0      0 192.168.110.55:50108    192.168.110.19:5671     ESTABLISHED 2870/mpa
tcp        0      0 192.168.110.55:50110    192.168.110.19:5671     ESTABLISHED 2870/mpa

root@kvm-01:/home/vmware# tcpdump -i ens32 port 1235 | grep kvm-01
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens32, link-type EN10MB (Ethernet), capture size 262144 bytes
<truncated output>
03:46:27.040461 IP nsxcontroller01.corp.local.1235 > kvm-01.corp.local.38754: Flags [P.], seq 3315301231:3315301275, ack 2671171555, win 323, length 44
03:46:27.040509 IP kvm-01.corp.local.38754 > nsxcontroller01.corp.local.1235: Flags [.], ack 44, win 1002, length 0
^C
<truncated output>
root@kvm-01:/home/vmware#

root@kvm-01:/home/vmware# tcpdump -i ens32 port 5671 | grep kvm-01
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens32, link-type EN10MB (Ethernet), capture size 262144 bytes
03:51:16.802934 IP kvm-01.corp.local.58954 > nsxmgr01.corp.local.amqps: Flags [P.], seq 1153:1222, ack 1790, win 259, length 69
03:51:16.823328 IP nsxmgr01.corp.local.amqps > kvm-01.corp.local.58954: Flags [P.], seq 1790:1891, ack 1222, win 254, length 101
^C
<truncated output>

Host Registration Failure

If NSX-T uses the wrong IP address, host registration will fail. This can happen when a host has multiple IP addresses. Trying to delete the transport node leaves it in the Orphaned state. To resolve the issue:

  • Go to Fabric > Nodes > Hosts, edit the host and remove all IP addresses except the management one.

  • Click on the errors and select Resolve.

KVM Host Issues

KVM host issues are sometimes caused by not enough disk space. The /boot directory can fill up quickly and cause errors such as:

  • Failed to install software on host

  • No space left on device

You can run the command df -h to check available storage. If the /boot directory is at 100%, you can do the following:

  • Run sudo dpkg --list 'linux-image*' | grep ^ii to see all the kernels installed.

  • Run uname -r to see your currently running kernel. Do not remove this kernel (linux-image).

  • Use apt-get purge to remove images you don't need anymore. For example, run sudo apt-get purge linux-image-3.13.0-32-generic linux-image-3.13.0-33-generic.

  • Reboot the host.

  • In NSX Manager, check the errors and select Resolve.

  • Make sure the VMs are powered on.

Configuration Error when Deploying an Edge VM

After deploying an Edge VM, NSX Manager shows the VM's status as configuration error. The manager log has a message similar to the following:

nsx-manager NSX - FABRIC [nsx@6876 comp="nsx-manager" errorCode="MP16027" subcomp="manager"] Edge 758ad396-0754-11e8-877e-005056abf715 is not ready for configuration error occurred, error detail is NSX Edge configuration has failed. The host does not support required cpu features: ['aes'].

Restarting the edge datapath service and then the VM should resolve the issue.

Force Removing a Transport Node

You can remove a transport node that is stuck in the Orphaned state by making the following API call:

DELETE https://<NSX Manager>/api/v1/transport-nodes/<TN ID>?force=true

NSX Manager will not do any validations as to whether you have any active VMs running on the host. You are responsible for deleting the N-VDS and VIBs. If you have the node added through Compute Manager, delete the Compute Manager first and then delete the node. The transport node will be deleted as well.