You can use the NSX Command Line Interface (CLI) to troubleshoot problems.

Table 1. Checking the NSX Installation on ESXi Host—Commands Run from NSX Manager
Description Commands on NSX Manager Notes
List all clusters to get the cluster IDs show cluster all View all cluster information
List all the hosts in the cluster to get the host IDs show cluster

clusterID

View the list of hosts in the cluster, the host-ids, and the host-prep installation status
List all the VMs on a host show host

hostID

View particular host information, VMs, VM IDs, and power status
Table 2. Names of VIBs and Modules Installed on Hosts to Use in Commands
NSX version ESXi version VIBs Modules
Any 6.3.x 5.5 esx-vxlan and esx-vsip vdl2, vdrb, vsip, dvfilter-switch-security, bfd, traceflow
6.3.2 and earlier 6.0 and later esx-vxlan and esx-vsip vdl2, vdrb, vsip, dvfilter-switch-security, bfd, traceflow
6.3.3 and later 6.0 and later esx-nsxv nsx-vdl2, nsx-vdrb, nsx-vsip, nsx-dvfilter-switch-security, nsx-core, nsx-bfd, nsx-traceflow
Table 3. Checking the NSX Installation on ESXi Host—Commands Run from Host
Description Commands on Host Notes

VIBs present depend on the NSX and ESXi versions.

See table Names of VIBs and Modules Installed on Hosts for details on which modules to check on your installation.

esxcli software vib get --vibname <name>

Check the version/date installed

esxcli software vib list displays a list of all VIBs on the system

List all the system modules currently loaded in the system esxcli system module list Older equivalent command: vmkload_mod -l | grep -E vdl2|vdrb|vsip|dvfilter-switch-security

Modules present depend on the NSX and ESXi versions.

See table Names of VIBs and Modules Installed on Hosts for details on which modules to check on your installation.

esxcli system module get -m <name> Run the command for each module
Two User World Agents (UWA) : control plane agent, firewall agent

/etc/init.d/vShield-Stateful-Firewall status

/etc/init.d/netcpad status

Check UWAs connection, port 1234 to controllers and 5671 to NSX Manager

esxcli network ip connection list | grep 1234

esxcli network ip connection list | grep 5671

Controller TCP connection

Message bus TCP connection

Check EAM status vSphere Web Client, check Administration > vSphere ESX Agent Manager
Table 4. Checking the NSX Installation on ESXi Host—Host Networking Commands
Description Host Networking Commands Notes
List physical NICs/vmnic esxcli network nic list Check the NIC type, driver type, link status, MTU
Physical NIC details esxcli network nic get -n vmnic# Check the driver and firmware versions along with other details
List vmk NICs with IP addresses/MAC/MTU, and so on esxcli network ip interface ipv4 get To ensure VTEPs are correctly instantiated
Details of each vmk NIC, including vDS information esxcli network ip interface list To ensure VTEPs are correctly instantiated
Details of each vmk NIC, including vDS info for VXLAN vmks esxcli network ip interface list --netstack=vxlan To ensure VTEPs are correctly instantiated
Find the VDS name associated with this host’s VTEP esxcli network vswitch dvs vmware vxlan list To ensure VTEPs are correctly instantiated
Ping from VXLAN-dedicated TCP/IP stack ping ++netstack=vxlan –I vmk1 x.x.x.x To troubleshoot VTEP communication issues: add option -d -s 1572 to make sure that the MTU of transport network is correct for VXLAN
View routing table of VXLAN-dedicated TCP/IP stack esxcli network ip route ipv4 list -N vxlan To troubleshoot VTEP communication issues
View ARP table of VXLAN-dedicated TCP/IP stack esxcli network ip neighbor list -N vxlan To troubleshoot VTEP communication issues
Table 5. Checking the NSX Installation on ESXi Host—Host Log Files
Description Log File Notes
From NSX Manager show manager log follow Tails the NSX Manager logs

For live troubleshooting

Any installation related logs for a host /var/log/esxupdate.log

Host related issues

VMkernel warning, messages, alerts, and availability report

/var/log/vmkernel.log

/var/log/vmksummary.log

/var/log/vmkwarning.log

Module load failure is captured /var/log/syslog IXGBE driver failure

NSX modules dependency failure are key indicators

On vCenter, ESX Agent Manager is responsible for updates In vCenter logs, eam.log
Table 6. Checking Logical Switching—Commands Run from NSX Manager
Description Command on NSX Manager Notes
List all logical switches show logical-switch list all List all the logical switches, their UUIDs to be used in API, transport zone, and vdnscope
Table 7. Logical Switching—Commands Run from NSX Controller
Description Commands on Controller Notes
Find the controller that is the owner of the VNI show control-cluster logical-switches vni 5000 Note the controller IP address in the output and SSH to it
Find all the hosts that are connected to this controller for this VNI show control-cluster logical-switch connection-table 5000 The source IP address in output is the management interface of host, and the port number is the source port of TCP connection
Find the VTEPs registered to host this VNI show control-cluster logical-switches vtep-table 5002
List the MAC addresses learned for VMs on this VNI show control-cluster logical-switches mac-table 5002 Map that the MAC address is actually on the VTEP reporting it
List the ARP cache populated by the VM IP updates show control-cluster logical-switches arp-table 5002 ARP cache expires in 180 secs
For a specific host/controller pair, find out which VNIs host has joined show control-cluster logical-switches joined-vnis <host_mgmt_ip>
Table 8. Logical Switching—Commands Run from Hosts
Description Command on Hosts Notes
Check if the host VXLAN is in-sync or not esxcli network vswitch dvs vmware vxlan get Shows the sync state and port used for encapsulation
View VM attached and local switch port ID for datapath captures net-stats -l A nicer way to get vm switchport for a specific VM
Verify VXLAN kernel module vdl2 is loaded esxcli system module get -m vdl2 Shows full detail of the specified module.

Verify the version

Verify correct VXLAN VIB version is installed

See table Names of VIBs and Modules Installed on Hosts for details on which VIBs to check on your installation.

esxcli software vib get --vibname esx-vxlan

or

esxcli software vib get --vibname esx-nsxv
Shows full detail of the specified VIB

Verify the version and date

Verify the host knows about other hosts in the logical switch esxcli network vswitch dvs vmware vxlan network vtep list --vxlan-id=5001 --vds-name=Compute_VDS Shows list of all the VTEPs that this host knows about that are hosting vtep 5001
Verify control plane is up and active for a Logical switch esxcli network vswitch dvs vmware vxlan network list --vds-name Compute_VDS Make sure the controller connection is up and the Port/Mac count matches the VMs on the LS on this host
Verify host has learnt MAC addresses of all VMs esxcli network vswitch dvs vmware vxlan network mac list --vds-name Compute_VDS --vxlan-id=5000 This should list all the MACs for the VNI 5000 VMs on this host
Verify host has locally cached ARP entry for remote VM’s esxcli network vswitch dvs vmware vxlan network arp list --vds-name Compute_VDS --vxlan-id=5000 Verify host has locally cached ARP entry for remote VM’s
Verify VM is connected to LS & mapped to a local VMKnic

Also shows what vmknic ID a VM dvPort is mapped to

esxcli network vswitch dvs vmware vxlan network port list --vds-name Compute_VDS --vxlan-id=5000 the vdrport will always be listed as long as the VNI is attached to a router
View vmknic ID’s and what switchport/uplink they are mapped to esxcli network vswitch dvs vmware vxlan vmknic list --vds-name=DSwitch-Res01
Table 9. Checking Logical Switching—Log Files
Description Log File Notes
Hosts are always connected to controllers hosting their VNIs /etc/vmware/netcpa/config-by-vsm.xml This file should always have all the controllers in the environment listed The config-by-vsm.xml file is created by netcpa process
The config-by-vsm.xml file is pushed by NSX Manager using vsfwd

If the config-by-vsm.xml file is not correct look at the vsfwd log

/var/log/vsfwd.log

Parse through this file looking for errors

To restart process: /etc/init.d/vShield-Stateful-Firewall stop|start

Connection to controller is made using netcpa /var/log/netcpa.log

Parse through this file looking for errors

Logical switching module logs are in vmkernel.log /var/log/vmkernel.log Check logical switching module logs in /var/log/vmkernel.log “prefixed with VXLAN:”
Table 10. Checking Logical Routing—Commands Run from NSX Manager
Description Commands on NSX Manager Notes
Commands for ESG show edge CLI commands for Edge ServicesGateway (ESG) start with 'show edge'
Commands for DLR Control VM show edge CLI commands for Distributed Logical Router (DLR) Control VM start with 'show edge'
Commands for DLR show logical-router CLI commands for Distributed Logical Router (DLR) start with show logical-router
List all edges show edge all List all the edges that support the central CLI
List all the services and deployment details of an edge show edge edgeID View Edge Service Gateway Information
List the command options for edge show edge edgeID ? View details, such as version, log, NAT, routing table, firewall, configuration, interface, and services
View routing details show edge edgeID ip ? View routing info, BGP, OSPF and other details
View routing table show edge edgeID ip route View the routing table at Edge
View routing neighbor show edge edgeID ip ospf neighbor View routing neighbor relationship
View logical routers connection information show logical-router host hostID connection Verify that the number of LIFs connected are correct, the teaming policy is right and the appropriate vDS is being used
List all logical router instances running on the host show logical-router host hostID dlr all

Verify the number of LIFs and routes

Controller IP should be same on all hosts for a logical router

Control Plane Active should be yes

--brief gives a compact response

Check the routing table on the host show logical-router host hostID dlr dlrID route

This is the routing table pushed by the controller to all the hosts in the transport zone

This must be same across all the hosts

If some of the routes are missing on few hosts, try the sync command from controller mentioned earlier

The E flag means routes are learned via ECMP

Check the LIFs for a DLR on the host show logical-router host hostID dlr dlrID interface (all | intName) verbose The LIF information is pushed to hosts from the controller

Use this command to ensure the host knows about all the LIFs it should

Table 11. Checking Logical Routing—Commands Run from NSX Controller
Description Commands on NSX Controller Notes
Find all the Logical Router Instances show control-cluster logical-routers instance all This should list the logical router instance and all the hosts in the transport zone which should have the logical router instance on them

In addition, shows the Controller that servicing this logical router

View details of each logical router show control-cluster logical-routers instance 0x570d4555 The IP column shows the vmk0 IP addresses of all hosts where this DLR exists
View all the interfaces CONNECTED to the logical router show control-cluster logical-routers interface-summary 0x570d4555 The IP column shows the vmk0 IP addresses of all hosts where this DLR exists
View all the routes learned by this logical router show control-cluster logical-routers routes 0x570d4555 Note that the IP column shows the vmk0 IP addresses of all hosts where this DLR exists
shows all the network connections established, like a net stat output show network connections of-type tcp Check if the host you are troubleshooting has netcpa connection Established to controller
Sync interfaces from controller to host sync control-cluster logical-routers interface-to-host <logical-router-id> <host-ip> Useful if new interface was connected to logical router but is not sync'd to all hosts
Sync routes from controller to host sync control-cluster logical-routers route-to-host <logical-router-id> <host-ip> Useful if some routes are missing on few hosts but are available on majority of hosts
Table 12. Checking Logical Routing—Commands Run from Edge
Description Commands on Edge or Logical Router Control VM Notes
View configuration show configuration <global | bgp | ospf | …>
View the routes learned show ip route Make sure the routing and forwarding tables are in sync
View the forwarding table show ip forwarding Make sure the routing and forwarding tables are in sync
View the distributed logical router interfaces show interface

First NIC shown in the output is the distributed logical router interface

The distributed logical router interface is not a real vNIC on that VM

All the subnets attached to distributed logical router are of type INTERNAL

View the other interfaces (management) show interface

Management/HA interface is a real vNIC on the logical router Control VM

If HA was enabled without specifying an IP address, 169.254.x.x/ 30 is used

If the management interface is given an IP address, it appears here

debug the protocol

debug ip ospf

debug ip bgp

Useful to see issues with the configuration (such as mismatched OSPF areas, timers, and wrong ASN)

Note: output is only seen on the Console of Edge (not via SSH session)

OSPF commands

show configuration ospf

show ip ospf interface

show ip ospf neighbor

show ip route ospf

show ip ospf database

show tech-support (and look for strings “EXCEPTION” and “PROBLEM”)

BGP commands

show configuration bgp

show ip bgp neighbor

show ip bgp

show ip route bgp

show ip forwarding

show tech-support (look for strings “EXCEPTION” and “PROBLEM”)

Table 13. Checking Logical Routing—Log Files from Hosts
Description Log File Notes
Distributed Logical Router instance information is pushed to hosts by vsfwd and saved in XML format /etc/vmware/netcpa/config-by-vsm.xml

If distributed logical router instance is missing on the host, first look at this file to see if the instance is listed

If not, restart vsfwd

Also, use this file to ensure that all of the controllers are known to the host

The above file is pushed by NSX Manager using vsfwd

If the config-by-vsm.xml file is not correct look at the vsfwd log

/var/log/vsfwd.log

Parse through this file looking for errors

To restart process: /etc/init.d/vShield-Stateful-Firewall stop|start

Connection to controller is made using netcpa /var/log/netcpa.log

Parse through this file looking for errors

Logical switching module logs are in vmkernel.log /var/log/vmkernel.log Check logical switching module logs in /var/log/vmkernel.log “prefixed with vxlan:”
Table 14. Controller Debugging—Command Run from NSX Manager
Descripction Command (On NSX Manager) Notes
List all controllers with state show controller list all Shows the list of all controllers and their running state
Table 15. Controller Debugging—Command Run from NSX Controller
Description Command(On Controller) Notes
Check controller cluster status show control-cluster status

Should always show 'Join complete' and 'Connected to Cluster Majority'

Check the stats for flapping connections and messages show control-cluster core stats The dropped counter should not change
View the node's activity in relation to joining the cluster initially or after a restart show control-cluster history This is great for troubleshooting cluster join issues
View list of nodes in the cluster show control-cluster startup-nodes Note that the list doesn’t have to have ONLY have active cluster nodes

This should have a list of all the currently deployed controllers

This list is used by starting controller to contact other controllers in the cluster

shows all the network connections established, like a net stat output show network connections of-type tcp Check if the host you are troubleshooting has netcpa connection Established to controller
To restart the controller process restart controller Only restarts the main controller process

Forces a re-connection to the cluster

To reboot the controller node restart system Reboots the controller VM
Table 16. Controller Debugging—Log Files on NSX Controller
Description Log File Notes
View controller history and recent joins, restarts. and so on show control-cluster history Great troubleshooting tool for controller issues especially around clustering
Check for slow disk show log cloudnet/cloudnet_java-zookeeper<timestamp>.log filtered-by fsync A reliable way to check for slow disks is to look for "fsync" messages in the cloudnet_java-zookeeper log

If sync takes more than 1 second, ZooKeeper prints this message, and it is a good indication that something else was utilizing the disk at that time

Check for slow/malfunctioning disk show log syslog filtered-by collectd Messages like the one in ample output about “collectd” tend to correlate with slow or malfunctioning disks
Check for diskspace usage show log syslog filtered-by freespace: There is a background job called “freespace” that periodically cleans up old logs and other files from the disk when the space usage reaches some threshold. In some cases, if the disk is small and/or filling up very fast, you’ll see a lot of freespace messages. This could be an indication that the disk filled up
Find currently active cluster members show log syslog filtered-by Active cluster members Lists the node-id for currently active cluster members. May need to look in older syslogs as this message is not printed all the time.
View the core controller logs show log cloudnet/cloudnet_java-zookeeper.20150703-165223.3702.log There may be multiple zookeeper logs, look at the latest timestamped file

This file has information about controller cluster master election and other information related to the distributed nature of controllers

View the core controller logs show log cloudnet/cloudnet.nsx-controller.root.log.INFO.20150703-165223.3668 Main controller working logs, like LIF creation, connection listener on 1234, sharding
Table 17. Checking Distributed Firewall—Commands Run from NSX Manager
Description Commands on NSX Manager Notes
View a VMs Information show vm vmID Details such as DC, Cluster, Host, VM Name, vNICs, dvfilters installed
View particular virtual NIC information show vnic icID Details such as VNIC name, mac address, pg, applied filters
View all cluster information show dfw cluster all Cluster Name, Cluster Id, Datacenter Name, Firewall Status
View particular cluster information show dfw cluster clusterID Host Name, Host Id, Installation Status
View dfw related host information show dfw host hostID VM Name, VM Id, Power Status
View details within a dvfilter show dfw host hostID filter filterID <option> List rules, stats, address sets etc for each VNIC
View DFW information for a VM show dfw vm vmID View VM's name, VNIC ID, filters, and so on
View VNIC details show dfw vnic vnicID View VNIC name, ID, MAC address, portgroup, filter
List the filters installed per vNIC show dfw host hostID summarize-dvfilter Find the VM/vNIC of interest and get the name field to use in the next commands as filter
View rules for a specific filter/vNIC show dfw host hostID filter filterID rules

show dfw vnic nicID

View details of an address set show dfw host hostID filter filterID addrsets The rules only display address sets, this command can be used to expand what is part of an address set
Spoofguard details per vNIC show dfw host hostID filter filterID spoofguard Check if SpoofGuard is enabled and what is the current IP/MAC
View details of flow records show dfw host hostID filter filterID flows If flow monitoring is enabled, host sends flow information periodically to NSX Manager

Use this command to see flows per vNIC

View statistics for each rule for a vNIC show dfw host hostID filter filterID stats This is useful to see if rules are being hit
Table 18. Checking Distributed Firewall—Commands Run from Hosts
Description Commands on Host Notes
Lists VIBs downloaded on the host.

See table Names of VIBs and Modules Installed on Hosts for details on which VIBs to check on your installation.

esxcli software vib list | grep esx-vsip

or

esxcli software vib list | grep esx-nsxv

Check to make sure right vib version is downloaded
Details on system modules currently loaded

See table Names of VIBs and Modules Installed on Hosts for details on which modules to check on your installation.

esxcli system module get -m vsip

or

esxcli system module get -m nsx-vsip

Check to make sure that the module was installed/loaded
Process list ps | grep vsfwd View if the vsfwd process is running with several threads
Daemon command /etc/init.d/vShield-Stateful-Firewall {start|stop|status|restart} Check if the daemon is running and restart if needed
View network connection esxcli network ip connection list | grep 5671 Check if the host has TCP connectivity to NSX Manager
Table 19. Checking Distributed Firewall—Log Files on Hosts
Description Log Notes
Process log /var/log/vsfwd.log vsfwd deamon log, useful for vsfwd process, NSX Manager connectivity, and RabbitMQ troubleshooting
Packet logs dedicated file /var/log/dfwpktlogs.log Dedicated log file for packet logs
Packet capture at the dvfilter pktcap-uw --dvfilter nic-1413082-eth0-vmware-sfw.2 --outfile test.pcap