Table 1. Checking the NSX Installation on ESXi Host—Commands Run from NSX Manager

Description

Commands on NSX Manager

Notes

List all clusters to get the cluster IDs

show cluster all

View all cluster information

List all the hosts in the cluster to get the host IDs

show cluster CLUSTER-ID

View the list of hosts in the cluster, the host-ids, and the host-prep installation status

List all the VMs on a host

show host HOST-ID

View particular host information, VMs, VM IDs, and power status

Table 2. Checking the NSX Installation on ESXi Host—Commands Run from Host

Description

Commands on Host

Notes

Three VIBs are loaded:

esx-vxlan; esx-vsip; esx-dvfilter-switch-security

esxcli software vib get --vibname <name>

Check the version/date installed

esxcli software vib list displays a list of all VIBs on the system

List all the system modules currently loaded in the system

esxcli system module list

Older equivalent command: vmkload_mod -l | grep -E vdl2|vdrb|vsip|dvfilter-switch-security

Four Modules are loaded:

vdl2, vdrb, vsip, dvfilter-switch-security

esxcli system module get -m <name>

Run the command for each module

Two User World Agents (UWA) : netcpad, vsfwd

/etc/init.d/vShield-Stateful-Firewall status

/etc/init.d/netcpad status

Check UWAs connection, port 1234 to controllers and 5671 to NSX Manager

esxcli network ip connection list | grep 1234

esxcli network ip connection list | grep 5671

Controller TCP connection

Message bus TCP connection

Check EAM status

Web UI, check Administration > vCenter ESX Agent Manager

Table 3. Checking the NSX Installation on ESXi Host—Host Networking Commands

Description

Host Networking Commands

Notes

List physical NICs/vmnic

esxcli network nic list

Check the NIC type, driver type, link status, MTU

Physical NIC details

esxcli network nic get -n vmnic#

Check the driver and firmware versions along with other details

List vmk NICs with IP addresses/MAC/MTU, and so on

esxcli network ip interface ipv4 get

To ensure VTEPs are correctly instantiated

Details of each vmk NIC, including vDS information

esxcli network ip interface list

To ensure VTEPs are correctly instantiated

Details of each vmk NIC, including vDS info for VXLAN vmks

esxcli network ip interface list --netstack=vxlan

To ensure VTEPs are correctly instantiated

Find the VDS name associated with this host’s VTEP

esxcli network vswitch dvs vmware vxlan list

To ensure VTEPs are correctly instantiated

Ping from VXLAN-dedicated TCP/IP stack

ping ++netstack=vxlan –I vmk1 x.x.x.x

To troubleshoot VTEP communication issues: add option -d -s 1572 to make sure that the MTU of transport network is correct for VXLAN

View routing table of VXLAN-dedicated TCP/IP stack

esxcli network ip route ipv4 list -N vxlan

To troubleshoot VTEP communication issues

View ARP table of VXLAN-dedicated TCP/IP stack

esxcli network ip neighbor list -N vxlan

To troubleshoot VTEP communication issues

Table 4. Checking the NSX Installation on ESXi Host—Host Log Files

Description

Log File

Notes

From NSX Manager

show manager log follow

Tails the NSX Manager logs

For live troubleshooting

Any installation related logs for a host

/var/log/esxupdate.log

Host related issues

VMkernel warning, messages, alerts, and availability report

/var/log/vmkernel.log

/var/log/vmksummary.log

/var/log/vmkwarning.log

Module load failure is captured

/var/log/syslog

IXGBE driver failure

NSX modules dependency failure are key indicators

On vCenter, ESX Agent Manager is responsible for updates

In vCenter logs, eam.log

Table 5. Checking Logical Switching—Commands Run from NSX Manager

Description

Command on NSX Manager

Notes

List all logical switches

show logical-switch list all

List all the logical switches, their UUIDs to be used in API, transport zone, and vdnscope

Table 6. Logical Switching—Commands Run from NSX Controller

Description

Commands on Controller

Notes

Find the controller that is the owner of the VNI

show control-cluster logical-switches vni 5000

Note the controller IP address in the output and SSH to it

Find all the hosts that are connected to this controller for this VNI

show control-cluster logical-switch connection-table 5000

The source IP address in output is the management interface of host, and the port number is the source port of TCP connection

Find the VTEPs registered to host this VNI

show control-cluster logical-switches vtep-table 5002

List the MAC addresses learned for VMs on this VNI

show control-cluster logical-switches mac-table 5002

Map that the MAC address is actually on the VTEP reporting it

List the ARP cache populated by the VM IP updates

show control-cluster logical-switches arp-table 5002

ARP cache expires in 180 secs

For a specific host/controller pair, find out which VNIs host has joined

show control-cluster logical-switches joined-vnis <host_mgmt_ip>

Table 7. Logical Switching—Commands Run from Hosts

Description

Command on Hosts

Notes

Check if the host VXLAN is in-sync or not

esxcli network vswitch dvs vmware vxlan get

Shows the sync state and port used for encapsulation

View VM attached and local switch port ID for datapath captures

net-stats -l

A nicer way to get vm switchport for a specific VM

Verify VXLAN kernel module vdl2 is loaded

esxcli system module get -m vdl2

Shows full detail of the specified module.

Verify the version

Verify correct VXLAN VIB version is installed

esxcli software vib get --vibname esx-vxlan

Shows full detail of the specified VIB

Verify the version and date

Verify the host knows about other hosts in the logical switch

esxcli network vswitch dvs vmware vxlan network vtep list --vxlan-id=5001 --vds-name=Compute_VDS

Shows list of all the VTEPs that this host knows about that are hosting vtep 5001

Verify control plane is up and active for a Logical switch

esxcli network vswitch dvs vmware vxlan network list --vds-name Compute_VDS

Make sure the controller connection is up and the Port/Mac count matches the VMs on the LS on this host

Verify host has learnt MAC addresses of all VMs

esxcli network vswitch dvs vmware vxlan network mac list --vds-name Compute_VDS --vxlan-id=5000

This should list all the MACs for the VNI 5000 VMs on this host

Verify host has locally cached ARP entry for remote VM’s

esxcli network vswitch dvs vmware vxlan network arp list --vds-name Compute_VDS --vxlan-id=5000

Verify host has locally cached ARP entry for remote VM’s

Verify VM is connected to LS & mapped to a local VMKnic

Also shows what vmknic ID a VM dvPort is mapped to

esxcli network vswitch dvs vmware vxlan network port list --vds-name Compute_VDS --vxlan-id=5000

the vdrport will always be listed as long as the VNI is attached to a router

View vmknic ID’s and what switchport/uplink they are mapped to

esxcli network vswitch dvs vmware vxlan vmknic list --vds-name=DSwitch-Res01

Table 8. Checking Logical Switching—Log Files

Description

Log File

Notes

Hosts are always connected to controllers hosting their VNIs

/etc/vmware/netcpa/config-by-vsm.xml

This file should always have all the controllers in the environment listed The config-by-vsm.xml file is created by netcpa process

Vsfwd only provides channel for netcpa

Netcpad connects to vsfwd on port 15002

The config-by-vsm.xml file is pushed by NSX Manager using vsfwd

If the config-by-vsm.xml file is not correct look at the vsfwd log

/var/log/vsfwd.log

Parse through this file looking for errors

To restart process: /etc/init.d/vShield-Stateful-Firewall stop|start

Connection to controller is made using netcpa

/var/log/netcpa.log

Parse through this file looking for errors

VDL2 module logs are in vmkernel.log

/var/log/vmkernel.log

Check VDL2 module logs in /var/log/vmkernel.log “prefixed with VXLAN:”

Table 9. Checking Logical Routing—Commands Run from NSX Manager

Description

Commands on NSX Manager

Notes

Commands for ESG

show edge

CLI commands for Edge ServicesGateway (ESG) start with 'show edge'

Commands for DLR Control VM

show edge

CLI commands for Distributed Logical Router (DLR) Control VM start with 'show edge'

Commands for DLR

show logical-router

CLI commands for Distributed Logical Router (DLR) start with show logical-router

List all edges

show edge all

List all the edges that support the central CLI

List all the services and deployment details of an edge

show edge EDGE-ID

View Edge Service Gateway Information

List the command options for edge

show edge EDGE-ID ?

View details, such as version, log, NAT, routing table, firewall, configuration, interface, and services

View routing details

show edge EDGE-ID ip ?

View routing info, BGP, OSPF and other details

View routing table

show edge EDGE-ID ip route

View the routing table at Edge

View routing neighbor

show edge EDGE-ID ip ospf neighbor

View routing neighbor relationship

View logical routers connection information

show logical-router host hostID connection

Verify that the number of LIFs connected are correct, the teaming policy is right and the appropriate vDS is being used

List all logical router instances running on the host

show logical-router host hostID dlr all

Verify the number of LIFs and routes

Controller IP should be same on all hosts for a logical router

Control Plane Active should be yes

--brief gives a compact response

Check the routing table on the host

show logical-router host hostID dlr dlrID route

This is the routing table pushed by the controller to all the hosts in the transport zone

This must be same across all the hosts

If some of the routes are missing on few hosts, try the sync command from controller mentioned earlier

The E flag means routes are learned via ECMP

Check the LIFs for a DLR on the host

show logical-router host hostID dlr dlrID interface (all | intName) verbose

The LIF information is pushed to hosts from the controller

Use this command to ensure the host knows about all the LIFs it should

Table 10. Checking Logical Routing—Commands Run from NSX Controller

Description

Commands on NSX Controller

Notes

Find all the Logical Router Instances

show control-cluster logical-routers instance all

This should list the logical router instance and all the hosts in the transport zone which should have the logical router instance on them

In addition, shows the Controller that servicing this logical router

View details of each logical router

show control-cluster logical-routers instance 0x570d4555

The IP column shows the vmk0 IP addresses of all hosts where this DLR exists

View all the interfaces CONNECTED to the logical router

show control-cluster logical-routers interface-summary 0x570d4555

The IP column shows the vmk0 IP addresses of all hosts where this DLR exists

View all the routes learned by this logical router

show control-cluster logical-routers routes 0x570d4555

Note that the IP column shows the vmk0 IP addresses of all hosts where this DLR exists

shows all the network connections established, like a net stat output

show network connections of-type tcp

Check if the host you are troubleshooting has netcpa connection Established to controller

Sync interfaces from controller to host

sync control-cluster logical-routers interface-to-host <logical-router-id> <host-ip>

Useful if new interface was connected to logical router but is not sync'd to all hosts

Sync routes from controller to host

sync control-cluster logical-routers route-to-host <logical-router-id> <host-ip>

Useful if some routes are missing on few hosts but are available on majority of hosts

Table 11. Checking Logical Routing—Commands Run from Edge

Description

Commands on Edge or Logical Router Control VM

Notes

View configuration

show configuration <global | bgp | ospf | …>

View the routes learned

show ip route

Make sure the routing and forwarding tables are in sync

View the forwarding table

show ip forwarding

Make sure the routing and forwarding tables are in sync

View the vDR interfaces

show interface

First NIC shown in the output is the vDR interface

The VDR interface is not a real vNIC on that VM

All the subnets attached to VDR are of type INTERNAL

View the other interfaces (management)

show interface

Management/HA interface is a real vNIC on the logical router Control VM

If HA was enabled without specifying an IP address, 169.254.x.x/ 30 is used

If the management interface is given an IP address, it appears here

debug the protocol

debug ip ospf

debug ip bgp

Useful to see issues with the configuration (such as mismatched OSPF areas, timers, and wrong ASN)

Note: output is only seen on the Console of Edge (not via SSH session)

OSPF commands

show configuration ospf

show ip ospf interface

show ip ospf neighbor

show ip route ospf

show ip ospf database

show tech-support (and look for strings “EXCEPTION” and “PROBLEM”)

BGP commands

show configuration bgp

show ip bgp neighbor

show ip bgp

show ip route bgp

show ip forwarding

show tech-support (look for strings “EXCEPTION” and “PROBLEM”)

Table 12. Checking Logical Routing—Log Files from Hosts

Description

Log File

Notes

VDR instance information is pushed to hosts by vsfwd and saved in XML format

/etc/vmware/netcpa/config-by-vsm.xml

If VDR instance is missing on the host, first look at this file to see if the instance is listed

If not, restart vsfwd

Also, use this file to ensure that all of the controllers are known to the host

The above file is pushed by NSX Manager using vsfwd

If the config-by-vsm.xml file is not correct look at the vsfwd log

/var/log/vsfwd.log

Parse through this file looking for errors

To restart process: /etc/init.d/vShield-Stateful-Firewall stop|start

Connection to controller is made using netcpa

/var/log/netcpa.log

Parse through this file looking for errors

VDL2 module logs are in vmkernel.log

/var/log/vmkernel.log

Check VDL2 module logs in /var/log/vmkernel.log “prefixed with vxlan:”

Table 13. Controller Debugging—Command Run from NSX Manager

Descripction

Command (On NSX Manager)

Notes

List all controllers with state

show controller list all

Shows the list of all controllers and their running state

Table 14. Controller Debugging—Command Run from NSX Controller

Description

Command(On Controller)

Notes

Check controller cluster status

show control-cluster status

Should always show 'Join complete' and 'Connected to Cluster Majority'

Check the stats for flapping connections and messages

show control-cluster core stats

The dropped counter should not change

View the node's activity in relation to joining the cluster initially or after a restart

show control-cluster history

This is great for troubleshooting cluster join issues

View list of nodes in the cluster

show control-cluster startup-nodes

Note that the list doesn’t have to have ONLY have active cluster nodes

This should have a list of all the currently deployed controllers

This list is used by starting controller to contact other controllers in the cluster

shows all the network connections established, like a net stat output

show network connections of-type tcp

Check if the host you are troubleshooting has netcpa connection Established to controller

To restart the controller process

restart controller

Only restarts the main controller process

Forces a re-connection to the cluster

To reboot the controller node

restart system

Reboots the controller VM

Table 15. Controller Debugging—Log Files on NSX Controller

Description

Log File

Notes

View controller history and recent joins, restarts. and so on

show control-cluster history

Great troubleshooting tool for controller issues especially around clustering

Check for slow disk

show log cloudnet/cloudnet_java-zookeeper<timestamp>.log filtered-by fsync

A reliable way to check for slow disks is to look for "fsync" messages in the cloudnet_java-zookeeper log

If sync takes more than 1 second, ZooKeeper prints this message, and it is a good indication that something else was utilizing the disk at that time

Check for slow/malfunctioning disk

show log syslog filtered-by collectd

Messages like the one in ample output about “collectd” tend to correlate with slow or malfunctioning disks

Check for diskspace usage

show log syslog filtered-by freespace:

There is a background job called “freespace” that periodically cleans up old logs and other files from the disk when the space usage reaches some threshold. In some cases, if the disk is small and/or filling up very fast, you’ll see a lot of freespace messages. This could be an indication that the disk filled up

Find currently active cluster members

show log syslog filtered-by Active cluster members

Lists the node-id for currently active cluster members. May need to look in older syslogs as this message is not printed all the time.

View the core controller logs

show log cloudnet/cloudnet_java-zookeeper.20150703-165223.3702.log

There may be multiple zookeeper logs, look at the latest timestamped file

This file has information about controller cluster master election and other information related to the distributed nature of controllers

View the core controller logs

show log cloudnet/cloudnet.nsx-controller.root.log.INFO.20150703-165223.3668

Main controller working logs, like LIF creation, connection listener on 1234, sharding

Table 16. Checking Distributed Firewall—Commands Run from NSX Manager

Description

Commands on NSX Manager

Notes

View a VMs Information

show vm VM-ID

Details such as DC, Cluster, Host, VM Name, vNICs, dvfilters installed

View particular virtual NIC information

show vnic VNIC-ID

Details such as VNIC name, mac address, pg, applied filters

View all cluster information

show dfw cluster all

Cluster Name, Cluster Id, Datacenter Name, Firewall Status

View particular cluster information

show dfw cluster CLUSTER-ID

Host Name, Host Id, Installation Status

View dfw related host information

show dfw host HOST-ID

VM Name, VM Id, Power Status

View details within a dvfilter

show dfw host HOST-ID filter filterID <option>

List rules, stats, address sets etc for each VNIC

View DFW information for a VM

show dfw vm VM-ID

View VM's name, VNIC ID, filters, and so on

View VNIC details

show dfw vnic VNIC-ID

View VNIC name, ID, MAC address, portgroup, filter

List the filters installed per vNIC

show dfw host hostID summarize-dvfilter

Find the VM/vNIC of interest and get the name field to use in the next commands as filter

View rules for a specific filter/vNIC

show dfw host hostID filter filterID rules

show dfw vnic nicID

View details of an address set

show dfw host hostID filter filterID addrsets

The rules only display address sets, this command can be used to expand what is part of an address set

Spoofguard details per vNIC

show dfw host hostID filter filterID spoofguard

Check if spoofgruard is enabled and what is the current IP/MAC

View details of flow records

show dfw host hostID filter filterID flows

If flow monitoring is enabled, host sends flow information periodically to NSX Manager

Use this command to see flows per vNIC

View statistics for each rule for a vNIC

show dfw host hostID filter filterID stats

This is useful to see if rules are being hit

Table 17. Checking Distributed Firewall—Commands Run from Hosts

Description

Commands on Host

Notes

Lists VIBs downloaded on the host

esxcli software vib list | grep vsip

Check to make sure right vib version is downloaded

Details on system modules currently loaded

esxcli system module get -m vsip

Check to make sure that the module was installed/loaded

Process list

ps | grep vsfwd

View if the vsfwd process is running with several threads

Deamon command

/etc/init.d/vShield-Stateful-Firewall {start|stop|status|restart}

Check if the deamon is running and restart if needed

View network connection

esxcli network ip connection list | grep 5671

Check if the host has TCP connectivity to NSX Manager

Table 18. Checking Distributed Firewall—Log Files on Hosts

Description

Log

Notes

Process log

/var/log/vsfwd.log

vsfwd deamon log, useful for vsfwd process, NSX Manager connectivity, and RabbitMQ troubleshooting

Packet logs dedicated file

/var/log/dfwpktlogs.log

Dedicated log file for packet logs

Packet capture at the dvfilter

pktcap-uw --dvfilter nic-1413082-eth0-vmware-sfw.2 --outfile test.pcap