You can use the NSX Command Line Interface (CLI) to troubleshoot problems.
Description | Commands on NSX Manager | Notes |
---|---|---|
List all clusters to get the cluster IDs | show cluster all | View all cluster information |
List all the hosts in the cluster to get the host IDs | show cluster clusterID | View the list of hosts in the cluster, the host-ids, and the host-prep installation status |
List all the VMs on a host | show host hostID | View particular host information, VMs, VM IDs, and power status |
NSX version | ESXi version | VIBs | Modules |
---|---|---|---|
6.3.2 and earlier | 6.0 and later | esx-vxlan and esx-vsip | vdl2, vdrb, vsip, dvfilter-switch-security, bfd, traceflow |
6.3.3 and later | 6.0 and later | esx-nsxv | nsx-vdl2, nsx-vdrb, nsx-vsip, nsx-dvfilter-switch-security, nsx-core, nsx-bfd, nsx-traceflow |
Description | Commands on Host | Notes |
---|---|---|
VIBs present depend on the NSX and ESXi versions. See table Names of VIBs and Modules Installed on Hosts for details on which modules to check on your installation. |
esxcli software vib get --vibname <name> | Check the version/date installed esxcli software vib list displays a list of all VIBs on the system |
List all the system modules currently loaded in the system | esxcli system module list | Older equivalent command: vmkload_mod -l | grep -E vdl2|vdrb|vsip|dvfilter-switch-security |
Modules present depend on the NSX and ESXi versions. See table Names of VIBs and Modules Installed on Hosts for details on which modules to check on your installation. |
esxcli system module get -m <name> | Run the command for each module |
Two User World Agents (UWA) : control plane agent, firewall agent | /etc/init.d/vShield-Stateful-Firewall status /etc/init.d/netcpad status |
|
Check UWAs connection, port 1234 to controllers and 5671 to NSX Manager | esxcli network ip connection list | grep 1234 esxcli network ip connection list | grep 5671 |
Controller TCP connection Message bus TCP connection |
Check EAM status | vSphere Web Client, check Administration > vSphere ESX Agent Manager |
Description | Host Networking Commands | Notes |
---|---|---|
List physical NICs/vmnic | esxcli network nic list | Check the NIC type, driver type, link status, MTU |
Physical NIC details | esxcli network nic get -n vmnic# | Check the driver and firmware versions along with other details |
List vmk NICs with IP addresses/MAC/MTU, and so on | esxcli network ip interface ipv4 get | To ensure VTEPs are correctly instantiated |
Details of each vmk NIC, including vDS information | esxcli network ip interface list | To ensure VTEPs are correctly instantiated |
Details of each vmk NIC, including vDS info for VXLAN vmks | esxcli network ip interface list --netstack=vxlan | To ensure VTEPs are correctly instantiated |
Find the VDS name associated with this host’s VTEP | esxcli network vswitch dvs vmware vxlan list | To ensure VTEPs are correctly instantiated |
Ping from VXLAN-dedicated TCP/IP stack | ping ++netstack=vxlan –I vmk1 x.x.x.x | To troubleshoot VTEP communication issues: add option -d -s 1572 to make sure that the MTU of transport network is correct for VXLAN |
View routing table of VXLAN-dedicated TCP/IP stack | esxcli network ip route ipv4 list -N vxlan | To troubleshoot VTEP communication issues |
View ARP table of VXLAN-dedicated TCP/IP stack | esxcli network ip neighbor list -N vxlan | To troubleshoot VTEP communication issues |
Description | Log File | Notes |
---|---|---|
From NSX Manager | show manager log follow | Tails the NSX Manager logs For live troubleshooting |
Any installation related logs for a host | /var/log/esxupdate.log | |
Host related issues VMkernel warning, messages, alerts, and availability report |
/var/log/vmkernel.log /var/log/vmksummary.log /var/log/vmkwarning.log |
|
Module load failure is captured | /var/log/syslog | IXGBE driver failure NSX modules dependency failure are key indicators |
On vCenter, ESX Agent Manager is responsible for updates | In vCenter logs, eam.log |
Description | Command on NSX Manager | Notes |
---|---|---|
List all logical switches | show logical-switch list all | List all the logical switches, their UUIDs to be used in API, transport zone, and vdnscope |
Description | Commands on Controller | Notes |
---|---|---|
Find the controller that is the owner of the VNI | show control-cluster logical-switches vni 5000 | Note the controller IP address in the output and SSH to it |
Find all the hosts that are connected to this controller for this VNI | show control-cluster logical-switch connection-table 5000 | The source IP address in output is the management interface of host, and the port number is the source port of TCP connection |
Find the VTEPs registered to host this VNI | show control-cluster logical-switches vtep-table 5002 | |
List the MAC addresses learned for VMs on this VNI | show control-cluster logical-switches mac-table 5002 | Map that the MAC address is actually on the VTEP reporting it |
List the ARP cache populated by the VM IP updates | show control-cluster logical-switches arp-table 5002 | ARP cache expires in 180 secs |
For a specific host/controller pair, find out which VNIs host has joined | show control-cluster logical-switches joined-vnis <host_mgmt_ip> |
Description | Command on Hosts | Notes |
---|---|---|
Check if the host VXLAN is in-sync or not | esxcli network vswitch dvs vmware vxlan get | Shows the sync state and port used for encapsulation |
View VM attached and local switch port ID for datapath captures | net-stats -l | A nicer way to get vm switchport for a specific VM |
Verify VXLAN kernel module vdl2 is loaded | esxcli system module get -m vdl2 | Shows full detail of the specified module. Verify the version |
Verify correct VXLAN VIB version is installed See table Names of VIBs and Modules Installed on Hosts for details on which VIBs to check on your installation. |
esxcli software vib get --vibname esx-vxlan or esxcli software vib get --vibname esx-nsxv |
Shows full detail of the specified VIB Verify the version and date |
Verify the host knows about other hosts in the logical switch | esxcli network vswitch dvs vmware vxlan network vtep list --vxlan-id=5001 --vds-name=Compute_VDS | Shows list of all the VTEPs that this host knows about that are hosting vtep 5001 |
Verify control plane is up and active for a Logical switch | esxcli network vswitch dvs vmware vxlan network list --vds-name Compute_VDS | Make sure the controller connection is up and the Port/Mac count matches the VMs on the LS on this host |
Verify host has learnt MAC addresses of all VMs | esxcli network vswitch dvs vmware vxlan network mac list --vds-name Compute_VDS --vxlan-id=5000 | This should list all the MACs for the VNI 5000 VMs on this host |
Verify host has locally cached ARP entry for remote VM’s | esxcli network vswitch dvs vmware vxlan network arp list --vds-name Compute_VDS --vxlan-id=5000 | Verify host has locally cached ARP entry for remote VM’s |
Verify VM is connected to LS & mapped to a local VMKnic Also shows what vmknic ID a VM dvPort is mapped to |
esxcli network vswitch dvs vmware vxlan network port list --vds-name Compute_VDS --vxlan-id=5000 | the vdrport will always be listed as long as the VNI is attached to a router |
View vmknic ID’s and what switchport/uplink they are mapped to | esxcli network vswitch dvs vmware vxlan vmknic list --vds-name=DSwitch-Res01 |
Description | Log File | Notes |
---|---|---|
Hosts are always connected to controllers hosting their VNIs | /etc/vmware/netcpa/config-by-vsm.xml | This file should always have all the controllers in the environment listed The config-by-vsm.xml file is created by netcpa process |
The config-by-vsm.xml file is pushed by NSX Manager using vsfwd If the config-by-vsm.xml file is not correct look at the vsfwd log |
/var/log/vsfwd.log | Parse through this file looking for errors To restart process: /etc/init.d/vShield-Stateful-Firewall stop|start |
Connection to controller is made using netcpa | /var/log/netcpa.log | Parse through this file looking for errors |
Logical switching module logs are in vmkernel.log | /var/log/vmkernel.log | Check logical switching module logs in /var/log/vmkernel.log “prefixed with VXLAN:” |
Description | Commands on NSX Manager | Notes |
---|---|---|
Commands for ESG | show edge | CLI commands for Edge ServicesGateway (ESG) start with 'show edge' |
Commands for DLR Control VM | show edge | CLI commands for Distributed Logical Router (DLR) Control VM start with 'show edge' |
Commands for DLR | show logical-router | CLI commands for Distributed Logical Router (DLR) start with show logical-router |
List all edges | show edge all | List all the edges that support the central CLI |
List all the services and deployment details of an edge | show edge edgeID | View Edge Service Gateway Information |
List the command options for edge | show edge edgeID ? | View details, such as version, log, NAT, routing table, firewall, configuration, interface, and services |
View routing details | show edge edgeID ip ? | View routing info, BGP, OSPF and other details |
View routing table | show edge edgeID ip route | View the routing table at Edge |
View routing neighbor | show edge edgeID ip ospf neighbor | View routing neighbor relationship |
View bgp routing | show edge edgeID ip bgp | View entries in the Border Gateway Protocol (BGP) routing table |
View logical routers connection information | show logical-router host hostID connection | Verify that the number of LIFs connected are correct, the teaming policy is right and the appropriate vDS is being used |
List all logical router instances running on the host | show logical-router host hostID dlr all | Verify the number of LIFs and routes Controller IP should be same on all hosts for a logical router Control Plane Active should be yes --brief gives a compact response |
Check the routing table on the host | show logical-router host hostID dlr dlrID route | This is the routing table pushed by the controller to all the hosts in the transport zone This must be same across all the hosts If some of the routes are missing on few hosts, try the sync command from controller mentioned earlier The E flag means routes are learned via ECMP |
Check the LIFs for a DLR on the host | show logical-router host hostID dlr dlrID interface (all | intName) verbose | The LIF information is pushed to hosts from the controller Use this command to ensure the host knows about all the LIFs it should |
Show the routing log | show log routing [follow | reverse] | follow: Update the displayed log. reverse: Show the log in reverse chronological order. |
Description | Commands on NSX Controller | Notes |
---|---|---|
Find all the Logical Router Instances | show control-cluster logical-routers instance all | This should list the logical router instance and all the hosts in the transport zone which should have the logical router instance on them In addition, shows the Controller that servicing this logical router |
View details of each logical router | show control-cluster logical-routers instance 0x570d4555 | The IP column shows the vmk0 IP addresses of all hosts where this DLR exists |
View all the interfaces CONNECTED to the logical router | show control-cluster logical-routers interface-summary 0x570d4555 | The IP column shows the vmk0 IP addresses of all hosts where this DLR exists |
View all the routes learned by this logical router | show control-cluster logical-routers routes 0x570d4555 | Note that the IP column shows the vmk0 IP addresses of all hosts where this DLR exists |
shows all the network connections established, like a net stat output | show network connections of-type tcp | Check if the host you are troubleshooting has netcpa connection Established to controller |
Sync interfaces from controller to host | sync control-cluster logical-routers interface-to-host <logical-router-id> <host-ip> | Useful if new interface was connected to logical router but is not sync'd to all hosts |
Sync routes from controller to host | sync control-cluster logical-routers route-to-host <logical-router-id> <host-ip> | Useful if some routes are missing on few hosts but are available on majority of hosts |
Description | Commands on Edge or Logical Router Control VM | Notes |
---|---|---|
View configuration | show configuration <global | bgp | ospf | …> | |
View the routes learned | show ip route | Make sure the routing and forwarding tables are in sync |
View the forwarding table | show ip forwarding | Make sure the routing and forwarding tables are in sync |
View the distributed logical router interfaces | show interface | First NIC shown in the output is the distributed logical router interface The distributed logical router interface is not a real vNIC on that VM All the subnets attached to distributed logical router are of type INTERNAL |
View the other interfaces (management) | show interface | Management/HA interface is a real vNIC on the logical router Control VM If HA was enabled without specifying an IP address, 169.254.x.x/ 30 is used If the management interface is given an IP address, it appears here |
debug the protocol | debug ip ospf debug ip bgp |
Useful to see issues with the configuration (such as mismatched OSPF areas, timers, and wrong ASN) Note: output is only seen on the Console of Edge (not via SSH session) |
OSPF commands | show configuration ospf show ip ospf interface show ip ospf neighbor show ip route ospf show ip ospf database show tech-support (and look for strings “EXCEPTION” and “PROBLEM”) |
|
BGP commands | show configuration bgp show ip bgp neighbor show ip bgp show ip route bgp show ip forwarding show tech-support (look for strings “EXCEPTION” and “PROBLEM”) |
Description | Log File | Notes |
---|---|---|
Distributed Logical Router instance information is pushed to hosts by vsfwd and saved in XML format | /etc/vmware/netcpa/config-by-vsm.xml | If distributed logical router instance is missing on the host, first look at this file to see if the instance is listed If not, restart vsfwd Also, use this file to ensure that all of the controllers are known to the host |
The above file is pushed by NSX Manager using vsfwd If the config-by-vsm.xml file is not correct look at the vsfwd log |
/var/log/vsfwd.log | Parse through this file looking for errors To restart process: /etc/init.d/vShield-Stateful-Firewall stop|start |
Connection to controller is made using netcpa | /var/log/netcpa.log | Parse through this file looking for errors |
Logical switching module logs are in vmkernel.log | /var/log/vmkernel.log | Check logical switching module logs in /var/log/vmkernel.log “prefixed with vxlan:” |
Descripction | Command (On NSX Manager) | Notes |
---|---|---|
List all controllers with state | show controller list all | Shows the list of all controllers and their running state |
Description | Command(On Controller) | Notes |
---|---|---|
Check controller cluster status | show control-cluster status | Should always show 'Join complete' and 'Connected to Cluster Majority' |
Check the stats for flapping connections and messages | show control-cluster core stats | The dropped counter should not change |
View the node's activity in relation to joining the cluster initially or after a restart | show control-cluster history | This is great for troubleshooting cluster join issues |
View list of nodes in the cluster | show control-cluster startup-nodes | Note that the list doesn’t have to have ONLY have active cluster nodes This should have a list of all the currently deployed controllers This list is used by starting controller to contact other controllers in the cluster |
shows all the network connections established, like a net stat output | show network connections of-type tcp | Check if the host you are troubleshooting has netcpa connection Established to controller |
To restart the controller process | restart controller | Only restarts the main controller process Forces a re-connection to the cluster |
To reboot the controller node | restart system | Reboots the controller VM |
Description | Log File | Notes |
---|---|---|
View controller history and recent joins, restarts. and so on | show control-cluster history | Great troubleshooting tool for controller issues especially around clustering |
Check for slow disk | show log cloudnet/cloudnet_java-zookeeper<timestamp>.log filtered-by fsync | A reliable way to check for slow disks is to look for "fsync" messages in the cloudnet_java-zookeeper log If sync takes more than 1 second, ZooKeeper prints this message, and it is a good indication that something else was utilizing the disk at that time |
Check for slow/malfunctioning disk | show log syslog filtered-by collectd | Messages like the one in ample output about “collectd” tend to correlate with slow or malfunctioning disks |
Check for diskspace usage | show log syslog filtered-by freespace: | There is a background job called “freespace” that periodically cleans up old logs and other files from the disk when the space usage reaches some threshold. In some cases, if the disk is small and/or filling up very fast, you’ll see a lot of freespace messages. This could be an indication that the disk filled up |
Find currently active cluster members | show log syslog filtered-by Active cluster members | Lists the node-id for currently active cluster members. May need to look in older syslogs as this message is not printed all the time. |
View the core controller logs | show log cloudnet/cloudnet_java-zookeeper.20150703-165223.3702.log | There may be multiple zookeeper logs, look at the latest timestamped file This file has information about controller cluster master election and other information related to the distributed nature of controllers |
View the core controller logs | show log cloudnet/cloudnet.nsx-controller.root.log.INFO.20150703-165223.3668 | Main controller working logs, like LIF creation, connection listener on 1234, sharding |
Description | Commands on NSX Manager | Notes |
---|---|---|
View a VMs Information | show vm vmID | Details such as DC, Cluster, Host, VM Name, vNICs, dvfilters installed |
View particular virtual NIC information | show vnic icID | Details such as VNIC name, mac address, pg, applied filters |
View all cluster information | show dfw cluster all | Cluster Name, Cluster Id, Datacenter Name, Firewall Status |
View particular cluster information | show dfw cluster clusterID | Host Name, Host Id, Installation Status |
View dfw related host information | show dfw host hostID | VM Name, VM Id, Power Status |
View details within a dvfilter | show dfw host hostID filter filterID <option> | List rules, stats, address sets etc for each VNIC |
View DFW information for a VM | show dfw vm vmID | View VM's name, VNIC ID, filters, and so on |
View VNIC details | show dfw vnic vnicID | View VNIC name, ID, MAC address, portgroup, filter |
List the filters installed per vNIC | show dfw host hostID summarize-dvfilter | Find the VM/vNIC of interest and get the name field to use in the next commands as filter |
View rules for a specific filter/vNIC | show dfw host hostID filter filterID rules show dfw vnic nicID |
|
View details of an address set | show dfw host hostID filter filterID addrsets | The rules only display address sets, this command can be used to expand what is part of an address set |
Spoofguard details per vNIC | show dfw host hostID filter filterID spoofguard | Check if SpoofGuard is enabled and what is the current IP/MAC |
View details of flow records | show dfw host hostID filter filterID flows | If flow monitoring is enabled, host sends flow information periodically to NSX Manager Use this command to see flows per vNIC |
View statistics for each rule for a vNIC | show dfw host hostID filter filterID stats | This is useful to see if rules are being hit |
Description | Commands on Host | Notes |
---|---|---|
Lists VIBs downloaded on the host. See table Names of VIBs and Modules Installed on Hosts for details on which VIBs to check on your installation. |
esxcli software vib list | grep esx-vsip or esxcli software vib list | grep esx-nsxv |
Check to make sure right vib version is downloaded |
Details on system modules currently loaded See table Names of VIBs and Modules Installed on Hosts for details on which modules to check on your installation. |
esxcli system module get -m vsip or esxcli system module get -m nsx-vsip |
Check to make sure that the module was installed/loaded |
Process list | ps | grep vsfwd | View if the vsfwd process is running with several threads |
Daemon command | /etc/init.d/vShield-Stateful-Firewall {start|stop|status|restart} | Check if the daemon is running and restart if needed |
View network connection | esxcli network ip connection list | grep 5671 | Check if the host has TCP connectivity to NSX Manager |
Description | Log | Notes |
---|---|---|
Process log | /var/log/vsfwd.log | vsfwd deamon log, useful for vsfwd process, NSX Manager connectivity, and RabbitMQ troubleshooting |
Packet logs dedicated file | /var/log/dfwpktlogs.log | Dedicated log file for packet logs |
Packet capture at the dvfilter | pktcap-uw --dvfilter nic-1413082-eth0-vmware-sfw.2 --outfile test.pcap |
Description | Command on NSX Manager | Notes |
---|---|---|
Show all packet capture sessions | show packet capture sessions | Shows details of all packet capture sessions. |
Show packet capture file content | debug packet capture display session <capture-id> parameters [optional parameters] | Shows the packet capture file content. |
Capture vNic | debug packet capture host < host-id > vnic <vnic-id> dir <direction> parameters [optional parameters] | Captures packets for a specific VM vNic. Direction has two options, input and output. Input is for traffic going into vNic, and output is for traffic going out from vNic. |
Capture vdrPort | debug packet capture host <host-id> vdrport dir <direction> parameters [optional parameters] | Captures packets for a specific port of virtual distributed router (vDR). Direction has two options, input and output. Input is for traffic going into vDR, and output is for traffic going out from vDR. |
Capture VMKNic | debug packet capture host <host-id> vmknic <vmknic-name> dir <direction> parameters [optional parameters] |
Captures packets for a specific VM KNic. Direction has two options, input and output. Input is for traffic going into VMKNic, and output is for traffic going out from VMKNic. |
Delete packet capture session | debug packet capture clear session <capture-id> | Deletes a specific packet capture session. |