This topic provides information on understanding and troubleshooting VMware NSX 6.x Distributed Firewall (DFW).

Problem

  • Publishing Distributed Firewall rules fails.
  • Updating Distributed Firewall rules fails.

Cause

Validate that each troubleshooting step below is true for your environment. Each step provides instructions or a link to a document to eliminate possible causes and take corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. After each step, re-attempt to update/publish the Distributed Firewall rules.

Solution

  1. Verify that the NSX VIBs are successfully installed on each of the ESXi hosts in the cluster. To do this, on each of the ESXi host that is on the cluster, run these commands.
    # esxcli software vib list | grep vsip
    esx-vsip                       6.0.0-0.0.4744062  VMware  VMwareCertified   2017-01-04 
    
    # esxcli software vib list | grep vxlan
    esx-vxlan                      6.0.0-0.0.4744062  VMware  VMwareCertified   2017-01-04
    
    Starting in NSX 6.3.3 with ESXi 6.0 or later, the esx-vxlan and esx-vsip VIBs are replaced with esx-nsxv.
    # esxcli software vib list | grep nsxv
    esx-nsxv                       6.0.0-0.0.6216823  VMware  VMwareCertified   2017-08-10
  2. On the ESXi hosts, verify the vShield-Stateful-Firewall service is in a running state.
    For example:
    # /etc/init.d/vShield-Stateful-Firewall status
    
    vShield-Stateful-Firewall is running
  3. Verify that the Message Bus is communicating properly with the NSX Manager.
    The process is automatically launched by the watchdog script and restarts the process if it terminates for an unknown reason. Run this command on each of the ESXi hosts on the cluster.
    For example:
    # ps | grep vsfwd
    
    107557 107557 vsfwd /usr/lib/vmware/vsfw/vsfwd
    

    There should be at least 12 or more vsfwd processes running in the command output. If there are less (most likely only 2) processes running, vsfwd is not running correctly.

  4. Verify that port 5671 is opened for communication in the firewall configuration.
    This command shows the VSFWD connectivity to the RabbitMQ broker. Run this command on ESXi hosts to see a list of connections from the vsfwd process on the ESXi host to the NSX Manager. Ensure that the port 5671 is open for communication in any of the external firewall on the environment. Also, there should be at least two connections on port 5671. There can be more connections on port 5671 as there are NSX Edge virtual machines deployed on the ESXi host which also establish connections to the RMQ broker.
    For example:
    # esxcli network ip connection list |grep 5671
    
    tcp         0       0  192.168.110.51:30133            192.168.110.15:5671   ESTABLISHED  10949155  newreno  vsfwd
    tcp         0       0  192.168.110.51:39156            192.168.110.15:5671   ESTABLISHED  10949155  newreno  vsfwd
    
  5. Verify that VSFWD is configured.

    This command should display the NSX Manager IP address.

    # esxcfg-advcfg -g /UserVars/RmqIpAddress
  6. If you are using a host-profile for this ESXi host, verify that the RabbitMQ configuration is not set in the host profile.
  7. Verify if the RabbitMQ credentials of the ESXi host are out of sync with the NSX Manager. Download the NSX Manager Tech Support Logs. After gathering all the NSX Manager Tech Support logs, search all the logs for entries similar to:
    Replace host-420 with the mo-id of the suspect host.
    PLAIN login refused: user 'uw-host-420' - invalid credentials.
  8. If such entries are found on the logs for the suspected ESXi host, resynchronize the message bus.

    To resynchronize the message bus, use REST API. To better understand the issue, collect the logs immediately after the Message Bus is resynchronized.

    HTTP Method : POST 
    Headers , 
    Authorization : base64encoded value of username password
    Accept : application/xml
    Content-Type : application/xml
    Request:
    
    POST https://NSX_Manager_IP/api/2.0/nwfabric/configure?action=synchronize
    
    Request Body:
    
    <nwFabricFeatureConfig>
    <featureId>com.vmware.vshield.vsm.messagingInfra</featureId>
    <resourceConfig>
    <resourceId>{HOST/CLUSTER MOID}</resourceId>
    </resourceConfig>
    </nwFabricFeatureConfig>
  9. Use the export host-tech-support <host-id> scp <uid@ip:/path> command to gather host-specific firewall logs.
    For example:
    nsxmgr# export host-tech-support host-28 scp [email protected]
    Generating logs for Host: host-28...
    
  10. Use the show dfw host host-id summarize-dvfilter command to verify that the firewall rules are deployed on a host and are applied to virtual machines.
    In the output, module: vsip shows that the DFW module is loaded and running. The name shows the firewall that is running on each vNic.

    You can get the host IDs by running the show dfw cluster all command to get the cluster domain IDs, followed by the show dfw cluster domain-id to get the host IDs.

    For example:
    # show dfw host host-28 summarize-dvfilter
    
    Fastpaths:
    agent: dvfilter-faulter, refCount: 1, rev: 0x1010000, apiRev: 0x1010000, module: dvfilter
    agent: ESXi-Firewall, refCount: 5, rev: 0x1010000, apiRev: 0x1010000, module: esxfw
    agent: dvfilter-generic-vmware, refCount: 1, rev: 0x1010000, apiRev: 0x1010000, module: dvfilter-generic-fastpath
    agent: dvfilter-generic-vmware-swsec, refCount: 4, rev: 0x1010000, apiRev: 0x1010000, module: dvfilter-switch-security
    agent: bridgelearningfilter, refCount: 1, rev: 0x1010000, apiRev: 0x1010000, module: vdrb
    agent: dvfg-igmp, refCount: 1, rev: 0x1010000, apiRev: 0x1010000, module: dvfg-igmp
    agent: vmware-sfw, refCount: 4, rev: 0x1010000, apiRev: 0x1010000, module: vsip
    
    Slowpaths:
    
    Filters:
    world 342296 vmm0:2-vm_RHEL63_srv_64-shared-846-3f435476-8f54-4e5a-8d01-59654a4e9979 vcUuid:'3f 43 54 76 8f 54 4e 5a-8d 01 59 65 4a 4e 99 79'
     port 50331660 2-vm_RHEL63_srv_64-shared-846-3f435476-8f54-4e5a-8d01-59654a4e9979.eth1
      vNic slot 2
       name: nic-342296-eth1-vmware-sfw.2
       agentName: vmware-sfw
       state: IOChain Attached
       vmState: Detached
       failurePolicy: failClosed
       slowPathID: none
       filter source: Dynamic Filter Creation
      vNic slot 1
       name: nic-342296-eth1-dvfilter-generic-vmware-swsec.1
       agentName: dvfilter-generic-vmware-swsec
       state: IOChain Attached
       vmState: Detached
       failurePolicy: failClosed
       slowPathID: none
       filter source: Alternate Opaque Channel
     port 50331661 (disconnected)
      vNic slot 2
       name: nic-342296-eth2-vmware-sfw.2 <======= DFW filter
       agentName: vmware-sfw       
       state: IOChain Detached
       vmState: Detached
       failurePolicy: failClosed
       slowPathID: none
       filter source: Dynamic Filter Creation
     port 33554441 2-vm_RHEL63_srv_64-shared-846-3f435476-8f54-4e5a-8d01-59654a4e9979
      vNic slot 2
       name: nic-342296-eth0-vmware-sfw.2<========= DFW filter
       agentName: vmware-sfw    
       state: IOChain Attached
       vmState: Detached
       failurePolicy: failClosed
       slowPathID: none
       filter source: Dynamic Filter Creation
    
  11. Run the show dfw host hostID filter filterID rules command.
    For example:
    # show dfw host host-28 filter nic-79396-eth0-vmware-sfw.2 rules
     
    ruleset domain-c33 {
      # Filter rules
      rule 1012 at 1 inout protocol any from addrset ip-securitygroup-10 to addrset ip-securitygroup-10 drop with log;
      rule 1013 at 2 inout protocol any from addrset src1013 to addrset src1013 drop;
      rule 1011 at 3 inout protocol tcp from any to addrset dst1011 port 443 accept;
      rule 1011 at 4 inout protocol icmp icmptype 8 from any to addrset dst1011 accept;
      rule 1010 at 5 inout protocol tcp from addrset ip-securitygroup-10 to addrset ip-securitygroup-11 port 8443 accept;
      rule 1010 at 6 inout protocol icmp icmptype 8 from addrset ip-securitygroup-10 to addrset ip-securitygroup-11 accept;
      rule 1009 at 7 inout protocol tcp from addrset ip-securitygroup-11 to addrset ip-securitygroup-12 port 3306 accept;
      rule 1009 at 8 inout protocol icmp icmptype 8 from addrset ip-securitygroup-11 to addrset ip-securitygroup-12 accept;
      rule 1003 at 9 inout protocol ipv6-icmp icmptype 136 from any to any accept;
      rule 1003 at 10 inout protocol ipv6-icmp icmptype 135 from any to any accept;
      rule 1002 at 11 inout protocol udp from any to any port 67 accept;
      rule 1002 at 12 inout protocol udp from any to any port 68 accept;
      rule 1001 at 13 inout protocol any from any to any accept;
    }
    
    ruleset domain-c33_L2 {
      # Filter rules
      rule 1004 at 1 inout ethertype any from any to any accept;
    
  12. Run the show dfw host hostID filter filterID addrsetscommand.
    For example:
    # show dfw host host-28 filter nic-342296-eth2-vmware-sfw.2 addrsets 
    
    addrset dst1011 {
    ip 172.16.10.10,
    ip 172.16.10.11,
    ip 172.16.10.12,
    ip fe80::250:56ff:feae:3e3d,
    ip fe80::250:56ff:feae:f86b,
    }
    addrset ip-securitygroup-10 {
    ip 172.16.10.11,
    ip 172.16.10.12,
    ip fe80::250:56ff:feae:3e3d,
    ip fe80::250:56ff:feae:f86b,
    }
    addrset ip-securitygroup-11 {
    ip 172.16.20.11,
    ip fe80::250:56ff:feae:23b9,
    }
    addrset ip-securitygroup-12 {
    ip 172.16.30.11,
    ip fe80::250:56ff:feae:d42b,
    }
    addrset src1013 {
    ip 172.16.10.12,
    ip 172.17.10.11,
    ip fe80::250:56ff:feae:cf88,
    ip fe80::250:56ff:feae:f86b,
    }
    
  13. If you have validated each of the above troubleshooting steps and cannot publish firewall rules to the host virtual machines, execute a host-level force synchronization via the NSX Manager UI or via the following REST API call.
    URL : [https:]https://<nsx-mgr-ip>/api/4.0/firewall/forceSync/<host-id>
    HTTP Method : POST 
    Headers , 
    Authorization : base64encoded value of username password
    Accept : application/xml
    Content-Type : application/xml
    

Solution

Notes:
  • Ensure that VMware Tools is running on the virtual machines if firewall rules do not use IP addresses. For more information, see https://kb.vmware.com/kb/2084048.

    VMware NSX 6.2.0 introduced the option to discover the virtual machine IP address using DHCP snooping or ARP snooping. These new discovery mechanisms enable NSX to enforce IP address-based security rules on virtual machines that do not have VMware Tools installed. For more information, see the NSX 6.2.0 Release Notes.

    DFW is activated as soon as the host preparation process is completed. If a virtual machine needs no DFW service at all, it can be added in the exclusion list functionality (by default, NSX Manager, NSX Controllers and Edge Services Gateways are automatically excluded from DFW function). There is a possibility that the vCenter Server access gets blocked after creating a Deny All rule in DFW. For more information, see https://kb.vmware.com/kb/2079620.

  • When troubleshooting VMware NSX 6.x Distributed Firewall (DFW) with VMware Technical Support, these are required:
    • Output of the command show dfw host hostID summarize-dvfilter on each of the ESXi host on the cluster.
    • Distributed Firewall Configuration from the Networking and Security > Firewall > General tab and click Export Configuration. This exports the Distributed Firewall configuration to an XML format.
    • NSX Manager logs. For more information, see https://kb.vmware.com/kb/2074678.
    • vCenter Server logs. For more information, see https://kb.vmware.com/kb/1011641.