VMware NSX-T Data Center 3.1.3   |  22 July 2021  |  Build 18328989

Check regularly for additions and updates to these release notes.

What's in the Release Notes

The release notes cover the following topics:

What's New

NSX-T Data Center 3.1.3 is a maintenance release focused on quality and bug fixing. We have also enhanced the monitoring capabilities with new events and alarms for the firewall and the DNS Forwarder.

Events and Alarms

  • Firewall Alarms
    • To notify when number of connections (TCP half open, ICMP, UDP or Other IP) is approaching the threshold or has exceed the threshold. The default threshold value for all the alarms is configurable.
      • TCP Half Open Flow Count High
      • TCP Half Open Flow Count Exceeded
      • ICMP Flow Count High
      • ICMP Flow Count Exceeded
      • UDP Flow Count High
      • UDP Flow Count Exceeded
      • IP Flow Count High
      • IP Flow Count Exceeded
  • DNS Forwarder
    • DNS Forwarder Upstream Server Timeout - To notify when DNS forwarder does not receive timely response from upstream server as it may impact FQDNs.

Compatibility and System Requirements

For compatibility and system requirements information, see the NSX-T Data Center Installation Guide.

API and CLI Resources

See code.vmware.com to use the NSX-T Data Center APIs or CLIs for automation.

Available Languages

NSX-T Data Center has been localized into multiple languages: English, German, French, Japanese, Simplified Chinese, Korean, Traditional Chinese, and Spanish. Because NSX-T Data Center localization utilizes the browser language settings, ensure that your settings match the desired language.

Document Revision History

July 22, 2021. First edition.
August 10, 2021. Second edition. Added issue 2732839.
August 17, 2021. Third edition. Added issue 2805986.
September 23, 2021. Fourth edition. Added resolved issue 2764808.

Resolved Issues

  • Fixed Issue 2523421: LDAP authentication does not work properly when configured with an external load balancer (configured with round-robin connection persistence).

    The API LDAP authentication won't work reliably and will only work if the load balancer forwards the API request to a particular Manager.

  • Fixed Issue 2596696: NsxTRestException observed in policy logs when creating SegmentPort from the API.

    NsxTRestException observed in policy logs. The SegmentPort cannot be created using the API.

  • Fixed Issue 2647620: In an NSX configured environment with a large number of Stateless Hosts (TransportNodes), workload VMs on some Stateless hosts may lose connectivity temporarily when upgrading Management Plane nodes to 3.0.0 and above.

    This is applicable only to Stateless ESX Hosts configured for NSX 3.0.0 and above.

  • Fixed Issue 2555383: Internal server error during API execution.

    Internal server error observed during API call execution. API will result in 500 error and not give the desired output.

  • Fixed Issue 2681092: You can switch from the active Global Manager to the stand-by Global Manager even when the certificate of the latter has expired.

    The expired certificate on the standby Global Manager continues to allow communication when it shouldn't.

  • Fixed Issue 2694496: Accessing VDI though Webclient/UAGs throws an error.

    When trying to access VDI from the Horizon portal, it times out with an error on port "22443".

  • Fixed Issue 2603550: Some VMs are vMotioned and lose network connectivity during UA nodes upgrade.

    During NSX UA nodes upgrading, you may find some VMs are migrated by DRS and lose network connectivity after the migration.

  • Fixed Issue 2638674: Azure Mellanox driver maintenance upgrades can cause North-South traffic disruption.

    When Azure performs a maintenance event for Mellanox devices that involves a hot add of Mellanox device, PCGs that are in the North-South path might experience a North-South traffic loss.

  • Fixed Issue 2752246: NTLM/Server-keepalive enabled on L7 virtual server can cause Nginx Core when load balancer connection reuses port quickly.

    Load balancer service crashes due to Nginx core.

  • Fixed Issue 2732839: SNMP trap is not generated for some alarms.

    SNMP traps for certain edge datapath alarms are not sent.

  • Fixed Issue 2764808: Traffic redirection fails with Clustered and HostBasedDeployment and using a LogicalSwitch to configure MGMT network.

    You will not be able to use E-W ServiceDeployments with a Segment/LogicalSwitch backing MGMT nic.

Known Issues

The known issues are grouped as follows.

General Known Issues
  • Issue 2355113: Unable to install NSX Tools on RedHat and CentOS Workload VMs with accelerated networking enabled in Microsoft Azure.

    In Microsoft Azure when accelerated networking is enabled on RedHat (7.4 or later) or CentOS (7.4 or later) based OS and with NSX Agent installed, the ethernet interface does not obtain an IP address.

    Workaround: After booting up RedHat or CentOS based VM in Microsoft Azure, install the latest Linux Integration Services driver available at https://www.microsoft.com/en-us/download/details.aspx?id=55106 before installing NSX tools.

  • Issue 2490064: Attempting to disable VMware Identity Manager with "External LB" toggled on does not work.

    After enabling VMware Identity Manager integration on NSX with "External LB", if you attempt to then disable integration by switching "External LB" off, after about a minute, the initial configuration will reappear and overwrite local changes.

    Workaround: When attempting to disable vIDM, do not toggle the External LB flag off; only toggle off vIDM Integration. This will cause that config to be saved to the database and synced to the other nodes.

  • Issue 2526769: Restore fails on multi-node cluster.

    When starting a restore on a multi-node cluster, restore fails and you will have to redeploy the appliance.

    Workaround: Deploy a new setup (one node cluster) and start the restore.

  • Issue 2523212: The nsx-policy-manager becomes unresponsive and restarts.

    API calls to nsx-policy-manager will start failing, with service being unavailable. You will not be able to access policy manager until it restarts and is available.

    Workaround: Invoke API with at most 2000 objects.

  • Issue 2482580: IDFW/IDS configuration is not updated when an IDFW/IDS cluster is deleted from vCenter.

    When a cluster with IDFW/IDS enabled is deleted from vCenter, the NSX management plane is not notified of the necessary updates. This results in inaccurate count of IDFW/IDS enabled clusters. There is no functional impact. Only the count of the enabled clusters is wrong.

    Workaround: None.

  • Issue 2534933: Certificates that have LDAP based CDPs (CRL Distribution Point) fail to apply as tomcat/cluster certs.

    You can't use CA-signed certificates that have LDAP CDPs as cluster/tomcat certificate.

    Workaround: See VMware knowledge base article 78794.

  • Issue 2557287: TNP updates done after backup are not restored.

    You won't see any TNP updates done after backup on a restored appliance.

    Workaround: Take a backup after any updates to TNP.

  • Issue 2468774: When option 'Detect NSX configuration change' is enabled, backups are taken even when there is no configuration change.

    Too many backups are being taken because backups are being taken even when there are no configuration changes.

    Workaround: Increase the time associated with this option, thereby reducing the rate at which backups are taken.

  • Issue 2534921: Not specifying inter_sr_ibgp property in a PATCH API call will prevent other fields from being updated in the BgpRoutingConfig entity.

    PATCH API call fails to update BGP routing config entity. Error_message "BGP inter SR routing requires global BGP and ECMP flags enabled." BgoRoutingConfig will not be updated.

    Workaround: Specify inter_sr_ibgp property in the PATCH API call to allow other fields to be changed.

  • Issue 2566121: A UA node stopped accepting any New API calls with the message, "Some appliance components are not functioning properly."

    The UA node stops accepting any New API calls with the message, "Some appliance components are not functioning properly." There are around 200 connections stuck in CLOSE_WAIT state. These connections are not yet closed. New API call is rejected.

    Workaround: Restart proton service (service proton restart) or restart unified appliance node.

  • Issue 2574281: Policy will only allow a maximum of 500 VPN Sessions.

    NSX claims support of 512 VPN Sessions per edge in the large form factor, however, due to Policy doing auto plumbing of security policies, Policy will only allow a maximum of 500 VPN Sessions. Upon configuring the 501st VPN session on Tier0, the following error message is shown:
    {'httpStatus': 'BAD_REQUEST', 'error_code': 500230, 'module_name': 'Policy', 'error_message': 'GatewayPolicy path=[/infra/domains/default/gateway-policies/VPN_SYSTEM_GATEWAY_POLICY] has more than 1,000 allowed rules per Gateway path=[/infra/tier-0s/inc_1_tier_0_1].'}

    Workaround: Use Management Plane APIs to create additional VPN Sessions.

  • Issue 2628503: DFW rule remains applied even after forcefully deleting the manager nsgroup.

    Traffic may still be blocked when forcefully deleting the nsgroup.

    Workaround: Do not forcefully delete an nsgroup that is still used by a DFW rule. Instead, make the nsgroup empty or delete the DFW rule.

  • Issue 2631703: When doing backup/restore of an appliance with vIDM integration, vIDM configuration will break.

    Typically when an environment has been both upgraded and/or restored, attempting to restore an appliance where vIDM integration is up and running will cause that integration to break and you will to have to reconfigure.

    Workaround: After restore, manually reconfigure vIDM.

  • Issue 2638673: SRIOV vNICs for VMs are not discovered by inventory.

    SRIOV vNICs are not listed in Add new SPAN session dialog. You will not see SRIOV vNICs when adding new SPAN session.

    Workaround: None.

  • Issue 2639424: Remediating a Host in a vLCM cluster with Host-based Service VM Deployment will fail after 95% Remediation Progress is completed.

    The remediation progress for a Host will be stuck at 95% and then Fail after 70 minute timeout is completed.

    Workaround: See VMware knowledge base article 81447.

  • Issue 2636855: Maximum capacity alarm raised when System-wide Logical Switch Ports is over 25K.

    Maximum capacity alarm raised when System-wide Logical Switch Ports is over 25K. But actually for PKS scale Env, the limitation for container port is 60K; >25K Logical Switch Ports in PKS Env is a normal case.

    Workaround: None.

  • Issue 2636771: Search can return resource when a resource tagged with multiple tag pairs, and tag and scope match with any value of tag and scope.

    This affects search query with condition on tag and scope. Filter may return extra data if tag and scope match with any pair.

    Workaround: None.

  • Issue 2662225: When active edge node becomes non-active edge node during flowing S-N traffic stream, traffic loss is experienced.

    Current S->N stream is running on multicast active node. The preferred route on TOR to source should be through the multicast active edge node only.
    Bringing up another edge can take over multicast active node (lower rank edge is active multicast node). Current S->N traffic will experience loss up to four minutes. This is will not impact new stream or if current stream is stopped and started again.

    Workaround: Current S->N traffic will recover automatically within 3.5 to 4 minutes. For faster recovery, disable multicast and enable through configuration.

  • Issue 2610851: Namespaces, Compute Collection, L2VPN Service grid filtering might return no data for few combinations of resource type filters.

    Applying multiple filters for a few types at the same time returned no results even though data is available with matching criteria. It is not a common scenario and filter will fail only these grids for the following combinations of filter attribute:

    • For Namespaces grid ==> On Cluster Name and Pods Name filter
    • For Network Topology page  ==> On L2VPN service applying a remote ip filter
    • For Compute Collection ==> On ComputeManager filter

    Workaround: You can apply one filter at a time for these resource types.

  • Issue 2587257: In some cases, PMTU packet sent by NSX-T edge is ignored upon receipt at the destination.

    PMTU discovery fails resulting in fragmentation and reassembly, and packet drop. This results in performance drop or outage in traffic.

    Workaround: None.

  • Issue 2587513: Policy shows error when multiple VLAN ranges are configured in bridge profile binding.

    You will see an "INVALID VLAN IDs" error message.

    Workaround: Create multiple bridge endpoints with the VLAN ranges on the segment instead of one with all VLAN ranges.

  • Issue 2682480: Possible false alarm for NCP health status.

    The NCP health status alarm may be unreliable in the sense that it is raised when NCP system is healthy.

    Workaround: None.

  • Issue 2690457: When joining an MP to an MP cluster where publish_fqdns is set on the MP cluster and where the external DNS server is not configured properly, the proton service may not restart properly on the joining node.

    The joining manager will not work and the UI will not be available.

    Workaround: Configure the external DNS server with forward and reverse DNS entries for all Manager nodes.

  • Issue 2685550: FW Rule realization status is always shown as "In Progress" when applied to bridged segments.

    When applying FW Rules to an NSGroup that contains bridged segments as one of its members, realization status will always be shown as in progress. You won't be able to check the realization status of FW Rules applied to bridged segments.

    Workaround: Manually remove bridged segment from the NSGroup members list.

  • Issue 2684574: If the edge has 6K+ routes for Database and Routes, the Policy API times out.

    These Policy APIs for the OSPF database and OSPF routes return an error if the edge has 6K+ routes:
    /tier-0s/<tier-0s-id>/locale-services/<locale-service-id>/ospf/routes
    /tier-0s/<tier-0s-id>/locale-services/<locale-service-id>/ospf/routes?format=csv
    /tier-0s/<tier-0s-id>/locale-services/<locale-service-id>/ospf/database
    /tier-0s/<tier-0s-id>/locale-services/<locale-service-id>/ospf/database?format=csv

    If the edge has 6K+ routes for Database and Routes, the Policy API times out. This is a read-only API and has an impact only if the API/UI is used to download 6k+ routes for OSPF routes and database.

    Workaround: Use the CLI commands to retrieve the information from the edge.

  • Issue 2692344: If you delete the Avi Enforcement point, it deletes all the realized objects from the policy, which deletes all default object’s realized entities from the policy. Adding new enforcement point fails to re-sync the default object from the Avi Controller. 

    You will not be able to use the system-default objects after deletion and recreation of the Enforcement point of AVIConnectionInfo.

    Workaround: The enforcement point should not be deleted. If there are any changes it can be updated but should not be deleted.

  • Issue 2636420: Host will go to "NSX install skipped" state and cluster in "Failed" state post restore if "Remove NSX" is run on cluster post backup.

    "NSX Install Skipped" will be shown for host.

    Workaround: Following restore, you should have to run "Remove NSX" on the cluster again to achieve the state that was present following backup (not configured state).

  • Issue 2646702: IDS events detected by the appliance will not be preserved during Configuration Backup operation.

    After restore of configuration backup on to the new appliance, all previously detected IDS events cannot be retrieved and are not visible on the new appliance.

    Workaround: None.

  • Issue 2668717: Intermittent traffic loss might be observed for E-W routing between the vRA created networks connected to segments sharing Tier-1.

    In cases where vRA creates multiple segments and connects to a shared ESG, V2T will convert such a topology to a shared Tier-1 connected to all segments on the NSX-T side. During the host migration window, intermittent traffic loss might be observed for E-W traffic between workloads connected to the segments sharing the Tier-1.

  • Issue 2558576: Global Manager and Local Manager versions of a global profile definition can differ and might have an unknown behavior on Local Manager.

    Global DNS, session, or flood profiles created on Global Manager cannot be applied to a local group from UI, but can be applied from API. Hence, an API user can accidentally create profile binding maps and modify global entity on Local Manager.

    Use the UI interface to configure system.

  • Issue 2730109: When Edge is powering ON, Edge tries to make OSPF neighborship with the peer using its routerlink IP address as a OSPF RouterID though loop back is present.

    After reloading Edge, OSPF selects the downlink IP-address (the higher IP-address) as router-id until it receives the OSPF router-id configuration due to the configuration sequencing order. The neighbour entry with older router-id will eventually become stale entry upon receiving OSPF HELLO with new router-id and get expired after dead timer expiry on the peer.

    Workaround: None.

  • Issue 2798540: RedHat 7.0-based VM will not be operable after a reboot and may become inaccessible.

    VM might lose SSH connectivity and may become inoperable. Key NSX functions might not work.

    Workaround: None.

  • Issue 2734742: Memory reservation fails for hosts that are being upgraded without a reboot.

    Memory reservation in NSXT-T 3.1.2 fails for hosts that are being upgraded without a reboot (which can be verified using the local CLI on the host: “localcli system visorfs ramdisk list) causing a loss of connectivity between the host and the control plane.

    Workaround: Perform a reboot of the ESX host, and the memory reservation will take effect.

Installation Known Issues
  • Issue 2562189: Transport node deletion goes on indefinitely when the NSX Manager is powered off during the deletion operation.

    If the NSX Managers are powered off while transport node deletion is in progress, the transport node deletion may go on indefinitely if there is no user intervention.

    Workaround: Once the Managers are back up, prepare the node again and start the deletion process again.

Upgrade Known Issues
  • Issue 2693576: Transport Node shows "NSX Install Failed" after KVM RHEL 7.9 upgrade to RHEL 8.2​.

    After RHEL 7.9 upgrade to 8.2, dependencies nsx-opsagent and nsx-cli are missing. Host is marked as install failed. Resolving the failure from the UI doesn't work: Failed to install software on host. Unresolved dependencies: [PyYAML, python-mako, python-netaddr, python3]

    Workaround: Manually install the NSX RHEL 8.2 vibs after the host OS upgrade and resolve it from the UI.

  • Issue 2550492: During an upgrade, the message, "The credentials were incorrect or the account specified has been locked" is
    displayed temporarily and the system recovers automatically.

    Transient error message during upgrade.

    Workaround: None.

NSX Edge Known Issues
  • Issue 2283559: https://<nsx-manager>/api/v1/routing-table and https://<nsx-manager>/api/v1/forwarding-table MP APIs return an error if the edge has 65k+ routes for RIB and 100k+ routes for FIB.

    If the edge has 65k+ routes for RIB and 100k+ routes for FIB, the request from MP to Edge takes more than 10 seconds and results in a timeout. This is a read-only API and has an impact only if they need to download the 65k+ routes for RIB and 100k+ routes for FIB using API/UI.

    Workaround: There are two options to fetch the RIB/FIB.

    • These APIs support filtering options based on network prefixes or type of route. Use these options to download the routes of interest.
    • CLI support in case the entire RIB/FIB table is needed and there is no timeout for the same.
  • Issue 2521230: BFD status displayed under ‘get bgp neighbor summary’ may not reflect the latest BFD session status correctly.

    BGP and BFD can set up their sessions independently. As part of ‘get bgp neighbor summary’ BGP also displays the BFD state. If the BGP is down, it will not process any BFD notifications and will continue to show the last known state. This could lead to displaying stale state for the BFD.

    Workaround: Rely on the output of ‘get bfd-sessions’ and check the ‘State’ field to get the most up-to-date BFD status.

  • Issue 2805986: Unable to deploy NSX-T managed edge VM.

    NSX-T Edge deployment fails when done using ESX UI.
     

    Workaround: Use vCenter UI to deploy, or use OVF tool to deploy edge.
     

Security Known Issues
  • Issue 2491800: AR channel SSL certificates are not periodically checked for their validity, which could lead to using an expired/revoked certificate for an existing connection.

    The connection would be using an expired/revoked SSL.

    Workaround: Restart the APH on the Manager node to trigger a reconnection.

  • Issue 2689449: Incorrect inventory may be seen if the Public Cloud Gateway (PCG) is rebooting.

    The managed state of managed instances is shown as unknown. Some inventory information, such as managed state, errors and quarantine status will not be available to the Cloud Service Manager.

    Workaround: Wait for PCG to be up, and either wait for periodic sync or trigger account sync.

Federation Known Issues
  • Issue 2630813: SRM recovery for compute VMs will lose all the NSX tags applied to VM and Segment ports.

    If a SRM recovery test or run is initiated, the replicated compute VMs in the disaster recovery location will not have any NSX tags applied.

  • Issue 2601493: Concurrent config onboarding is not supported on Global Manager in order to prevent heavy processing load.

    Although parallel config onboarding does not interfere with each other, multiple such config onboarding executions on GM would make GM slow and sluggish for other operations in general.

    Workaround: Security Admin / Users must sync up maintenance windows to avoid initiating config onboarding concurrently.

  • Issue 2613113: If onboarding is in progress, and restore of Local Manager is done, the status on Global Manager does not change from IN_PROGRESS.

    UI shows IN_PROGRESS in Global Manager for Local Manager onboarding. Unable to import the configuration of the restored site.

    Workaround: Use the Local Manager API to start the onboarding of the Local Manager site, if required.

  • Issue 2625009: Inter-SR iBGP sessions keep flapping, when intermediate routers or physical NICs have lower or equal MTU as the inter-SR port.

    This can impact inter-site connectivity in Federation topologies.

    Workaround: Keep the pNic MTU and intermediate routers' MTU bigger than the global MTU (i.e., the MTU used by inter-SR port). The size of the packets becomes more than MTU because of encapsulation and packets don't go through.

  • Issue 2606452: Onboarding is blocked when trying to onboard via API.

    Onboarding API fails with the error message, "Default transport zone not found at site". 

    Workaround: Wait for fabric sync between Global Manager and Local Manager to complete.

  • Issue 2643749: Unable to nest group from custom region created on specific site into group that belongs to system created site specific region.

    You will not see the group created in site specific custom region while selecting child group as a member for the group in the system created region with the same location.

  • Issue 2649240: Deletion is slow when a large number of entities are deleted using individual delete APIs.

    It takes significant time to complete the deletion process.

    Workaround: Use hierarchical API to delete in bulk.

  • Issue 2649499: Firewall rule creation takes a long time when individual rules are created one after the other.

    Slow API takes more time to create rules.

    Workaround: Use Hierarchical API to create several rules.

  • Issue 2652418: Slow deletion when large number of entities are deleted.

    Deletion will be slower.

    Workaround: Use the hierarchical API for bulk deletion.

  • Issue 2655539: Host names are not updated on the Location Manager page of the Global Manager UI when updating the host names using the CLI.

    The old host name is shown.

    Workaround: None.

  • Issue 2658687: Global Manager switchover API reports failure when transaction fails, but the failover happens.

    API fails, but Global Manager switchover completes.

    Workaround: None.

  • Issue 2658092: Onboarding fails when NSX Intelligence is configured on Local Manager.

    Onboarding fails with a principal identity error. and you cannot onboard a system with principal identity user.

    Workaround: Create a temporary principal identity user with the same principal identity name that is used by NSX Intelligence.

  • Issue 2622576: Failures due to duplicate configuration are not propagated correctly to user.

    While onboarding is in progress, you see an "Onboarding Failure" message.

    Workaround: Restore Local Manager and retry onboarding.

  • Issue 2679614: When the API certificate is replaced on the Local Manager, the Global Manager's UI will display the message, "General Error has occurred."

    When the API certificate is replaced on the Local Manager, the Global Manager's UI will display the message, "General Error has occurred."

    Workaround:

    1. Open the "Location Manager" of the Global Manager UI.
    2. Click the "ACTION" tab under the affected Local Manager and then enter the new thumbprint.
    3. If this does not work, off-board the Local Manager and then re-onboard the Local Manager.
  • Issue 2663483: The single-node NSX Manager will disconnect from the rest of the NSX Federation environment if you replace the APH-AR certificate on that NSX Manager.

    This issue is seen only with NSX Federation and with the single node NSX Manager Cluster. The single-node NSX Manager will disconnect from the rest of the NSX Federation environment if you replace the APH-AR certificate on that NSX Manager.

    Workaround: Single-node NSX Manager cluster deployment is not a supported deployment option, so have three-node NSX Manager cluster.

check-circle-line exclamation-circle-line close-line
Scroll to top icon