times

VMware NSX-T Data Center 3.1.2 | 17 April 2021 | Build 17883596

Check regularly for additions and updates to these release notes.

What's in the Release Notes

The release notes cover the following topics:

What's New
Compatibility and System Requirements
Available Languages
API and CLI Resources
Revision History
Resolved Issues
Known Issues

What's New

NSX-T Data Center 3.1.2 provides a variety of new features to offer new functionalities for virtualized networking and security for private, public, and multi-clouds. Highlights include new features and enhancements in the following focus areas.

NSX Cloud

HCS on Azure with NSX Cloud Testing
- Additional HCS topologies tested and validated with NSX Cloud.
Additional NSX Cloud OS Support: Added support for the following operating systems:
- Standard Windows 2012, 2016 & 2019, and Windows 10 enterprise.
- Red Hat Enterprise Linux 7.0, 7.1, 7.2, 7.3, 8.1, 8.2, 8.3.

Events and Alarms

Load Balancer
- Load Balancer/Distributed Load Balancer Status Degraded, Load Balancer Service Memory Usage Very High.
Edge Health
- Edge node Pool Member Capacity In Use Very High, Edge node Load Balancer Capacity In Use High.
IPAM
- IP Block Usage Very High, IP Pool Usage Very High.
Edge NIC Out of Receive Buffer
- Based on the customer feedback, the severity of the alarm is changed from Critical to Warning and default threshold value changed from 0.1% to 2%.

Operations

Rolling Packet Capture to troubleshoot Datapath Issues on Edge
- You can now enable rolling packet capture on Edge through nsxcli with up to 25 files and maximum size of 100 MB per file. This will help in running the packet capture for a longer duration while troubleshooting intermittent datapath issues.

NVDS to VDS Migration

NVDS to VDS migration - support for parallel cluster upgrade:
- You can now migrate NVDS host switches to VDS switches while upgrading ESXi hosts to vSphere 7.0 U2, where the host clusters are being upgraded in parallel. A maximum of 4 clusters can be upgraded in parallel to support this feature.
Support VSAN file service & share nothing architecture VMs in NVDS to VDS migration
- You can now migrate NVDS host switches to VDS switches on a host that has either a VSAN File Service or VSAN Share Nothing Architecture VM connected to the NVDS on that host.

N-VDS NSX-T Host Switch Deprecation Announcement

NSX-T 3.0.0 and later has the capability to run on the vSphere VDS switch version 7.0 and later. This provides a tighter integration with vSphere and easier NSX-T adoption for customers adding NSX-T to their vSphere environment.

Please be aware that VMware intends to remove support of the NSX-T N-VDS virtual switch on ESXi hosts in an upcoming NSX-T release, which will be generally available no sooner than one year from the date of this message, April 17, 2021. N-VDS will remain the supported virtual switch on KVM, NSX-T Edge nodes, native public cloud NSX agents, and bare metal workloads.

It is recommended that new deployments of NSX-T and vSphere take advantage of this close integration and deploy using VDS switch version 7.0 and later. In addition, for existing deployments of NSX-T that use the N-VDS on ESXi hosts, VMware recommends moving toward the use of NSX-T on VDS. To make this process easy, VMware has provided both a CLI based switch migration tool, which was first made available in NSX-T 3.0.2, and a GUI based Upgrade Readiness Tool, which was first made available in NSX-T 3.1.1 (see NSX documentation for more details on these tools).

The following deployment considerations are recommended when moving from N-VDS to VDS:

The N-VDS and VDS APIs are different, and the backing type for VM and vmKernel interface APIs for the N-VDS and VDS switches are also different. As you move to use VDS in your environment, you will have to invoke the VDS APIs instead of N-VDS APIs. This ecosystem change will have to be made before converting the N-VDS to VDS. Refer to KB https://kb.vmware.com/s/article/79872 for more details.

Note: There are no changes to N-VDS or VDS APIs.

VDS is configured through vCenter. N-VDS is vCenter independent. With NSX-T support on VDS and the eventual deprecation of N-VDS, NSX-T will be closely tied to vCenter and vCenter will be required to enable NSX in vSphere environments.

Compatibility and System Requirements

For compatibility and system requirements information, see the NSX-T Data Center Installation Guide.

API and CLI Resources

See code.vmware.com to use the NSX-T Data Center APIs or CLIs for automation.

Available Languages

NSX-T Data Center has been localized into multiple languages: English, German, French, Japanese, Simplified Chinese, Korean, Traditional Chinese, and Spanish. Because NSX-T Data Center localization utilizes the browser language settings, ensure that your settings match the desired language.

Document Revision History

April 17, 2021. First edition.
April 30, 2021. Second edition. Added issues 2663064, 2689867, 2692347, 2692436, 2697537, 2697550, 2697824, 2699857, 2709978, 2715237, 2718052, 2719526, 2719973, 2727859, 2730109, 2732839, 2734742, 2742234, 2690344, 2707873.
August 30, 2021. Third edition. Added resolved issue 2690014.
September 17, 2021. Fourth edition. Added known issue 2761589.

Resolved Issues

Fixed Issue 2742234: OSPF instance is not found after upgrade.
OSPF instance is not present for 10 to 15 minutes after upgrade from NSX-T 2.5.0 to 3.1.0 and NSX-T 3.1.0 to 3.1.1.
Fixed Issue 2692347: network_path parameter is not supported.
network_path parameter is not supported in VS VIP API calls.
Fixed Issue 2520803: Encoding format for Manual Route Distinguisher and Route Target configuration in EVPN deployments.
You currently can configure manual route distinguisher in both Type-0 encoding and in Type-1 encoding. However, using the Type-1 encoding scheme for configuring Manual Route Distinguisher in EVPN deployments is highly recommended. Also, only Type-0 encoding for Manual Route Target configuration is allowed.
Fixed Issue 2537989: Clearing VIP (Virtual IP) does not clear vIDM integration on all nodes.
If VMware Identity Manager is configured on a cluster with a Virtual IP, disabling the Virtual IP does not result in the VMware Identity Manager integration being cleared throughout the cluster. You will have to manually fix vIDM integration on each individual node if the VIP is disabled.
Fixed Issue 2521071: For a Segment created in Global Manager, if it has a BridgeProfile configuration, then the Layer2 bridging configuration is not applied to individual NSX sites.
The consolidated status of the Segment will remain at "ERROR". This is due to failure to create bridge endpoint at a given NSX site. You will not be able successfully configure a BridgeProfile on Segments created via Global Manager.
Fixed Issue 2532127: LDAP user can't log in to NSX only if the user's Active Directory entry does not contain the UPN (userPrincipalName) attribute and contains only the samAccountName attribute.
User authentication fails and the user is unable to log in to the NSX user interface.
Fixed Issue 2560981: On upgrade, vIDM config may not persist.
You will have to re-login after successful upgrade if using vIDM, re-enable vIDM on the cluster.
Fixed Issue 2596162: Unable to update the nsxaHealthStatus for a switch when the switch name contains a single quote.
NSX configuration state is at partial success because the health status of a switch could not be updated.
Fixed Issue 2610718: Attempting to wire vIDM to NSX using the nsx-cli fails if lb_enable and vidm_enable flags are not explicitly specified.
The error, "An error occurred attempting to update the vidm properties" will appear. You will only be able to wire vIDM using the UI or directly through REST API, or only through CLI while explicitly defining lb_enable and vidm_enable flags.
Fixed Issue 2641990: During Edge vMotion, there can be multicast traffic loss up to 30 seconds (default pim hello interval).
When edge is vMotioned and IGMP snooping is enabled on TOR, The TOR needs to know the new edge location. This can be learned by TOR when it gets any of the multicast control or data traffic from the edge. Multicast Traffic is lost up to 30 seconds on edge vMotion.
Fixed Issue 2691432: Restore may fail.
Restore may not work in some cases.
Fixed Issue 2690996: Cross-site packet forwarding may fail on KVM nodes if system assigned l2 forwarder vtep group id conflicts with VTEP label assigned to transport nodes.
VM attached to stretched segment may lose cross-location connectivity. Cross-site traffic would not work for conflicting segments for KVM deployments.
Fixed Issue 2694707: The operational status of firewall rules on cloud VMs may show unknown for some rules in case an HA failover of public cloud gateways happens.
The operational status of firewall rules on NSX Policy UI may show unknown. There is no functional impact. All rules are successfully realized. The status should clear itself and become healthy when both public cloud gateways are online.
Fixed Issue 2697111: Unable to use "import CRL" functionality using Global Manager UI.
When trying to Import CRL, the operation fails due to the wrong URL hit from the UI. You will not be able to use the Import CRL option on the Global Manager.
Fixed Issue 2674689: If the transport node is updated between URT and the start of migration, it loses the extra config profile.
Migration of transport node fails in TN_Validate Stage.
Fixed Issue 2697549: If there is GI service deployed on the cluster, URT ApplyTopology will fail as URT is unable to make change to the transport node deployed GI service.
URT ApplyTopology returns overall status of APPLY_TOPOLOGY_FAILED.
Fixed Issue 2687948: LR does not work after a switchover from IP address to FQDN.
"Fetching LR status timed out" error observed in UI and GM log replication will stop.
Fixed Issue 2680854: Second attempt for Config Onboarding for a site is failing after rollback is successful on Global Manager.
The Config Onboarding status is stuck in "In progress" indefinitely. You will not be able to config onboard site a second time after first attempt ends up in rollback.
Fixed Issue 2702168: After upgrading from NSX-T 3.0 to NSX-T 3.1, you cannot make any changes to VRF LR.
If TIER0_EVPN_TEP_IP was added in VRF LR redistribution rule, you are unable to make any changes to VRF LR. A validation error states that "TIER0_EVPN_TEP_IP" is not supported for VRF LR.
Fixed Issue 2688584: Fetching LR sync status timed out because one LR node hit TransactionAbortedException and shut down its thread pool.
You will not be able to switch over and LR will stop.
Fixed Issue 2679344: Logging in to NSX-T Manager node as an LDAP user in a scaled Active Directory configuration may take a long time or fail.
Logging in takes a long time or times out and may fail.
Fixed Issue 2711497: NSX Cloud Upgrade from an older version to NSX-T 3.1.1 may temporarily move the agented VMs to error state.
You will lose access to the VMs and there could be application downtime until the PCG is upgraded.
Fixed Issue 2723546: North-South traffic is lost when the primary PCG goes in a standby mode during upgrade.
The primary PCG goes in a standby mode during upgrade. The secondary PCG becomes active and all VMs connect to the secondary PCG. During this time, the North-South traffic goes down.

The primary PCG becomes active when the upgrade completes. All VMs connect back to it and the North-South traffic is resumed.
Fixed Issue 2738345: BGP extended large community fails when it is configured with regex.
If BGP extended large community is configured with regex, FRR-CLI fails and configuration does not take effect due to which the BGP route filtering will not work.
Fixed Issue 2534089: When the IDS service is enabled on a transport node (host), virtual machine traffic on IDS-enabled hosts will stop flowing unexpectedly.
When enabling the NSX IDS/IPS (in either detect-only or detect-and-prevent mode) on a vSphere cluster and applying IDS/IPS to workloads, the lockup condition can get triggered just by having the IDPS engine enabled. As a result, all traffic to and from all workloads on the hypervisor subject to IDS/IPS or Deep Packet Inspection Services (L7 App-ID) will be dropped. Traffic not subject to IDS/IPS or Deep Packet Inspection is not impacted and as soon as IDS/IPS is disabled or no longer applied to traffic, traffic flow is restored.

This issue is fixed in ESXi 7.0.2.
Fixed Issue 2663064: Upgrading NSX-T from 3.0.2 to 3.1.1 does not trigger upgrade of category_id for ethernet sections.
When NSX-T is upgraded from 3.0.2 to 3.1.1, the upgrade for category_id for ethernet sections from 10 to 250 is not triggered and ethernet rules are kept incorrectly in prefilter bucket on hosts. This results in Ethernet (L2) rules not being enforced or not working as expected.

Modify ethernet sections from Policy UI or APIs to trigger the publish of rules from policy to proton.
Fixed Issue 2689867: Segment ports are stuck in delete pending state.
After segment ports are detached from Global Manager, on Local Manager they show up as stuck in delete pending state.
Fixed Issue 2692436: VM tags are not retained after more than one cross-site migration of VM with vMotion.
New or updated VM tags cannot be retrieved if a VM is migrated with vMotion across more than one site with the same storage shared cross site. Tags might need to be added manually to a VM on the new site.
Fixed Issue 2697550: A call to nanosleep() gets stuck that blocks an RPC library thread in nsx-exporter.
After upgrading from NSX-T 2.5.2 to 3.1.1, an RPC library thread in nsx-exporter gets blocked due to nanosleep() call getting stuck. This results in operational status being down for logical port. Other services provided by nsx-exporter might also not function.
Fixed Issue 2697824: When upgrading from NSX-T 2.5.2 to NSX-T 3.1.1 host upgrade fails with an error.
When upgrading from NSX-T 2.5.2 to NSX-T 3.1.1 host upgrade fails with ‘unloading module nsxt-vswitch’ error if the Teaming policy (TeamPolicyUpDelay) configuration is set in minutes and ENS is enabled or uplink flaps during upgrade.
Fixed Issue 2699857: FQDN traffic matches against unexpected rules.
FQDN traffic matches against an invalid rule under the following conditions:
- IDS is enabled along with FQDN attributes in a Context Profile.
- A change is made to an FQDN context profile by adding or deleting an FQDN attribute and the rule in question triggers an FQDN revalidation that fails when traffic is matched against the rule.
Fixed Issue 2709978: Error while connecting to load balancer VIP.
A load balancer Application Rule does not work due to which "502 bad gateway" error is received while connecting to load balancer VIP.
Fixed Issue 2715237: Purple Screen of Death (PSOD) occurs under certain condition when service insertion is enabled.
When service insertion is enabled, certain operations such as vMotion or heavy traffic cause deadlock and PSOD on random hosts.
Fixed Issue 2718052: Firewall rule fails to get realized on the management plane.
A firewall rule with only raw IPv4 addresses in source and IPv6 addresses in destination or vice-versa gets created on policy, but the rule fails to get realized on the management plane as this combination is not supported. There is no validation of such a combination on policy.
Fixed Issue 2719526: vSwitch drops the Internet Group Management Protocol (IGMP) reports that come from overlay client port to uplinks.
The 'igmp report' message, sent from a VM on an overlay segment, is not received on another machine on a VLAN in the physical network.
Fixed Issue 2719973: Routes learned from inter-SR remain stale.
Due to BGP peer GR mode toggle or on restart of 'Restart' mode peer, routes learned from inter-SR remain stale that results in inter_sr_vrf showing stale imported routes and 'get route' output showing stale isr routes.
Fixed Issue 2727859: The API GET : https://NSX-IP/policy/api/v1/infra/realized-state/virtual-machines returns a NullPointerException.
The API GET : https://NSX-IP/policy/api/v1/infra/realized-state/virtual-machines returns a NullPointerException. This issue is seen if some of the VMs on VC MOB have either the OS name or the Computer Name (but not both) populated on the VM guest info.
Fixed Issue 2690344: VMs on the same ESXi host and HostSwitch as the Edge have North-South connectivity issues while ICMP traffic is not affected.
In a collapsed cluster deployment where the Edge VTEP vNIC is not connected to a VLAN trunk and uplink, vNICs use VLAN 0 in Edge Uplink profile with ESX flow cache enabled. North-South traffic for workload VMs on the host gets impacted where flow cache function on the host sets the transport VLAN offload for all encapsulated packets. Since Edge is not expecting VLAN tagged packets, the packets are dropped.
Fixed Issue 2697537: There is up to a 4 minute delay in creating first logical switch after enabling lockdown mode.
Creation of first logical switch is delayed by 4 minutes after enabling lockdown mode.
Fixed Issue 2707873: L2 forwarder configured from Policy or MP missing from Edge.
L2 forwarder created by federation configuration is missing from Edge. There is loss of traffic from flows that are supposed to be forwarded over the missing L2 forwarder or stretched lswitch along with syslog displaying the following error.

Invalid LogSwitchForwarderContextMsg, service context ID [num] in use by lswitch [uuid]
Fixed Issue 2690014: Control Channel To Transport Node “Down" alarms are not cleared despite the channel being up.
This is cosmetic and does not impact the operation of CCP or the Transport Nodes. See VMware knowledge base article 85168 for details.

Known Issues

The known issues are grouped as follows.

General Known Issues
Installation Known Issues
Upgrade Known Issues
NSX Edge Known Issues
Security Known Issues
Federation Known Issues

General Known Issues

Issue 2734742: NestDB memory reservation fails for hosts that are being upgraded without a reboot.
NestDB memory reservation in NSXT-T 3.1.2 fails for hosts that are being upgraded without a reboot (which can be verified using the local CLI on the host: “localcli system visorfs ramdisk list) causing a loss of connectivity between the host and the control plane.

Workaround: Perform a reboot of the ESX host, and the NestDB memory reservation will take effect.
Issue 2732839: SNMP trap is not generated for some alarms.
SNMP traps for certain edge datapath alarms are not sent.
Issue 2329273: No connectivity between VLANs bridged to the same segment by the same edge node.
Bridging a segment twice on the same edge node is not supported. However, it is possible to bridge two VLANs to the same segment on two different edge nodes.

Workaround: None
Issue 2355113: Unable to install NSX Tools on RedHat and CentOS Workload VMs with accelerated networking enabled in Microsoft Azure.
In Microsoft Azure when accelerated networking is enabled on RedHat (7.4 or later) or CentOS (7.4 or later) based OS and with NSX Agent installed, the ethernet interface does not obtain an IP address.

Workaround: After booting up RedHat or CentOS based VM in Microsoft Azure, install the latest Linux Integration Services driver available at https://www.microsoft.com/en-us/download/details.aspx?id=55106 before installing NSX tools.
Issue 2490064: Attempting to disable VMware Identity Manager with "External LB" toggled on does not work.
After enabling VMware Identity Manager integration on NSX with "External LB", if you attempt to then disable integration by switching "External LB" off, after about a minute, the initial configuration will reappear and overwrite local changes.

Workaround: When attempting to disable vIDM, do not toggle the External LB flag off; only toggle off vIDM Integration. This will cause that config to be saved to the database and synced to the other nodes.
Issue 2526769: Restore fails on multi-node cluster.
When starting a restore on a multi-node cluster, restore fails and you will have to redeploy the appliance.

Workaround: Deploy a new setup (one node cluster) and start the restore.
Issue 2523212: The nsx-policy-manager becomes unresponsive and restarts.
API calls to nsx-policy-manager will start failing, with service being unavailable. You will not be able to access policy manager until it restarts and is available.

Workaround: Invoke API with at most 2000 objects.
Issue 2482580: IDFW/IDS configuration is not updated when an IDFW/IDS cluster is deleted from vCenter.
When a cluster with IDFW/IDS enabled is deleted from vCenter, the NSX management plane is not notified of the necessary updates. This results in inaccurate count of IDFW/IDS enabled clusters. There is no functional impact. Only the count of the enabled clusters is wrong.

Workaround: None.
Issue 2534933: Certificates that have LDAP based CDPs (CRL Distribution Point) fail to apply as tomcat/cluster certs.
You can't use CA-signed certificates that have LDAP CDPs as cluster/tomcat certificate.

Workaround: See VMware knowledge base article 78794.
Issue 2557287: TNP updates done after backup are not restored.
You won't see any TNP updates done after backup on a restored appliance.

Workaround: Take a backup after any updates to TNP.
Issue 2468774: When option 'Detect NSX configuration change' is enabled, backups are taken even when there is no configuration change.
Too many backups are being taken because backups are being taken even when there are no configuration changes.

Workaround: Increase the time associated with this option, thereby reducing the rate at which backups are taken.
Issue 2523421: LDAP authentication does not work properly when configured with an external load balancer (configured with round-robin connection persistence).
The API LDAP authentication won't work reliably and will only work if the load balancer forwards the API request to a particular Manager.

Workaround: None.
Issue 2534921: Not specifying inter_sr_ibgp property in a PATCH API call will prevent other fields from being updated in the BgpRoutingConfig entity.
PATCH API call fails to update BGP routing config entity. Error_message "BGP inter SR routing requires global BGP and ECMP flags enabled." BgoRoutingConfig will not be updated.

Workaround: Specify inter_sr_ibgp property in the PATCH API call to allow other fields to be changed.
Issue 2566121: A UA node stopped accepting any New API calls with the message, "Some appliance components are not functioning properly."
The UA node stops accepting any New API calls with the message, "Some appliance components are not functioning properly." There are around 200 connections stuck in CLOSE_WAIT state. These connections are not yet closed. New API call is rejected.

Workaround: Restart proton service (service proton restart) or restart unified appliance node.
Issue 2574281: Policy will only allow a maximum of 500 VPN Sessions.
NSX claims support of 512 VPN Sessions per edge in the large form factor, however, due to Policy doing auto plumbing of security policies, Policy will only allow a maximum of 500 VPN Sessions. Upon configuring the 501st VPN session on Tier0, the following error message is shown:
{'httpStatus': 'BAD_REQUEST', 'error_code': 500230, 'module_name': 'Policy', 'error_message': 'GatewayPolicy path=[/infra/domains/default/gateway-policies/VPN_SYSTEM_GATEWAY_POLICY] has more than 1,000 allowed rules per Gateway path=[/infra/tier-0s/inc_1_tier_0_1].'}

Workaround: Use Management Plane APIs to create additional VPN Sessions.
Issue 2596696: NsxTRestException observed in policy logs when creating SegmentPort from the API.
NsxTRestException observed in policy logs. The SegmentPort cannot be created using the API.

Workaround: Either populate the Id field in PortAttachmentDto or pass it as null in the API input.
Issue 2628503: DFW rule remains applied even after forcefully deleting the manager nsgroup.
Traffic may still be blocked when forcefully deleting the nsgroup.

Workaround: Do not forcefully delete an nsgroup that is still used by a DFW rule. Instead, make the nsgroup empty or delete the DFW rule.
Issue 2631703: When doing backup/restore of an appliance with vIDM integration, vIDM configuration will break.
Typically when an environment has been both upgraded and/or restored, attempting to restore an appliance where vIDM integration is up and running will cause that integration to break and you will to have to reconfigure.

Workaround: After restore, manually reconfigure vIDM.
Issue 2638673: SRIOV vNICs for VMs are not discovered by inventory.
SRIOV vNICs are not listed in Add new SPAN session dialog. You will not see SRIOV vNICs when adding new SPAN session.

Workaround: None.
Issue 2647620: In an NSX configured environment with a large number of Stateless Hosts (TransportNodes), workload VMs on some Stateless hosts may lose connectivity temporarily when upgrading Management Plane nodes to 3.0.0 and above.
This is applicable only to Stateless ESX Hosts configured for NSX 3.0.0 and above.

Workaround: None.
Issue 2639424: Remediating a Host in a vLCM cluster with Host-based Service VM Deployment will fail after 95% Remediation Progress is completed.
The remediation progress for a Host will be stuck at 95% and then Fail after 70 minute timeout is completed.

Workaround: See VMware knowledge base article 81447.
Issue 2636855: Maximum capacity alarm raised when System-wide Logical Switch Ports is over 25K.
Maximum capacity alarm raised when System-wide Logical Switch Ports is over 25K. But actually for PKS scale Env, the limitation for container port is 60K; >25K Logical Switch Ports in PKS Env is a normal case.

Workaround: None.
Issue 2636771: Search can return resource when a resource tagged with multiple tag pairs, and tag and scope match with any value of tag and scope.
This affects search query with condition on tag and scope. Filter may return extra data if tag and scope match with any pair.

Workaround: None.
Issue 2643610: Load balancer statistics APIs are not returning stats.
Stats of API are not set. You can't see load balancer stats.

Workaround: Reduce the number of load balancers configured.
Issue 2555383: Internal server error during API execution.
Internal server error observed during API call execution. API will result in 500 error and not give the desired output.

Workaround: This error is encountered because the session is invalidated. In this case, re-execute the session creation api to create a new session.
Issue 2662225: When active edge node becomes non-active edge node during flowing S-N traffic stream, traffic loss is experienced.
Current S->N stream is running on multicast active node. The preferred route on TOR to source should be through the multicast active edge node only.
Bringing up another edge can take over multicast active node (lower rank edge is active multicast node). Current S->N traffic will experience loss up to four minutes. This is will not impact new stream or if current stream is stopped and started again.

Workaround: Current S->N traffic will recover automatically within 3.5 to 4 minutes. For faster recovery, disable multicast and enable through configuration.
Issue 2610851: Namespaces, Compute Collection, L2VPN Service grid filtering might return no data for few combinations of resource type filters.
Applying multiple filters for a few types at the same time returned no results even though data is available with matching criteria. It is not a common scenario and filter will fail only these grids for the following combinations of filter attribute:
- For Namespaces grid ==> On Cluster Name and Pods Name filter
- For Network Topology page ==> On L2VPN service applying a remote ip filter
- For Compute Collection ==> On ComputeManager filter
Workaround: You can apply one filter at a time for these resource types.
Issue 2587257: In some cases, PMTU packet sent by NSX-T edge is ignored upon receipt at the destination.
PMTU discovery fails resulting in fragmentation and reassembly, and packet drop. This results in performance drop or outage in traffic.

Workaround: None.
Issue 2587513: Policy shows error when multiple VLAN ranges are configured in bridge profile binding.
You will see an "INVALID VLAN IDs" error message.

Workaround: Create multiple bridge endpoints with the VLAN ranges on the segment instead of one with all VLAN ranges.
Issue 2682480: Possible false alarm for NCP health status.
The NCP health status alarm may be unreliable in the sense that it is raised when NCP system is healthy.

Workaround: None.
Issue 2690457: When joining an MP to an MP cluster where publish_fqdns is set on the MP cluster and where the external DNS server is not configured properly, the proton service may not restart properly on the joining node.
The joining manager will not work and the UI will not be available.

Workaround: Configure the external DNS server with forward and reverse DNS entries for all Manager nodes.
Issue 2685550: FW Rule realization status is always shown as "In Progress" when applied to bridged segments.
When applying FW Rules to an NSGroup that contains bridged segments as one of its members, realization status will always be shown as in progress. You won't be able to check the realization status of FW Rules applied to bridged segments.

Workaround: Manually remove bridged segment from the NSGroup members list.
Issue 2694496: Accessing VDI though Webclient/UAGs throws an error.
When trying to access VDI from the Horizon portal, it times out with an error on port "22443".

Workaround: Reboot the VDI.
Issue 2684574: If the edge has 6K+ routes for Database and Routes, the Policy API times out.
These Policy APIs for the OSPF database and OSPF routes return an error if the edge has 6K+ routes:
/tier-0s/<tier-0s-id>/locale-services/<locale-service-id>/ospf/routes
/tier-0s/<tier-0s-id>/locale-services/<locale-service-id>/ospf/routes?format=csv
/tier-0s/<tier-0s-id>/locale-services/<locale-service-id>/ospf/database
/tier-0s/<tier-0s-id>/locale-services/<locale-service-id>/ospf/database?format=csv

If the edge has 6K+ routes for Database and Routes, the Policy API times out. This is a read-only API and has an impact only if the API/UI is used to download 6k+ routes for OSPF routes and database.

Workaround: Use the CLI commands to retrieve the information from the edge.
Issue 2603550: Some VMs are vMotioned and lose network connectivity during UA nodes upgrade.
During NSX UA nodes upgrading, you may find some VMs are migrated by DRS and lose network connectivity after the migration.

Workaround: Change the DRS automation mode to manual before performing UA upgrade.
Issue 2622240: NVDS to CVDS Migration is triggered only for ESX upgrades that cross the 7.0.2 (X.Y.Z-U.P) release.
Migration will not be triggered for any "U.P" (update-patch) upgrades. ESX version is specified as X.Y.Z-U.P where, X = Major, Y = Minor, Z = Maintenance, U = Update, P = Patch

Workaround: NVDS to CVDS migration needs to be started using API/UI.
POST https://{{nsxmanager-ip}}/api/v1/transport-nodes/{{transportnode-id}}?action=migrate_to_vds
Issue 2692344: If you delete the Avi Enforcement point, it deletes all the realized objects from the policy, which deletes all default object’s realized entities from the policy. Adding new enforcement point fails to re-sync the default object from the Avi Controller.
You will not be able to use the system-default objects after deletion and recreation of the Enforcement point of AVIConnectionInfo.

Workaround: The enforcement point should not be deleted. If there are any changes it can be updated but should not be deleted.
Issue 2636420: Host will go to "NSX install skipped" state and cluster in "Failed" state post restore if "Remove NSX" is run on cluster post backup.
"NSX Install Skipped" will be shown for host.

Workaround: Following restore, you should have to run "Remove NSX" on the cluster again to achieve the state that was present following backup (not configured state).
Issue 2646702: IDS events detected by the appliance will not be preserved during Configuration Backup operation.
After restore of configuration backup on to the new appliance, all previously detected IDS events cannot be retrieved and are not visible on the new appliance.

Workaround: None.
Issue 2668717: Intermittent traffic loss might be observed for E-W routing between the vRA created networks connected to segments sharing Tier-1.
In cases where vRA creates multiple segments and connects to a shared ESG, V2T will convert such a topology to a shared Tier-1 connected to all segments on the NSX-T side. During the host migration window, intermittent traffic loss might be observed for E-W traffic between workloads connected to the segments sharing the Tier-1.
Issue 2638674: Azure Mellanox driver maintenance upgrades can cause North-South traffic disruption.
When Azure performs a maintenance event for Mellanox devices that involves a hot add of Mellanox device, PCGs that are in the North-South path might experience a North-South traffic loss.

Reboot the PCG to recover the edge-datapath and to restore the North-South traffic connectivity.
Issue 2558576: Global Manager and Local Manager versions of a global profile definition can differ and might have an unknown behavior on Local Manager.
Global DNS, session, or flood profiles created on Global Manager cannot be applied to a local group from UI, but can be applied from API. Hence, an API user can accidentally create profile binding maps and modify global entity on Local Manager.

Use the UI interface to configure system.
Issue 2752246: NTLM/Server-keepalive enabled on L7 virtual server can cause Nginx Core when load balancer connection reuses port quickly.
Load balancer service crashes due to Nginx core.

Workaround: Disable the NTLM/Server-keepalive feature in http profile.
Issue 2730109: When Edge is powering ON, Edge tries to make OSPF neighborship with the peer using its routerlink IP address as a OSPF RouterID though loop back is present.
After reloading Edge, OSPF selects the downlink IP-address (the higher IP-address) as router-id until it receives the OSPF router-id configuration due to the configuration sequencing order. The neighbor entry with older router-id will eventually become stale entry upon receiving OSPF HELLO with new router-id and get expired after dead timer expiry on the peer.

Workaround: None.
Issue 2761589: Default layer 3 rule configuration changes from DENY_ALL to ALLOW_ALL on Management Plane after upgrading from NSX-T 2.x to NSX-T 3.x.
This issue occurs only when rules are not configured via Policy, and the default layer 3 rule on the Management Plane has the DROP action. After upgrade, the default layer 3 rule configuration changes from DENY_ALL to ALLOW_ALL on Management Plane.

Workaround: Set the action of default layer3 rule to DROP from policy UI post upgrade.

Installation Known Issues

Issue 2562189: Transport node deletion goes on indefinitely when the NSX Manager is powered off during the deletion operation.
If the NSX Managers are powered off while transport node deletion is in progress, the transport node deletion may go on indefinitely if there is no user intervention.

Workaround: Once the Managers are back up, prepare the node again and start the deletion process again.

Upgrade Known Issues

Issue 2693576: Transport Node shows "NSX Install Failed" after KVM RHEL 7.9 upgrade to RHEL 8.2.
After RHEL 7.9 upgrade to 8.2, dependencies nsx-opsagent and nsx-cli are missing. Host is marked as install failed. Resolving the failure from the UI doesn't work: Failed to install software on host. Unresolved dependencies: [PyYAML, python-mako, python-netaddr, python3]

Workaround: Manually install the NSX RHEL 8.2 vibs after the host OS upgrade and resolve it from the UI.
Issue 2550492: During an upgrade, the message, "The credentials were incorrect or the account specified has been locked" is
displayed temporarily and the system recovers automatically.
Transient error message during upgrade.

Workaround: None.

NSX Edge Known Issues

Issue 2283559: https://<nsx-manager>/api/v1/routing-table and https://<nsx-manager>/api/v1/forwarding-table MP APIs return an error if the edge has 65k+ routes for RIB and 100k+ routes for FIB.
If the edge has 65k+ routes for RIB and 100k+ routes for FIB, the request from MP to Edge takes more than 10 seconds and results in a timeout. This is a read-only API and has an impact only if they need to download the 65k+ routes for RIB and 100k+ routes for FIB using API/UI.

Workaround: There are two options to fetch the RIB/FIB.
- These APIs support filtering options based on network prefixes or type of route. Use these options to download the routes of interest.
- CLI support in case the entire RIB/FIB table is needed and there is no timeout for the same.
Issue 2521230: BFD status displayed under ‘get bgp neighbor summary’ may not reflect the latest BFD session status correctly.
BGP and BFD can set up their sessions independently. As part of ‘get bgp neighbor summary’ BGP also displays the BFD state. If the BGP is down, it will not process any BFD notifications and will continue to show the last known state. This could lead to displaying stale state for the BFD.

Workaround: Rely on the output of ‘get bfd-sessions’ and check the ‘State’ field to get the most up-to-date BFD status.

Security Known Issues

Issue 2491800: AR channel SSL certificates are not periodically checked for their validity, which could lead to using an expired/revoked certificate for an existing connection.
The connection would be using an expired/revoked SSL.

Workaround: Restart the APH on the Manager node to trigger a reconnection.
Issue 2689449: Incorrect inventory may be seen if the Public Cloud Gateway (PCG) is rebooting.
The managed state of managed instances is shown as unknown. Some inventory information, such as managed state, errors and quarantine status will not be available to the Cloud Service Manager.

Workaround: Wait for PCG to be up, and either wait for periodic sync or trigger account sync.

Federation Known Issues

Issue 2630813: SRM recovery for compute VMs will lose all the NSX tags applied to VM and Segment ports.
If a SRM recovery test or run is initiated, the replicated compute VMs in the disaster recovery location will not have any NSX tags applied.
Issue 2601493: Concurrent config onboarding is not supported on Global Manager in order to prevent heavy processing load.
Although parallel config onboarding does not interfere with each other, multiple such config onboarding executions on GM would make GM slow and sluggish for other operations in general.

Workaround: Security Admin / Users must sync up maintenance windows to avoid initiating config onboarding concurrently.
Issue 2613113: If onboarding is in progress, and restore of Local Manager is done, the status on Global Manager does not change from IN_PROGRESS.
UI shows IN_PROGRESS in Global Manager for Local Manager onboarding. Unable to import the configuration of the restored site.

Workaround: Use the Local Manager API to start the onboarding of the Local Manager site, if required.
Issue 2625009: Inter-SR iBGP sessions keep flapping, when intermediate routers or physical NICs have lower or equal MTU as the inter-SR port.
This can impact inter-site connectivity in Federation topologies.

Workaround: Keep the pNic MTU and intermediate routers' MTU bigger than the global MTU (i.e., the MTU used by inter-SR port). The size of the packets becomes more than MTU because of encapsulation and packets don't go through.
Issue 2606452: Onboarding is blocked when trying to onboard via API.
Onboarding API fails with the error message, "Default transport zone not found at site".

Workaround: Wait for fabric sync between Global Manager and Local Manager to complete.
Issue 2643749: Unable to nest group from custom region created on specific site into group that belongs to system created site specific region.
You will not see the group created in site specific custom region while selecting child group as a member for the group in the system created region with the same location.
Issue 2649240: Deletion is slow when a large number of entities are deleted using individual delete APIs.
It takes significant time to complete the deletion process.

Workaround: Use hierarchical API to delete in bulk.
Issue 2649499: Firewall rule creation takes a long time when individual rules are created one after the other.
Slow API takes more time to create rules.

Workaround: Use Hierarchical API to create several rules.
Issue 2652418: Slow deletion when large number of entities are deleted.
Deletion will be slower.

Workaround: Use the hierarchical API for bulk deletion.
Issue 2655539: Host names are not updated on the Location Manager page of the Global Manager UI when updating the host names using the CLI.
The old host name is shown.

Workaround: None.
Issue 2658687: Global Manager switchover API reports failure when transaction fails, but the failover happens.
API fails, but Global Manager switchover completes.

Workaround: None.
Issue 2630819: Changing LM certificates should not be done after LM register on GM.
When Federation and PKS need to be used on the same LM, PKS tasks to create external VIP & change LM certificate should be done before registering the LM on GM. If done in the reverse order, communications between LM and GM will not be possible after change of LM certificates and LM has to be registered again.
Issue 2658092: Onboarding fails when NSX Intelligence is configured on Local Manager.
Onboarding fails with a principal identity error. and you cannot onboard a system with principal identity user.

Workaround: Create a temporary principal identity user with the same principal identity name that is used by NSX Intelligence.
Issue 2622576: Failures due to duplicate configuration are not propagated correctly to user.
While onboarding is in progress, you see an "Onboarding Failure" message.

Workaround: Restore Local Manager and retry onboarding.
Issue 2679614: When the API certificate is replaced on the Local Manager, the Global Manager's UI will display the message, "General Error has occurred."
When the API certificate is replaced on the Local Manager, the Global Manager's UI will display the message, "General Error has occurred."

Workaround:
1. Open the "Location Manager" of the Global Manager UI.
2. Click the "ACTION" tab under the affected Local Manager and then enter the new thumbprint.
3. If this does not work, off-board the Local Manager and then re-onboard the Local Manager.
Issue 2681092: You can switch from the active Global Manager to the stand-by Global Manager even when the certificate of the latter has expired.
The expired certificate on the standby Global Manager continues to allow communication when it shouldn't.

Workaround: Ensure that certificates have not expired. Alarms are raised when certificates are about to expire.
Issue 2663483: The single-node NSX Manager will disconnect from the rest of the NSX Federation environment if you replace the APH-AR certificate on that NSX Manager.
This issue is seen only with NSX Federation and with the single node NSX Manager Cluster. The single-node NSX Manager will disconnect from the rest of the NSX Federation environment if you replace the APH-AR certificate on that NSX Manager.

Workaround: Single-node NSX Manager cluster deployment is not a supported deployment option, so have three-node NSX Manager cluster.