VMware NSX 4.2.1 | 10 OCT 2024 | Build 24304122 Check for additions and updates to these release notes. |
VMware NSX 4.2.1 | 10 OCT 2024 | Build 24304122 Check for additions and updates to these release notes. |
This release resolves CVE-2024-38818, CVE-2024-38817, and CVE-2024-38815. For more information on these vulnerabilities and their impact on VMware products, see VMSA-2024-0020.
NSX 4.2.1 provides a variety of new features, offering new functionalities for virtualized networking and security for private clouds. Highlights include new features and enhancements in the following focus areas:
NSX now supports VRF configuration at the Global Manager level. This allows customers a single place to configure stretched T0 Gateways.
A number of enhancements that help you provide high availability for networking including TEP Groups, monitoring for dual DPU deployments, and improved alarms on the NSX Edge are now supported.
As part of our multi-tenancy feature set we now offer the exposure of NSX Projects as folders in VMware vCenter.
Custom IDPS Signatures: VMware vDefend Advanced Threat Prevention (ATP) now includes custom signature support, enabling customers to import their own signatures or Suricata-based signatures from third-party threat feeds. Both Distributed IDS/IPS and Gateway IDS/IPS can enforce these custom signatures. With this added layer of security and flexibility, customers are better equipped to protect their organizations against advanced threats.
Gateway Firewall scale enhancements for multiple scenarios including support for up to 2500 rules per section
Security operational enhancements including additional alerts.
Malware File Analysis Test Drive: vDefend Firewall users can now test drive Malware File Analysis, allowing customers to upload artifacts (files/URLs) for comprehensive malware analysis and obtain verdicts especially for highly evasive malware and zero day threats via the VMware Advanced Threat Cloud Service.
In addition, many other capabilities are added in every area of the product. More details are available below.
Networking
Layer 2 Networking
High Availability for TEP Groups: NSX 4.2 introduced TEP Groups, enabling virtual network traffic to be load balanced across multiple Tunnel Endpoints on the ESXi host. This release adds High Availability to those TEP Groups, based on the Bidirectional Forwarding Detection (BFD) session state, ensuring continuity of workload network traffic when a TEP Group member is brought down.
Policy APIs for NSX Enhanced Data Path: Enables REST API based automated configuration & monitoring of Enhanced Data Path operation. Equivalent operations are available only in the NSX CLI prior to this release.
DPU Monitoring is provided as data plane counters in the NSX UI and API, consistent with the RX/TX counter detail presented for performance NICs.
Layer 3 Networking
5-tuple hashing for ESXi Transport Node ECMP: Enhances the configuration workflow of 5-tuple based hashing for ESXi transport nodes. The legacy configuration of 5-tuple is a global parameter that and affects all nodes. Now this option is available to be enabled per host or per TNP (Transport Node Profile). The configuration is available only via API.
Inter-SR Support for Tier-0 VRF Gateways: Introduces support of Inter-SR Routing for Tier-0 VRF Gateways in Active/Active Edge node deployments in addition to the existing Inter-SR support for the parent Tier-0. Deployments using VRFs will be able to deploy Active/Active Edge nodes with improved availability.
Edge Platform
Clean Edge Stale Entry: Enables clean up of stale Edge entries, simplifying the management of your NSX environment for a more streamlined experience.
Auto-Refresh Edge Node Settings: Automatically refreshes settings that have been overwritten via vCenter or Edge CLI, providing a unified experience for managing Edge node configurations.
Edge Alarms: Enhances visibility by introducing more detailed alarms, such as those triggered by Edge Datapath Deadlock detection, read-only datastore status, and low disk space issues, helping you respond to critical events more effectively.
Security
Gateway Firewall
Increased scale for vDefend gateway firewall rules per section increases scale of 2500 rules per Gateway Firewall section. Gateway firewall alarms for scale limits:
System-wide gateway firewall rules: Alarm when GFW system wide rule scale limit is met.
Maximum rules per Edge: Alarm to alert when GFW rules reach maximum supported rules on an Edge.
Maximum SRs and bridges per Edge: Alert when maximum supported gateways / bridges hosted per Edge are met.
Distributed Firewall
Security Usage Report is now available as a CSV File, with details on current security features in use and associated license core counts for vDefend Firewall and vDefend Advanced Threat Prevention. Introduces a new NSX API to collect security usage data using "GET /license/security-usage?format=csv".
Intrusion Detection & Prevention System (IDS/IPS)
Custom Signature Support: vDefend IDS/IPS now supports the ability to upload your own custom Suricata-based signatures for use in both the Gateway and Distributed IDS/IPS feature sets.
Signature Thresholds: vDefend IDS/IPS now supports advanced system signature functionality to configure alerting thresholds for signatures to reduce noisy signatures and also change actions on detection of threats.
Malware Prevention
Malware Protection Test Drive: Lets VMware Firewall users leverage malware protection capabilities without requiring NAPP or an ATP SKU. Once activated, you can upload artifacts (files/URLs) for comprehensive malware analysis via our cloud backend, which includes both static and dynamic analysis.
Platform and Operations
Automation
Enhanced tooling to migrate from Management API to Policy API: The Management to Policy tooling has been enhanced to support the promotion of Management API objects (MP API) to Policy objects in cases of mixed mode - when configurations from both API (MP and Policy) are intertwined in same setup.
Multi-Tenancy and Virtual Private Cloud
Networking Folders in vCenter for Projects & VPC: Introduces the ability to have in the vCenter Networking tab NSX Managed folders reflecting the Projects and VPCs. You can now have vSphere PortGroups organized according to the NSX tenancy model, hence enhancing visibility from vCenter of network tenancy and allowing you to put RBAC on networks from a specific Project or a VPC. This feature has some requirements on the vSphere and vCenter version. Please review documentation.
Lifecyle Management (Installation and Upgrade) & Certificate Management
Upgrade: Support for in-place upgrade for VUM-based clusters in VCF deployments thus allowing for faster host upgrades.
Certificate Management:
NSX automatically renews some self-signed X.509 certificates before they can expire. Renewed certificates preserve their values and properties, like key length and validity period.
NSX UI now allows users to enter Subject Alternative Names (SAN) to create certificates. Two standard formats are supported: A list of FQDNs with wildcards, like *.example.com, and a list of IP addresses.
Scale
VMware has made several updates to the maximum scale supported by NSX. Details are provided on the VMware Configuration Maximums tool.
Federation
Tier-0 VRF Gateway Support Federation: This release introduces support for Tier-0 VRF Gateway configuration from Global Manager. Tier-0 VRF Gateway across multiple locations is also supported with the same topologies and operation modes as previously supported for stretched parent Tier-0.
1. Feature Deprecations
Reminder of the deprecation for NSX Manager APIs and NSX Advanced UIs
2. Entitlement Changes
Entitlement Change for the NSX Load Balancer
In a future major release of NSX, VMware intends to change the entitlement of the built-in NSX load balancer (a.k.a. NSX-T Load Balancer). This load balancer will only support load balancing for Aria Automation, IaaS Control Plane (Supervisor Cluster), and load balancing of VCF infrastructure components.
VMware recommends that customers who need general purpose and advanced load balancing features purchase Avi Load Balancer. Avi provides a superset of the NSX load balancing functionality including GSLB, advanced analytics, container ingress, application security, and WAF.
Existing entitlement to the built-in NSX load balancer for customers using NSX 4.x will remain for the duration of the NSX 4.x release series.
3. API Deprecations and Behavior Changes
Following industry-wide recommendations to enhance security and encryption of data that consume NSX services, support for Transport Layer Security (TLS) 1.1 is terminated in NSX.
Starting with NSX 4.2, TLS 1.2 or 1.3 are the supported versions. However, if you upgrade from previous NSX releases that use TLS 1.1 to NSX 4.2, NSX will maintain TLS 1.1.
Users are advised to use tools that are compatible with TLS 1.3.
To simplify API consumption, refer to the new pages containing a list of deprecated and removed APIs and Types in the NSX API Guide.
Removed APIs: To review the removed APIs from NSX, view the Removed Methods category in the NSX API Guide. It lists the APIs removed and the version when removed.
Deprecated APIs: To review the deprecated APIs from NSX, view the Deprecated Methods category in the NSX API Guide. It lists the deprecated APIs still available in the product.
For compatibility and system requirements information, see the VMware Product Interoperability Matrices and the NSX Installation Guide.
For instructions about upgrading NSX components, see the NSX Upgrade Guide.
This release is not supported for NSX Cloud customers deployed with AWS/Azure workloads. Do not upgrade your environment in this scenario.
Note: Customers upgrading to NSX 3.2.1 or earlier are recommended to run the NSX Upgrade Evaluation Tool before starting the upgrade process. The tool is designed to ensure success by checking the health and readiness of your NSX Manager repository prior to upgrading. For customers upgrading to NSX 3.2.2 or later, the tool is already integrated into the Upgrade workflow as part of the upgrade pre-checks; no separate action is needed.
Upgrade Integration Issues Due to Download Site Decommission
The NSX upgrade experience is impacted due to the decommissioning of downloads.vmware.com. See knowledge base article 372634 before upgrading.
Beginning with the next major release, we will be reducing the number of supported localization languages. The three supported languages will be:
Japanese
Spanish
French
The following languages will no longer be supported:
Italian, German, Korean, Traditional Chinese, and Simplified Chinese.
Impact:
Customers who have been using the deprecated languages will no longer receive updates or support in these languages.
All user interfaces, help documentation, and customer support will be available only in English or in the three supported languages mentioned above.
Because NSX localization utilizes the browser language settings, ensure that your settings match the desired language.
Revision Date |
Edition |
Changes |
---|---|---|
October 9, 2024 |
1 |
Initial edition |
October 10, 2024 |
2 |
Added CVE-2024-38817 to resolved security vulnerabilities in this release, updated API Deprecation section, added known issue 3352650, and updated known issue 3432205. |
October 17, 2024 |
3 |
Moved 3407134 to resolved issues. |
Fixed Issue 3407134: In NSX-T versions prior to 3.1.2 users may experience the root partition getting full.
In rare cases, the root directory may get filled up with remnants of a third-party vendor repo that is no longer used. This may cause an upgrade to fail. Starting with VMware NSX-T Data Center 3.1.2, a different vendor was used and these signatures are no longer required.
Workaround: While upgrading to NSX-T version 3.1.2 and later,remove the /home/secureall/secureall/policy/trustwave-repo folder to clear partition space and restart the upgrade. For details, refer to the knowledge base article 372374.
Fixed Issue 3394991: SHA would crash in the case it connects to NAPP by FQDN and network disruption happens with no DNS cache.
There would be crash core dump file, which would raise an crash alert on UI.
Fixed Issue 3405911: In rare scenarios, nsx-exporter on ESX host may crash.
There will be a brief service interruption, which may delay data updates on the MP. The nsx-exporter process restarts automatically and resumes operation.
Fixed Issue 3432205: Auditor users can acquire Enterprise Admin privileges when groups are used.
Users may get higher permissions due to corruption in the user-role bindings when they belong to groups named with lowercase letters.
Fixed Issue 3429630: UDP packets from source port 53 are allowed on the LM/GM.
UDP packages from source port 53 are allowed on LM/GM. Security scanners will flag this.
Fixed Issue 3421157: Advertised static routes over inter-vrf routing are not deleted on VRF delete.
On a target VRF, users will see stale advertised routes.
Fixed Issue 3421098: After replacing CBM_CLUSTER_MANAGER certificate on manager nodes, the cluster status stays DEGRADED forever.
The manager components are not functioning properly, and UI is not available.
Fixed Issue 3408608: Configuration mismatch for Edge node network interface fails to resolve even with a valid segment or distributed port group connected.
Fixed Issue 3408318: Edges created prior to 4.2.0, without cliUsername, were not able to upgrade.
No new configurations can be updated on the Edge transport node.
Fixed Issue 3402485: IPDiscovery profile on Segment/Port falls back to Default configuration for a brief period during updates to the Segment/Port - Profile association.
DFW rules can behave in an unexpected manner.
[Fixed Issue 3402208: If TCP multiplexing is enabled in LB pool, some HTTP packets to the LB L7 virtual server are not forwarded to the corret TCP port of the pool member and the HTTP connection is broken.
Some HTTP connections to LB L7 virtual server cannot be established.
Fixed Issue 3420193: Logs that have a newline char embedded in it, is logged in multiple lines.
Some logs span multiple lines.
Fixed Issue 3417349: A traffic disruption (2 to 3 seconds) when vMotion occurs if the vMotioned VM is the first VM on the host.
North-south traffic sees a 2 second disruption when vMotion occurs if the vMotioned VM is the first VM on the host.
Fixed Issue 3415013: The ND Suppression cache for LS bound to DR (ESX) not getting cleaned up.
Traffic to the Gateway from a VM will get dropped or lost.
Fixed Issue 3413637: Discrepancy in VTEP HA Switch Profile when it has been configured from the API and UI.
Customers can miss the multi TEP HA profile configuration from the host switch, and it may lead to the TEP HA feature getting disabled on the host.
Fixed Issue 3409559 :XLarge LB Perf Profile cannot be found in Edge while applying it via the Edge CLI.
LB service performance cannot be optimized according to the LB Perf Profile.
Fixed Issue 3405942: "service_status_unknown" alerts are raised for the Messaging-manager service.
Fixed Issue 3405672: NSX-T VDR cannot resolve SR backplane MAC when VM and Edge are on same host.
This was because of having 2 different values for same config key(com.vmware.port.extraConfig.vdl2.nestedTNConfig) in the logicalport.
Fixed Issue 3402015: Brownfield configuration import failed because of the trace-flow configuration exists on LM.
Users see an error message an on the UI or in the API response of the start configuration onboarding API indicating failure of the config onboarding.
User will see a message to restore the LM from backup.
Fixed Issue 3399480: VMs are still present in groups even though their tags have been deleted.
Customer may find VMs still present in groups even though their tags have been deleted.
Fixed Issue 3391640: In Federation setup, Local Manager to Global Manager connectivity was broken.
Fixed Issue 3395382: The Liagent service is in running state prior the NSX upgrade when there are Log Insight server configured. The Liagent service will not run after the NSX upgrade, even if there are Log Insight servers is configured in the NSX appliance.
Customer have to start the Liagent service manually after upgrade.
Fixed Issue 3394029: The Policy Latency Profile is stuck deleting.
When deleting the Policy Latency profile, it is stuck deleting due to the Corfu update conflict.
Fixed Issue 3429123: Datapath CPU utilisation in the monitor tab of Edge shows N/A when it's 0%. This is a UI only issue.
This might misinform the customer about the Datapath CPU utilisation of the Edge, showing that the metric is not available, where as it has a value of 0 percent.
Fixed Issue 3409177: In rare scenarios, alarms generated in UA may be marked as resolved after restart of nsx-opeagent-appliance service.
An open alarm might be resolved despite the alarm condition still persisting in the system if nsx-opsagent-appliance was restarted.
Fixed Issue 3408734: After upgrade and reboot, vmware-snmpd is not running in NSX Manager.
SNMP functionality in NSX Manager is impacted.
Fixed Issue 3371076: Static Route Entry doesn't appear on Tier-0 SR.
nsd still has a stale entry that has the same ifindex.
Fixed Issue 3422927: NSX edge node interface deployed from GM is not realized on LM edge.
Uplink deletion gets stuck completely.
Fixed Issue 3422686: Interface (uplink) deletion during the asynchronous crypto operations (enqueue/dequeue) resulted in invalid iface reference in mbuf causing datapathd crash.
Edge failover happens due to DP crash on edge. Traffic flowing through the Edge node may get interrupted for a small duration (failover duration).
Fixed Issue 3420120: BGP Neighbor API fails with error as "Invalid route map (error code 503056)."
Users are unable to make BGP configuration change which affects the operations.
Fixed Issue 3411831: When locales other than English are used in the browser when the Uplink profile is saved, it does not gets saved properly due to teaming policies and ends up in validation error.
User won't be able to create an Uplink profile with other locales.
Fixed Issue 3407006: Global config GlobalReplicationMode and tep-group feature should both not be enabled at the same time.
Incorrect configuration is pushed to the dataplane. A BFD session is formed between edges.
Fixed Issue 3405848: Multiple NSX backed VLANs went down during the NSX Manger upgrade due to incorrect bridge configuration being pushed.
Fixed Issue 3402602: After edge failover, HA status of edge nodes is not reflected immediately on UI. It shows stale information.
After failover, HA status on UI will get updated after 5 minutes.
Fixed Issue 3399603: LDAP users are unable to access NSX.
LDAP users see the error "Access Denied" after logging into NSX.
Fixed Issue 3396695: When using the Transport Node Collection API, an NPE will be thrown if the Transport Node Profile (TNP) ID is not specified.
The TNC will not be created and an NPE is thrown to the user.
Fixed Issue 3396017: Operations db disk usage high alarm is raised for high disk usage in /nonconfig folder.
No functional impact.
Fixed Issue 3395506: When stale host transport node entries are detected, notify the user with a clear message and indicating the next steps.
When stale host transport node entries are detected, the user will receive a clear error message with the next steps to take.
Fixed Issue 3402143: Uninstall NSX security only on cluster fails.
Customer won't be able to uninstall NSX secutiry only on the cluster.
Fixed Issue 3392551: Core dump happened on a Nginx process on a T1 standby edge node.
A core dump happened on an Nginx process on a T1 standby edge node. This issue occurred during load balancer reconfiguration and HA syncing.
Alarms were triggered, and the problematic load balancer service was downgraded to a standalone mode for a while.
Fixed Issue 3392421: Memory of SHA on UA keeps increasing due to memory leak.
The memory usage on Manager node keeps increasing, even hitting 90% and raising alarms.
Fixed Issue 3392316: If all DPUs are down in a SmartNIC setup, NSX world agent's connections to NSX proxy get marked as down.
This results in Aggregation service on MP marking unrelated component status (eg. vmnic0) as down due to heartbeat timeout with the user world agents on the Host. For a SmartNIC setup, vmnic0 PNIC interface status on the UI will show as down.
Fixed Issue 3391640: In Federation setup, Local Manager to Global Manager connectivity was broken.
Local Manager to Global Manager connectivity was broken.
Fixed Issue 3385062: TransportNodeProfile was stored with transportZone path starting with /infra in transport_zone_endpoints. Attempts to fetch it, returns /global-infra transportZone in transport_zone_endpoints.
This caused unexpected behaviour on UI side.
Fixed Issue 3382950: Error logs about "Failed to get VIF" for VDR port.
An issue enabled end-to-end latency on customer's setup, and reporting Configuration Errors in Loginsight.
Fixed Issue 3382874: If a customer connects to NSX with their browser, is redirected to VIDM for login, and then takes no action for a long time, then login may fail or the browser may be redirected to a VIDM page instead of NSX.
Login fails or browsers are redirected to the wrong location.
Fixed Issue 3382202: LDAP users with certain special characters in their usernames or passwords cannot log into NSX.
Customers with LDAP users with special characters in their usernames or passwords cannot log in to NSX.
Fixed Issue 3380334: T0 SR's routing is Down because service_frr is not running
In some rare condition when system hits resources and storage issue, NSD failed to start service_frr.
Fixed Issue 3380317: NSX manager cluster becomes unavailable in UI and no appliance/manager node is visible when total count of TN is high.
Customer may have some delay on getting the cluster status in NSX manager UI in some race conditions (when total count of TN is high.)
Fixed Issue 3371543: Issue where active phase-1 IKE SA memory getting leaked in quicksec code. Due to this, all the sessions go Down one by one with the reason "Out of memory".
As sessions are going to Down status, there will be traffic impact.
Fixed Issue 3297011: In case of multiple ESP SPIs for single tunnel, NSX Edge may start using an old ESP SPI which is not acceptable to a few VPN vendors. These vendors silently drop the ESP packets sent by NSX Edge.
VPN Datapath for the specific tunnel shall be disputed. BGP over RBVPM shall also go down.
Fixed Issue 3396626: Unable to search with IP address in Firewall page on Global Manager using domain credentials.
VIDM- or AD-integrated users are unable to perform firewall rule searches using IP Addresses/VMs. The same search works properly if logged in using the admin user.
Fixed Issue 3427000/ 3431505: PSOD can occur during vMotion import.
PSOD can occur if the vMotion import connection state information contains an extreme number (millions) of L7 attributes.
Fixed Issue 3415047: NSX Edge VM fails to start after deploying datapath down.
After the upgrade OS reboot, the dispatcher and dataplane services fail and will not start on the Edge. This is caused by some older Intel CPUs and results in the upgrade pausing.
Fixed Issue 3410308: DFW policy installation failed due to time-based rules when payload has white space while creating firewall scheduler.
Rule's realization will be in a failed state in the UI with Error - 'internal error occurred on firewall kernel." New and existing rules will not be realized on the hosts.
Fixed Issue 3407134: NSX-T upgrades for versions prior to 3.1.2 cannot complete due to root partition size.
The root directory is filled up with remnants of a third-party vendor repo that is no longer used and causes the upgrade to fail. Starting with VMware NSX-T Data Center 3.1.2, a different vendor was used and these signatures are no longer required.
Workaround: Remove the /home/secureall/secureall/policy/trustwave-repo folder to clear partition space and restart the upgrade. For details, refer to the knowledge base article 372374.
Fixed Issue 3401797: Firewall rules are not displayed correctly in View Gateway Firewall for Tier-1 gateway.
Rules are displaying inaccurately and VIDM- or an AD-integrated users cannot use IP addresses or VMs to search firewall rules.
Fixed Issue 3401745: Unable to reorder security policies causes internal failure with sequence_number -1.
When attempting to create the 200th security policy in NSX using the action=revise&operation=insert_bottom API, the API fails.
Fixed Issue 3399527: NSX upgrade from 3.2.3 to 4.1.2.1 fails due to StaleFirewallSectionEntityBarrierCleanUpMigrationTask.
Upgrade stops and is unable to proceed.
Fixed Issue 3377639: DHCP relay traffic is dropped by IDPS if the DHCP relay is configured on a Tier-1 gateway.
DHCP traffic is interrupted for a few minutes.
Fixed Issue 3431076: Routes are missing in T0 user VRF.
Traffic relying on those routes are dropped or blackholed.
Fixed Issue 3427488: Users who are added to NSX via LDAP groups may receive elevated permissions that they should not have.
LDAP users may be able to modify NSX object when they should not be able to.
Fixed Issue 3423329: After upgrading to NSX 4.2, if an LDAP identity source was configured prior to the upgrade, resolution of nested LDAP groups is incorrectly turned off.
Users cannot log on to the NSX Manager using LDAP.
Fixed Issue 3411698: Cross-subnet traffic drops after Edge bridge fails back on exiting maintenance mode.
Traffic is lost from seconds to minutes after active bridge fail back.
Fixed Issue 3402448: Edge-agent crashes due to a segmentation fault.
The Edge-agent restarts after the crash. Certain Edge operations do not work during this time.
Fixed Issue 3401519: After toggling the gateway connectivity (off--on) of a downlink segment with bridging, VLAN to VNI cross-subnet traffic is down after edge bridge failover.
Traffic is down until the ARP resolution timeout.
Fixed Issue 3398452: In an extra-large deployments, the capacity dashboard shows the label as “Maximum capacity displayed above is for large appliance only.
The footer shows incorrect information.
Fixed Issue 3395578: After uninstalling security from a cluster, the discovered DVPG is not deleted if it is a part of a dynamic group.
After uninstalling security from a cluster, the discovered DVPG is not deleted if it is a part of a dynamic group.
Even after the group is deleted, the discovered DVPG will not get deleted. This could result in stale discovered DVPGs on the MP.
Fixed Issue 3357794: Some Baremetal Edges with in-band mgmt configured may lose management connectivity when entering maintenance mode.
If performing an Edge upgrade, management connectivity may be lost when the Edge enters maintenance mode at the beginning of the upgrade, which will cause the upgrade to fail.
Fixed Issue 3431505: PSOD can occur during vMotion import.
PSOD can occur if the vMotion import connection state information contains an extreme number (millions) of L7 attributes.
Fixed Issue 3361383: Advanced Threat Prevention feature may take excessive time to analyze and provide results of file inspection during high load periods.
This occurs when many files are downloaded in a short period (for example, a guest OS upgrade is performed). During this time the result of malware analysis is delayed. With extreme loads, it might take from 1 to 4 hours for results to arrive.
Fixed Issue 3410849: User is able to delete policies/sections in the filtered view that could unintentionally delete rules not visible in the filtered result set.
Filtered views show only a subset of rules. Deleting the entire policy based on a filtered view could unintentionally remove rules that are not available in filtered result set.
Fixed Issue 3393742: Malware SVM doesn’t get IP from static pool.
On the MPS Service Deployment UI screen, choose Static IP allocation. Do not select any IP Pool from the list and choose to deploy.
Fixed Issue 3298108: During maintenance mode upgrade to NSX 4.1.2 with ESX version at 8.* or ESX version upgrade to 8.* with NSX version at 4.1.2, underlay gateway information is lost, resulting in overlay datapath outage.
Downtime due to overlay VM traffic outage may occur.
Issue 3352650: Duplicate VTEP IPs may get assigned to host and Edge TNs.
In certain user misconfigurations and other corner cases, the tunnels will go down for the impacted ones and recover after reconfiguration of the same.
Workaround: Uninstall all TN's (hosts/Edges) where duplicates are seen. Re-install the TN's.
Issue 3417784: NSX Manager deletion stuck in UI at 1% for failed NSX Manager.
Customer can't delete NSX Manager.
Workaround: Delete NSX Manager using one of the following APIs:
POST https://<manager_ip>/api/v1/cluster/nodes/deployments/<deleted-vm-id>?action=delete
POST https://<manager_ip>/api/v1/cluster/nodes/deployments/<deleted-vm-id>?action=delete&force_delete=true
Issue 3405216: Upgrade might fail if nsxdp-cli / nsxcli commands will be run in background once upgrade is initiated.
Unload of NSX-T modules can fail causing upgrade failure.
Workaround: Do not run nsxdp-cli / nsxcli commands until the upgrade completes.
Issue 3355669: During vlan-to-overlay migration, VMs are allowed to migrate despite a lack of bridging between the vlan and segment.
If the customer unknowingly migrates the VM without the check, they could be able to continue the migration with the VM data path being broken.
Workaround: None.
Issue 3432380: IPFIX data received is different than the actual data flowing between VMs.
When the total number of flows for a VM exceeds 100K, the IPFIX active stats are not accurate and can be counted multiple times due to incorrect updates to active flows. vRNI utilizes DFW IPFIX to tabulate the amount of data exchanged between VMs.
Workaround: If possible reduce total active flows for one filter to less than 100K. Alternately, increase the threshold to 150K active flows per filter in nsx-exporter. Long active flows won't be treated as new flows.
Issue 3429787: TN Flow Exporter disconnected alarm appears on UI sometimes.
Flows failed to reach Security Intelligence and nsxcli showed zero acknowledged counters. Workaround: Restart the exporter using /etc/init.d/nsx-exporter restart. After the restart, the exporter will use the latest host certificate for the SSL negotiation with the Kafka broker.
Workaround: Restart the exporter using /etc/init.d/nsx-exporter restart. After the restart, the exporter will use the latest host certificate for the SSL negotiation with the Kafka broker.
Issue 3391073: For high scale setups with over 20K rules, CLI runs OOM after 30 minute wait time.
In customer environment cases, customers won't run this command. CLI is for internal debugging purpose only. The CLI cannot handle the number of configured rules. Happens only under heavy stress.
Workaround: Use the following command that runs in less than a minute:
edge-appctl -t /var/run/vmware/edge/dpd.ctl fw/show 1a631fa9-d790-41bb-8eff-0cd036d5d7c4 ruleset | python -m json.tool
Issue 3426922: Service Deployment UI stops responding during Upgrade-In-Progress state when updating an SVM to a new OVF.
After using the Service Deployment -> Change appliance workflow to update an SVM to a new OVF, if that OVF is unreachable, then we restore the OVF back to the original OVF. Even though the service deployment UI stops responding, the API responds with "Deployment Successful." There is no functionality loss. The deployment is working and the API returns the correct status.
Workaround: Power-off any of the SVMs. The deployment status will change to filed/error state. Click Resolve on the UI.
Issue 3389383: The standby Edge connecton table displays the old firewall rule ID.
The default rule action on the standby Edge displays the old value in get firewall connections output.
Workaround: None.
Issue 3422264: GFW firewall rules are getting reapplied on Edge nodes after TLS is enabled on Tier-1.
All of the GFW rules are getting repushed to the Edge node after toggling the TLS feature for that gateway. This causes temporary performance degradation, but has no functional impact.
Workaround:
1. If you have not configured any TLS rules, then you should not use TLS enable/disable switch.
2. The sequence to add a TLS rule is to first create and add TLS rules and later enable the TLS switch so that the TLS rules are realized.
Issue 3428873: The UI may display confusing information as to why a custom signature is invalid.
After uploading a custom signature or cloning an existing system signature that results in an invalid signature, the reason displayed in the UI may include information that is not helpful. For example, if you upload a custom signature that depends on a Lua script, the invalid error message: "original_signature - Invalid dependency on unknown Lua script" displays with no explanation of what "original_signature" is.
Workaround: None.
Issue 3431486: /var/log directory size on standby nodes grow to 100% before rotation then shrinks back.
On standby nodes, the rotation of log files may not occur as expected causing them to grow too large. No visible impact as this happens on the standby and resolves itself.
Workaround: None.
Issue 3389911: Firewall TCP Reset is not generated for bridge packets with VLAN tagging.
The connection continues to retry and then times out since the TCP Reset (RST) packet is not sent by the firewall after matching the REJECT rule on the bridge port..
Workaround: Use the DROP action instead of REJECT for L4 rules with bridge firewall when VLAN tagging is present. With the REJECT rule, there is no workaround, the connection will time out, reset will not be sent.
Issue 3431914: NSX Intelligence data collection service is disabled on a few hosts after upgrading to 4.2.1.
Data Collection service is disabled on a few hosts or clusters after upgrade. No data will be received from the affected TNs.
Workaround: Navigate to System > Security Intelligence on the NSX Manager UI and toggle data collection for the affected TNs or clusters. This will reset the config and enable the collection service on them.
Issue 3411866: Malware prevention events overload the PostgreSQL database, resulting in the unavailability of the NSX Application Platform UI.
When Malware Prevention events exceed two million in less than 14 days, the PostgreSQL database becomes overloaded. As a result, the NSX Application Platform UI shows the UNAVAILABLE state, and new Malware Prevention events do not appear on the UI.
Workaround: Delete the records from the PostgreSQL database and vacuum the database to resolve the problem. See KB article 320807.
Issue 3396277: Upgrade integration issues due to download.vmware.com site decommission.
Some integration issues occur because download.vmware.com is no longer available.
UI Notifications listing the NSX releases available for upgrade will not be available.
You will not be able to download the release binaries automatically to the NSX appliance.
The NSX pre-upgrade checks bundle (PUB), that is, asynchronous prechecks or dynamic prechecks, will not be downloaded automatically to the NSX appliance.
Workaround: Before upgrading, refer to the knowledge base article 372634 for details.
Issue 3391130: Onboarding NSX Federation 4.2.0 Local Manager to 4.1.1 Global Manager fails with compatibility error.
The issue occurs when GM is 4.1.1 and the LM version of NSX is greater than the GM.
Workaround: Upgrade Global Manager version to NSX Federation 4.1.2.
Issue 3215655: While upgrading the NSX Application Platform, some periodic cronjobs might not run if an older repository URL is blocked before the repository URL is updated to point to the new repository that contains the uploaded target version charts and images.
ImagePullBackoff error after the old repository URL is blocked before the repository URL is updated to point to the new repository. The NSX Application Platform upgrade might complete but certain periodic NSX Intelligence cron jobs might not be able to run after the upgrade completes.
Workaround: Log in to the NSX Manager and use the following command to manually delete the failed or stuck jobs.
napp-k delete job <job-name>
Also, avoid blocking access to an older repository URL before the NSX Application Platform upgrade has completed.
Issue 3364365: NSX upgrade fails due to the kernel module load failure.
Live upgrade or maintenance upgrade will fail and the system will require a reboot that increases the upgrade time.
Issue 3380679: DPU fail over due to uplink down impacts the TCP traffic.
If the distributed firewall Policy is set to "TCP Strict" mode, DPU fail over due to uplink status impacts the TCP traffic.
Workaround: In the NSX Manager distributed firewall policy, disable and enable the "TCP Strict" mode to recover or resume the traffic.
Issue 3377155: API response for creating a DFW Security Policy takes a long time (~275 seconds) due to duplicate Groups validation.
Create security policies in chunks since creating full scale polices with rules could take longer times on large scale setups.
Issue 3391552: High connection rates and concurrent configuration changes, including group membership and firewall rule updates, can cause NSX Edge to run out of memory.
New sessions will be impacted.
Issue 3384652: During upgrade, when existing DFW flows are migrated from 3.2.4 to 4.2.0, the flows are not classified correctly resulting in matching incorrect rules.
Existing L7 flows might hit incorrect rules during and after the upgrade.
Workaround: None. New flows after the migration will not have problem.
Issue 3332991: 0.0.0.0 IP address is inaccurately treated as 0.0.0.0/0 [ANY] in DFW rules.
Firewall rules with zero IPs (both v4 and v6) in source or destination fields were treated and enforced as ANY match.
Workaround: No good workaround since support for zero IP address is not present in NSX since inception.
Issue 3402184: Some TLS Inspection stats will be reported improperly on NAPP.
Gateway Firewall TLS stats are not updated properly during the failover scenarios of the Active-Standby edge deployments. ACTIVE-STANDBY failover stats will reset to 0.
Issue 3276632: IPv4/IPv6 BGP sessions fail to establish due to IPv4/IPv6 addresses missing on the interfaces.
IPv4/IPv6 BGP sessions are stuck in an idle state. That is, sessions are not established.
Traffic through the problematic BGP session might be disrupted. However, BGP would gracefully restart on its own.
To recover the missing IPv4/IPv6 addresses, you can rescan the interfaces by running the following commands on the edge CLI:
Edge> get logical-routers Edge> vrf <vrf_id of SERVICE_ROUTER_TIER0> Edge(tier0_sr)> set debug Edge(tier0_sr)> start rescan interfaces Edge(tier0_sr)> exit
Issue 3272782: Post host upgrade from baseline remediation, TN states of hosts is shown as installed failed with errors in 'Configuration complete' step. In the error message, we can see "Node has invalid version 4.1.2.0.0-8.0.22293677 of software nsx-monitoring" for all builtin_ids of host.
If a user tries to monitor the status of OS upgrade through automation, then there is a chance that incorrect reporting is shown temporarily. The issue can be fixed by using the same resolver workflow which is followed when a host TN creation fails. On the UI, click the Install failed status of the host. A popup appears with the error message. Click Resolve.
Workaround: None.
Issue 3273294: The member in a group uses short ipv6 address format, but in earlier releases long format address is used.
There is no functional/security impact. It is a visibility related change of behavior.
Workaround: None.
Issue 3268012: The special wildcard character "^" in Custom Fully Qualified Domain Name (FQDN) values is available starting from GM version 4.1.2 onward. In case of federation deployments where LMs/sites are on lower versions, GM created firewall rules consisting of context profiles which in turn have Custom Fully Qualified Domain Name (FQDN) with "^" will have undeterministic behavior on the datapath.
Undeterministic behavior on the datapath of GM created firewall rules consisting of context profiles which in turn have Custom Fully Qualified Domain Name (FQDN) with "^".
Workaround:
If feasible, remove or update: the custom FQDN to remove "^" on 4.1.2 GM, context profile consuming that custom FQDN, or rules consuming the context profile.
If step 1 is not feasible, upgrade the lower version LMs (4.1.1) to the 4.1.2 version.
Issue 3242530: New NSX-T Segments are not appearing in vCenter.
Unable to deploy new segments.
Workaround: Export and import DVS without preserving DVS IDs.
Issue 3278718: Failure in packet capture (PCAP) export if the PCAP file has not been received by the NSX Manager.
Users will not be able to export the requested PCAPs as the request will fail.
Issue 3261593: IDFW alarms will be reset after upgrade.
After upgrade, the existing alarms will be reset. These alarms will be re-created if the issues remain and the corresponding operations are performed.
Workaround: None.
Issue 3233352: Request payload validations (including password strength) are bypassed on redeploy.
Alarm cannot be resolved and edit of TN configuration is not allowed till the password is fixed.
Workaround: Fix the invalidated passwords by using the API POST https//<nsx-manager>/api/v1/transport-nodes/<node-id>?action=addOrUpdatePlacementReferences documented in the NSX-T Data Center REST API Reference Guide.
Issue 3262712: IPv4-compatible IPv6 address of the format ::<ipv4> gets converted to its equivalent IPv6 address in effective membership API response.
There is no functional or security impact. The effective membership API response for Ipv4-compatible Ipv6 address will be different.
Workaround: None. This is a change of behavior introduced in NSX 4.1.2.
Issue 3261528: LB Admin is able to create the Tier-1 Gateway, but while deleting the Tier-1 Gateway it directs the page to Login page and LB admin needs to login again. After logging in, it is observed that the Tier-1 gateway is not deleted from the list/table.
The LB Admins cannot delete the Tier-1 created by them.
Workaround:
Log in as one of the following users:
enterprise_admin, cloud_admin, site_reliability_engineer, network_engineer, security_engineer, org_admin, project_admin, or vpc_admin (vpc_admin to delete the security-config policy resource).
Issue 3236772: After removing vIDM configuration, logs still show that background tasks are attempting to still reach invalid vIDM.
Logs for NAPI will show the following error message after vIDM configuration is removed: Error reaching given VMware Identity Manager address <vIDM-FQDN> | [Errno -2] Name or service not known.
Workaround: None.
Issue 2787353: Host transport node (TN) creation via vLCM workflow fails when host has undergone specific host movements in VC.
Users will not be able to create a host TN.
Workaround: Follow the regular resolver workflow for the vLCM cluster level from NSX UI.
Issue 3167100: Tunnels new configuration take several minutes to be observed from UI.
It takes several minutes to observe the new tunnels information after configuring the host node.
Workaround: None.
Issue 3245183: The "join CSM command" adds CSM to MP cluster, but does not add the Manager account on CSM.
It will not be possible to continue with any other CSM work unless the Manager account is added on CSM.
Workaround:
Run the join command without including CSM login credentials.
Example:
join <manager-IP> cluster-id <MP-cluster-ID> username <MP-username> password <MP-password> thumbprint <MP-thumbprint>
Add NSX Manager details in CSM through UI.
a. Go to System -> Settings.
b. Click Configure on the Associated NSX Node tile.
c. Provide NSX Manager details (username, password, and thumbprint).
Issue 3214034: Internal T0-T1 transit subnet prefix change after tier-0 creation is not supported by ESX datapath from Day 1.
In cases where tier-1 router is created without SR, traffic loss can happen if the transient subnet IP prefix is changed.
Workaround: Instead of changing the transient subnet IP prefix, delete and add Logical Router Port with a new transient subnet IP.
Issue 3248603: NSX Manager File system is corrupted or goes into read only mode.
In the /var/log/syslog, you may see log messages similar to the log lines below.
2023-06-30T01:34:55.506234+00:00 nos-wld-nsxtmn02.vcf.netone.local kernel - - - [6869346.074509] sd 2:0:1:0: [sdb] tag#1 CDB: Write(10) 2a 00 04 af de e0 00 02 78 00
2023-06-30T01:34:55.506238+00:00 nos-wld-nsxtmn02.vcf.netone.local kernel - - - [6869346.074512] print_req_error: 1 callbacks suppressed
2023-06-30T01:34:55.506240+00:00 nos-wld-nsxtmn02.vcf.netone.local kernel - - - [6869346.074516] print_req_error: I/O error, dev sdb, sector 78634720
2023-06-30T01:34:55.513497+00:00 nos-wld-nsxtmn02.vcf.netone.local kernel - - - [6869346.075123] EXT4-fs warning: 3 callbacks suppressed
2023-06-30T01:34:55.513521+00:00 nos-wld-nsxtmn02.vcf.netone.local kernel - - - [6869346.075127] EXT4-fs warning (device dm-8): ext4_end_bio:323: I/O error 10 writing to inode 4194321 (offset 85286912 size 872448 starting block 9828828)
Appliance may not work as normal.
Workaround: See knowledge base article 330478 for details.
Issue 3010038: On a two-port LAG that serves Edge Uniform Passthrough (UPT) VMs, if the physical connection to one of the LAG ports is disconnected, the uplink will be down, but Virtual Functions (VFs) used by those UPT VMs will continue to be up and running as they get connectivity through the other LAG interface.
No impact.
Workaround: None.
Issue 2490064: Attempting to disable VMware Identity Manager with "External LB" toggled on does not work.
After enabling VMware Identity Manager integration on NSX with "External LB", if you attempt to then disable integration by switching "External LB" off, after about a minute, the initial configuration will reappear and overwrite local changes.
Workaround: When attempting to disable vIDM, do not toggle the External LB flag off; only toggle off vIDM Integration. This will cause that config to be saved to the database and synced to the other nodes.
Issue 2558576: Global Manager and Local Manager versions of a global profile definition can differ and might have an unknown behavior on Local Manager.
Global DNS, session, or flood profiles created on Global Manager cannot be applied to a local group from UI, but can be applied from API. Hence, an API user can accidentally create profile binding maps and modify global entity on Local Manager.
Workaround: Use the UI to configure system.
Issue 3224295: IPv4/IPv6 BGP sessions fail to establish due to IPv4/IPv6 addresses missing on the interfaces.
Traffic over the problematic BGP session would be disrupted. However, BGP would gracefully restart on its own.
Workaround: See knowledge base article 322523 for details.