VMware SD-WAN 4.3.1 | 02 April 2024
Check for additions and updates to these release notes. |
VMware SD-WAN 4.3.1 | 02 April 2024
Check for additions and updates to these release notes. |
The release notes cover the following topics:
This release is recommended for all customers who require the features and functionality first made available in Release 4.3.0, as well as those customers impacted by the issues listed below which have been resolved since Release 4.3.0.
Release 4.3.1 Orchestrators, Gateways, and Hub Edges support all previous VMware SD-WAN Edge versions greater than or equal to Release 3.2.0
This means releases prior to 3.2.0 are not supported.
The following interoperability combinations were explicitly tested:
Orchestrator |
Gateway |
Edge |
|
Hub |
Branch/Spoke |
||
4.3.1 |
3.4.6 |
3.4.6 |
3.4.6 |
4.3.1 |
4.3.1 |
3.4.6 |
3.4.6 |
4.3.1 |
4.3.1 |
4.3.1 |
3.4.6 |
4.3.1 |
4.3.1 |
3.4.6 |
4.3.1 |
4.3.1 |
4.2.2 |
4.2.2 |
4.2.2 |
4.3.1 |
4.3.1 |
4.2.2 |
4.2.2 |
4.3.1 |
4.3.1 |
4.3.1 |
4.2.2 |
4.3.1 |
4.3.1 |
4.2.2 |
4.3.1 |
4.3.1 |
4.3.0 |
4.3.0 |
4.3.0 |
4.3.1 |
4.3.1 |
4.3.0 |
4.3.0 |
4.3.1 |
4.3.1 |
4.3.1 |
4.3.0 |
4.3.1 |
4.3.1 |
4.3.0 |
4.3.1 |
4.5.0 |
4.3.1 |
4.3.0 |
4.3.1 |
4.5.0 |
4.5.0 |
4.3.0 |
4.3.1 |
4.3.1 |
4.3.1 |
3.2.2 |
3.2.2 |
4.3.1 |
4.3.1 |
3.3.2 P2 |
3.3.2 P2 |
Note: Release 3.x did not properly support AES-256-GCM, which meant that customers using AES-256 were always using their Edges with GCM disabled (AES-256-CBC). If a customer is using AES-256, they must explicitly disable GCM from the Orchestrator prior to upgrading their Edges to a 4.x Release. Once all their Edges are running a 4.x release, the customer may choose between AES-256-GCM and AES-256-CBC.
VMware SD-WAN Releases 3.2.x and 3.3.x for all componants, and 3.4.x for the Orchestrator and Gateway have reached the End of Support.
Releases 3.2.x and 3.3.x reached End of General Support (EOGS) on December 15, 2021, and End of Technical Guidance (EOTG) March 15, 2022.
Release 3.4.x for the Orchestrator and Gateway reached End of General Support (EOGS) on March 30, 2022, and will reach End of Technical Guidance (EOTG) on September 30, 2022.
Note: This is for the Orchestrator and Gateway only. 3.4.x for the Edge is scheduled to enter its End of Support window beginning on December 31, 2022.
For more information please consult the Knowledge Base article: Announcement: End of Support Life for VMware SD-WAN Release 3.x (84151)
VMware SD-WAN Release 4.0.x has reached, and Releases 4.2.x and 4.3.x are approaching End of Support for Gateways and Orchestrators.
Release 4.0.x reached End of General Support (EOGS) on September 30, 2022, and End of Technical Guidance (EOTG) December 31, 2022.
Release 4.2.x Orchestrators and Gateways reached End of General Support (EOGS) on December 30, 2022, and will reach End of Technical Guidance (EOTG) March 30, 2023.
Release 4.2.x Edges will reach End of General Support (EOGS) on June 30, 2023, and End of Technical Guidance (EOTG) September 30, 2025.
Release 4.3.x Orchestrators and Gateways will reach End of General Support (EOGS) on June 30, 2023, and End of Technical Guidance (EOTG) September 30, 2023.
Release 4.3.x Edges will reach End of General Support (EOGS) on June 30, 2023, and End of Technical Guidance (EOTG) September 30, 2025.
For more information please consult the Knowledge Base article: Announcement: End of Support Life for VMware SD-WAN Release 4.x (88319)
VMware Security Advisory 2024-0008
VMSA-2024-0008 documents VMware's response to CVE-2024-22247, which details a missing authentication and protection mechanism vulnerability which impacts all supported SD-WAN Edges.
More information on mitigating this vulnerability is found in the KB article: VMware Response to CVE-2024-22247 (VMSA-2024-0008) (97391).
Potential Issue With Sites Using a High Availability Topology
A site where a pair of Edges are deployed in a High Availability topology may encounter an issue where the Standby Edge reboots one or more times to resolve an Active-Active state. The Standby Edge reboot(s) can cause a disruption of customer traffic with the impact greater on sites using an Enhanced HA topology as the Standby Edge also passes customer traffic. The issue is being tracked by #85369 under the Edge/Gateway Resolved Issues section of these Release Notes and is resolved in Edge build R431-20220608-GA.
Mixing Wi-Fi Capable and Non-Wi-Fi Capable Edges in High Availability Is Not Supported
Beginning in 2021, VMware SD-WAN introduced Edge models which do not include a Wi-Fi module: the Edge models 510N, 610N, 620N, 640N, and 680N. While these models appear identical to their Wi-Fi capable counterparts except for Wi-Fi, deploying a Wi-Fi capable Edge and a Non-Wi-Fi capable Edge of the same model (for example, an Edge 640 and an Edge 640N) as a High Availability pair is not supported. Customers should ensure that the Edges deployed as a High Availability pair are of the same type: both Wi-Fi capable, or both Non-Wi-Fi capable.
BGPv4 Filter Configuration Delimiter Change for AS-PATH Prepending
Through Release 3.x, the VMware SD-WAN BGPv4 filter configuration for AS-PATH prepending supported both comma and space based delimiters. However, beginning in Release 4.0.0 and forward, VMware SD-WAN will only support a space based delimiter in an AS-Path prepending configuration.Customers upgrading from 3.x to 4.x need to edit their AS-PATH prepending configurations to "replace commas with spaces" prior to upgrade to avoid incorrect BGP best route selection.
Reverse-Path Forwarding (RPF) Enabled by Default
In previous releases, packets with an unknown source were allowed from the LAN interface of a VMware SD-WAN Edge. This behavior was the result of the Edge's LAN interfaces not having reverse-path forwarding (RPF) enabled by default. As part of Fixed Issue 52628 first added with Release 3.4.5, this behavior is changed with RPF enabled on all Edge LAN interfaces, and packets from LAN interfaces would be allowed only if the packets are sourced from the configured LAN subnet.
Zscaler Tunnels Now Use IKEv2
Once an Orchestrator and Gateways are upgraded to Release 4.3.0 or above, all Non SD-WAN Destinations via Gateway which use a Zscaler type will have their tunnels change to IKEv2 and no longer use IKEv1.
Extended Upgrade Time for Edge 3x00 Models
Upgrades to this version may take longer than normal (3-5 minutes) on Edge 3x00 models (i.e., 3400, 3800 and 3810). This is due to a firmware upgrade which resolves issue 53676. If an Edge 3400 or 3800 had previously upgraded its firmware when on Release 3.4.5 or later; 4.0.2; 4.2.0 or later; or 4.3.0, then the Edge would upgrade as expected. For more information, please consult Fixed Issue 53676 in the 3.4.5, 4.0.2, 4.2.0, or 4.3.0 Release Notes.
Limitation with Azure Virtual WAN Automation and BGP over IPsec on Edge and Gateway.
The BGP over IPsec on Edge and Gateway feature is not compatible with Azure Virtual WAN Automation from Edge or Gateway. Only static routes are supported when automating connectivity from an Edge or Gateway to an Azure vWAN.
Limitation When Disabling Autonegotiation on VMware SD-WAN Edge Models 520, 540, 620, 640, 680, 3400, 3800, and 3810
When a user disables autonegotiation to hardcode speed and duplex on ports GE1 - GE4 on a VMware SD-WAN Edge model 620, 640 or 680; on ports GE3 or GE4 on an Edge 3400, 3800, or 3810; or on an Edge 520/540 when an SFP with a copper interface is used on ports SFP1 or SFP2, the user may find that even after a reboot the link does not come up.
This is caused by each of the listed Edge models using the Intel Ethernet Controller i350, which has a limitation that when autonegotiation is not used on both sides of the link, it is not able to dynamically detect the appropriate wires to transmit and receive on (auto-MDIX). If both sides of the connection are transmitting and receiving on the same wires, the link will not be detected. If the peer side also does not support auto-MDIX without autonegotiation, and the link does not come up with a straight cable, then a crossover Ethernet cable will be needed to bring the link up.
For more information please see the KB article Limitation When Disabling Autonegotiation on VMware SD-WAN Edge Models 520, 540, 620, 640, 680, 3400, 3800, and 3810 (87208).
December 15th, 2021. First Edition
December 21st, 2021. Second Edition.
Added a new Orchestrator build R431-20211217-GA to Orchestrator Resolved Issues. This Orchestrator build remediates CVE-2021-44228, the Apache Log4j vulnerability, by updating to Log4j version 2.16.0. For more information on the Apache Log4j vulnerability, please consult the VMware Security Advisory VMSA-2021-0028.5.
Added to Important Notes the Note: Limitation When Disabling Autonegotiation on VMware SD-WAN Edge Models 520, 540, 620, 640, 680, 3400, 3800, and 3810. This note covers an issue that may be encountered when configuring a forced speed on some Ethernet ports of the listed Edge models.
January 7th, 2022. Third Edition.
Added a new Edge build R431-20211221-GA to the Edge Resolved Issues section. This build is the new Edge GA build for Release 4.3.1.
This Edge build includes the fixed issues #70933 and #76564, which are each documented in this section.
February 17th, 2022. Fourth Edition.
Added a new Edge build R431-20220118-GA to the Edge Resolved Issues section. This build is the new Edge GA build for Release 4.3.1.
This Edge build includes the fixed issues #64343 and #74149, which are each documented in this section.
Added Fixed Issue #72718 to the Edge/Gateway Resolved Issues section of the original GA build. This issue was resolved with the original build but was omitted from the original release notes due to an internal ticket labelling error.
February 25th, 2022. Fifth Edition.
Added a new Orchestrator GA build R431-20220222-GA to the Orchestrator Resolved section.
This Orchestrator build remediates Apache Log4j vulnerabilities CVE-2021-44228 (which was first addressed in Orchestrator build R431-20211217-GA with Log4j version 2.16.0) and CVE-2021-45046, by updating to Log4j version 2.17.0. For updated information on the Apache Log4j vulnerabilities and their impact on VMware products, please consult the VMware Security Advisory VMSA-2021-0028.9.
This Orchestrator build also adds fixed issues #76036, #80613, and #81498, which are documented in this section.
March 3rd, 2022. Sixth Edition.
Added a new Edge build R431-20220302-GA to the Edge Resolved Issues section. This build is the new Edge GA build for Release 4.3.1.
This Edge build includes the fixed issues #53951, #55327, #80010, #80551, #80654, #82463, and #82652, which are each documented in this section.
Two Open Issues are added: #72925 and #83747, which are documented in the Edge/Gateway Known Issues section.
Added Important Note: Mixing Wi-Fi Capable and Non-Wi-Fi Capable Edges in High Availability Is Not Supported.
March 4th, 2022. Seventh Edition.
Open Issue #83747 is removed from Edge/Gateway Known Issues. This ticket was erroneously included in the Sixth Edition for Edge Release R431-20220302-GA due to a miscommunication about the symptoms for this issue and the impact to the customer, neither of which warranted inclusion in the Release Notes.
March 23rd, 2022. Eighth Edition.
Added a new Edge build R431-20220316-GA to the Edge Resolved Issues section. This build is the new Edge GA build for Release 4.3.1.
Edge build R431-20220316-GA includes the fixed issues #61797, #70586, #77525, and #77625, which are each documented in this section.
Under the Compatibility section, added a new Warning that Release 3.4.x software is approaching End of Support for the Orchestrator and Gateway with End of General Support (EOGS) on March 30, 2022, and End of Technical Guidance (EOTG) June 30, 2022. This is for the Orchestrator and Gateway only. The 3.4.x Edge software is scheduled to enter its End of Support window beginning on December 31, 2022.
Added a new Important Note regarding the Limitation with Azure Virtual WAN Automation and BGP over IPsec on Edge and Gateway, and . The note reads: "The BGP over IPsec on Edge and Gateway feature is not compatible with Azure Virtual WAN Automation from Edge or Gateway. Only static routes are supported when automating connectivity from an Edge or Gateway to an Azure vWAN."
Added Issue #84825, to the Edge/Gateway Known Issues section.
March 29th, 2022. Ninth Edition.
This edition corrects or adds several tickets related to a symptom where a site deployed with a High Availability topology experiences an Active-Active state which results in multiple failovers and/or Standby Edge reboots. The corrections and additions are as follows:
Amended Edge Fixed Issue #77625 to correct the root cause which previously listed "HA thread starvation" and now lists "inverted HA thread priority" as the root cause.
The root cause previously associated with #77625 (HA thread starvation) is amended to "HA Edge thread suspension" and is assigned to a new issue: #85369. This issue is added to the Edge/Gateway Known Issues section and remains under investigation with this edition.
Amended fixed issue #67201 to provide more precise description for the symptom and what is fixed.
Amended fixed issue #80654 to add a Note that this fix, which is included in the R431-20220302-GA rollup build and later corrects a regressive issue introduced by the fix for #67201 in the original GA build R431-20211208-GA.
Added new issues #79220 and #85156 to the Edge/Gateway Known Issues section.
March 31st, 2022. Tenth Edition.
Added a new Edge build R431-20220331-GA to the Edge/Gateway Resolved Issues section. This build is the new Edge and Gateway GA build for Release 4.3.1.
Edge build R431-20220331-GA includes the fixed issues #65695, #68923, #78003, #80897, #81517, #81575, #81920, and #82314, which are each documented in this section.
The fixed Edge/Gateway issues are broken out as follows:
Edge and Gateway Fix: #80897
Edge Only Fixes: #65695, #68923, #78003, and #81517.
Gateway Only Fixes: #81575, #81920, and #82314.
April 14th, 2022. Eleventh Edition.
Added a new Edge build R431-20220407-GA to the Edge/Gateway Resolved Issues section. This build is the new Edge GA build for Release 4.3.1.
Edge build R431-20220407-GA includes the fixed issues #58791, #65466, #83029, #83928, #83402, and #86103 which are each documented in this section.
Added Open Issue #62701 to the Edge/Gateway Known Issues section as this issue remains unresolved on all releases at this time.
April 19th, 2022. Twelfth Edition.
Added an additional Fixed Issue #84847 to Edge build R431-20220407-GA. The code for this fix was included in the validated R431-20220407-GA build, but Engineering did not specifically validate the fix for #84847 and thus that ticket was omitted from the April 14th edition of the Release Notes. Engineering has since validated the fix for 84847 for R431-20220407-GA and as of this edition of the Release Notes is now included as fixed.
May 6th, 2022. Thirteenth Edition.
Added a new Orchestrator rollup build R431-20220429-GA to the Orchestrator Resolved section. This is the second Orchestrator rollup build and is the new Orchestrator GA build for Release 4.3.1.
Orchestrator rollup build R431-20220429-GA includes fixed issues #84152 and #84969, which are documented in this section.
Added a new warning in the Compatibility section regarding Release 4.0.x approaching End of Support.
May 12th, 2022. Fourteenth Edition
Added a new Edge/Gateway rollup build R431-20220509-GA to the Edge/Gateway Resolved section. This is the seventh Edge/Gateway rollup build and is the new Edge and Gateway GA build for Release 4.3.1.
Edge build R431-20220509-GA includes the fixes for issues #81809, #83209, #84136, and #85459, which are each documented in this section.
Gateway build R431-20220509-GA includes the fixes for issues #65466 and #74316. Issue #65466 can affect either the Edge or the Gateway, and the Edge fix was included in the sixth rollup build: R431-20220407-GA. However, the Gateway fix was not available at that time.
May 18th, 2022. Fifteenth Edition
Added a new Edge/Gateway rollup build R431-20220510-GA to the Edge/Gateway Resolved section. This is the eighth Edge/Gateway rollup build and is the new Edge/Gateway GA build for Release 4.3.1.
Edge/Gateway build R431-20220510-GA includes the fixes for issues #64627 and #78568 which are each documented in this section.
May 26th, 2022. Sixteenth Edition
Added a new Edge/Gateway rollup build R431-20220518-GA to the Edge/Gateway Resolved section. This is the ninth Edge/Gateway rollup build and is the new Edge/Gateway GA build for Release 4.3.1.
Edge/Gateway build R431-20220518-GA includes the fixes for issues #75772 and #88796 which are each documented in this section.
Added Issue #88796 as a new Orchestrator Known Issue. This ticket tracks the issue as it applies to the Orchestrator OVA only, as the fix is included in the latest Gateway build.
Added Issue #85461 to the Edge/Gateway Known Issues section.
June 13th, 2022. Seventeenth Edition
Added a new Important Note: "Potential Issue With Sites Using a High Availability Topology" regarding ongoing issues with customer sites using a High Availability topology for a pair of Edges. This issue continues to be tracked by Issue #85369 located in Edge/Gateway Known Issues.
Under Compatibility, amended the End of Life dates for Release 4.2.x Edge software. The Edge software is broken out as a separate item and now reads: "Release 4.2.x Edges will reach End of General Support (EOGS) on June 30, 2023, and End of Technical Guidance (EOTG) September 30, 2023." The separate Orchestrator and Gateway entry retains the same End of Life dates as before.
Revised Fixed Issue #53951 in the Edge/Gateway Resolved Issues section to include another scenario that could impact a customer that encounters this issue in the field.
Moved Issue #76016 from the Orchestrator Resolved Issues section and placed it in the Orchestrator Known Issues section. This ticket was erroneously listed as 'Fixed', which is not the case as of this edition of the 4.3.1 Release Notes.
June 27th, 2022. Eighteenth Edition
Added a new Edge/Gateway rollup build R431-20220608-GA to the Edge/Gateway Resolved section. This is the tenth Edge/Gateway rollup build and is the new Edge/Gateway GA build for Release 4.3.1.
Edge/Gateway build R431-20220608-GA includes the fixes for issues #78678, #83083, and #85369 which are each documented in this section.
July 6th, 2022. Nineteenth Edition
Added Open Issues #88604 and #91746 to the Edge/Gateway Known Issues section.
Moved Issue #74149 from the Resolved Issues section for Edge rollup build R431-20220118-GA to the Edge/Gateway Known Issues section. This issue was included as 'fixed' in error and the fix for this issue has not been included in any 4.3.1 Edge build and remains open on Release 4.3.1.
July 14th, 2022. Twentieth Edition
Added Open Issues #81859, #91365 and #92676 to the Edge/Gateway Known Issues section.
July 22nd, 2022. Twenty-First Edition
Added a new Orchestrator rollup build R431-20220715-GA to the Orchestrator Resolved section. This is the third Orchestrator rollup build and is the new Orchestrator GA build for Release 4.3.1.
Orchestrator rollup build R431-20220715-GA includes fixed issues #76016 and #88796, which are documented in this section.
August 26th, 2022. Twenty-Second Edition.
Added Open Issue #89217 to the Edge/Gateway Known Issues section.
Removed Open Issue #49712 from Edge/Gateway Known Issues as Engineering concluded it was caused by a configuration error versus a defect in the code.
September 9th, 2022. Twenty-Third Edition.
Added Open Issues #72245, #81224, #87552, #89873, and #93383 to the Edge/Gateway Known Issues section.
September 14th, 2022. Twenty-Fourth Edition.
Removed Issue #61797 "Route backtracking is not supported on the VMware SD-WAN Edge which results in false reachability routes" from the Edge/Gateway Resolved Issue section of Edge build R431-20220316-GA. This issue was included erroneously and because it is an enhancement is being removed entirely from the Release Notes, versus relocating it to Known Issues.
September 28th, 2022. Twenty-Fifth Edition.
Added Open Issues #86098, #94204, #96441, #96888, and #98136 to the Edge/Gateway Known Issues section.
November 7th, 2022. Twenty-Sixth Edition.
Revised and republished these notes using a new publication tool.
November 22nd, 2022. Twenty-Seventh Edition.
January 30th, 2023. Twenty-Eighth Edition.
Revised Fixed Ticket #89217 to reflect a revised Edge version (R5012-20230123-GA-103475) and Platform Firmware version (R131-20221216-GA) needed to resolve the issue. The ticket also adds a link to the KB Article that covers #89217 and which includes step-by-step instructions for upgrading a 6x0 Edge.
In the Compatibility section, revised the Import Note regarding End of Support for 4.2.x and added Release 4.3.x to reflect newly revised dates for the SD-WAN Edge software.
February 17th, 2023. Twenty-Ninth Edition.
Removed Issue #39659 from the Edge/Gateway Known Issues section as this is a duplicate of another ticket, #39501 which was resolved in Release 4.3.0.
April 2nd, 2024. Ninth Edition.
Added an Important Note regarding CVE-2024-22247, which details a missing authentication and protection mechanism vulnerability that impacts an SD-WAN Edge. VMware's response to this vulnerability is documented in VMSA-2024-0008. More information on mitigating this vulnerability is found in the KB article: VMware Response to CVE-2024-22247 (VMSA-2024-0008) (97391).
Edge/Gateway version R431-20220608-GA was released on 06-27-2022 and is the 10th Edge/Gateway rollup for Release 4.3.1.
This Edge/Gateway rollup build addresses the below critical issues since the 9th Edge/Gateway rollup, version R431-20220518-GA.
Fixed Issue 85369: For a site deployed with a High Availability topology, the customer may observe traffic disruptions and possibly multiple reboots of the VMware SD-WAN Standby Edge.
A condition triggered by load and system events causes the Active Edge to experience delays in the timely delivery of HA heartbeats to the Standby Edge. The delay causes the Standby Edge to miss heartbeats and incorrectly assume the Active role causing an Active-Active state. To recover from the Active-Active state the Standby Edge reboots, possibly multiple times.
If the site does become Active-Active, a conventional HA setup would experience minimal traffic disruption since the Standby Edge does not pass traffic in this topology, but on an Enhanced HA deployment, where the Standby Edge is also passing traffic, the reboot(s) would disrupt some customer traffic.
Fixed Issue 83083: A VMware SD-WAN Gateway upgraded to Release 4.3.1 or later may experience a slow memory leak which can lead to the Gateway's service restarting to clear the memory.
Gateway restarts can be disruptive to customer traffic for the 30-45 seconds it takes the for the Gateway service to restart. Each time an Operator user runs the debug.py --flow_dump all all all command on the Gateway, the Gateway will leak some of its memory. Running this debug command a sufficient number of times will cause the Gateway's memory usage to reach a critical level and trigger a Gateway service restart to clear the memory.
For a Gateway without the fix, an Operator must avoid running the debug.py --flow_dump all all all command on the Gateway. If using this debug command is unavoidable, monitor the memory usage and schedule maintenance windows to preemptively restart the service to clear the memory prior to an unscheduled restart.
Fixed Issue 78678: On a site deployed with a High Availability topology, the VMware SD-WAN Edge performing the Standby role may get rebooted while processing synchronization messages from the Active Edge.
When the Standby Edge is handling a high number of flow synchronization messages, the SD-WAN service may detect a buffer overflow condition and trigger a reboot of the Standby Edge. In terms of impact, a conventional HA setup would experience minimal traffic disruption since the Standby Edge does not pass traffic in this topology, but on an Enhanced HA deployment, where the Standby Edge is also passing traffic, the reboot(s) would disrupt some customer traffic.
Edge/Gateway version R431-20220518-GA was released on 05-24-2022 and is the 9th Edge/Gateway rollup for Release 4.3.1.
This Edge/Gateway rollup build addresses the below critical issues since the 8th Edge/Gateway rollup, version R431-20220510-GA.
Fixed Issue 88796: When deploying either a VMware SASE Orchestrator or a VMware SD-WAN Gateway and using an OVA on vSphere, the OVF properties set as part of the deployment (password, network information, etc.) are not applied to the image and the system cannot be accessed after deployment.
This only affects a new system deployed from an OVA using OVF/vApp properties (versus using ISO files). This issue is caused by upstream changes to cloud-init in recent updates.
Without the fix, the workaround is for the Operator to deploy the system using a cloud-init user-data ISO file.
This fix is for the Gateway OVA only. The issue as it impacts the Orchestrator OVA is tracked with the same ticket #88796 but under the Orchestrator Resolved Issues section for build R431-20220715-GA.
Fixed Issue 75772: For a customer using Edge Network Intelligence where a VMware SD-WAN Edge has Analytics active, the Edge may experience a memory leak that results in the Edge restarting its service to clear the memory leak.
Where Analytics is enabled and DHCP is enabled on an Edge interface, the client connectivity events cause the memory usage to increase. Over a sufficient period of time the memory usage may reach a critical threshold and the Edge would defensively restart the Edge service to restore normal memory levels. As with any memory leak issue, the smaller the initial memory footprint (for example, an Edge 510, 520, or 610) the more vulnerable the Edge is to having an Edge restart occur.
Edge/Gateway version R431-20220510-GA was released on 05-17-2022 and is the 8th Edge/Gateway rollup for Release 4.3.1.
This Edge/Gateway rollup build addresses the below critical issues since the 7th Edge/Gateway rollup, version R431-20220509-GA.
Fixed Issue 78568: For a customer using BGP and connected to VMware SD-WAN Partner Gateways, a Partner Gateway may continue to advertise a VMware SD-WAN Edge's VLAN subnet after that subnet's advertise flag is set to False.
The routes continue to be advertised because when the Edge breaks the BGP neighbor adjacency state with an L3 BGP neighbor, one of the connected Partner Gateways maintains ownership of the Edge VLAN subnet. Stale routes on a Partner Gateway negatively affect customer traffic and can lead to entire customer flows getting dropped because traffic is routed to now non-existent routes.
Without a fix, the only way to remediate the issue and clear the stale BGP routes is for a Partner or Operator to restart the Partner Gateway service in a suitable maintenance window.
Fixed Issue 64627: A VMware SD-WAN Gateway may experience a Dataplane Service restart with a brief disruption in traffic.
When there are a large number of subpaths configured on WAN links for a VMware SD-WAN Edge or there are frequent flaps of of the Edge's management tunnels with its connected Gateway, this may lead to exhaustion of the Gateway's memory counters which triggers a restart of the Gateway to recover.
Edge/Gateway version R431-20220509-GA was released on 05-11-2022 and is the 7th Edge/Gateway rollup for Release 4.3.1.
This Edge/Gateway rollup build addresses the below critical issues since the 6th Edge/Gateway rollup, version R431-20220407-GA.
The Gateway rollup build also addresses issue #65466, a fix that was included in the previous 6th Edge rollup build but made available here because the Gateway fix was not available at that time.
Fixed Issue 85459: An attempt to SSH either from an Edge LAN-side client to an Edge, or from a remote branch Edge client to an Edge may not work after LAN side NAT rules rules are configured.
SSH reply packet packets coming from the Edge's SSH process go through the Edge's dataplane service and since LAN side NAT rules are configured, it is possible the SSH reply packets use LAN side NAT rules to go to different destination than the original client that generated the SSH traffic which causes an SSH attempt to an Edge to not work.
Fixed Issue 84136: Customer may observe high CPU utilization and poor traffic performance on a VMware SD-WAN Edge upgraded to Release 4.3.1.
This issue occurs on an Edge where the there are more than 400 IP rules configured under the Configure > Firewall > Edge Access section (Support Access or SNMP Access allowed IP addresses). When the Edge tries to send the firewall configuration in that scenario the management process maxes out the CPU and it times out and then this process repeats.
On customer sites that are also using a High Availability topology, the symptoms would include "HA unknown events" because the Active Edge is not sending heartbeats within the expected time window.
Fixed Issue 83209: For customers using OSPF in their enterprise, OSPF routing may not work as expected.
The issue occurs when there is a change in the OSPF router-id and the Edge service is restarted. Only loopback interfaces and Interfaces with 'Advertise' flag enabled are considered for router-id selection. When there is a new loopback interface configured with a higher IP address, upon restarting the Edge service, the new loopback IP address is selected as the router-id and if the Edge is elected as the DR (Designated Router) the issue is seen.
Without the fix, the only workaround is to force the use of the old Router ID. To bring back the old Router ID, enable Advertise Flag on the respective interface (an Edge service restart will be required).
Fixed Issue 81809: When a user attempts to SSH to a VLAN IP on a VMware SD-WAN Edge from a remote client sitting behind another Edge or even from a VMware SD-WAN Gateway, the SSH attempt fails.
An SSH attempt from a LAN client to an Edge VLAN IP works properly. Originally the Edge's Management IP was used to SSH to the Edge. However, after the Edge Management IP was deprecated, there was no option for the user to SSH to the Edge (via overlay from a remote Edge client) as the Loopback IP still doesn't support SSH.
Fixed Issue 74316: A VMware SD-WAN Spoke Edge may not connect to any or all of the assigned Hub Edge Clusters, even if the Edge has a service restart or a full reboot.
There is an issue with the cluster reassignment logic which creates cluster assignment mapping without the cluster member’s endpoint information in a specific Cluster-member-to-Super-Gateway overlay flap scenario. As a result, Spoke Edges assigned to the Hub Cluster member subsequently fails to receive the endpoint information of the Hub Cluster member leading to no overlays between Spoke Edges and Hub Clusters.
Without the fix the only way to temporarily remediate the condition is for someone with Gateway access to trigger a cluster reassignment manually on the Super Gateways.
Edge version R431-20220407-GA was released on 04-13-2022 and is the 6th Edge rollup for Release 4.3.1.
This Edge rollup build addresses the below critical issues since the 5th Edge rollup, version R431-20220331-GA.
Fixed Issue 86103: For a customer enterprise that uses RADIUS authentication, client users at some sites may be unable to connect to VMware SD-WAN Edges and pass traffic.
The issues is caused by the Edge incorrectly categorizing fragmented RADIUS packets with the DF (Don't Fragment) bit set in the IP header as non-fragmented. One or more of these packets fails to reach multiple Edges with the result that traffic that relies on RADIUS authentication will not pass for those Edges. This issue can occur in any topology including Hub/Spoke and simple Branch-to-Branch.
Without the fix the only workaround is to configure the RADIUS server to not set the DF bit in the IP header while sending fragmented packets.
Fixed Issue 84847: Customers using either USB-based LTE modems or VMware SD-WAN Edge LTE models (510-LTE or 610-LTE) may experience intermittent issues with building tunnels from the CELL interface after the modem is reset.
When the LTE modem is reset in one of the following scenarios:
On an Edge using a USB modem, by removing and re-plugging in the modem from the USB port.
On an Edge-LTE, after an Edge reboot or by resetting the CELL1 interface via the Test & Troubleshoot > Remote Diagnostics > Reset USB Modem > CELL1.
In either scenario the underlying network device changes from wwan0 to wwan1 and the Edge does not honor this new name because it appears to be a duplicate interface.
Without the fix the workaround to restore the LTE interface is to restart the Edge Service through Remote Actions > Restart Service.
Fixed Issue 83402: On a VMware SD-WAN Edge with multiple WAN links, one or more WAN links may stop passing traffic.
On the WAN link(s) that stop passing traffic, the DHCP acquired address is not renewed and the WAN interface's address is lost. Issue occurs when there are multiple interfaces acquiring IP addresses using DHCP and the DHCP server is in a different network from the client. The outgoing interface of a DHCP renew unicast packet is determined through route lookup. Since there are multiple default routes with different metric values learned through different interfaces, the DHCP request packets might get sent out of a different interface. Without the fix, an onsite user would need to unplug and then plug back in the affected WAN link from the Edge to force it to get its IP address again.
Fixed Issue 83928: A VMware SD-WAN Edge may experience high CPU usage and poor customer traffic performance.
Users would also be able to observe poor QoE scores when looking at the Orchestrator's Monitor > Edge > QoE screen for that Edge. The issue is caused by an ACL (Access Control List) rule getting instantiated multiple times in the Edge and it is stressing the Edge's CPU capacity to process this many ACL rules at once and this results in the Edge being unable to process customer traffic properly.
Fixed Issue 83029: For either a standalone VMware SD-WAN Edge or a site deployed with a High Availability topology where one or more PPPoE links are used, if the PPPoE endpoint IP changes after either an Edge interface flap for that PPPoE link or when an HA site experiences a failover, traffic would not pass on the affected PPPoE link(s).
On a site that uses PPPoE links, along with a change in the PPPoE endpoint IP, the impact would mean no customer traffic would pass. The issue is caused due to the presence of a stale default route, which is a route using the old IP address of the PPPoE endpoint on the Edge that is not deleted after a new PPPoE endpoint IP address is received.
Without the fix, an onsite user would need to either disconnect each PPPoE cable and reconnect it to force a renegotiation or reboot the Edge, which would also force a renegotiation.
Fixed Issue 65466: A VMware SD-WAN Gateway or VMware SD-WAN Edge processing a large BGP route exchange may experience a Dataplane Service Failure and restart when running certain debug commands or generating a diagnostic bundle.
Either an Edge or Gateway processing a large number of routes (for example, an Edge advertising 50K BGP routes, or a Gateway learning +100K BGP routes from Edges), can encounter this issue if the debug command dispcnt (with parameters) is also run. The dispcnt debug command is used to monitor capacity drops and can be run either by a Partner Operator on the respective device's CLI or by a user during a diagnostic bundle creation. When this command is run on an Edge or Gateway with a large number of routes and another event (for example, route delete) occurs such that the original variable points to a memory location that is now stale, the result will be a Dataplane Service failure due to the illegal access to memory.
Note: The Gateway fix for 65466 is included in the 7th Gateway rollup build R431-20220509-GA and later. The Gateway fix was not available at the time the 6th rollup was released.
Fixed Issue 58791: A site deployed with a High Availability topology where BGP is used may encounter an issue where the VMware SD-WAN Edge repeatedly fails over.
This issue affects HA sites configured within a Hub/Spoke topology where the HA site has greater than 512 BGPv4 filter prefixes configured.
When BGP is used with multiple network commands configured and while the Standby Edge is coming up it parses the all configurations symmetrically and for every network command vtysh is spawned and as a result this is causing the verp thread to not run. The verp thread being delayed results in a delay in heartbeat processing which causes the Standby Edge to believe the Active Edge is down and the Standby Edge then becomes active which leads to a split-brain state (active-active). To recover from the split-brain state, the Standby Edge restarts which merely repeats the cycle.
Without the fix the workaround is to reduce the number of BGP filter prefix configurations by aggregating them and getting the total number below 512 (256 Inbound, and 256 Outbound filters).
Note: A previous version of this ticket description stated this was also a fix for HA sites with BGP match and set operations. That part of the issue is not fixed with this ticket and is tracked with Issue #84825.
Edge and Gateway version R431-20220331-GA was released on 03-31-2022 and addresses the below critical issues since Edge version 431-20220316-GA and Gateway Version R431-20211208-GA:
Fixed Issue 82314: When a VMware SD-WAN Gateway is upgraded it may throw an exception with a loss of connectivity when using Intel x710 PCIe pass-through based NICs.
When the Gateway is upgraded, part of the upgrade is a kernel change and the i40e driver installation fails and is not available on the Gateway. Because of the unavailability of i40e drivers, all x710 PCIe pass-through based NICs will not operate properly on the Gateway with either a resulting performance degradation or a loss of connectivity. This issue does not affect a Gateway which uses Virtio or VMNet based NICs.
Fixed Issue 81920: On a VMware SD-WAN Gateway deployed using a KVM-based VM which uses Intel x710 SR-IOV-based NICs may have connectivity issues after a Gateway software upgrade.
The issue arises out of the iavf Linux Virtual Function Drivers not installing correctly on a Gateway upgrade and as a result x710 SR-IOV based NICs do not work on the upgraded Gateway. This issue does not affect a Gateway which uses Virtio or VMNet based NICs.
Fixed Issue 81575: A VMware SD-WAN Gateway deployed using a VMware OVA-based VM which uses Intel x710 SR-IOV-based NICs may have connectivity issues after a Gateway software upgrade.
The issue arises out of the iavf Linux Virtual Function Drivers not installing correctly on a Gateway upgrade and as a result x710 SR-IOV based NICs do not work on the upgraded Gateway. This issue does not affect a Gateway which uses Virtio or VMNet based NICs.
Fixed Issue 81517: On a site deployed with an Enhanced High Availability topology which uses VMware SD-WAN Edge models 6x0's, the HA link state is not updating properly.
The HA link is the link that connects the Enhanced HA Edge pair and if this link does not update properly the site could have issues with customer traffic since the Standby Edge also passes customer traffic.
Fixed Issue 80897: For a customer enterprise where VMware SD-WAN Edges are connected to VMware SD-WAN Partner Gateways, users may observe poor performance for customer traffic.
The poor performance is the result of routing issues stemming from the Partner Gateway distributing routes to the Edges where preferred secure static routes are available but the Edge does not properly label these routes as secure. The result is the Edge potentially advertising non-preferred non-secure routes over secure routes since all routes are treated equally when the expected behavior is to always prefer secure routes over non-secure routes.
Note: Both the Partner Gateway and customer Edges must be upgraded to a build that includes this fix to resolve the issue.
Fixed Issue 78003: For a customer using a Hub/Spoke topology, static tunnels from the VMware SD-WAN Spoke Edge to a Hub Edge may not form.
Typically if there are a large number of Dynamic Edge-to-Edge tunnels already established on the Spoke Edge, the maximum tunnel number check is hit on the Spoke for the static tunnel, and this check prevents static tunnel formation from the Spoke to the Hub.
Fixed Issue 68923: On a customer enterprise using BGP, a default route may be redistributed to a BGP peer though the reachable status for the installed default route is set to 'False'.
If a static route is configured on an VMware SD-WAN Edge pointing to any Edge interface and that BGP peer learns the default route from the Edge and that interface is later disabled which changes the reachable flag for that route to False, the route continues to be advertised. It is equally true that a route that is not being redistributed because the interface was down, but then when the interface comes up and marks the route status as 'True', the route would continue to not be redistributed. The cause in both instances is the Edge not readvertising the route on an interface status change that reflects the new route status.
Fixed Issue 65695: A customer may observe traffic failing when it is destined for a connected subnet.
The issue is that IPv4/IPv6 connected subnets are being redistributed to the overlay even after the 'Reachable' status goes to False. When the parent interface is down, the Edge service does not receive the 'down' notification for sub-interfaces and as a result the connected routes belonging to the sub-interfaces are not removed. Any traffic that would normally use those subnets when they are reachable is getting blackholed and failing completely.
Edge version R431-20220316-GA was released on 03-23-2022 and addresses the below critical issues since Edge version R431-20220302-GA:
Fixed Issue 77625: On a site deployed with a High Availability topology, a user may observe the VMware SD-WAN Standby Edge rebooting multiple times.
The site goes into an Active/Active (Split-Brain) state due to the HA thread priority being inverted, a lower priority thread takes priority and prevents a higher priority thread from running that causes a delay in heartbeat processing and leads to the Standby Edge incorrectly being promoted to Active. In an Active-Active state the tie-break goes to the Active Edge and the Standby Edge is rebooted to demote it back to its proper Standby status. In this case though, the Active/Active event is detected multiple times with Standby Edge reboots each time to recover the site. When this issue is encountered on a conventional HA topology the customer impact would be minimal as the Standby Edge does not pass customer traffic. On an Enhanced HA deployment, where the Standby Edge is also passing traffic, the reboot(s) would disrupt some customer traffic.
Field issues have involved Edge 6x0 (610, 620, 640, 680) models but the issue is platform agnostic and could occur on other Edge models.
Fixed Issue 77525: For a site using a High Availability topology, when the VMware SD-WAN HA Edges are upgraded to a new software image, the Standby may fail to upgrade and the VMware SD-WAN Orchestrator UI incorrectly lists the Standby Edge's status as 'Active' even though it is not.
When the Active Edge detects the Standby Edge it tries to fetch the Standby Edge's software version and if the version is greater than 3.4.x then the Active Edge copies the network configuration file to the Standby Edge. While fetching the Standby Edge software version, there may be an exception which is not handled by the Edge's HA code and this leads to an HA worker thread getting struck and further communication with the Standby Edge fails. At this point the management process between the Active and Standby Edge is broken and anything have to do with the management plane, including software management, Standby Edge status, and configuration changes, will not be synchronized between the Active and Standby Edge. This results in the Standby Edge being falsely detected as Active and which appears as an Active/Active "split brain" state on the Orchestrator but is not as the Standby Edge is still performing its proper role.
If there is an HA failover and the Standby Edge is promoted to Active, the Edge would be with a mismatched set of configurations and software. The Orchestrator would detect the configuration mismatch and push the updated configuration to this Edge while also completing the software upgrade the Standby Edge previously missed. And since an Edge software upgrade requires a reboot, the customer would observe another failover while the newly Active Edge was rebooted and then demoted back to Standby status.
This issue is not consistently encountered when an HA site upgrades the Edges' software. In addition, this issue can also happen when bringing up a new HA site, or when a standalone site is brought up to High Availability, anytime the Standby Edge has to upgrade its software. But these secondary scenarios are more rare when compared with HA Edges undergo a software upgrade.
Without the fix, a customer observing this issue would need to restart the Edge service or trigger an HA failover to clear the issue.
Fixed Issue 70586: When a routed interface on a VMware SD-WAN Edge is configured for 802.1x (uses RADIUS authentication), clients connected on that interface get silently de-authenticated whenever any other interface flaps (in other words, when any non-802.1x interface goes down and up in quick succession), and all of their traffic gets dropped until the client disconnects and then reconnects to the Edge.
The Edge is not checking that the interface that flapped is actually the one that had 802.1x clients authenticated and thus treats any interface flap is if it were a 802.1x interface flap and acts accordingly.
Without the fix, the only workaround is to force the client to physically disconnect and reconnect to get re-authenticated again.
Edge version R431-20220302-GA was released on 03-03-2022 and addresses the below critical issues since Edge version R431-20220118-GA:
Fixed Issue 80551: On VMware SD-WAN Edge models which include an internal LTE modem (Edge 510-LTE and Edge 610-LTE), LTE Tunnels via an IPV6 link become UNSTABLE when the IPv6 address on the CELL interface changes.
Whenever the IPv6 address on a CELL interface changes (for example, due to a DHCP lease expiration), the IPv6 tunnels become UNSTABLE. This is because the tunnels continue to use the old IPv6 address rather than the new one.
Without the fix the only way to remediate the issue is to restart the Edge's service.
Fixed Issue 82652: For a customer using a Cloud Security Service (CSS) where L7 Health Check is configured, the VMware SD-WAN Edge makes no attempt to recover an IPsec CSS tunnel that has been marked as down for more than five minutes.
In the current implementation of the L7 Health Check, the Edge sends L7 probes on all CSS tunnels and if those probes fail a set number of times, the Edge marks that tunnel as Down and then continues to send L7 probes and waits for the tunnel to come up on its own. The issue being that no attempt is made to recover a tunnel once it is in a Down state for more than five minutes where IKE remains up (if IKE is also down the IPsec tunnel is automatically reset after 20 seconds).
The fix in this ticket enhances the L7 Health Check by including an additional step for IPsec-based CSS tunnels: if an IPsec-based CSS tunnel remains down for longer than five minutes (no successful L7 probes) while IKE for the tunnel remains up during the same period, the Edge will tear down the IPsec tunnel and reset the IKE in an attempt to recover the CSS tunnel. L7 probes would continue to be sent while this occurs and if successful would mark the tunnel as Up. If the tunnel remained down, the same step would be applied after an additional five minutes.
This added behavior only applies to a CSS with IPsec tunnels and not ones using GRE tunnels.
Fixed Issue 82463: For a site configured with a Cloud Security Service (CSS), the VMware SD-WAN Edge may drop traffic destined for the CSS.
If the site is routing all internet traffic through a CSS, the impact of this issue can be significant. When the issue occurs, CSS packets are sent on the incorrect interface with the IP address of the actual interface as the source which leads to a failure in application access. The issue is caused by a potential race between the CSS context lookup thread and the outgoing interface selection thread which leads to the incorrect association of the outgoing interface with the flow and some flows on the CSS paths fail.
Without the fix, when experiencing the issue the user can remediate it by starting a new flow, or flushing all flows on the Edge by using Remote Diagnostics > Flush Flows.
Fixed Issue 80654: For a site configured with an Enhanced High Availability topology, a user may observe intermittent traffic drops on the VMware SD-WAN Standby Edge's WAN links.
When there are frequent path flaps (paths being added and deleted frequently), under certain timing scenarios the TCP connection between the Active and Standby Edges is reset, leading to packet drops for traffic traversing a WAN link on the Standby Edge.
Note: The fix for this issue resolves a regression introduced by the fix for #67201 in the original GA build R431-20211208-GA.
Fixed Issue 80010: For a customer enterprise using a Hub/Spoke topology where SD-WAN Reachable is also configured, the Spoke to Gateway path (using a public WAN link) via the Hub path does not come up if the Spoke-to-Hub path is point-to-point.
The SD-WAN Reachable feature, which is a passthrough for a Spoke Edge to connect to a Gateway through a connected Hub, is not supported if the Spoke Edge and the Hub Edge are connected by a point-to-point link (in other words, the Spoke's IP address matches the connected route on the Hub). The fix for this issue adds this functionality.
Fixed Issue 55327: The SSH connection from a VMware SD-WAN Gateway to a VMware SD-WAN Edge may not work if the tunnel from the Edge to the Gateway continuously flaps.
If the tunnel from Edge to Gateway flaps continuously, the route entry installed in the Edge for allowing the SSH connection from the Gateway may get deleted and cause the SSH connection to fail.
Fixed Issue 53951: A VMware SD-WAN Edge may experience either a failure of traffic sent direct to the internet or a loss of connectivity to the VMware SD-WAN Orchestrator and the Edge is marked as down.
This issue can affect an Edge in one of two scenarios:
For an Edge which uses public WAN links, when there is a flap (link goes down and then comes up) on a WAN link, the impact to the customer in this scenario is that traffic that is steered to the affected link and is also classified as Direct is dropped. This issue is especially impactful for a site where Business Policy rules are configured to force certain traffic to use one WAN link only while also being sent Direct.
When enabling HA on an Edge using PPPoE WAN links, there is a change in the PPPoE interface IP and the old self route is deleted but with the new PPPoE IP address the new self route is not getting added. As a result the communication between the Orchestrator and the Edge no longer works.
Without the fix, the way to temporarily correct the issue is to either restart the Edge service to ensure Direct traffic is sent on the affected public WAN link, or reboot the Edge (where PPPoE links are used) which recovers the route to the Orchestrator.
Edge version R431-20220118-GA was released on 01-26-2022 and addresses the below critical issue since Edge version R431-20211221-GA:
Fixed Issue 64343: BGP routes learned from a peer are not tagged as uplink routes although uplink routes are setup for corresponding BGP neighbor.
The remote routes advertised to other VMware SD-WAN Edges and Gateways do not have the Uplink flag setup on their respective BGP routes. Either new BGP routes learned or BGP routes get updated from the BGP neighbor and they get advertised into the overlay network.
Note: The fix for this issue requires a VMware SD-WAN Orchestrator build which includes the fix for #77101. This fix is included in refreshed 4.5.0 Orchestrator build R450-20220215-GA released on 02/17/2022.
Edge version R431-20211221-GA was released on 12-24-21 and addresses the below critical issues since Edge version R431-20211208-GA:
Fixed Issue 76564: For a site configured with a High Availability topology, when the VMware SD-WAN Edge's WAN interface is either enabled or disabled using the VMware SD-WAN Orchestrator UI, the site may experience a "split-brain" Active/Active state which is disruptive to customer traffic.
When an Edge's WAN Settings are changed, the Edge's network service is restarted and this leads to temporary HA packet loss which fools the Orchestrator into thinking the Active Edge is down and promotes the Standby Edge to Active which leads to the Active/Active "split-brain" state.
Fixed Issue 70933: After a configuration profile migration, a VMware SD-WAN Edge with High Availability enabled may experience multiple restarts.
During a configuration profile migration, only the device settings configuration is synchronized immediately with the Standby Edge. The remaining configurations are synchronized only in response to a heartbeat from the Standby Edge. When an Active Edge restarts to apply the latest configuration prior to receiving the heartbeat from the Standby Edge, the result will be a configuration mismatch between the Active and Standby Edge and this will cause multiple Edge restarts to synchronize the configurations of both HA Edges.
The below issues have been resolved since Edge Version R430-20211007-GA-61583-69704-59629-72423 and Gateway Version R430-20211020-GA-VCG.
Fixed Issue 21293: For a site using an Enhanced High Availability topology, Remote Diagnostic sections do not show proper information about interface present on the VMware SD-WAN Standby Edge (the interface used in Enhanced HA mode).
Certain Remote Diagnostic sections which contain information or action specific to interfaces (like Interface Status and Clear ARP cache) do not show information about the interface on the Standby Used used in Enhanced HA mode.
Fixed Issue 40268: Where a user changes the configuration of a VMware SD-WAN Hub or an Edge-to-Edge via Hub configuration, the Spoke Edge installs routes that are marked as 'False'.
The Spoke Edge installs routes in the FIB which are marked as False (as there is no tunnel from the Hub for those routes) and these routes stay in the FIB for ~2 minutes before being cleared out. In that time, these False routes may cause disruption to some networks.
Fixed Issue 44256: For an enterprise where two different sites deploy their VMware SD-WAN Edges as Hubs while also using a High Availability topology, and each site uses the other Hub site as a Hub in its profile. If one of the Hub sites triggers an HA failover, it may take up to 30 minutes for both Hub Edges to reestablish tunnels with each other.
On an HA failover, both Hub Edges try to initiate a tunnel with each other at the same time and neither replies to the peer, the packet exchange between both Hubs occurs, but IKE never succeeds. This leads to a deadlock that has been observed to take up to 30 minutes to resolve on its own. The issue is intermittent and does not occur after every HA failover.
Without this fix the only way to prevent this issue from occurring is to use a workaround where the customer configures only one of the two HA Hub sites to use the other Hub site as a Hub for itself. For example, where there are two HA Hub sites, Hub1 and Hub2, Hub1 could have Hub2 as a Hub for itself in its profile, but Hub2 must not use Hub1 as a Hub in its profile.
Fixed Issue 46489: If different Partner Gateway enabled profiles are assigned to multiple VMware SD-WAN Edges, the Edges will retain stale routing entries for the VMware SD-WAN Partner Gateways not assigned in their profile.
If different Partner Gateway enabled profiles are assigned to multiple Edges, the Edge keeps the routing entries which are learned from other Gateways, and those routes are considered stale entries. The customer impact is traffic not being routed correctly because the Edge is trying to send traffic to invalid routes for that profile.
Fixed Issue 49787: A user navigating to the Remote Diagnostics page for a VMware SD-WAN Edge on the VMware SD-WAN Orchestrator UI may observe the UI processing the request but the Edge's diagnostics page does not load.
The issue can occur for an Edge where certificates have been disabled and is the result of the VMware SD-WAN Edge's connection being continually reset because the Edge is repeatedly renewing its certificate.
Fixed Issue 50422: Peer MAC address may be learned incorrectly via ARP when using VLAN tagged routed interfaces.
If you have a VLAN tag assigned to a routed interface and the next hop sends untagged ARP requests, it will cause the untagged MAC to be learned and can cause traffic to black hole depending on which entry is learned first. Without this fix, the only workaround is to filter out untagged ARP requests from the next hop if you have a VLAN tag on the routed interface.
Fixed Issue 52628: Packets with an unknown source are allowed from the LAN interface of a VMware SD-WAN Edge.
With this issue packets which are sourced from a subnet different from the LAN subnet are allowed to pass through the Edge. This is caused by the Edge LAN interfaces not having reverse-path forwarding (RPF) enabled. The fix enables RPF on all Edge LAN interfaces and packets from LAN interfaces would be allowed only if the packets are sourced from the configured LAN subnet.
Fixed Issue 53359: BGP/BFD session may fail during some DDoS attack scenarios.
If traffic is flooded from the client connected to the routed interface to the LAN client, the BGP/BFD session can fail. Also when real-time high priority traffic is flooded to the overlay destination, the BGP/BFD session can fail.
Fixed Issue 54001: A VMware Edge is unable to send traffic after a Tx queue hang on SFP interfaces.
In rare cases, when the Edge sends an invalid sized packet (less than 17 bytes or greater than 1526 bytes) to DPDK, the transmit queue becomes stalled and causes any further traffic to not be forwarded by the Edge. Rebooting the Edge temporarily corrects the issue, but the problem can happen again when an invalid sized packet is sent from the Edge service to DPDK. Only upgrading to a level with the fix avoids this problem.
Fixed Issue 54157: User may observe traffic drop on a VMware SD-WAN Gateway for the traffic from a datacenter server to a legacy client advertised through BGP over IPsec from Gateway.
In Release 4.4.x and earlier, it is not possible to distribute provider edge (PE) bound legacy routes to a Non SD-WAN Destination (NDS) via Gateway and NSD routes to the PE. Release 4.5.0 provides the data pipeline support between PE-bound destinations and a NSD via Gateway. This also includes route redistribution facility between PG-BGP and NSD-BGP.
Fixed Issue 54846: VMware SD-WAN SNMP MIBs use counters for Jitter, Latency, and Packet Loss.
In VMware SNMP MIBs, Latency, Jitter, and Packet Loss are defined as Counter64 which is not appropriate for these types. Counters should be used for data types that are ever increasing values and which never reset in SNMP like bytes Tx/Rx. In contrast, latency, jitter, and packet loss do not have ever increasing values but dynamically adjusted values and should not use counters.
Fixed Issue 56218: For a customer site deployed with a High Availability topology or where HA has just been enabled, when the Edges are upgrade from 3.2.x to 3.4.x, the Standby Edge may go down.
When HA is enabled or the HA Edges are upgraded from 3.2.2 to 3.4.x after a WAN setting is configured using the Local UI, the HA interface (e.g. LAN1 or GE1 depending on the Edge model) will be removed from the Standby Edge and HA status will be set to HA_FAILED on the VMware SD-WAN Orchestrator.
Fixed Issue 57011: For a site configured with a High Availability topology, whenever segments are added and then deleted on that site, one of the HA Edges may experience a Dataplane Service failure and if the service failure is on the Active Edge, the site would also experience an HA failover.
When segments are added and then deleted from an HA site, there is the potential for stale segments (in other words, the deleted segments might still show up on one of the Edges in the HA pair). Due to this mismatch in segment information between the HA Edges, any event meant for the stale segment might be sent to the other Edge resulting in a Dataplane Service failure, an HA failover if the service failure is on the Active Edge, and the generation of a core dump that will be found on a diagnostic bundle taken after the failover.
Fixed Issue 58259: In some cases a customer may observe a VMware Non SD-WAN Destination tunnel down on the Gateway side with a Zscaler peer.
There are some cases when the Zscaler peer end deletes Phase 2 security association (SA) but the VMware SD-WAN Gateway still retains the SA. In these cases the tunnel will be torn down, and the customer will not be able to pass traffic.
Without the fix, the workaround is a phase2_sa_check.py script which walks over the Phase 2 SA table and checks if there is Phase 2 SA for which Phase1 SA is missing. If it finds one then the Gateway reestablishes the tunnel.
Fixed Issue 58453: Some Office365 packets are misclassified as SSL packets by the VMware SD-WAN Edge.
The VMware SD-WAN Deep Packet Inspection (DPI) engine is sometimes misclassifying packets that should be classified as Office365 as SSL instead. The impact is that these flows will be treated as SSL flows versus Office365 flows and that may mean they are treated with less priority, impacting the user experience.
Fixed Issue 59236: For sites using an Enhanced High Availability topology, tunnels are not formed if the WAN link connected to the Standby Edge is a Metanoia SFP and this behavior persists even after an HA failover.
For Enhanced HA, the WAN ports are blocked on the Standby Edge (in other words, the Edge does not allow TX on its WAN interfaces). In order to bring up a Metanoia SFP interface, there is a packet exchange needed between the hardware. As the Edge does not allow TX, the interface initialization does not succeed.
Fixed Issue 59629: On a customer site deployed with a High Availability topology, the user may observe the VMware SD-WAN Standby Edge restarting multiple times.
Both the Active and Standby Edge miss their HA heartbeat and both Edges become Active/Active (also known as "Split Brain"). To break the tie, the newly promoted Active Edge (the previous Standby Edge) will undergo a restart with a logging event: "Active/Active Panic". The fix for this issue involves promoting the priority of the HA Edge heartbeat thread so as to minimize the delay in processing the heartbeats which can be viewed as missed heartbeats causing the Active/Active state.
Fixed Issue 60010: For a site using VMware SD-WAN Edges with VNF deployed in a High Availability topology, the VNF on the Standby Edge is not accessible via SSH after a LAN-side port flap.
The LAN side interface on the Standby VNF is in normally in a disabled state. Due to the LAN-side port flap, it moves to a forwarding state which results in a wrong MAC address port mapping on the bridge interface which results in inaccessibility of the VNF.
Fixed Issue 60073: DNS packets via a VMware SD-WAN Edge's PPPoE interface are not processed.
The DNS packets if traversed via Edge's PPPoE interface are not processed and dropped. Due to this the DNS over PPPoE functionality is impacted and customer would observer, for example, issues such as CSS tunnels not coming after an upgrade to Release 4.2.0 or later.
Fixed Issue 60184: A Branch VMware SD-WAN Edge installs routes marked with uplink community from a non-profile Hub Edge (Dynamic Branch-to-Branch) and prefers these routes before everything else.
The non-profile Hub Edge is treated as a Branch Edge when Dynamic Branch-to-Branch is used. So, when there is a dynamic tunnel bring-up, the issue occurs as described. The only workaround is to add Hubs to all profiles but this cannot scale on larger networks where there are 20+ Hub Edges due to the enormous number of routes that would be created.
Fixed Issue 60367: Stateful Firewall rules do not drop the first packet in a flow going to a VMware SD-WAN Edge IP even with a VLAN-specific drop rule in place.
Sending a ping to and Edge's VLAN IP is successful even with VLAN-specific Stateful Firewall drop rules. With VLAN specific Stateful Firewall Drop rules, the behavior is not consistent between ping to a VLAN host and the VLAN IP of the Edge. Ping to a VLAN IP of the Edge is successful. The fix disallows ping to either the Edge VLAN-IP or VLAN host.
root@vc-client1:~# ping 10.0.2.1
PING 10.0.2.1 (10.0.2.1) 56(84) bytes of data.
64 bytes from 10.0.2.1: icmp_seq=1 ttl=62 time=1.37 ms
From 10.0.2.1 icmp_seq=2 Destination Net Unreachable
64 bytes from 10.0.2.1: icmp_seq=83 ttl=62 time=53.6 ms
From 10.0.2.1 icmp_seq=84 Destination Net Unreachable
64 bytes from 10.0.2.1: icmp_seq=173 ttl=62 time=126 ms
From 10.0.2.1 icmp_seq=174 Destination Net Unreachable
--- 10.0.2.1 ping statistics ---194 packets transmitted, 3 received, +3 errors, 98% packet loss, time 193216ms
rtt min/avg/max/mdev = 1.373/60.671/126.962/51.510 ms
We also see a single response to SNMP queries:
$ snmpwalk -c public -v 2c 10.100.30.1
iso.3.6.1.2.1.1.1.0 = STRING: "VeloCloud EDGE5X0"
Timeout: No Response from 10.100.30.1
pkt_tracker shows the drops and the allowed packets too:
21/03/26 00:22:33.283011,072|7|27428/5|vc_pkt_print_track:187 dir: wan_to_lan, 10.0.2.1:5201 <-> 10.0.4.25:44647 proto 6, app 1461, class 7, policy "User Default", reason "tun_send", count 20 path "2 3 5 11 20 47 48 49 54 58 60 65 74 "
21/03/26 00:22:34.282824,192|7|27416/63|vc_pkt_print_track:187 dir: wan_to_lan, 10.0.2.1:5201 <-> 10.0.4.25:44647 proto 6, app 1461, class 7, policy "User Default", reason "vcmp_inb_fw_drop", count 19 path "2 11 47 48 49 54 58 60 74 "
21/03/26 00:22:35.283832,832|7|27416/64|vc_pkt_print_track:187 dir: wan_to_lan, 10.0.2.1:5201 <-> 10.0.4.25:44647 proto 6, app 1461, class 7, policy "User Default", reason "vcmp_inb_fw_drop", count 18 path "2 11 47 48 49 54 58 60 74 "
21/03/26 00:22:36.284884,480|7|27416/65|vc_pkt_print_track:187 dir: wan_to_lan, 10.0.2.1:5201 <-> 10.0.4.25:44647 proto 6, app 1461, class 7, policy "User Default", reason "vcmp_inb_fw_drop", count 17 path "2 11 47 48 49 54 58 60 74 "
21/03/26 00:22:37.286092,544|7|27416/66|vc_pkt_print_track:187 dir: wan_to_lan, 10.0.2.1:5201 <-> 10.0.4.25:44647 proto 6, app 1461, class 7, policy "User Default", reason "vcmp_inb_fw_drop", count 16 path "2 11 47 48 49 54 58 60 74 "
21/03/26 00:22:54.623312,128|7|27428/6|vc_pkt_print_track:187 dir: wan_to_lan, 10.0.2.1:40125 <-> 10.0.4.25:53 proto 17, app 216, class 21, policy "User Default", reason "tun_send", count 15 path "2 3 5 11 20 47 48 49 54 58 60 65 72 74 "
21/03/26 00:22:54.623384,576|7|27424/3196|vc_pkt_print_track:187 dir: lan_to_wan, 10.0.2.1:40125 <-> 10.0.4.25:53 proto 17, app 216, class 21, policy "User Default", reason "cloud_to_edge_drop", count 14 path "7 31 32 34 71 74 "
21/03/26 00:22:55.622983,680|7|27416/68|vc_pkt_print_track:187 dir: wan_to_lan, 10.0.2.1:40125 <-> 10.0.4.25:53 proto 17, app 216, class 21, policy "User Default", reason "vcmp_inb_fw_drop", count 13 path "2 11 47 48 49 54 58 60 74 "
21/03/26 00:22:56.624076,544|7|27416/69|vc_pkt_print_track:187 dir: wan_to_lan, 10.0.2.1:40125 <-> 10.0.4.25:53 proto 17, app 216, class 21, policy "User Default", reason "vcmp_inb_fw_drop", count 12 path "2 11 47 48 49 54 58 60 74 "
Fixed Issue 61344: When a VMware SD-WAN Edge has >180Mbps traffic flowing through it, there will be slowness experienced in the next new traffic initiated through the Edge.
When >180Mbps traffic is flowing through an Edge, new traffic is buffered at the VMware SD-WAN Gateway and is not processed by the Gateway in the expected time and thus the traffic experiences latency.
Fixed Issue 61361: When applying a software update to upgrade a VMware SD-WAN Edge 3400, 3800 and 3810 to Edge Release 3.4.5, 4.0.2, or 4.2.1, there is a change the Edge models may not boot back up immediately after the update.
Release 3.4.5, 4.0.2, and 4.2.1 include a particular firmware update for the complex programmable logic device (CPLD), and the update triggers a reboot that can sometimes get "stuck", requiring a manual power cycle to restart the system.
Without the fix, a local user needs to manually power cycle the Edge to complete the update.
Fixed Issue 61403: For a site deployed with a High Availability topology where the VMware SD-WAN Edges are LTE models (in other words, the Edge 510-LTE or Edge 610-LTE), tunnels do not form on the Standby Edge's LTE link when HA is enabled.
When an unactivated Standby Edge with an LTE link has a 3.4.4, 4.0.1, or 4.2.0 image and HA is enabled, tunnels do not form on the Standby Edge's LTE link after the Edge is activated and upgraded post-enabling HA.
Fixed Issue 61502: During activation of a VMware SD-WAN Edge, the download of the new software image to be applied is delayed indefinitely.
In an environment with unreliable network connectivity, or certain types of traffic throttling, the HTTPS download of the new software image can get stuck. Without this fix, should this scenario happen, please power cycle the Edge and wait for a couple of minutes. The download should restart automatically, though it will restart all the way from the beginning.
Fixed Issue 61583: If a customer enables High Availability for a site, the VMware SD-WAN Edge will go offline and the site goes down and all customer traffic is disrupted.
When HA is enabled, the Edge will at a minimum go offline for ~5 minutes with customer traffic disrupted during that period. The Edge may be able to roll back to the previous configuration and resume operation after that ~5 minute period, though that Edge would continue to operate as a standalone with HA not possible. However, if the Edge does not successfully roll back to the last configuration, then it will stay down until a local user performs a factory reset followed by an RMA Reactivation (with HA not enabled) to restore connectivity at that site.
For more information please consult the KB article Enabling High Availability on a VMware SD-WAN Edge using Release 4.3.0 GA or 4.4.0 GA may cause the Edge to go offline. (84396)
Fixed Issue 61657: SNMP on a VMware SD-WAN Gateway is impacted when IPv6 routes are configured for attributes like publicIpAddr, localIpAddr, nextHop, and peerIp which results in errors when running an SNMP walk for that Gateway.
SNMP relies on the parsing of IP addresses using a '.' delimiter. In the case of an IPv6 address there is also a ':' delimiter and this results in failure for SNMP walks. The fix for this issue changes the parsing logic in SNMP.
Fixed Issue 61725: For a site using a High Availability topology where USB WAN links are used, running the Remote Diagnostic "HA Info" will result in errors.
When a USB/LTE modem is present or was previously present only on the VMware SD-WAN Active Edge and not on the Standby Edge, The Active Edge tries to fetch USB/LTE interface details on the Standby Edge and the result is the Edge throws an error since the USB/LTE interface is not present on the Standby Edge.
Fixed Issue 61759: ISP name and Bandwidth are shown as empty on the VMware SD-WAN Edge Local UI (Overview page and Routed Interface Properties page).
If a routed interface has more than one overlay the local UI is expected to show the details of the interface which has the latest "last active" value (since it is designed to show details of only one). When user opens the Local UI Overview/Details page, the bandwidth and ISP values for routed interfaces with multiple overlays is shown as empty instead.
Fixed Issue 62197: A VMware SD-WAN Gateway may restart its Dataplane Service.
The Gateway encounters a memory leak which occurs while syncing routes from itself to the VMware SD-WAN Orchestrator. When memory consumption reaches critical levels, the Gateway's dataplane service restarts to clear the memory, causing a brief disruption in customer traffic using the Gateway.
Fixed Issue 62280: The VMware SD-WAN Edge's LAN subinteface is not showing in a traceroute from a routed host to a client connected through Edge-to-Edge.
When the traceroute is done from a host (not directly connected to the Edge), to a client in an Edge-to-Edge topology, the Edge's interface IP is not displayed in the o/p. This happens only when a VMware SD-WAN Gateway configuration is not done on the Edge interface on the path to the host.
Without the fix, the only workaround is to enable the Gateway configuration on the Edge interface connecting to that traceroute host.
Fixed Issue 62373: When bringing up a High Availability site configured for a unique MAC address, the vMAC is programmed from a routed interface to a switched interface.
The unique MAC address will not be programmed for the HA Edge when the interface type is changed from a WAN interface to a Switched interface and this leads to traffic loss.
Without the fix, the workaround to resolve this issue is to restart the HA Edges, which will program the proper MAC on the now switched interface.
Fixed Issue 62897: If an Operator User logs into a VMware SD-WAN Gateway and runs the tcpdump command on either eth0 or eth1, the results are not delivered correctly.
tcpdump is a critical Gateway debugging command for Operators and a lack of correct output greatly impairs their efforts. There is a way to get the correct output without the fix, which is to use a command which pipes tcdump output to cat, for example: tcpdump -nnplei eth0 | cat
Fixed Issue 62552: A site may experience intermittent periods of high packet loss and connectivity issues.
This is caused by the API that checks for ARP resolution telling the Edge there is a successful ARP resolution for a device while delivering a MAC address of 00:00:00:00. This address is kept in the ARP cache and any packets intended for the device where the MAC is listed as zero are dropped. In this issue, many such instances of successful ARP's with zero MAC addresses are delivered causing high packet loss and connectivity issues.
Note: There is a previous issue 60130 with the same underlying behavior and cause but which includes a different resolution from 62552. With 60130, the resolution was a defensive workaround, while 62552 includes a complete fix that prevents any recurrence of this issue.
Fixed Issue 62685: If LAN side NAT is configured with the same outside IP for different LAN subnets with NAT type as source, traffic destined for the Cloud will not work.
For the outside IP used in LAN side NAT rules, we configure a static route and advertise it to the remote branches. For the return traffic to be routed to the correct the LAN subnet, route lookup should be done based on the Inside IP configured in the LAN side NAT rule instead of the next hop in the static route. But for the return traffic from cloud, the route lookup is done based on the next hop in the static route and traffic can get routed to the wrong LAN subnet.
Fixed Issue 62736: On a hardware-based VMware SD-WAN Edge, when user is accessing the Local User Interface, on the routed interface properties page of the Local UI, for PPPoE interfaces, the MAC address field is shown as empty and the IP address field shown is wrong.
Starting release 4.3.x, interface details in Local UI are being fetched from NETLINK events instead of constantly polling for updates. Since PPPoE interfaces themselves interfaces do not have MAC addresses (base interfaces do), the MAC address is shown as empty. Also, a wrong parameter of NETLINK was being used to report the IP address (IFA_ADDRESS instead of IFA_LOCAL). DHCP/Static interfaces do not have this issue as the value of both fields are the same for them.
Fixed Issue 63056: A VMware SD-WAN Edge may experience a kernel panic with a resulting reboot and core.
The mutex mon process fails with SIGXCPU and a core is triggered. Allowing all threads to use both cores is the fix along with moving the Edge Dataplane Service < > frr communication to Unix Domain sockets which gains all the benefits of TCP sockets without the heavy kernel overhead and better performance.
Fixed Issue 63141: For a site using an Enhanced High Availability topology where Metonia ADSL2+ SFP modules are being used, on a failover the ADSL2+ SFP modules fail to come up.
When an Enhanced HA Edge fails over or if the network is restarted, an Edge with an ADSL2+ configuration fails to come up.
Fixed Issue 63359: For a site configured with a High Availability topology and OSPF and where the VMware SD-WAN Edges are using a MGMT IP Edge build, when these Edges are upgraded from a 3.4.x to a 4.2.x MGMT-IP build, OSPF connectivity may be broken post-upgrade.
When the HA Edges are upgraded to a 4.2.x MGMT IP build, the HA systems may define its Router ID as 169.254.2.2. This is not the expected behavior given that the Edge selection of Router ID should not take the HA interface's IP Address into account. This Router ID breaks OSPF connectivity and there is a complete disconnection as route exchange no longer occurs.
Without the fix, the only workaround is to restart the Edge service (triggering an HA failover) as this will force a reselection of the Router ID which should be a correct one after the restart.
Fixed Issue 63362: For a site using an Enhanced High Availability Topology, a DHCP/PPPoE enabled interface stops sending traffic after the Standby Edge is either rebooted, or power cycled.
In an Enhanced HA topology if DHCP/PPPoE is enabled on a proxy interface (in other words, the HA link state is set to USE_PEER) it fails to get an address from the server after the Standby Edge either reboots or power cycles.
Without the fix, the only workaround is to either change the dynamic address to a static address type or do a forced HA failover to get an IP address from the server.
Fixed Issue 63513: For a customer using VMware Edge Network Intelligence, the software version displayed for the VMware SD-WAN Edge is not updated after an Edge software upgrade.
The Edge has in fact upgraded to the latest Edge Network Intelligence version, but the Edge communicates the older version number to the VMware SD-WAN Orchestrator and this is what the customer observes. The customer encounters the issue after upgrading an Edge. After the Edge is upgraded from an older to a newer release, the customer continues to see the older release version for Edge Network Intelligence.
Fixed Issue 63645: A VMware SD-WAN Edge may experience memory issues like corruption which results in the Edge experiencing a Dataplane Service Failure during a tunnel flap.
Reference counting is used to keep the Edge's multi-threaded system safe from accessing already freed up memory. In one scenario, adding reference counting to one of the internal data structure which contains Gateway information can cause the Edge's memory to become corrupt.
Fixed Issue 63692: On a VMware SD-WAN Gateway where Federal Information Processing Standards (FIPS) is enabled during cloud-init, BGP sessions do not come up.
If a user enables FIPS on the Gateway during cloud-init and then configures BGP, the user would observe that BGP neighborship does not come up.
Fixed Issue 63725: For a customer who deploys a Non SD-WAN Destination (NSD) via Gateway where redundant VMware SD-WAN Gateways have also been configured, when NSD traffic fails over from the Primary Gateway to the Secondary Gateway, the traffic fails.
This issue is caused by the lack of a route to the peer destination on the Secondary Gateway. When the traffic fails over to the Secondary Gateway, because the Secondary Gateway does not have a route to the NSD's peer destination, the Gateway sends the traffic direct. Because the traffic is sent direct when trying to connect with the peer destination, all NSD via Gateway traffic fails.
Fixed Issue 63752: On a VMware SD-WAN Gateway where Federal Information Processing Standards (FIPS) is enabled, if a user attempts to generate a diagnostic bundle, the attempt will either fail or timeout.
A Gateway configured for a FIPS mode of operation enforces application security profiles which prevent some diagnostic data from being collected on the system and this causes the diagnostic bundle generation to fail.
Fixed Issue 63983: If a partner is using a monitoring tool (for example, Prometheus) to collect CPU and memory utilization metrics for a VMware SD-WAN Gateway, the user of this tool would also see outputs for CPU and memory usage of the Orchestrator as well.
Because there is no prefix added to differentiate CPU and Memory statistics, the monitoring tool would collect CPU and memory usage for both the Gateway and also the Orchestrator.
Fixed Issue 64078: If a partner is using a monitoring tool (for example, Prometheus) to collect throughput metrics for a VMware SD-WAN Gateway, and a particular Gateway uses a bonded interface, the throughput statistics for that Gateway would be inaccurate.
The Gateway is only exporting throughput counters for eth interfaces and not exporting statistics for bonded interfaces, which is a major issue since many Gateways use bonded interfaces.
Fixed Issue 64184: When a user enables High Availability on a site using two VMware SD-WAN Hardware Edges, the user may observe that when the point of upgrading the software of the Standby Edge is reached the upgrade for the Edge does not occur and the Standby Edge remains in an inactive state.
This issue happens rarely under the above conditions, but when it does occur the cause of the issue is, after HA is enabled, an HA worker thread is ended on the Active Edge during the action of invoking the Standby Edge image upgrade. Ending this HA worker thread leads to the Standby Edge being in an inactive state.
Fixed Issue 64205: User will observe a high number of handoff queue drops of VCMP Data for a VMware SD-WAN Gateway, leading to a poor user experience.
When there are continuous flow create events, the packet processing on VCMP (VeloCloud Management Protocol) Data thread gets slower. This fix reduces the VCMP Data thread load by redirecting VCMP Control messages to a different thread and by eliminating some of the continuous log messages.
Fixed Issue 64713: For a customer site that uses an Enhanced High Availability topology, if a user restarts the Edge service or makes a configuration change that results in the Edge service restarting, the customer may observe that the restart takes much longer than expected before the Edge recovers.
When this issue occurs, there is a race condition between the Edge process and other competing processes that results in the delay in starting up. In the diagnostic bundle logs, the user would see a line with the phrase "FATAL: Cannot get hugepage information".
Fixed Issue 64951: A customer using Zscaler as a Cloud Security Service (CSS) may observe ZSCALER_MONITOR_FAILED events on the VMware SD-WAN Orchestrator events page when an L7 Health Check is done.
These events are false and the Zscaler tunnel is actually intact which causes confusion for the user.
Fixed Issue 64633: A customer who uses a Non SD-WAN (NSD) via Gateway to connect to a VMware Cloud (VMC) on AWS peer may observe an intermittent traffic drop lasting ~30 seconds each time.
This issue is observed with VMware Cloud (VMC) on AWS only. The peer starts an IKE rekey 30 seconds before the security association (SA) expiration and after each rekey the peer retains the old SA and uses it until its expiration, while the VMware SD-WAN Gateway deletes the inbound SA. The deletion of the inbound SA causes the traffic drop with this peer. The frequency of this issue is contingent on the peer's rekey policy. If the peer rekeys every 45 minutes, then this issue would happen every 45 minutes, if 12 hours, then every 12 hours. The traffic will recover automatically after ~30 seconds by itself, when the peer switches to the new SA.
Fixed Issue 64961: A VMware SD-WAN Edge may experience a Dataplane Service Failure and restart that service if processing IP packets that include options.
The processing of IP packets with options, could result in a Dataplane Service Failure due to incorrect parsing of the options fields (the parsing continues beyond the end of the options list). The Dataplane Service failure is triggered by mutex mon. Without this fix, the only way to minimize the risk of this issue is to avoid setting options other than Record Route (RR) and No Option (NOP) in the user-traffic IP packets.
Fixed Issue 65037: A HTTPS/SSL connection may fail to establish because of a corrupt certificate if the certificate has special characters or spaces in the SSL common name field.
The VMware SD-WAN Edge inspects all user traffic passing through it so that it may identify the application to which the traffic belongs. It is needed for correctly applying business policies and also for the VMware SD-WAN Orchestrator to display per-application statistics on the Edge's Monitoring page. However an issue in the application identification code caused a byte in the SSL common name to be overwritten in case the SSL common name had special characters or spaces and thereby corrupting it.
Fixed Issue 65186: For a customer site using multiple WAN links, if there is a business policy configured to use one link with a Preferred or Mandatory policy, the traffic type covered by the business policy continues to be load balanced across all available links.
Even though the Business Policy is configured to route traffic to one WAN Link using a mandatory or preferred configuration, traffic would be load balanced on multiple WAN links.
Fixed Issue 65219: A KVM SR-IOV type VMware SD-WAN Gateway using a i40evf driver drops customer packets of 1500 bytes or greater.
Anything less than 1496B data size will not be dropped. If a user attempts to SSH into the Gateway host, the user will observe a hang based on the condition described.
Fixed Issue 65293: The throughput performance of a VMware SD-WAN Gateway deployed in AWS and running with Amazon's Elastic Network Adapter (ENA) driver is degraded when using Release 4.x.
This issue will occur if the Gateway is upgraded to a 4.x build (from 3.x) or on a new deployment using a 4.x build. Gateways using Release 4.0.0 or later have DPDK v19.11, and starting from DPDK v19.02, Amazon's ENA driver uses low-latency queuing (LLQ). However, for LLQ to work efficiently the write-combine for memory setting must be enabled per the ENA reference guide. If memory mapping is not write-combined, a Gateway deployed on AWS experiences high CPU usage, significantly impacting throughput. The fix for this issue enables write-combining on the ENA adapter for Gateways deployed on AWS.
Fixed Issue 65432: A traceroute from a client which is LAN-side connected to a VMware SD-WAN Edge to a DC server via a VMware SD-WAN Gateway does not display the Gateway IP in the traceroute output.
On initiating a traceroute from the LAN client to the DC which is reachable through the Gateway, the traceroute displays all the hops except the Gateway IP.
Fixed Issue 65521: A VMware SD-WAN Edge may encounter a Dataplane Service failure and restart as a result.
An Edge service restart will disrupt customer traffic for ~5-10 seconds. The Edge Dataplane Service fails while processing an unexpected control message during a VeloCloud Management Protocol (VCMP) tunnel creation handshake. This issue not dependent on network topology or number of flows, or throughput. It is both rare and random, but has the potential to occur on any type of customer enterprise.
Fixed Issue 65526: The VMware SD-WAN Orchestrator generates Alerts and Events for a VMware SD-WAN Edge in a "Degraded" state which never reaches an "Offline/Down" state.
When a VMware SD-WAN Edge initially loses connectivity to the Orchestrator (on a heartbeat check), this state is called "Degraded". Should the Edge loss of connectivity to the Orchestrator continue, the Edge would then be marked as Offline/Down, and this second state is when an "Edge Down" Event should be posted on the Orchestrator's Monitor > Events page and a matching Alert sent out as appropriate to a Customer's Alerts configuration. However, the Orchestrator is generating an Event and sending an Alert for an Edge in a Degraded state, resulting in a possibly large number of spurious Edge Down Events and Alert notifications for the customer.
Fixed Issue 65539: A BGP session established between two devices across two different branches does not come up when the customer has upgraded their VMware SD-WAN Edges to Release 4.2.x.
When a customer upgrades their Edges from a lower version to Release 4.2.x, the BGP sessions between 2 LAN devices of different branches established over VCMP tunnels will not come up.
Fixed Issue 65839: For flows initiated from the clients behind a VMware SD-WAN Hub Edge to the LAN behind a Spoke Edge, the return traffic from the spoke is routed via the Partner Gateway if the default route is advertised from the Partner Gateway.
The expected behavior is for a flow that originates from a Hub Edge to return by the Hub Edge as well. If there is no default route or Edge-to-Edge route advertised from the Hub Edge to the Spoke Edge, the route lookup on the Spoke Edge for the return traffic matches the Partner Gateway default route and the return traffic is routed to the Partner Gateway instead of the Hub Edge.
Fixed Issue 65929: When a user turns on High Availability on the VMware SD-WAN Orchestrator UI for a customer site using physical Edges, the Edge may go offline immediately and no traffic is passed by the Edge.
During startup, an HA Edge will block traffic until the Edge's dataplane service starts up. However, the dataplane needs some information from the management process to start, and the management process itself can block trying to resolve a DNS name for the Orchestrator (because its query is blocked by the above). It needs to do an asynchronous resolution of the Orchestrator address so as not to block the Edge's dataplane service from starting up.
Fixed Issue 65985: For a customer using Dynamic Edge-to-Edge, a VMware SD-WAN Edge in their network may abruptly drop all tunnels and then be unable to build tunnels to any other sites in the network.
Once the site drops all its tunnels, the Edge's maximum tunnel value becomes corrupted and shows a negative value for the maximum # of tunnels. This corrupted value prevents the Edge from forming any new Dynamic Edge-to-Edge tunnels to other Edges. The impact is severe as the Edge cannot communicate with any other site in the network.
Without the fix the only way to clear this issue is to perform an Edge service restart or an HA failover for HA sites.
Fixed Issue 66119: A VMware SD-WAN Virtual Edge deployed in a remote region of the world may not activate when using cloud-init.
If network latency between the Edge and the VMware SD-WAN Orchestrator is greater than 1 second, the Virtual Edge fails auto-activation via cloud-init during initial deployment.
Fixed Issue 66325: Customer traffic that should be matching a BGP learned route is instead matching a business policy with the potential for disrupting customer traffic.
If a customer enterprise uses a business policy that configures a Source as the IP of a routed client and the destination as Internet, traffic that should match a BGP learned route instead uses the business policy, including whatever the traffic classification is for that policy (for example, Real Time), which can cause customer traffic disruptions.
Fixed Issue 66355: For a customer where the Stateful Firewall is enabled and at least one LAN side NAT (Many:1) rule is configured, inter-VLAN flows do not work.
With Many:1 LAN side NAT rules, the TCP state is not maintained properly for the inter-VLAN traffic and with Stateful Firewall also enabled, the packets will be dropped.
Fixed Issue 66366: For a customer using multicast with a large number of neighbors, a VMware SD-WAN Edge may experience a Dataplane Service failure and restart, causing a brief disruption in customer traffic.
"Large number of neighbors" is defined as ~1600 PIM neighbors. In the case where this issue happens, while traffic is running for a group from 1600 Spoke Edges to one receiver behind a Hub Edge, the PIM service fails and this in turn causes the Edge service to also fail, causing the restart.
Fixed Issue 66636: VMware SD-WAN Edge does not honor source interface configuration for RADIUS authentication traffic when the source is a loopback interface.
When a user configures RADIUS on a Profile or Edge and specifies a loopback interface as the desired source interface for outgoing authentication traffic, the Edge fails to create a NAT rule as expected due to a parse error stemming from an inconsistency in the expected versus actual type of the "port" parameter for the authentication service that is dispatched from the VMware SD-WAN Orchestrator. This value should be an integer, and the Orchestrator API validation logic has been modified accordingly.
Fixed Issue 66676: When a Business Policy NAT is configured, the return traffic from the VMware SD-WAN Gateway may not NAT back to the original source IP.
During the NAT entry insertion in the code, it is expected to delete the older entries. However, due to not using all keys for hash table look up, older entries were not getting deleted in some instances and this was causing the NAT entry insertion error.
Fixed Issue 66714: User is unable to use a hostname for DCHP Option 150 on a VMware SD-WAN Edge.
If a user configures a hostname for DHCP Option 150, the attempt to obtain an IPv4 address from the Edge with a DHCP client will result in dnsmasq error messages in the Edge logs which refer to the hostname as a bad IP address and the DHCP client obtains no IP address from the DHCP service on the Edge. While RFC 5859 was designed to use IPv4 addresses instead of a hostname, other current networking devices allow for the usage of a hostname for Option 150. So customers who are using hostnames on other devices would need to accommodate for Edge devices so that DHCP service on the edge does not break.
Fixed Issue 66794: If a VMware SD-WAN Orchestrator is upgraded to Release 4.3.0, a user may be unable to configure Port Forwarding or 1:1 NAT rules for a VMware SD-WAN Edge.
For an Edge where the firewall is on but has no rules configured, on the Edge's Configure > Firewall page if the user configures a Port Forwarding rule or a 1:1 NAT rule and attempts to save that rule, the VMware SD-WAN Orchestrator will not save the rule and instead displays a 'networkSegments is not iterable' error on the page. This issue is caused by the Orchestrator using segmentID as the array index for array networkSegments.
Fixed Issue 66801: For a customer site using a High Availability topology and a VNF, the customer may not be able to connect to a VNF to perform trust establishment from a management server.
The issue is seen at HA sites when routed interfaces are DHCP enabled and there is no default route present in the kernel route table. In that case the kernel responds with "ICMP destination unreachable".
Without the fix, the workaround to prevent this issue is to add a default route on the Standby Edge so the the Edge does not send "ICMP Unreachable" back to the VNF VM, causing the SSH connection to reset.
Fixed Issue 66901: When a customer upgrades their VMware SD-WAN Edges to Release 3.2.2, the customer may observe that some of the Edges do not come up post-upgrade.
The issue happens very rarely but when it does happen the customer would see on the Orchestrator event messages (and log messages) of the user process ending and shutting down for the Edge, but the Edge does not turn on after shutting down. Edge instead powers down and the Orchestrator shows the status of the Edge as "Down" with an event message of "Edge down".
Without the fix, the way to remediate this issue is to power back on the Edge and perform a manual factory reset. After the Edge is reset, the Edge will function as before and needs to be reactivated to that customer site.
Fixed Issue 67060: VMware SD-WAN Edge may show a large memory utilization which may potentially cause an Edge service restart if sufficiently high.
The issue is a memory leak which manifests as a slow and continuous increase in memory utilization. The issue is occurs when multiple HTTP request packets are sent for a single flow, the memory leak specifically happens while the Edge is parsing the HTTP request packets.
Fixed Issue 67083: VMware SD-WAN Edge may experience a Dataplane Service failure and restart as a result, with a brief disruption of customer traffic.
In a few scenarios the VeloCloud Management Protocol (VCMP) data packet are processed with wrong the parameters (for example, a data packet is misclassified as a control packet) which triggers an exception and the service restart.
Fixed Issue 67173: When the same route is learned from multiple IBGP neighbors, the second best route selected from the BGP process is being used by the VMware SD-WAN Edge resulting in a black-holing of certain customer traffic.
Due to an issue in the Free Range Routing suite (FRR), IBGP was sending multiple next-hops to the Edge and it was picking the second best (last in the next hop order) to update the forwarding information base (FIB). The fix includes a command in the BGP process to send only the best next hop to the Edge.
Fixed Issue 67191: For a customer using a Cloud Security Service (CSS), the Layer 7 Health Check returns an erroneous failure and the CSS tunnels are torn down.
When there are a large number of Non SD-WAN Destination (NSD) tunnels on a VMware SD-WAN Gateway, the Virtual Tunnel Interface (VTI) IP can fall out of the given subnet mask /24 range, which is defined for the probes to be processed by the Gateway dataplane service. This is what causes an erroneous L7 Health Check failure. The fix updates the mask to /16 to accept the L7 for processing in the Gateway's dataplane service.
Without the fix, the only remediation is for an operator with access to the Gateway, the file /opt/vc/bin/gwd_ip_setup.sh can be manually changed to reflect the change in the mask(169.254.0.0/24 to 169.254.0.0/16). Followed by a Gateway service restart.
Fixed Issue 67197: A customer network may experience periodic disruption of multicast service in deployments with more than 1500 sources associated with a multicast group.
A software issue in the PIM stack's join-prune message handling logic fails with an exception when handling join-prune updates in deployments with more than 1500 sources associated with a multicast group.
Without the fix, the only way to prevent this issue is to limit the total number of multicast sources to 1000.
Fixed Issue 68785: DHCP INFORM packets are dropped by the VMware SD-WAN Edge software when received on an interface configured as a DHCP relay.
DHCP clients can request additional network information like DNS server or the Gateway address using the DHCP INFORM message once it has acquired an IP address. When the Edge is configured as a relay agent, these INFORM messages should be forwarded to the DHCP server but are getting dropped.
Fixed Issue 68829: For VMware SD-WAN Edge LTE models (in other words, the Edge 510-LTE or Edge 610-LTE), IPv6 paths are not formed over their LTE interfaces.
The udp6 packets are sent out with a 0 checksum, causing them to be dropped on the next hop. This resulted in the SD-WAN management paths being in an INIT state. The correct behavior is to populate checksum for udp6 packets.
Fixed Issue 68840: For a customer using a High Availability topology, SNMP polling is not able to retrieve LAN and WAN information from the VMware SD-WAN Standby Edge.
For HA SNMP GET, the Standby LAN/WAN count (vceHaStandbyLanItfNum and vceHaStandbyWanItfNum) is displayed either partially or not all.
Fixed Issue 67201: For a site using an High Availability topology, the customer may observe multiple reboots of the VMware SD-WAN Standby Edge with a potential disruption to customer traffic.
When the Standby Edge is detected, the Active Edge synchronizes all the path information to the Standby Edge. However, where there are a large number of path synchronization messages, the way the Edge processes these path synchronization messages can lead to either a Dataplane Service failure on the Standby Edge or to a thread priority inversion which would causes a delay in heartbeat processing while processing which can lead to an Active/Active state. In either instance on a conventional HA topology the customer impact would be minimal since the Standby Edge is not passing customer traffic. However, on an Enhanced HA deployment, where the Standby Edge is also passing traffic, the reboot(s) would disrupt some customer traffic. An HA Edge which includes this fix has enhanced path information processing code path to prevent the issue from occurring.
Fixed Issue 67694: A customer may experience a Cloud Security Service (CSS) tunnel failure because the L7 Health Check probes get conditionally backhauled.
L7 Health Check probes should never be conditionally backhauled, and when they are they will fail and this results in CSS tunnels being erroneously marked as down.
Fixed Issue 67259: Multicast traffic flow disrupted when PIM process restarts multiple times and PIM neighbor do not come up.
On a scale setup with 1600 PIM neighbors, when restarting PIM process multiple times while traffic is running from 700 Spoke Edges to a receiver behind a Hub Edge, after one of the restarts, only 570+ PIM neighbors came up out of the 1600 PIM neighbors. The only way to clear this issue is restart the Edge service.
Fixed Issue 67745: A VMware SD-WAN Edge which has a WAN link with some ISP provided routers may experience customer traffic issues if that ISP route goes down and then comes up.
When a WAN link from the Edge to some ISP routers (this issue was found with an ISP router used by Spectrum) goes down or the ISP router goes down and then and back up, the ISP router performs a diagnostic which includes briefly assigning a private IP to the Edge in the subnet 192.168.100.0/24, and then after that it assigns the public IP address. However the Edge installs the connected route for 192.168.100.0/0 and it is not cleared after it gets the public IP address.
Fixed Issue 67790: For a customer enterprise which uses either BGP or OSPF and has configured an inbound filter(s) to ignore certain routes, when Dynamic Cost Calculation (DCC) is enabled on this enterprise, the inbound filter(s) will no longer be in effect and traffic will attempt to use those routes.
Prior to DCC being enabled, the forwarding information base (FIB) will not include the routes that were set to IGNORE on the BGP/OSPF inbound filter. After DCC is enabled the FIB now includes these routes and traffic will attempt to use these routes with the potential for significant traffic disruption for the customer enterprise.
Without the fix, the only workaround is to restart OSPF/BGP for the inbound filter to be properly applied.
Fixed Issue 67869: When a VMware SD-WAN Hub Edge was previously configured as single stack IPv4 and later changed to dual stack IPv6 preferred, the older IPv4 tunnel does not get disconnected.
When this issue manifests, the customer would observe an incorrect tunnel count because the IPv4 tunnels are not being torn down. In effect, double the number of tunnels as there should be.
Fixed Issue 67889: SNMPv3 polling may not work properly for a customer polling multiple VMware SD-WAN Edges.
The issue is that VMware SD-WAN allows snmpd to generate a random engine_id for each Edge and in many cases this engine-id is duplicated for multiple Edges. When that is the case one of the Edges with the same engine-id will not report statistics back to the collector.
Fixed Issue 67947: With LAN side NAT configured, inter-VLAN traffic does not work after a route version update.
LAN side NAT rule is skipped for inter-VLAN traffic, but when route version is updated, the route lookup happens with the wrong IP and this can cause traffic failure.
Fixed Issue 68994: Customers who deploy a Non SD-WAN Destination (NSD) tunnel from a VMware SD-WAN Edge with a VMware SD-WAN Gateway may observe the tunnel flapping.
This issue is observed at tunnel establishment or at IKE rekey. Either the Edge or the Gateway deletes the security associations (SAs) based on IKESAID=0 which causes tunnel flapping. The tunnel automatically stabilizes, but the time needed to do this is not consistent and that can further the impact to customer traffic to the NSD.
Fixed Issue 69194: If a user moves a USB modem from one USB port to a different port on a VMware SD-WAN Edge, the Edge may experience a Dataplane Service failure and restart as a result.
USB ports are incorrectly being bound to the DPDK AF_PACKET drivers. This driver does not support port removal and could cause the Edge's dataplane service to fail when the USB dongle is moved from one port to another.
Fixed Issue 69497: The VMware SD-WAN MIBs shows vceLinkVpnState SNMP object even though that is no longer a valid object.
VMware SD-WAN no longer shows a differentiated VPN state on the VMware SD-WAN Orchestrator but still exposes this in SNMP. To be specific, the SNMP Collector polls for SNMP OID 1.3.6.1.4.1.45346.1.1.2.3.2.2.1.26, which it should no longer do.
Fixed Issue 69681: If a VMware SD-WAN Edge is configure with Hot Standby WAN links and also uses SNMP polling, the user will observe SNMP errors.
Error message would be similar to following:
ERROR [oids (10028:MainThread:10028)] [VCE.Path]<update>: Path failed update buffer: KeyError('HOTSTANDBY_IDLE',) INFO [oids (10028:MainThread:10028)] [VCE]<update_if_stale>: Current MIB buffer size: 217 DEBUG [oids (10028:MainThread:10028)] [VCE.Link]<ip2octet>: Failed to convert IP to Octet for caller <class 'vcsnmp.oids.Link'>[publicIpAddress] on []: ValueError("invalid literal for int() with base 10: ''",), used default value[00 00 00 00] instead
Cause of issue is that SNMP path states do no include Hot Standby link states and this causes SNMP issues including error messages.
Fixed Issue 69704: Enabling High Availability on a site using a VMware SD-WAN Edge 6x0 platform (610, 620, 640, 680) may break the Edge's communication with the VMware SD-WAN Orchestrator.
After enabling HA, due to certain timing conditions related to how long 6x0 interfaces take to come up, the Orchestrator communication breaks. This results in HA not coming up and the Edge loses complete connectivity to the Orchestrator, meaning the Orchestrator will mark the Edge as down and no further configuration changes can be made.
Fixed Issue 70041: A partner upgrading their VMware SD-WAN Gateways to Release 4.3.0 would observe that they are no longer able to ping the Gateway's VRF IP address.
When the ping fails, route_drop counter increments. Pinging the VRF IP address from a handoff interface was disabled with 4.3.0 and is restored with the fix found on Release 4.3.1.
Fixed Issue 70154: For a customer enterprise where the Stateful Firewall is enabled, the user will observe packet drops when sending bidirectional pings between branch clients with the same ICMP ID.
If a ping is initiated from Client A in Branch 1 to Client B in Branch 2 and vice versa, the ICMP states for both the pings will be tracked with the same flow object if the ICMP ID is the same and this can lead to multiple packet drops because of the sequence number check.
Without this fix, the workaround is to either disable the Stateful Firewall or to generate ping with different ICMP IDs.
Fixed Issue 70310: For a customer using multiple segments, when one or more segments are deleted or disabled, a VMware SD-WAN Edge may suffer a Dataplane Service failure and restart that service, causing a brief interruption of customer traffic.
When a segment is deleted, the Edge does not fully clean up the memory associated with this deleted segment. There are scenarios where the Active Edge synchronizes events to the Standby Edge by referencing such segments which results in a service failure on the Standby Edge as these segments are not present.
Fixed Issue 70349: In rare instances NAT'd traffic stops working on a VMware SD-WAN Gateway. Restarting the Gateway's service will clear the condition.
Due to a race condition on the Gateway's dataplane process, the Gateway may fail to establish a connection with natd (the daemon in the Gateway that manages NAT allocations) and so the Gateway will not be able to allocate NAT entries. This will cause all flows NAT'd via the Gateway to fail.
Fixed Issue 70416: VMware SD-WAN Gateway may show a high CPU load that results in latency and packet loss for Edges using it as their Primary Gateway.
This issue is caused by the Gateway's fast path threads (IKE, VCMP Data, etc.) spending between 15-20% of their cycles doing InetNtop operations. The fix for this issue removes InetNtop operations and replaces them with a more efficient data formatting process.
Fixed Issue 70438: Customers that have traffic which relies on NAT may experience disruptions in this traffic when a VMware SD-WAN Gateway is upgraded to Release 4.3.0, or restarts while running 4.3.0.
On Gateway startup there may be a race condition that deletes the NAT entry from "gwd_cloud_read" while the NAT entry could already be cached in the flow (fc->nat_tup) in the SEND_TO_WAN direction. Even though the NAT entry is deleted, the flow will still use the cached nat_tup for applying NAT. Meanwhile another flow can get the same source port allocated with a new NAT entry.
Fixed Issue 70590: An attempt to generate a diagnostic bundle on a VMware SD-WAN Gateway using Release 4.3.0 may fail.
Diagnostic bundles generated on a Gateway running Release 4.3.0 fail due to the diagnostic bundle exceeding the size limit configured on the Orchestrator. The excessive size of the diagnostic bundle is caused by audit logs that grow large over time.
Without this fix, the only way to successfully generate a diagnostic bundle on a Gateway is for an Operator User to log into the Gateway and, prior to triggering the diagnostic bundle on the Orchestrator, the Operator User needs to delete the audit log files that are under the Gateway's /var/log/audit directory.
Fixed Issue 70789: Customer may experience random drops in traffic due to IPSec Anti-Replay detection.
If either a VMware SD-WAN Edge or VMware SD-WAN Gateway receives two packets which each update the cache entry sequence number, then it is possible that the first packet will update the replay window incorrectly, which may trigger IPsec Anti-Replay detection which would cause the IPsec packet to drop.
Fixed Issue 70855: A VMware SD-WAN Gateway may drop traffic originating from the VMware SD-WAN Orchestrator and as a result may cause any Gateway configuration update from the Orchestrator to fail.
When the Gateway load is high (for example, ~1.6 million flows on the Gateway with a NAT object count of ~800K), the number of packet buffers in the system will be depleted and this can sometimes cause the Orchestrator traffic to be dropped on the Gateway. Once the Gateway enters this state, the Orchestrator traffic is always dropped even if packet buffers become available.
Without the fix, the only remediation of this issue is to restart the Gateway.
Fixed Issue 70954: VMware SD-WAN Edge may experience multiple Dataplane Service Failures if they the Edge has a Business Policy configured with a mandatory link to a Zscaler Cloud Security Service (CSS) and the interface for that mandatory link fails.
The Edge should be dropping the traffic to Zscaler when the mandatory interface drops, versus suffering a service failure.
Fixed Issue 71052: When the number of customer enterprises connecting to a VMware SD-WAN Gateway is greater than 285, the Gateway experiences a Dataplane Service failure. Beginning with Gateway Release 4.3.0, the Gateway's ability to monitor customers was enhanced by adding new counters to track the packet and flow related information at the customer enterprise level. The issue is that the number of counters initialized for the customer enterprises exhausts after 285 customer enterprises, and the counter initialization for any further new customer will fail, causing the Gateway's Dataplane Service to fail and forcing a service restart.
Beginning with Gateway Release 4.3.0, the Gateway's ability to monitor customers was enhanced by adding new counters to track the packet and flow related information at the customer enterprise level. The issue is that the number of counters initialized for the customer enterprises exhausts after 285 customer enterprises, and the counter initialization for any further new customer will fail, causing the Gateway's Dataplane Service to fail and forcing a service restart.
71513: A user looking at the Gateways > Monitor tab on a VMware SD-WAN Orchestrator UI would observe that the Handoff Queue Drops always show a value of 0 if looking at a VMware SD-WAN Gateway using Release 4.3.0.
Gateways running Release 4.3.0 or above do not report handoff queue drops to the Orchestrator due to incorrect formatting and this blocks the Operator from getting a clear picture of this particular troubleshooting data point.
Fixed Issue 71977: When a user enables VMware Edge Network Intelligence on a VMware SD-WAN Edge, the Edge experiences a Dataplane Service failure and generates a core.
This issue occurs if the number of sessions created dynamically in the Edge exceeds the maximum allowed limit.
Fixed Issue 72100: For a site using an Enhanced High Availability topology where a pair of VMware SD-WAN Edge 610's are used for that deployment, tunnels do not establish via the WAN link on the Standby Edge 610.
The issue is seen only in case of Edge Model 610 where tunnels are not being established via the Standby Edge. Prior to enabling HA, if the VLAN on GE1 interface does not have an IP address then the SD-WAN service does not map the hardware switch port to GE1 once HA is enabled. As a result the packets are dropped on the Standby Edge.
Without the fix, the workaround for this issue is to add an IP address to the VLAN so that the interface GE1 is mapped to a port on the hardware switch.
Fixed Issue 72567: If a user deliberately configures asymmetric routing via an underlay (MPLS with no WAN-overlay) in the forward path through a fixed_link Business Policy and then via an overlay (for example, a Hub Edge) in the reverse path, the flow will break and traffic for that flow will not be successful.
This is a case of asymmetric routing of cloud flows by design in which the Edge has a business policy to force flows out of the underlay (INTF_ROUTED). The remote Edge routes the response back via a Hub Edge (overlay). The remote Edge would see the flow as a new, locally initiated flow, and sends a QoS synchronization to update the flow routing parameters, which leads to a break in the flow routing. In the fix, the QoS synchronization is rejected to prevent the configured link_mode from getting overwritten.
Fixed Issue 72688: VMware SD-WAN Edge randomly restarts its Dataplane Service with resulting service interruptions due to the restarts.
Packets pinned to a decryption thread, when they float to another decryption thread are rejected by the non-owning thread. In the process of rejecting packets, the associated QAT crypto reference was incorrectly released leading to exceptions in the Dataplane Service and a failure and restart.
Fixed Issue 72703: A VMware SD-WAN Edge routed interface configured with a sub-interface allows flows whose routes belong to the sub-interfaces of the interface.
A flow matching a route belonging to a sub-interface is successfully allowed on its parent interface whose reverse path forwarding (RPF) mode is set to 'specific'. The root cause is due to the source_route lookup ignoring VLAN in the match set.
Fixed Issue 72718: On a customer site using a Non SD-WAN Destination (NSD) via Edge, where Internet Backhaul is configured to route traffic through the NSD via Edge, the traffic is failing for destinations using the Backhaul rule.
Backhauling to an NSD via Edge is failing because of an issue stemming from not overwriting the Destination ID. Earlier traffic that was not being backhauled to the NSD matches a default cloud route whose destination is the Gateway. The SD-WAN service fails to overwrite the older cloud route destination with the newer NSD logical ID and thus the traffic fails when matching the backhaul rule.
Fixed Issue 72939: A user may observe a VMware SD-WAN Gateway's stale flow count increasing steadily until the count reaches the maximum flows supported for that Gateway specifications and thus no longer create new flows which disrupts users connected to that Gateway.
The increasing stale flow count is connected to sites sending traffic from a Non SD-WAN Destination to a Partner Gateway handoff. There are two functions that have been added "gw_nvs_to_pg_pkt" & "gw_send_nvs_pkt", where flow counts are not being released after the lookup.
Without the fix, the only remediation is to restart the Gateway service.
Fixed Issue 73251: Users who need to authenticate via RADIUS may find they are unable to authenticate because fragmented traffic is failing to be sent from the VMware SD-WAN Edge.
RADIUS traffic is always fragmented and this issue impacts users trying to authenticate on a wireless link even more so. When this issue occurs, the fragmented packet count gets beyond what DPDK can handle on the affected particular interface. The fix proactively resets the fragmentation count to avoid traffic disruption.
Fixed Issue 73320: When a VMware SD-WAN Edge interface is configured with a DHCP address type and when the IP address assigned to the interface is updated, some of the direct flows can fail due to NAT failure.
When the DHCP lease expires, the IP address is removed for the interface. Before a new IP is assigned to the interface, NAT entries for any new flows during this transient time will be created with "0.0.0.0" as the source NAT IP and the packets will be dropped. Once a valid IP address is assigned to the interface, the existing NAT entry is not deleted and a new NAT entry is not created with a valid IP causing the traffic to be dropped.
Fixed Issue 73987: A VMware SD-WAN Edge may experience a Dataplane Service Failure when attempting to generate a diagnostic bundle.
The key part of the diagnostic bundle that is causing the issue is when the Edge needs the logs for "vcdgbdump -r remote_routes". In this scenario, when there are a large number of remote routes, traffic is running which is using those routes, and a diagnostic bundle collection is triggered, there is the potential for lock contention in such cases.
Fixed Issue 74313: For a customer site using an LTE model of a VMware SD-WAN Edge (in other words, the Edge 510-LTE or Edge 610-LTE), the Edge loses communication to the VMware SD-WAN Orchestrator post-reboot of the Edge if the Edge only has an LTE WAN link.
The connectivity between the Orchestrator and the Edge is lost, as the default route reachability via LTE interface becomes false after an Edge reboot. As a result the Edge will be marked as down by the Orchestrator even though customer traffic is still passing on the LTE WAN link.
Without the fix the only way to recover Edge reachability is to perform an Edge service restart.
Fixed Issue 74482: For a customer using a Hub-Spoke network topology, the route from a VMware SD-WAN Spoke Edge is not learned on a Hub Edge if that Hub Edge is also a Spoke for a different Hub Edge.
For this issue to hit, the customer would also have Edge to Edge via Hub configured. When customer deployments have an Edge that is deployed as a Hub for certain Spokes and as a Spoke for certain Hubs, such Edges do not receive the Spoke routes from the VMware SD-WAN Gateway. This issue has a major impact on a network with this topology because of the missing routes.
Without this fix, the only way to prevent this issue is to disable Edge to Edge via Hub on the Spoke Edges.
Fixed Issue 75309: A user may observe multicast traffic drop for a specific group on a VMware SD-WAN Spoke Edge.
Multicast traffic for a specific group is not received at the Spoke Edge when traffic is sourced from a Hub Edge. IGMP groups are not populated in the IGMP group table of the Spoke Edge which leads to traffic loss. This can be observed in the Spoke Edge's diagnostic bundle logs for igmp.
Fixed Issue 75968: When a user configures a local IP address hand-off for a VMware SD-WAN Gateway, and then removes that handoff, the handoff is not removed from the Gateway which causes the tunnel path to go down.
This issue can cause major disruptions since the Gateway cannot build a new tunnel.
Without the fix, the user would need to restart the Gateway to clear the issue.
Fixed Issue 76726: A VMware SD-WAN Gateway may experience a Dataplane Service Failure and generate a core.
The cause of the Gateway service failure and core is that management protocol ctrl packet exchanges between an Edge and the Gateway, the Gateway may fail because of the asserts and this would cause an outage for multiple customers. This issue is fixed by removing asserts and having proper error handling.
Orchestrator version R431-20220715-GA was released on 07-22-2022 and is the 3rd Orchestrator rollup for Release 4.3.1.
This Orchestrator rollup build addresses the below critical issues since the 2nd Orchestrator rollup, version R431-20220429-GA.
Fixed Issue 88796: When deploying either a VMware SASE Orchestrator or a VMware SD-WAN Gateway and using an OVA on vSphere, the OVF properties set as part of the deployment (password, network information, etc.) are not applied to the image and the system cannot be accessed after deployment.
This only affects a new system deployed from an OVA using OVF/vApp properties (versus using ISO files). This issue is caused by upstream changes to cloud-init in recent updates.
On an Orchestrator that did not include the fix, the Operator would need to deploy the system using a cloud-init user-data ISO file.
Note: This entry tracks the Orchestrator OVA only. The fix for the Gateway OVA is tracked with the same ticket #88796 but is listed under the Edge/Gateway Resolved Issues section for Gateway build R431-20220518-GA.
Fixed Issue 76016: When a Partner Gateway hand-off is set after upgrading a VMware SD-WAN Orchestrator to Release 4.3.0 or later, any subcomponent inside the hand-off, like BGP/BFD/static routes are not allowed to be deleted from the Orchestrator UI or API by a Partner administrator.
This issue occurred due to a regression in logic for handling subcomponent configuration for deletion on the Orchestrator's backend. Previously a fix was added for mixed Gateway Pools (Cloud + Partner Gateways) where if a Partner administrator modified a Partner Gateway, it does not affect the Cloud Gateway hand-off configuration. However this fix did not account for cases where deletion of subcomponents is involved.
On an Orchestrator that does not include this fix there are two workarounds:
The Partner could request an Operator user (for example, VMware Support) to perform the deletion without any problems as this issue only affects Partner users.
The Partner user could remove the entire hand-off for the Partner Gateway and recreate it with the revised configuration.
Orchestrator version R431-20220429-GA was released on 05-05-2022 and is the 2nd Orchestrator rollup for Release 4.3.1.
This Orchestrator rollup build addresses the below critical issues since the 1st Orchestrator rollup, version R431-20220222-GA.
Fixed Issue 84969: When a VMware SD-WAN Edge running a 4.2.x Release which is also configured with an overridden non-default Management IP is upgraded to Release 4.3.x or higher on a VMware SD-WAN Orchestrator running 4.3.x or higher, the Edge may lose the configured overridden Management IP.
An Orchestrator running 4.3.x or higher is not automatically creating the loopback interface while also retaining the overridden non-default Management IP for an Edge, when that Edge is upgraded from 4.2.x to a 4.3.x or later build.
Fixed Issue 84152: When a customer generates a Top Talkers report for their enterprise, the Top Talker names may be listed as 'Unknown'.
"Top Talkers" are the top sources from all the flows in a given time range. The Top Talker name may not show if the client device is not present for the (Source IP + MAC Address) unique pair. This happens because the client devices are saved based on which Visibility Mode (IP Address or MAC Address) is configured for the VMware SD-WAN Edge. For example, an Orchestrator may save a device for (IP Address 1, MAC Address 1) and then the (IP Address 2, MAC Address 2) record is not be saved if Visibility Mode is set to IP Address. This would lead to the Top Talker corresponding to IP Address 2/MAC Address1 being marked as 'Unknown'.
Orchestrator version R431-20220222-GA was released on 02-23-2022 and is the 1st Orchestrator rollup for Release 4.3.1.
This Orchestrator rollup build remediates Apache Log4j vulnerabilities CVE-2021-44228 (which was first addressed in Orchestrator build R431-20211217-GA with Log4j version 2.16.0) and CVE-2021-45046, by updating to Log4j version 2.17.0. For updated information on the Apache Log4j vulnerabilities and their impact on VMware products, please consult the VMware Security Advisory VMSA-2021-0028.9
In addition, this Orchestrator rollup build also addresses the following critical issue since the original Orchestrator GA build, version R431-20211217-GA.
Fixed Issue 81498: When a VMware SD-WAN Edge running a 4.2.x Release which is also configured with an overridden non-default Management IP is upgraded to Release 4.3.x or higher on a VMware SD-WAN Orchestrator running 4.3.x or higher, the Edge may lose the configured overridden Management IP.
The Orchestrator running 4.3.x or higher is not automatically creating the loopback interface while also retaining the overridden non-default Management IP for an Edge when the Edge is upgraded from 4.2.x to a 4.3.x or later build.
Fixed Issue 80613: On a VMware SD-WAN Orchestrator configured for Disaster Recovery (DR), replication may fail between the Active and Standby Orchestrators with the user observing the Copying DB status as 'Failed' on the Orchestrator UI.
Since MySQL version 8.0.26, master-data option has been deprecated in mysqldump command. This has been replaced with source-data option.This issue is encountered because the Orchestrator DR process uses mysqldump command with master-data option. However, with the upgrade of MySQL to latest version, this option no longer works and hence breaks the DR. To address this problem, the Orchestrator with this fix uses source-data option instead of master-data option for mysqldump command during the DR process.
Fixed Issue 76036: Attempting to access either 'Partner Overview' page and/or a 'Configure > Customer' page for that Partner on a VMware SASE Orchestrator fails to load with an "An unexpected error has occurred" message.
The Partner Overview page and/or a Configure > Customer page for a customer supported by that partner may fail to load because the `enterpriseProxy /getEnterpriseProxyGatewayPools` API times out. The trigger for these pages not loading is if the they include a large number of Gateway pools and Gateways which may lead to the enterpriseProxy /getEnterpriseProxyGatewayPools API used on the page timing out, and causing the page loading issue for each UI page.
Orchestrator version R431-20211217-GA was released on 12-20-2021. This Orchestrator build remediates CVE-2021-44228, the Apache Log4j vulnerability, by updating to Log4j version 2.16.0. For more information on the Apache Log4j vulnerability, please consult the VMware Security Advisory VMSA-2021-0028.5.
The below issues have been resolved since Orchestrator version R430-20211112-GA.
Fixed Issue 77116: The VMware SD-WAN Orchestrator retains logs for less than 12 months.
Currently the Orchestrator retains logs for 6 months but some customers need at least 12 months of logging history.
Fixed Issue 75879: In some instances, the VMware SD-WAN Orchestrator does not reply to a VMware SD-WAN Edge's routing events and this results in an increased load on the Orchestrator.
When an Orchestrator does not reply to an Edge's routing events, the Edge retries the routes continuously causing the increase in load on the Orchestrator. This is the permanent fix for this issue.
Fixed Issue 75656: In some instances, the VMware SD-WAN Orchestrator after processing a VMware SD-WAN Edge's routing events replies with an empty ack (which as bad as a non reply) and this results in increased load on the Orchestrator.
When an Orchestrator sends an empty ack to an Edge's routing events, the Edge retries the routes continuously causing the increase in load on the Orchestrator.
Fixed Issue 74450: VMware SD-WAN Orchestrator does not export watermark counters to an external application like Telegraf or Prometheus.
This ticket is linked to Issue #74446 where that fix creates watermark counters for Gateway Handoff Queue Drops. This ticket ensures those metrics can be collected by an application like Telegraf or Prometheus.
Fixed Issue 74446: Gateway Handoff Queue Drop counters are not serving the purpose of identifying traffic spikes when observed on the VMware SD-WAN Orchestrator UI.
Gateway Handoff Queue Drop is not sufficiently granular to identify traffic spikes. The fix for this adds new watermark counters: wmark_1min and wmark_5mins which will give the maximum depth of a handoff queue within 1 minute and 5 minutes respectively.
Fixed Issue 72841: VMware SD-WAN Orchestrator is inconsistent in generating "Gateway up" events. Whenever the Gateway state is changed from OFFLINE to CONNECTED, "Gateway up" the event is not logged in every instance.
The issue occurs inconsistently but when it does happen there are more than 2 Gateways and for both the Gateways the updateStateChange is triggered at the same time. The more Gateways available which all experience a state change at the same time the more likely the issue will occur.
Fixed Issue 71399: Or a VMware SD-WAN Orchestrator deployed in a Disaster Recovery (DR) configuration, the Operator User may observe that the Standby Orchestrator has failed to synchronize with the Active Orchestrator.
On the Orchestrator UI under the Replication page, a user would observe all Sync activities as failed under the Activity Monitor. The DR synchronization failure happens on initial handshake where the Active Orchestrator fails to copy the configuration database to the Standby Orchestrator.
Fixed Issue 70018: A VMware SD-WAN Orchestrator running Release 4.3.0 or higher may not be able to form a Disaster Recovery pair.
The root cause prevents the Orchestrator from getting free disk space size necessary to do DR and the DR pairing may fail as a result.
Fixed Issue 69534: For a customer using VMware Edge Network Intelligence, when a user clicks anywhere on the main Monitoring page of the customer in the Orchestrator's New UI, the links for 'Application Analytics' and 'Branch Analytics' disappear.
If the user reloads the page, those links will reappear until the user clicks anywhere on the same page and then they are gone once again. Because of this the user will not be able to view "Application Analytics" and "Branch Analytics" monitoring data unless they know to reload the page.
Fixed Issue 69514: Report generation may fail with error "Failed to update blob data for PDF".
When the user tries to create an offline report, the report generation fails with an Internal error. This was happening due to a wrong version for the chart generation library.
Fixed Issue 69273: On a VMware SD-WAN Orchestrator using Release 4.3.0, when a customer has deployed a Cloud Security Service (CSS) and looks at the Monitor > Network Services page, the user would observe that the CSS public IP is truncated and is not showing the full IP when hovering over the status bubble.
The width of the Public IP is not enough to show the full IP address, for example: 255.255.255.255. The Orchestrator's UI parameter displays an undefined error when showing the overall status.
Fixed Issue 69196: Where a customer has a site that has configured a WAN link as a Backup and also uses a Cloud Security Service (CSS), a user may observe CSS Tunnel events for the Backup link.
If the user has Backup WAN links where CSS tunnels are configured, they will see events related to those links along with Active and Hot Standby links. Since the backup link is by definition inactive, these events are spurious and annoying to the customer.
Fixed Issue 69181: If a user configures a secondary fronted by IPsec Gateway in the Device Settings > Gateway Handoff section of the VMware SD-WAN Orchestrator, the IPsec tunnel is not established with the secondary fronted by IPsec Gateway.
The user would not observe the IPsec configuration applied and present when looking at debug.py --gateways.
Fixed Issue 69162: Sample test payload for a Webhook alert is missing the customer name when the test is initiated by a Partner Administrator.
When a Partner Administrator user initiates a Webhook alert test on behalf of a customer using a payload template that includes the special "customer" keyword, the VMware SD-WAN Orchestrator incorrectly substitutes an empty string for the customer name.
Fixed Issue 69046: On a VMware SD-WAN Orchestrator, connected VMware SD-WAN Edges may not receive their routing updates and as a result continue trying to use old routes.
When Edges queue more than 250 files in a span of 15 seconds and all of these files are small, containing only one or two route events, the routing events file enqueue job is not creating jobs for the consumers to consume. As a result, file count keeps on increasing in the file processing queue and the enqueue job become long running as the file count increases to a large number. When hitting this issue, there are many pending routing events files with the count consistently increasing. Even though the Orchestrator's upload process which handles routing requests is responding with ACKs for all routing events immediately, the Edges are not receiving the ACKs. Instead "Connection timed out" messages are seen in Edge logs. This impacts not only the Edges not getting routing events but also places stress on the Orchestrator's processing.
Fixed Issue 68702: On a VMware SD-WAN Orchestrator running Release 4.3.0, configuring a user role to deny permission for either "Update Profile Device Multicast Settings" or "Update Customer Edge Settings" privilege is not enforced by the Orchestrator.
The 4.3.0 Orchestrator does not include the privileges for either "Update Profile Device Multicast Settings" or "Update Customer Edge Settings".
Fixed Issue 68531: Customer Administrators with Superuser, Standard and Enterprise Network roles are not able to view “BGP Gateway Neighbor State (for BGP on IPsec via Gateway)” on the Edge Monitoring page of a VMware SD-WAN Orchestrator using Release 4.3.0.
Release 4.5.0 added privilege for Customer Administrators with Superuser, Standard and Enterprise Network roles to view “BGP Gateway Neighbor State (for BGP on IPsec via GW)” in Edge Monitoring page but this was not included for Release 4.3.0 Orchestrators.
Fixed Issue 68387: When a user attempts to create a new operator user or customer administrator with a non-native type (in other words, a user not using a username/password but SSO or RADIUS authentication), the VMware SD-WAN Orchestrator still requires the user to input a password for the new user.
When a user tries to create a new customer or operator user, the Orchestrator UI does not clear the password field. And on the back-end, instead of first checking if user type is non-native, it first runs a password strength check and throws an error.
Fixed Issue 68321: A customer whose enterprise uses a Hub and Spoke topology, when the VMware SD-WAN Orchestrator that enterprise is using is upgraded to 4.3.0, the customer would observe a large quantity of "BGP session established to edge neighbor" events being delivered every minute.
The Hub Edges are sending out these BGP state change events several times a minute and the customer's Events page would be flooded with these events which would make it much more difficult to identify meaningful events for that customer.
Fixed Issue 67701: When configuring a Business Policy Rule, the drop down list for Object Groups cannot be seen on the VMware SD-WAN Orchestrator UI when 20+ groups are configured.
Even with 5+ Object Groups (Address Group, Port Group) configured, the Object Group drop down list appears near the bottom of the browser screen. With 20+ rules the Object Groups list is completely out of the screen, and it’s impossible to see it unless the user zooms out a lot on the browser but by then the text is so tiny as to also be unusable.
Fixed Issue 67496: A VMware SD-WAN Orchestrator upgraded to Release 4.3.0 has a minor performance regression with regard to resource utilization.
The issue would not be noticed by a customer at the enterprise level, however the Orchestrator administrator would note a ~10% increase in resource utilization after upgrading to 4.3.0. The Orchestrator performance issues were caused by several database queries. Those queries each had a very minor inefficiency which could impact Orchestrator performance on very large scale deployments (6000+ Edges). The fix addresses those issues and restores Orchestrator performance to expected or better when compared with the previous release.
Fixed Issue 67336: When a user looks at the Orchestrator's Monitoring page for a VMware SD-WAN Edge, the Transport statistics show much lower values when compared to the Application statistics for that Edge.
The issue prevents a user from getting an accurate picture of throughput for a particular Edge as the user could not know which data set is correct. The issue is the result of Transport statistics not including underlay accounting versus Application statistics which do.
Fixed Issue 67153: Alert emails are being sent out even if the VMware SD-WAN Edge came up within the configured delay interval.
The VMware SD-WAN Orchestrator sends Edge Down / Up Alerts notifications even if the events happened within the configured delay interval.
Fixed Issue 66679: A user may find a VMware SD-WAN Orchestrator portal unresponsive after it is upgraded to 4.3.0.
Post-4.3.0 Orchestrator upgrade, the backend process does not start up as expected due to a side effect of the Bastion Orchestrator feature where Redis is being used as an intermediary for ensuring bastion settings have been configured. This causes the Orchestrator backend to run into an infinite loop because it is receiving Pub/Sub messages on 'Edge' channel subscriptions.
Fixed Issue 66678: When a VMware SD-WAN Orchestrator is upgraded to Release 4.3.0, the Non SD-WAN Destination via Gateway tunnels may be torn down and not rebuilt.
This issue is caused by a validation defect introduced within a feature added for Release 4.3.0. This defect causes the VMware SD-WAN Gateway heartbeat to fail which results in the Orchestrator not pushing Gateway configurations to their respective Gateways. Since the configurations are not sent, the NSD tunnels, which are part of the Gateway configuration, are not propagated to the Gateways and the tunnels eventually go down and do not recover. This issue affects Orchestrators which have been in use for a long time and where some of the NSD peers in these Orchestrators were not associated with any segment. Since the heartbeat failure is associated with Gateways, multiple customers could face an NSD via Gateway tunnel down issue.
Fixed Issue 66639: On a VMware SD-WAN Orchestrator using 4.3.0, when a customer site using a High Availability topology and the site experiences an HA failover, the Orchestrator can process the HA events out of order resulting in inconsistent alerting and the site possibly being marked as Down.
The Orchestrator relies on events to determine a site's HA state and when there is an HA failover there is an "HA state" sent in the parameters along with an HA_GOING_ACTIVE event at the same time. If the Orchestrator process these out of order the user can observe both incorrect alerting and the HA pair being marked as down.
Fixed Issue 66631: The Migration Tool does not work when attempting to migrate large customer enterprises.
Large customer enterprise is defined as one with 100 or more Edges. The migration tool will fail at the step where is is supposed to stringify the whole data blob and write to a file. When doing the configuration export, the migration tool was using JSON.stringify to stringify the output data and write it to the file, which will fail when the configuration is huge.
Fixed Issue 66597: On a VMWare SD-WAN Orchestrator where there is a customer with a very large number of Edges deployed, when adding multiple VMware SD-WAN Gateways to a Gateway Pool that customer is using, a large number of Edges may show as down on the Orchestrator.
This issue was observed in the field with a customer who had ~7000 Edges connected to the Orchestrator. When there is a change in the Gateway Pool for that customer, the Orchestrator needs to push configuration changes to all the Edges and the control plane recalculations for more than 700+ edges in a 30 second window causes heartbeats/statistic pushes to fail with 'POOL_ENQUEUELIMIT' error. Because of heartbeat failures, the Edges show as down on the Orchestrator.
Fixed Issue 66177: Some Partner users are unable to see Path Statistics on the VMware SD-WAN Orchestrator.
This affects Partner administrators who are assigned role IDs 5, 6, or 8. Those roles translate as follows: 5 = IT Specialist; 6 = Superuser; 8 = Customer Support. The reason for this issue is that view path stats is a delegated privilege. The complete fix for this is addressed in releases 4.4.x and above.
In 4.3.1 the issue is corrected for Role ID 6 (Superuser), though not for 5 or 8.
Fixed Issue 66011: The VMWare SD-WAN Orchestrator Portal API method linkQualityEvent/getLinkQualityEvents non-deterministically operates in "verbose" mode.
The linkQualityEvent/getLinkQualityEvents Orchestrator API method supports an "individualScores" option that permits clients to optionally request more detailed information on link QoE in the API response. This optional method produces detailed per-timeseries-sample information (which is slow and performance-intensive to produce) in the result. The default value of this parameter is false, to avoid this performance impact. However, here the server non-deterministically reports this information even in cases where the client has not specifically requested it with a resulting impact in performance.
Fixed Issue 65967: For a customer performing an upgrade of an on-premise VMware SD-WAN Orchestrator to Release 4.3.0, the Orchestrator services may not come up after the upgrade is complete and the Orchestrator will appear to be down.
This issue is the result of an upgrade script which is not able to handle invalid data sent by certain versions of VMware SD-WAN Edges. Some of the internal services in the Orchestrator cannot start properly and restarting the portal and upload services does not clear up the issue. The fix for this issue includes a patch which is refactored to skip invalid configurations and log an Operator Event with all IDs, so that the Operator can fix them later.
Fixed Issue 65760: When an Operator User is looking at the Orchestrator Diagnostics page of the VMware SD-WAN Orchestrator UI, the user will observe that the Database Storage Info section is missing several data groups.
The following sections are missing from Database Storage: Database Process List; Database Status Variable; Database System Variable; Database Engine Status.
Fixed Issue 65558: A customer deployed on a VMware SD-WAN Orchestrator using Release 4.3.0 is not able to configure Syslog where the source interface is a VLAN.
When configuring Syslog with a source interface VLAN the attempt to save will result in the error "Syslog source interface VLAN-xxx does not exist on Segment <segment name>" on the Orchestrator UI.
Fixed Issue 65253: When configuring a Firewall Rule, the drop down list for Object Groups is unusable on the VMware SD-WAN Orchestrator UI when 20+ groups are configured.
Even with 5+ Object Groups (Address Group, Port Group) configured, the Object Group drop down list appears near the bottom of the browser screen. With 20+ rules the Object Groups list is completely out of the screen, and it’s impossible to see it unless the user zooms out a lot on the browser but by then the text is so tiny as to also be unusable.
Fixed Issue 64716: User is unable to generate reports on the VMware SD-WAN Orchestrator.
When the user attempts to generate a report it will fail and with 'Error' displayed in the Status column of the Reports page. The issue was introduced when one of the dependent packages was updated, and the updated package introduced a defect that caused all generated reports to fail.
Fixed Issue 64039: In some cases, a customer may observe their DHCP server as inactive.
The issue can be observed in the following scenario: after providing values to addressing type, enable the DHCP server and give values and click on the Update button. If the user opens the subinterface popup, they would observe the DHCP server showing as inactive with all the fields under DHCP server hidden.
Fixed Issue 63694: For a customer enterprise with a Hub and Spoke topology where their VMware SD-WAN Edge are running Release 4.2.x or earlier and are connected to a VMware SD-WAN Gateway running 4.3.0 or later, the Edges do not install the proper route order which results in traffic disruptions.
The VMware SD-WAN Orchestrator is not properly handling uplink routes from a pre-4.2.x Edge on the one hand, and a post-4.2.x Edge on the other. Under their respective releases, the Edges behave differently in the way they send the uplink flag to the Orchestrator and this causes the Orchestrator to send the wrong route order to a 4.2.x or earlier Edge.
Fixed Issue 63622: The VMware SD-WAN Orchestrator UI does not log a Gateway Deleted event when a user deletes a VMware SD-WAN Gateway.
The Orchestrator should log an Event at both the Operator level and the Partner level (for Partner Gateways so deleted), and yet is not doing so.
Fixed Issue 63556: User has the option to add more than one TACAC server on the VMware SD-WAN Orchestrator UI.
While the user can add more than one TACAC server, this is not a valid configuration. The reason is that if the first TACAC server fails, the second TACAC server is not going to take over in any case. The fix removes the option for adding more than one TACAC server.
Fixed Issue 63518: For a customer enterprise using Release 3.3.x which upgrades to Release 4.3.0, a VMware SD-WAN Gateway connected to an Edge on this customer may not advertise learned BGP routes.
This issue is caused by the VMware SD-WAN Orchestrator which fails to send acknowledgements to the learned routes and so instead of replying with advertise - true, and the corresponding cost, the Orchestrator sends the Gateway advertise - false and the Gateway does not advertise the route.
Fixed Issue 62958: For a VMware SD-WAN Edge where Zscaler Cloud Security Service (CSS) IPsec automation is enabled, when its public WAN link IP address changes, the Edge may send an invalid value for the public IP as a transient state.
The validation for a public IP does not work as expected and may cause an Zscaler API error due to invalid parameters. This issue happens when the Edge's public WAN link enters some short transient state. Usually, this should not have any impact on existing CSS tunnels from the public WAN link.
Fixed Issue 62624: When a user attempts to uncheck the Partner Gateway box on Gateways > Overview page of the VMware SD-WAN Orchestrator UI, an error pops up which displays a Profile name only, with no indication which Customer owns the Profile.
This is a significant issue if needing to change the status of a VMware SD-WAN Gateway since a user cannot know which customer(s) are using this Gateway since all the user can see is the Profile, which effectively means nothing without the Customer connected to it.
Fixed Issue 62575: The VMware SD-WAN Edge does not honor expected Cloud Security Service (CSS) or Non-SD-WAN Destination site configuration for non-global segments when those capabilities are enabled via an Edge-specific override.
In some uncommon configuration scenarios (for example, in one case where Cloud Security Service was enabled exclusively on a non-global segment via an Edge-specific override), the Orchestrator incorrectly computed the Edge control plane configuration for non-global segments.
Fixed Issue 62355: User is unable to configure BGP options for a Non SD-WAN Destination with a Palo Alto Networks type.
A prior ticket removed the ability to configure BGP from Non SD-WAN Destinations (NSD) that did not support BGP. However, an NSD with a Palo Alto Networks type does support BGP and the ability to configure BGP was inadvertently removed from this type as well. The fix here restores those BGP configuration fields to the Palo Alto Networks type.
Fixed Issue 62058: The VMware SD-WAN Orchestrator displays WLAN interfaces for VMware SD-WAN Edge 510-N and 6x0-N models even though these models are not equipped with Wi-Fi.
The Orchestrator UI should hide WLAN interfaces for 510-N and 6X0-N Edge models. An Edge model with a '-N' designator indicates it is built without a Wi-Fi chip populated on board. When these Edge models are activated, the model number should be used by the Orchestrator to hide WLAN interfaces. The impact to a customer is minimal as while the WLAN interfaces show, any attempt to configure them will be ignored by the Orchestrator.
Fixed Issue 59434: When a user navigates away from the Configure > Edge > Device page on the VMware SD-WAN Orchestrator UI, the web page shows a navigation pop-up that reads "The changes you made will be lost if you navigate away from this page" even though the user had made no changes on the Device Settings page.
The Orchestrator has data that shows a change has been made even though no change has been made, so the popup appears asking customers to save. This is a result of the wrong object being compared to the existing object to check the changes. Because of a wrong comparison, data was considered as modified. The fix replaces the wrong object with the correct object to compare and thus ensure no false requests to save.
Fixed Issue 58070: A user is not able to filter an OFC subnet based on segments when using the OFC page on the VMware SD-WAN Orchestrator UI.
The OFC subnet search filter does not work with a segment, so if a user learns a prefix and views on the OFC page and then tries search the learned prefix using a search option with a segment, the search does not give a result.
Fixed Issue 53751: Connected route validation in static route settings fails in a VLAN without cidrIp and cidrPrefix.
The issue occurs when the user create a VLAN without a cidrIp and cidrPrefix.
Fixed Issue 48791: User is unable to switch a VMware SD-WAN Edge between Profiles when the Edge has an interface configured using Edge Override.
For example, if a customer two Configuration Profiles: Profile 1 and Profile 2 and associates an Edge with Profile 1. If the user then uses Edge Override to configure GE2 to routed and adds a static route for GE2, when the user later tries to assign this same Edge to Profile 2, the user will observe an error that GE2 does not exist on Profile 2 as routed. This issue occurs because when a user configures an Edge interface using Edge Override that belongs to a profile, the VMware SD-WAN Orchestrator is unable to switch because the Orchestrator is not validating the Edge Override presence.
Fixed Issue 48706: Users may not be able to save changes on the Configure > Edge > Device tab with the source interface selected under the Syslog configuration.
The error the user would see on the VMware SD-WAN Orchestrator is "Provided source interface is not present in the segment on segment: <Segment Name>." The is caused by the user creating and deleting a number of segments in such a way the segment sequence is no longer sequential.
Fixed Issue 45078: When configuring a VNF for a customer on the VMware SD-WAN Orchestrator, if a VNF state is configured at the Profile level one way, and then configured a different way at the site level using Edge Override, when Edge Override is later disabled, the site continues to use the Edge Override settings and does not revert back to the Profile settings as expected.
This issue occurs when configuring a VNF Insertion parameter on a Configuration Profile where the opposite setting is configured for a site using Edge Override and later Edge Override is itself disabled, but the setting persists.
Open Issues in Release 4.3.1
The known issues are grouped as follows:
Issue 14655:
Plugging or unplugging an SFP adapter may cause the device to stop responding on the Edge 540, Edge 840, and Edge 1000 and require a physical reboot.
Workaround: The Edge must be physically rebooted. This may be done either on the Orchestrator using Remote Actions > Reboot Edge, or by power-cycling the Edge.
Issue 25504:
Static route costs greater than 255 may result in unpredictable route ordering.
Workaround: Use a route cost between 0 and 255.
Issue 25742:
Underlay accounted traffic is capped at a maximum of the capacity towards the VMware SD-WAN Gateway, even if that is less than the capacity of a private WAN link which is not connected to the Gateway.
Issue 25758:
USB WAN links may not update properly when switched from one USB port to another until the VMware SD-WAN Edge is rebooted.
Workaround: Reboot the Edge after moving USB WAN links from one port to another.
Issue 25855:
A large configuration update on the Partner Gateway (e.g. 200 BGP-enabled VRFs) may cause latency to increase for approximately 2-3 seconds for some traffic via the VMware SD-WAN Gateway.
Workaround: No workaround available.
Issue 25921:
VMware SD-WAN Hub High Availability failover takes longer than expected (up to 15 seconds) when there are three thousand branch Edges connected to the Hub.
Issue 25997:
The VMware SD-WAN Edge may require a reboot to properly pass traffic on a routed interface that has been converted to a switched port.
Workaround: Reboot the Edge after making the configuration change.
Issue 26421:
The primary Partner Gateway for any branch site must also be assigned to a VMware SD-WAN Hub cluster for tunnels to the cluster to be established.
Issue 28175:
Business Policy NAT fails when the NAT IP overlaps with the VMware SD-WAN Gateway interface IP.
Issue 31210:
VRRP: ARP is not resolved in the LAN client for the VRRP virtual IP address when the VMware SD-WAN Edge is primary with a non-global CDE segment running on the LAN interface.
Issue 32731:
Conditional default routes advertised via OSPF may not be withdrawn properly when the route is turned off. Re-enabling and disabling the route will retract it successfully.
Issue 32960:
Interface “Autonegotiation” and “Speed” status might be displayed incorrectly on the Local Web UI for activated VMware SD-WAN Edges.
Issue 32981:
Hard-coding speed and duplex on a DPDK-enabled port may require a VMware SD-WAN Edge reboot for the configurations to take effect as it requires disabling DPDK.
Issue 35778:
When there are multiple user-defined WAN links on a single interface, only one of those WAN links can have a GRE tunnel to Zscaler.
Workaround: Use a different interface for each WAN link that needs to build GRE tunnels to Zscaler.
Issue 35807:
A DPDK routed interface will be disabled completely if the interface is disabled and re-enabled from the VMware SD-WAN Orchestrator.
Issue 36923:
Cluster name may not be updated properly in the NetFlow interface description for a VMware SD-WAN Edge which is connected to that Cluster as its Hub.
Issue 38682:
A VMware SD-WAN Edge acting as a DHCP server on a DPDK-enabled interface may not properly generate “New Client Device" events for all connected clients.
Issue 38767:
When a WAN overlay that has GRE tunnels to Zscaler configured is changed from auto-detect to user-defined, stale tunnels may remain until the next restart.
Workaround: Restart the Edge to clear the stale tunnel.
Issue 39134:
The System health statistic “CPU Percentage” may not be reported correctly on Monitor > Edge > System for the VMware SD-WAN Edge, and on Monitor > Gateways for the VMware SD-WAN Gateway.
Workaround: Users should use handoff queue drops for monitoring Edge capacity not CPU percentage.
Issue 39608:
The output of the Remote Diagnostic “Ping Test” may display invalid content briefly before showing the correct results.
Issue 39624:
Ping through a subinterface may fail when the parent interface is configured with PPPoE.
Issue 39753:
Disabling Dynamic Branch-to-Branch VPN may cause existing flows currently being sent using Dynamic Branch-to-Branch to stall.
Issue 40096:
If an activated VMware SD-WAN Edge 840 is rebooted, there is a chance an SFP module plugged into the Edge will stop passing traffic even though the link lights and the VMware SD-WAN Orchestrator will show the port as 'UP'.
Workaround: Unplug the SFP module and then replug it back into the port.
Issue 40421:
Traceroute is not showing the path when passing through a VMware SD-WAN Edge with an interface configured as a switched port.
Issue 42872:
Enabling Profile Isolation on a Hub profile where a Hub cluster is associated does not revoke the Hub routes from the routing information base (RIB).
Issue 43373:
When the same BGP route is learnt from multiple VMware SD-WAN Edges, if this route is moved from preferred to eligible exit in the Overlay Flow Control, the Edge is not removed from the advertising list and continues to be advertised.
Workaround: Enable distributed cost calculation on the VMware SD-WAN Orchestrator
Issue 44526: For an enterprise where two different sites deploy their VMware SD-WAN Edges as Hubs while also using a High Availability topology, and each site uses the other Hub site as a Hub in its profile. If one of the Hub sites triggers an HA failover, it may take up to 30 minutes for both Hub Edges to reestablish tunnels with each other.
On an HA failover, both Hub Edges try to initiate a tunnel with each other at the same time and neither replies to the peer, the packet exchange between both Hubs occurs, but IKE never succeeds. This leads to a deadlock that has been observed to take up to 30 minutes to resolve on its own. The issue is intermittent and does not occur after every HA failover.
Workaround: To prevent this issue from occurring, the customer should configure only one of the two HA Hub sites to use the other Hub site as a Hub for itself. For example, where there are two HA Hub sites, Hub1 and Hub2, Hub1 could have Hub2 as a Hub for itself in its profile, but Hub2 must not use Hub1 as a Hub in its profile.
Issue 45302:
In a VMware SD-WAN Hub Cluster, if one Hub loses connectivity for more than 5 minutes to all of the VMware SD-WAN Gateways common between itself and its assigned Spoke Edges, the Spokes may in rare conditions be unable to retain the hub routes after 5 minutes. The issue resolves itself when the Hub regains contact with the Gateways.
Issue 46053:
BGP preference does not get auto-corrected for overlay routes when its neighbor is changed to an uplink neighbor.
Workaround: An Edge Service Restart will correct this issue.
Issue 42278:
For a specific type of peer misconfiguration, the VMware SD-WAN Gateway may continuously send IKE init messages to a Non-SD-WAN peer. This issue does not disrupt user traffic to the Gateway; however, the Gateway logs will be filled with IKE errors and this may obscure useful log entries.
Issue 42388:
On a VMware SD-WAN Edge 540, an SFP port is not detected after disabling and reenabling the interface from the VMware SD-WAN Orchestrator.
Issue 42488:
Traffic might blackhole for the connected route for VMware SD-WAN Edge interfaces which do not have a link connected. If the link on an Edge port is removed and the interface is not disabled, the Edge does not revoke the route from the Gateway causing other Edges to forward the traffic to the Edge with no link connected.
Workaround: Disable the interface if no link is connected.
Issue 44995:
OSPF routes are not revoked from VMware SD-WAN Gateways and VMware SD-WAN Spoke Edges when the routes are withdrawn from the Hub Cluster.
Issue 45189:
With source LAN side NAT is configured, the traffic from a VMware SD-WAN Spoke Edge to a Hub Edge is allowed even without the static route configuration for the NAT subnet.
Issue 46216: On a Non SD-WAN Destinations via Gateway or Edge where the peer is an AWS instance, when the peer initiates Phase-2 re-key, the Phase-1 IKE is also deleted and forces a re-key. This means the tunnel is torn down and rebuilt, causing packet loss during the tunnel rebuild.
On a Non SD-WAN Destinations via Gateway or Edge where the peer is an AWS instance, when the peer initiates Phase-2 re-key, the Phase-1 IKE is also deleted and forces a re-key. This means the tunnel is torn down and rebuilt, causing packet loss during the tunnel rebuild.
Workaround: To avoid tunnel destruction, configure the Non SD-WAN Destinations via Gateway/Edge or CSS IPsec rekey timer to less than 60 minutes. This prevents AWS from initiating the re-key.
Issue 46391:
For a VMware SD-WAN Edge 3800, the SFP1 and SFP2 interfaces each have issues with Multi-Rate SFPs (i.e. 1/10G) and should not be used in those ports.
Workaround: Please use single rate SFP's per the KB article VMware SD-WAN Supported SFP Module List (79270). Multi-Rate SFPs may be used with SFP3 and SFP4.
Issue 47681:
When a host on the LAN side of a VMware SD-WAN Edge uses the same IP as that Edge’s WAN interface, the connection from the LAN host to the WAN does not work.
Issue 47355:
When the same route is learned via local underlay BGP, Hub BGP and/or statically configured on the Partner Gateway, the sorting order of the routes is incorrect with the Hub BGP being preferred over the underlay BGP.
Issue 48166:
A VMware SD-WAN Virtual Edge on KVM is not supported when using a Ciena virtualization OS and the Edge will experience recurring Dataplane Service Failures.
Issue 48175:
A VMware SD-WAN Edge running Release 3.4.2 will form an OSPF adjacency on a non-global segment if the non-global segment has an interface configured in the same IP range as an interface configured on the global segment.
Issue 48530:
VMware SD-WAN Edge 6x0 models do not perform autonegotiation for triple speed (10/100/1000 Mbps) copper SFP's.
Workaround: Edge 520/540 supports triple speed copper SFPs but this model has been marked for End-of-Sale by Q1 2021.
Issue 48597:
Multihop BGP neighborship does not stay up if one of the two paths to the peer goes down If there is a Multihop BGP neighborship with a peer to which there are multiple paths and one of them goes down, user will notice that the BGP neighborship goes down and does not come up using the other available path(s). This includes the Local IP-loopback neighborship case too.
Workaround: There is no workaround for this issue.
Issue 50518:
On a VMware SD-WAN Gateway where PKI is enabled, if >6000 PKI tunnels attempt to connect to the Gateway, the tunnels may not all come up because inbound SAs do not get deleted.
Note: Tunnels using pre-shared key (PSK) authentication do not have this issue.
Issue 51036:
ifSpeed reports a 0 value for operational VMware SD-WAN Edge interfaces when polling the Edge via SNMP. This is an expected behavior for DPDK enabled ports. Currently the only way to get speed values for DPDK enabled ports is by using the command "debug.py --dpdk_ports". But the SNMP module running on an Edge does not rely on this command to extract speed values for DPDK-enabled ports. SNMP only queries via the kernel interface, which unfortunately does not populate speed values for dpdk_ports.
Issue 51428:
Multicast traffic loss may be observed on a site where the VMware SD-WAN Edge has a sub-interface configured with PIM. When a sub-interface configured with PIM is moved from a segment to another on the fly, pimd (the process that manages PIM) may restart and the site would experience intermittent multicast traffic loss.
Workaround: Disable the sub-interface first, and then move the sub-interface to another segment. Once moved, re-enable the sub-interface.
Issue 52955:
DHCP decline is not sent from Edge and DHCP rebinding is not restarted after DAD failure in Stateful DHCP. If DHCPv6 server allocates an address which is detected as duplicate by the kernel during a DAD check then the DHCPv6 client does not send a decline. This will lead to traffic dropping as the interface address will be marked as DAD check failed and will not be used. This will not lead to any traffic looping in the network but traffic blackholing will be seen.
Workaround: There is no workaround for this issue.
Issue 53147:
Non default hop limit values advertised in the router advertisements are not honored on the VMware SD-WAN Edge. The hop limit value of the tunnel is always set to 64. The default value of hop limit is 64. If the desire is to have a non-default hop limit value advertised through router advertisements, the Edge does not process the hop limit fields in the packet and the values remain as 64.
Workaround: There is no workaround to this issue.
Issue 53219:
By default Hop limit in Edge is 64.
When hop limit from RA server is set to non-default values , it is not updated for tunnel.
Filing this ticket to track the limitation
edge:b2-edge1:~# edged -v VCE Info ======== Version: 4.2.0 Build rev: R420-20201119-DEV-69492a5ca4 Build Date: 2020-11-19_10-57-33 Build Hash: 69492a5ca4934d41a7ff9822f763312d014f1836
N/A
Issue 53219:
After a VMware SD-WAN Hub Cluster rebalances, a few Spoke Edges may not have their RPF interface/IIF set properly. On the affected Spoke Edges, multicast traffic will be impacted. What happens is that after a cluster rebalance, some of the Spoke Edge fail to send a PIM join.
Workaround: This issue will persist until the affected Spoke Edge has an Edge Service restart.
Issue 53337:
Packet drops may be observed with an AWS instance of a VMware SD-WAN Gateway when the throughput is above 3200 Mbps. When traffic exceeds a throughput above 3200 Mbps and a packet size of 1300 bytes, packets drops are observed at RX and at IPv4 BH handoff.
Workaround: There is no workaround for this issue.
Issue 53687:
When a VMWare SD-WAN Spoke Edge has a preference for a IPv6/v4 tunnel, the non-preferred v4/v6 tunnels MTU will influence the MTU seen in preferred tunnels too. An Edge (Spoke or Hub) maintains a system level MTU, which is the minimum of all the link MTU's and this MTU is exchanged as the advertised MTU. Since a non-preferred link's MTU can still be considered for determining the system level MTU, a lower MTU can be advertised than the actual path MTU.
Workaround: There is no workaround for this issue.
Issue 53830: On a VMware SD-WAN Edge, some of the routes in BGP view may not have the correct preference and advertise values when DCC flag is enabled causing incorrect sorting order in the Edge's FIB.
When Distributed Cost Calculation (DCC) is enabled in a scaled scenario with a large number of routes on an Edge, when looking at an Edge diagnostic bundle for the log bgp_view some of the routes may not be correctly updated with the preference and advertise values. This issue, if found at all, would be a found in a few Edges as part of a large enterprise (100+ Spoke Edges connected to either Hub Edges or Hub Clusters).
Workaround: This issue can be addressed by either relearning the underlay BGP routes or performing a "Refresh" option on the OFC page of the VMware SD-WAN Orchestrator for the affected routes. Please note that performing a "Refresh" of a route would re-learn the routes from all the Edges in the enterprise.
Issue 53934: In an enterprise where a VMware SD-WAN Hub Cluster is configured, if the primary Hub has Multihop BGP neighborships on the LAN side, the customer may experience traffic drops on a Spoke Edge when there is a LAN side failure or when BGP is disabled on all segments.
In a Hub cluster, the primary Hub has Multihop BGP neighborship with a peer device to learn routes. If the physical interface on the Hub by which BGP neighborship is established, goes down, then BGP LAN routes may not become zero despite BGP view being empty. This may cause Hub Cluster rebalancing to not happen. The issue may also be observed when BGP is disabled for all segments and when there are one or more Multihop BGP neighborships.
Workaround: Restart the Hub which had the LAN-side failure (or BGP disabled).
Issue 54099: Fragmented IPv6 packets will be dropped by a VMware SD-WAN Edge.
Any fragmented IPv6 packet is going to get dropped by an Edge.
Workaround: There is no workaround for this issue.
Issue 54378: IPv6 static address enabled interface duplicate address detection (DAD) check fails due to NA drops.
Static address DAD check will not happen and if there is a duplicate address in the network for configured static address then that will not be detected.
Workaround: There is no workaround for this issue.
Issue 54536: Duplicate address detection (DAD) check not triggered after a VMware SD-WAN Edge reboot.
If there is a duplicate address in the network, then that would not be detected if this DAD check is not performed after reboot.
Workaround: There is no workaround for this issue.
Issue 54687: DHCPv6 solicit message is not sent by a VMware SD-WAN Edge after the server is provided with a configuration of T1 values greater than T2.
If a DHCPv6 server initially provides T1 value greater than T2, then the Edge does not accept the provided prefix, but even after this configuration has been corrected on the server, the Edge will not send a DHCPv6 solicit message after attempting three times. At that point only when the Edge's Dataplane Service is restarted will the issue get cleared.
Workaround: Restart the Edge's service.
Issue 54731: Renew messages are sent with a high frequency until the rebind time (t2) is reached when a user changes the value of the IPv6 address range in the server.
When an IP address which has been assigned to a VMware SD-WAN Edge is removed from the valid range of addresses provided by a server, the client keeps sending renew messages to the server until T2 time is reached. This may result in a customer user observing a large quantity of DHCPv6 traffic.
Workaround: There is no workaround for this issue.
Issue 56454: Configure both IPv4 and IPv6 links as auto-discovered links on an interface and then have tunnels formed through the non-preferred link as well. The link stats do not display consolidated information of a IPv4 and IPv6 link.
When an interface has both IPv4 and IPv6 overlays configured as auto-discovered overlays and tunnels formed over both links, the links stats reflect only the status of the preferred link. The traffic information or the status of the non-preferred link is not correctly reflected. As a result, the statistics seen for the link on the Edge > Monitoring page which include the bandwidth and throughput should be used as a guideline to measure the performance of the tunnels formed over the preferred IP address family only.
Workaround: There is no workaround for this issue.
Issue 57957: If a DPDK interface is changed from Autonegotiate=on to Autonegotiate=off, the Edge unloads the KNI driver and loads the Linux driver for that interface during the Edge Service restart sequence (from vc_dpdk.py).
After loading up the new Linux interface and nameif'ing it, vc_dpdk.py also needs to invoke "set_interface_neg.py" to apply the auto-negotiation settings. However, because of the new auto-negotiation settings and the Linux driver is reloaded, the bare metal interface is no longer under DPDK control.
Workaround: There is no workaround for this issue.
Issue 59970: Customer will observe traffic drop from a VMware SD-WAN Edge to the datacenter through a Zscaler IaaS, when switching from a primary to a secondary Gateway.
When the primary Gateway goes down and the Orchestrator switches to the secondary Gateway, the current traffic flow reverses path from a Zscaler Enforcement Noted (ZEN) does not work.
Workaround: The workaround would be to reinitiate all traffic flow. The Zscaler is notified about this issue and they confirmed that reverse traffic path is not working properly on their side.
Issue 61882: When a security parameter configuration change is made (e.g., SA lifetime change) from the VMware SD-WAN Orchestrator, the customer may observe traffic drop for a period of time.
This has been seen on a large-scale deployment with +1000 Edges in a Hub/Spoke topology. If security parameters (lifetime, cryptography algorithms, authentication mode) are changed this will bring down current tunnels, which will then rebuild. In a large-scale deployment this can cause issues with traffic stability. The responder side (Hub Edge) may not be able to handle all the tunnels in time and this can cause a traffic drop. Eventually tunnels will establish, but it can take some time based on the existing number of tunnels.
Workaround: It is recommended to perform configuration changes in a maintenance window, since recovery time is unknown based on the existing number of tunnels.
Issue 62685: If LAN side NAT is configured with the same outside IP for different LAN subnets with NAT type as source, traffic destined for the Cloud will not work.
For the outside IP used in LAN side NAT rules, we configure a static route and advertise it to the remote branches. For the return traffic to be routed to the correct the LAN subnet, route lookup should be done based on the Inside IP configured in the LAN side NAT rule instead of the next hop in the static route. But for the return traffic from cloud, the route lookup is done based on the next hop in the static route and traffic can get routed to the wrong LAN subnet.
Workaround: Use a different Outside IP for different LAN subnets.
Issue 62701: For a VMware SD-WAN Edge deployed as part of an Edge Hub Cluster, If Cloud VPN is not enabled under the Global Segment but is enabled under a Non-Global Segment, a control plane update sent by the Orchestrator may cause all the WAN links to flap on the Hub Edge.
The Hub Edge's WAN links going down, then up in rapid succession (flap) will impact real time traffic like voice calls. This issue was observed on a customer deployment where Cloud VPN was not enabled on the Hub Edge's Global segment, but the Cluster configuration was enabled which means this Hub Edge was part of a Cluster (and a Cluster configuration is applicable to all segments). When a configuration change is pushed to the Hub Edge, the Hub Edge's dataplane will start parsing data and will start with the Global Segment where it will see Cloud VPN not enabled and the Hub Edge erroneously thinks clustering is disabled on this Global Segment. As a result, the Hub Edge will tear down all tunnels from the Hub's WAN link(s) which will cause link flaps on all that Edge's WAN links. For any such incident the WAN links only go down and recover a single time per control pane update.
The root cause of this issue remains under investigation.
Workaround: The workaround is to activate Cloud VPN on all segments, meaning the Global Segment and all Non-Global Segments.
Issue 62725: A VMware SD-WAN Edge in a network which uses BGP may experience high memory usage under certain rare conditions.
If an Edge learns a BGP route with a next hop IP address which is different from the peer IP address, the next hop will be tracked for reachability by the Edge's Next Hop Tracking (NHT) module. If BGP is then disabled when the tracked IP address is unreachable on the Edge, the tracked NHT entry may not get deleted. In rare cases where there are a lot of stale NHT entries, high Edge memory usage can be seen.
Workaround: Reboot the Edge to delete the memory leaking NHT entries.
Issue 62897: The debug command tcpdump does not work properly on a VMware SD-WAN Gateway.
Execute tcpdump command on eth0 or eth1 interface of Gateway, the output is not correct. tcpdump.sh and vctcdump are also not working. An attempted fix was attempted, an AppArmor complain profile for vctcpdump (based on the tcpdump profile) that allows tcpdump to inherit vctcpdump's confinement was added, but tcpdump still is not working. Essentially AppArmor causes stdout to stop working for tcpdump. This is a known issue with AppArmor.
Workaround: "pipe tcpdump output to cat. e.g. tcpdump -nnplei eth0 | cat".
Issue 63125: When MTU is increased for any interface/link on a VMware SD-WAN Hub Edge, it will not be reflected in the path MTU on the Spoke Edge (for the paths with that Hub Edge).
If the user increases the MTU of an interface or link on a Hub, the Spoke Edge path does not pick up the changed MTU setting.
Workaround: Reboot the Spoke Edge, the increased MTU will be reflected in the path MTU on the Spoke.
Issue 65885: For a customer who has deployed a Non SD-WAN Destination (NSD) via Gateway and is using a redundant Gateway configuration, if the Primary Gateway goes down, there is a race condition in which before the PG-BGP peership on the Primary Gateway comes up, the NSD already advertises the legacy routes learned via the Redundant Gateway to the Primary Gateway.
Since BGP multipath is not supported, the route which the Primary Gateway was supposed to learn over the handoff interface appears to be learned as the datacenter from the NSD. Though these are valid routes, this should not happen since traffic should go via the NSD > Primary Gateway > Handoff Interface when the Primary Gateway is up. The impact to the customer is not in terms of performance as the traffic still reaches the destination, but it negatively impacts a customer's network management. The customer expects traffic to come via one route and will install QoS policies to manage it, but the traffic is going via another route not expected in the QoS policy.
Workaround: The customer should filter outbound routes on the Non SD-WAN Destination via Gateway so that it does not advertise a route learned via a Redundant Gateway to a Primary Gateway.
Issue 67458: When a VMware SD-WAN Hub Edge with a large number of Spoke Edges is upgraded to Release 4.2.1 or later, some tunnels to other Spoke Edges will not come up for the Hub Edge.
A large number of Spoke Edges is understood at ~1000 or more. This issue is not consistent, but generally ~1/3rd of the VeloCloud Management Protocol (VCMP) tunnels are not established between the Hub Edge and the connected Spoke Edges. This is caused by the Hub Edge ignoring the MP_INITs as the number of half open TDs exceeds the Hub Edge's upper limit.
Workaround: Restarting the Edge Service will restore full tunnel connectivity.
Issue 72245: If a VMware SD-WAN Hub Edge is used as an internet breakout from an MPLS network, management (VCMP) tunnels from a connected Spoke Edge's private interface to any public Gateways may go down or not come up.
Management (VCMP) packets from a Spoke Edge's private interface to the public Gateway are sent via the Hub Edge. In this scenario, the Hub considers this flow as a direct flow and pushes these packets to the Internet via a public interface. However, due to routing issues, these flows can be marked as "Via Gateway" and this can impact them and cause the issue described above.
Workaround: There is no workaround for this issue.
Issue 72925: For a customer who uses SNMP polling for monitoring their enterprise and deploys lower model VMware SD-WAN Edges (for example, Edge models 510, 520, or 610) which are running a 4.x software release, SNMP polling takes exceptionally long to process and can even timeout.
This issue significantly reduces the effectiveness of SNMP polling for network monitoring when using Edges in the 510, 5x0, and 6x0 series. This issue is caused by the Release 4.x SNMPagent taking an unnecessarily long amount of time in traversing the debug command list, which is not actually required for the SNMP process.
Workaround: There is no workaround for this issue.
Issue 74149: For a customer using a Zscaler type Cloud Security Service where the L7 Health Check is enabled, if a VMware SD-WAN Edge is rebooted while a WAN link is also down, the L7 Health Check process may not send probes to the Zscaler service even after both the Edge and the WAN link(s) are fully restored.
This issue is not consistent and happens rarely even when the listed conditions are met. When the Edge is being rebooted, and L7 Health check is enabled, and if the Edge WAN interface undergoes a state transition Up/Down, during restart and initialization time, the Edge may miss sending L7 Probes.
Workaround: Without the fix, the only way to get the Edge to resume sending L7 Probes is to toggle (turn off, save changes, and then turn back on and save changes) L7 Health Check.
Issue 76292: A VMware SD-WAN Edge configured as a Spoke does not prefer an uplink route with best BGP attribute even after forming a dynamic tunnel.
A customer may be impacted by this behavior in a case where the uplink route over a dynamic tunnel was used to forward traffic. In that case it would not work and all the traffic for this route would be routed towards the Hub.
The idea behind Uplink routes was to create a special route to breakout from the Hub Edge and not to be used in any other case. These are the external routes that VMware SD-WAN does not want to advertise like other VCRP routes. These are to be advertised by Hubs to their Spokes, which means over static tunnels only.
Workaround: There is no workaround for this issue.
Issue 79220: For a site deployed with a High Availability topology, the customer may observe multiple reboots of the VMware SD-WAN Standby Edge with a potential disruption to customer traffic.
A flood of events from the Active Edge to the Standby Edge can overload some threads on the Standby Edge which would delay heartbeat processing which leads to the Standby Edge incorrectly being promoted to Active. In an Active-Active state the tie-break goes to the Active Edge and the Standby Edge is rebooted to demote it back to its proper Standby status. When this issue is encountered on a conventional HA topology the customer impact would be minimal as the Standby Edge does not pass customer traffic. On an Enhanced HA deployment, where the Standby Edge is also passing traffic, the reboot(s) would disrupt some customer traffic.
In this ticket a few optimizations are made in the Edge to process events more efficiently from Active and Standby Edges and to minimize the number of events by preventing some invalid events from getting synchronized from the Active to the Standby.
Workaround: There is no workaround for this issue.
Issue 81224: On a site deployed with a High Availability topology, when the site experiences an HA failover, the OSPF route tags may not propagate post-HA failover.
On an HA failover, OSPF external LSA's (link state advertisements) do not have a route tag, which leads to improper routing with an adverse affect on customer traffic.
Workaround: OSPF needs to be restarted on the Edges that do not receive the correct route tag.
Issue 81859: When activating a VMware SD-WAN Edge 610-LTE, the CELL interface may not come up after the Edge completes its activation.
This issue is not consistent but when it occurs it can have a major impact if the Edge 610-LTE's only public link is the mobile CELL link as the Edge would be effectively down and intervention for this Edge would need to be local in the form of someone power cycling the Edge to recover it.
Workaround: If encountering this issue and the 610-LTE has other wired public WAN links, the user would need to either restart the Edge service through the Orchestrator using Remote Actions > Service Restart in a suitable maintenance window, or restart the Edge's modem to restore the CELL interface.
If the 610-LTE only uses a CELL interface for internet, someone local to the Edge would have to power cycle the Edge as it would be inaccessible through the Orchestrator.
If the 610-LTE Edge being activated only uses CELL for internet, the Edge should be activated with someone present to potentially power cycle it should it go down after completing activation.
Issue 82104: In rare cases, VMware SD-WAN Edges activated in a High Availability topology may be unable to communicate with a VMware SASE Orchestrator which will mark the site as down and preclude any intervention through the Orchestrator to the site.
This issue occurs only when an unusual and invalid configuration is applied to the HA Edges. The configuration specifies that the HA port is configured as "trunk" (which should not be allowed), with zero VLANs (also should not be allowed), but where "all VLANs" are set. Instead of throwing an error at this configuration and preventing a user from activating HA for the Edges, the Orchestrator allows it and this configuration triggers a management plane failure on the HA Edges which no longer send a heartbeat to the Orchestrator and the Orchestrator marks the site as down.
Workaround: Avoid using the configuration outlined above.
Issue 84790: When a VMware SD-WAN Edge with any model type other than 510/510-LTE is rebooted, the Edge may erroneously report the critical event Unable to launch service wifihang
to the VMware SASE Orchestrator.
The wifihang event message is designed for use only with the Edge 510/510-LTE models and alerts a customer to a problem with that Edge model's Wi-Fi process. When this event message is observed on any other Edge model, whether that model uses Wi-Fi or not (for example: the Edge 3400), the event message is spurious and the event can be safely ignored.
Workaround: A user can safely ignore the wifihang event message on any Edge other than an Edge 510 or 510-LTE as it is spurious.
Issue 84825: For a site deployed with a High Availability topology where BGP is configured, if the site has greater than 512 BGPv4 match/set rules configured, the customer may observe the HA Edge pair continuously failing over without ever recovering.
Greater than 512 BGPv4 match and set rules is understood as a customer configuring more than 256 such rules on the inbound filter and 256 rules on the outbound filter. This issue would be disruptive to the customer as the repeated failover would cause flows for real time traffic like voice calls to be continuously dropped and then recreated. When HA Edges experience this issue, the process that synchronizes Edge CPU threads fails causing the Edge to reboot to recover, but the promoted Edge also experiences the same issue and reboots in turn with no recovery reached at the site.
Workaround: Without a fix for this issue, the customer must ensure that no more than 512 BGPv4 match and set rules are configured for an HA site.
If a site is experiencing this issue and has more than 512 BGP/v4 match and set rules configured, the customer must immediately reduce the number of rules to 512 or less to recover the site.
Alternatively, if the customer must have more than 512 BGPv4 match and set rules, they can downgrade the HA Edges to Release 3.4.6 where this issue is not encountered, but at the cost of Edge features found in later releases. This can only be done if their Edge model is supported on 3.4.6 and the customer should confirm that is so before downgrading.
Issue 85156: For a site deployed with a High Availability topology, the customer may observe multiple reboots of the VMware SD-WAN Standby Edge with a potential disruption to customer traffic.
The HA control data synchronization processing logic on the Standby Edge for data received via TCP can lead to the data getting only partially read. This can cause multiple such short messages to be processed on the Standby which can slow down the Standby node. In low-end Edge platforms (for example, Edge models 510, 520, 610, 620), this slow down can significantly impact heartbeat processing between the Active and Standby which leads to the Standby Edge incorrectly being promoted to Active. In an Active-Active state the tie-break goes to the Active Edge and the Standby Edge is rebooted to demote it back to its proper Standby status. When this issue is encountered on a conventional HA topology the customer impact would be minimal as the Standby Edge does not pass customer traffic. On an Enhanced HA deployment, where the Standby Edge is also passing traffic, the reboot(s) would disrupt some customer traffic.
This ticket adds enhancements in the Edge TCP message processing logic to improve the performance on the Standby Edge and prevent a system slowdown.
Workaround: There is no workaround for this issue.
Issue 85461: If a VMware SD-WAN Edge is used to forward DNS, and LAN devices connected to the Edge are using the Edge for DNS forwarding, all DNS traffic may fail.
All DNS forwarding traffic is affected, not just Conditional DNS. Depending on the Edge software, this issue can be encountered on an Edge as follows:
If the Edge is using Release 4.2.2, the Edge can encounter this issue if the Edge is using routed LAN ports with no Gateway IP address specified. Switched LAN ports + VLANs are not affected in 4.2.2.
If the Edge is using either Release 4.3.0/4.3.1, 4.5.0/4.5.1, or 5.0.0.x, the Edge can encounter the issue if the Edge is using switched LAN ports and VLANs, or the Edge is using routed LAN ports with no Gateway IP address specified.
For switched interfaces, the cause of the issue stems from the deprecation and removal of the Management IP interface in favor of a loopback interface in Releases 4.3.x, 4.5.x, and 5.0.0.x and later. Because DNS uses segment NAT, the DNS packet has no matching entry for the destination IP when the Edge does segment NAT table lookup and the Edge drops the packet.
For routed interfaces, the lack of a Gateway IP means the DNS packet is routed to the Edge as the next hop and the Edge does not forward the DNS packet further.
Workaround: The workaround for this issue is to either not use the Edge to forward DNS, or...
When using Edge Release 4.2.2: use either switched LAN ports or routed LAN ports that include a Gateway IP address.
When using Release 4.3.x or 4.5.x, use only routed LAN ports with a Gateway IP address specified.
Issue 86098: For a site using an Enhanced High Availability topology where a PPPoE WAN link is used on the Standby Edge, a user may observe that the default proxy route is not installed in the Active Edge and traffic using that link fails.
When an Enhanced HA Edge pair come up, the PPPoE link synchronizes with the Standby Edge and provides a default route with a next hop of 0.0.0.0. As a result this route is not installed on the Active and traffic using this link is dropped.
Workaround: There is no workaround for this issue.
Issue 87552: On a site using a Non SD-WAN Destination (NSD) via Edge, the VMware SD-WAN Edge Dataplane Service may periodically experience a failure and restart when Edge-to-NSD tunnels are unstable.
When a Edge-to-NSD tunnel tears down, the incorrect release of a previously chosen tunnel is performed that triggers an exception in the Edge Dataplane Service and a restart is required to restore the service. Restarting the Edge service will result in a 10-15 second disruption of customer traffic.
Workaround: Limiting a NSD via Edge to one WAN link will decrease the likelihood of this issue occurring.
Issue 88604: For a site using a High Availability topology, if a WAN interface goes down and then comes back up on a VMware SD-WAN Standby Edge, the event is not recorded on the VMware SASE Orchestrator.
A user does not have visibility on Standby Edge interface events, which is especially impactful on Enhanced HA deployments where the Standby Edge is also passing traffic.
Workaround: There is no workaround for this issue.
Issue 89217: A VMware SD-WAN Edge in the 6x0 model line (610, 610N, 610-LTE, 620, 620N, 640, 640N, 680, 680N) may suddenly power off for no reason.
The 6x0 Edge would have all lights off, both the front status LED and the rear Ethernet port lights, and can only be recovered by manually power cycling the Edge.
The cause of the issue is traced to a PIC microcontroller exclusive to the Edge 6x0 line which uses a PIC firmware version of v20M or earlier (v20L, v20K, v20J). This issue can only occur when the 6x0 Edge uses a PIC version of v20M or earlier, but even with this version the odds of experiencing the power off issue are rare (approximately 1/1,000). The issue cannot occur on a 6x0 Edge with a PIC firmware version of v20N or later.
Note: A 6x0 Edge's Firmware including PIC version can be determined on an Orchestrator using 5.x by going to the Monitor > Edge > Overview page for that Edge and clicking the dropdown information box next to the Edge name which includes the Edge Information, Device Version, and the Device Firmware. However this only works on an Edge using Release 4.5.1.
The issue is resolved by upgrading the 6x0 Edge to Platform Firmware 1.3.1 (R131-20221216-GA), which includes PIC version v20N. To do this the 6x0 Edge must be connected to a VMware SASE Orchestrator using Release 5.x (5.0.0 or later), and the 6x0 Edge must first be upgraded to Edge hoftix build R5012-20230123-GA-103475. Once the 6x0 Edge is upgraded to R5012-20230123-GA-103475, the user would then update the 6x0 Edge Platform Firmware to version R131-20221216-GA in the same way that an Edge's software version is modified.
For more information and a step-by-step guide to upgrading a 6x0 Edge to Platform Firmware 1.3.1, see the KB Article: VMware SD-WAN 6X0 model Edges may power off with no LEDs and require a power cycle to come back to a working state (88970). This KB article was updated on January 27th, 2023 to reflect the new Edge and Platform Software needed to resolve the issue.
For information on uploading a Platform Firmware bundle to an Orchestrator, consult the Platform Firmware and Factory Images with New Orchestrator UI section of the VMware SD-WAN Operator Guide.
For information on updating a 6x0 Edge’s Platform Firmware, consult the View or Modify Edge Information section of the VMware SD-WAN Administration Guide.
Workaround: To recover the Edge from the problem state:
Disconnect the Edge from the power source.
Wait 20 seconds.
Reconnect the Edge to the power source.
If you do not wish to upgrade the platform firmware, the user can ensure the power to the Edge is consistent and does not flap rapidly or consistently. A good way to ensure a reliable power source is to connect the 6x0 Edge to an Uninterruptible Power Supply (UPS).
If the user prefers to keep the Edge on a lower software release (for example, Release 4.3.1, or 4.5.1), the customer can temporarily upgrade the Edge to R5012-20230123-GA-103475, perform the Platform Firmware upgrade to version 1.3.1 (R131-20221216-GA) so that the PIC version is v20N, and then downgrade the Edge’s software back to their preferred version. Downgrading the 6x0 Edge's software to an earlier version does not also downgrade the Edge's Platform Firmware and the Edge would continue to use Platform Firmware version 1.3.1. In this use case the customer Edges would need to be on an Orchestrator using Release 5.x.
If the 6x0 Edge is on an Orchestrator that does not use version 5.x and has experienced this issue and requires an update of its PIC firmware, the customer may reach out to VMware SD-WAN Support and they will manually update the Edge’s PIC version.
Issue 91365: For a customer using Edge Network Intelligence, an VMware SD-WAN Edge where Analytics is configured experiences a memory leak that will result in the Edge triggering an Edge Service restart to clear the memory.
When the Analytics function is enabled on an Edge, the Edge's Dataplane service begins leaking memory at a steady rate that will result in the Edge needing to trigger an unscheduled Service Restart to clear the memory leak when it reaches a critical level (60% memory utilization for longer than 90 seconds). An Edge Service restart causes a 10-15 second disruption in customer traffic. In the field the time it takes to trigger an Edge Service restart has been ~3 to 4 days, and once the memory is cleared the memory leak will resume with the same general time window for the next Edge Service restart. The period when the Edge would reach a critical memory usage level depends on the Edge model and the amount of information the Analytics feature is recording for that Edge.
Workaround: The customer has two options, a) temporarily turn off Analytics for the Edge until a fixed Edge build is delivered; or b) monitor the Edge's memory. When memory utilization reaches 40% and the Orchestrator records a Memory Warning Event, schedule a manual Edge Service Restart in a maintenance window to clear the memory and ensure minimal customer impact.
Issue 89873: A user may observe an increase in memory utilization on a VMware SD-WAN Edge resulting in a Memory Usage Warning Event on the Orchestrator and potentially an unscheduled Edge Service restart to recover the Edge's memory.
This issue occurs when UDP flows with unique IP address and ports are processed at a high rate on the Edge. Flow creation is handled asynchronously on the Edge and when multiple packets of a same flow are enqueued to the flow creation service, the flow objects are leaked and result in an Edge memory leak. The impact is more commonly observed on entry level Edge models (for example, the 510, 610, or 620) which have smaller amounts of Edge memory, but over a long enough period every Edge model could reach a critical memory level (60% memory utilization for longer than 90 seconds) and restart. An unplanned Edge Service restart to clear the memory can cause a brief disruption in customer traffic.
Workaround: The only way to prevent this issue impacting a customer site is to monitor the memory. When memory utilization reaches 40% and the Orchestrator records a Memory Warning Event, schedule a Edge Service Restart in a maintenance window to clear the memory and ensure minimal customer impact.
Issue 91746: A VMware SD-WAN Edge using either wired or wireless 802.1x authentication (for example, RADIUS, Cisco ISE) may experience certificate authentication failures with all traffic requiring this authentication dropping at the Edge.
The issue is caused by the Edge improperly altering the L4 headers of IP fragmented packets which results in the packets becoming corrupted before exiting the Edge. This primarily impacts UDP packets and as these packets are used for 802.1x certificate authentication has the potential to cause 802.1x wired or wireless clients to fail.
Workaround: On an Edge where this issue is encountered, the workarounds are to either, a) disable 802.1x authentication, or b) roll the Edge back to a prior Edge software build where 802.1x authentication worked properly because this issue is not present.
Issue 92676: For a customer deployment where a Non VMware SD-WAN Destination (NSD) via Gateway is configured to use redundant tunnels and redundant Gateways and is also using BGP over IPsec, if the Primary and Secondary Gateways advertise a prefix with an equal AS path to the Primary and Secondary NSD tunnels, the Primary NSD tunnel will prefer a redundant Gateway path over the Primary Gateway.
The impact of the Primary NSD over Gateway tunnel preferring the redundant Gateway path over the Primary Gateway is experienced only for return traffic to the Gateway from the NSD.
Workaround: Configure a higher (3 or more) metric on the redundant Gateway for the interested prefix as this will help the NSD's primary tunnel choose the Primary Gateway for return traffic.
Issue 93383: Symptom: A VMware SD-WAN Edge may suffer one or more Dataplane Service failures with a disruption in customer traffic.
The issue is caused by a rare instance of a mismatch of the number of interfaces stored in the Edge in two different data structures which triggers an exception and results in the Edge service failing one or more times. The Edge service needs to restart to recover which, in a non-HA deployment, would cause a 10-15 second disruption of customer traffic. However, if the Edge service fails three consecutive times, the Edge will require a reboot or power cycle to recover.
Workaround: There is no workaround for this issue.
Issue 94204: A user may observe that attempts to generate a diagnostic bundle for a VMware SD-WAN Edge fail.
The Edge diagnostic bundles fail to complete because the Edge runs out of disk space. This can happen if the Edge has generated one or more cores and is caused by the Edge sending these cores to the /vnf/tmp folder. Each core is unpacked in the /vnf/tmp folder and due to a core's unpacked size quickly fills this folder which causes the diagnostic bundle to fail.
Workaround: There is no workaround for this issue.
Issue 96441: On a site using a High Availability Topology, the customer may observe frequent HA failovers.
The issue is triggered by the HA interface being marked by the Edge as down and then coming back up within 500-1000ms which can trigger an HA failover. However, these interface down events are spurious and caused by a DPDK-enabled interface using polling with an interval of 500ms to determine interface status. Using this method, the underlying device driver can sometimes report a spurious interface down event and each event causes the Edge to mark the interface as down until the next poll of the interface status (in 500ms) reports that the interface is up.
Workaround: There is no workaround for this issue.
Issue 96888: In certain load conditions, the routing protocols for either BGP or OSPF may randomly restart, leading to route re-convergence and traffic disruption.
Under higher load conditions the BGP and OSPF routing protocol processes are made to wait longer than expected by the Edge CPU to get scheduled and this leads to a stall and restart of the routing protocol. The routing protocol delay is caused by insufficient CPU bandwidth allocation and can occur on any Edge model.
Workaround: If an Edge is experiencing this issue, a customer may contact VMware Support for assistance or upgrade their Edge to Release 4.5.1, build R451-20220916-GA or later.
Issue 97321: When Edge Network Intelligence Analytics is activated on a VMware SD-WAN Edge, the Edge may trigger an Edge Service restart, which causes 10-15 seconds of customer traffic disruption.
When Analytics is enabled on the Edge, the Edge can experience an out of memory condition followed by a "double free" memory state. The Edge restarts its service to restore memory. The symptoms for this issue can happen multiple times while Analytics are activated.
Workaround: There is no workaround for this issue.
Issue 97559: On a customer site deployed with an Enhanced High Availability topology, a WAN link connected to the VMware SD-WAN Edge in a Standby role may show as down on the VMware SASE Orchestrator and not pass customer traffic even though the Edge's WAN interface where the WAN link is connected is up.
A user looking at a tcpdump or diagnostic bundle logging would observe ARP requests coming in and the Standby Edge not responding as a result of its port being blocked.
In Enhanced HA, when an Edge assumes the role of Standby, the following events should occur in sequence:
The Standby Edge blocks all ports.
The Standby Edge then detects that it is deployed in Enhanced HA and unblocks its WAN ports to pass traffic.
When this issue occurs, Event 1, the initial port blocking takes an unexpectedly long time to complete and the follow-up Event 2, the unblocking of all WAN ports is completed prior to the completion of Event 1. And then Event 1 completes and thus the final state is all WAN ports are blocked on the Standby Edge.
Workaround: An HA failover that promotes the Standby Edge to Active brings up the HA Edge's WAN link(s).
Issue 98136: For customer enterprises using a Hub/Spoke topology where Dynamic Branch To Branch VPN is configured, client users behind a SD-WAN Spoke Edge may observe that some traffic has unexpected latency resulting from the traffic using a sub-optimal path.
Spoke Edge traffic that experiences this issue uses a route that was initially a non-uplink route for a Hub Edge not included in the Profile the Spoke Edge was using. A Dynamic Branch To Branch VPN tunnel can be formed from the Spoke Edge to the Hub Edge because of traffic being sent towards some other unrelated prefix and in this instance the non-uplink route is installed in the Spoke Edge.
As a result of this non-uplink route, all traffic towards this prefix starts going through the Hub Edge and the non-uplink route becomes uplink (community change to uplink community) but the non-uplink route installed previously is not revoked and the traffic takes the Hub Edge path as long as the Dynamic Branch To Branch VPN tunnel remains up.
Workaround: Wait for the Dynamic Branch To Branch VPN tunnel to tear down, after which the uplink route will not be installed in the Spoke Edge when a new Dynamic Branch To Branch VPN tunnel is formed towards the Hub Edge.
Issue 21342:
When assigning Partner Gateways per-segment, the proper list of Gateway Assignments may not show under the Operator option "View" Gateways on the VMware SD-WAN Edge monitoring list.
Issue 24269:
Monitor > Transport > Loss not graphing observed WAN link loss while QoE graphs do reflect this loss.
Issue 25932:
The VMware SD-WAN Orchestrator allows VMware SD-WAN Gateways to be removed from the Gateway Pool even when they are in use.
Issue 32335:
The ‘End User Service Agreement’ (EUSA) page throws an error when a user is trying to accept the agreement.
Workaround: Ensure no leading or trailing spaces are found in Enterprise Name.
Issue 32435:
A VMware SD-WAN Edge override for a policy-based NAT configuration is permitted for tuples which are already configured at the profile level and vice versa.
Issue 32856:
Though a business policy is configured to use the Hub cluster to backhaul internet traffic, the user can unselect the Hub cluster from a profile on a VMware SD-WAN Orchestrator that has been upgraded from Release 3.2.1 to Release 3.3.x.
Issue 32913:
After Enabling High Availability, Multicast details for the VMware SD-WAN Edge are not displayed on the Monitoring Page. A failover resolves the issue.
Issue 35667:
When a VMware SD-WAN Edge is moved from one profile to another profile which has the same CSS setting but a different GRE CSS name (the same endpoints), some GRE tunnels will not show in monitoring.
Workaround: Disable and then reenable GRE at the Edge level to resolve the issue.
Issue 35658:
When a VMware SD-WAN Edge is moved from one profile to another which has a different CSS setting (e.g. IPsec in profile1 to GRE in profile2), the Edge level CSS settings will continue to use the previous CSS settings (e.g. IPsec versus GRE).
Workaround: Disable and then reenable GRE at the Edge level to resolve the issue.
Issue 36665:
If the VMware SD-WAN Orchestrator cannot reach the internet, user interface pages that require accessing the Google Maps API may fail to load entirely.
Issue 38056:
The Edge-Licensing export.csv file does not show region data.
Issue 38843:
When pushing an application map, there is no Operator event, and the Edge event is of limited utility.
Issue 39633:
The Super Gateway hyper link does not work after a user assigns the Alternate Gateway as the Super Gateway.
Issue 39790:
The VMware SD-WAN Orchestrator allows a user to configure a VMware SD-WAN Edge’s routed interface to have greater than the supported 32 subinterfaces, creating the risk that a user can configure 33 or more subinterfaces on an interface which would cause a Dataplane Service Failure for the Edge.
Issue 41691:
User cannot change the 'Number of addresses' field although the DHCP pool is not exhausted on the Configure > Edge > Device page.
Issue 43276:
User cannot change the Segment type when a VMware SD-WAN Edge or Profile has a partner gateway configured.
Issue 47820:
If a VLAN is configured with DHCP disabled at the Profile level, while also having an Edge Override for this VLAN on that Edge with DHCP enabled, and there is an entry for the DNS server field set to none (no IP configured), the user will be unable to make any changed on the Configure > Edge > Device page and will get an error message of ‘invalid IP address []’ that does not explain or point to the actual problem.
Issue 48085:
The VMware SD-WAN Orchestrator allows a user to delete a VLAN which is associated with an interface.
Issue 49225:
VMware SD-WAN Orchestrator does not enforce a limit of 32 total VLANs.
Issue 50531:
When two Operators of differing privileges use the same browser window when accessing the New UI on a 4.0.0 Release version of the VMware SD-WAN Orchestrator, and the Operator with lesser privileges tries to login after the Operator with higher privileges, that lesser privileged Operator will observe multiple errors stating that the "user does not have privilege".
Note: There is no escalation in privileges for the Operator with lower privileges, only the display of error messages.
Workaround: The next operator may refresh that page prior to logging in to prevent seeing the errors, or each Operator may use different browser windows to avoid this display issue.
Issue 51722: On the Release 4.0.0 VMware SD-WAN Orchestrator, the time range selector is no greater than two weeks for any statistic in the Monitor > Edge tabs.
The time range selector does not show options greater than "Past 2 Weeks" in Monitor > Edge tabs even if the retention period for a set of statistics is much longer than 2 weeks. For example, flow and link statistics are retained for 365 days by default (which is configurable), while path statistics are retained only for 2 weeks by default (also configurable). This issue is making all monitor tabs conform to the lowest retained type of statistic versus allowing a user to select a time period that is consistent with the retention period for that statistic.
Workaround: A user may use the "Custom" option in the time range selector to see data for more than 2 weeks.