Updated 25 April, 2023 VMware SD-WAN™ Orchestrator Version R422-20220715-GA Check regularly for additions and updates to these release notes. |
What's in the Release Notes
The release notes cover the following topics:- Recommended Use
- Compatibility
- Important Notes
- Support for New Hardware Platforms
- Revision History
- Resolved Issues
- Known Issues
Recommended Use
This release is recommended for all customers who require the features and functionality first made available in Release 4.2.0, as well as those customers impacted by the issues listed below which have been resolved since Release 4.2.1.
Compatibility
Release 4.2.2 Orchestrators, Gateways, and Hub Edges support all previous VMware SD-WAN Edge versions greater than or equal to Release 3.0.0
Note: this means releases prior to 3.0.0 are not supported.
The following interoperability combinations were explicitly tested:
Orchestrator |
Gateway |
Edge |
|
Hub |
Branch/Spoke |
||
4.2.2 |
4.0.0 |
4.0.0 |
4.0.0 |
4.2.2 |
4.2.2 |
4.0.0 |
4.0.0 |
4.2.2 |
4.2.2 |
4.2.2 |
4.0.0 |
4.2.2 |
4.2.2 |
4.0.0, 4.0.2 |
4.2.2 |
4.2.2 |
4.0.0 |
4.2.2 |
4.0.0, 4.0.2 |
4.2.2 |
4.0.2 |
4.2.2 |
4.0.0, 4.0.2 |
4.2.2 |
4.0.0 |
4.2.2 |
4.2.2 |
4.2.2 |
3.4.6 |
3.4.6 |
3.4.6 |
4.2.2 |
4.2.2 |
3.4.4, 3.4.5, 3.4.6 |
3.4.6 |
4.2.2 |
4.2.2 |
4.2.2 |
3.4.4, 3.4.5, 3.4.6 |
4.2.2 |
4.2.2 |
3.4.4, 3.4.5, 3.4.6 |
4.2.2 |
4.2.2 |
3.3.2 P2 * |
3.3.2 P2 * |
3.3.2 P2 * |
4.2.2 |
4.2.2 |
3.3.2 P2, 3.3.2 P3 * |
3.3.2 P2 * |
4.2.2 |
4.2.2 |
4.2.2 |
3.3.2 P2, 3.3.2 P3 * |
4.2.2 |
4.2.2 |
3.3.2 P2, 3.3.2 P3 * |
4.2.2 |
4.2.2 |
3.2.2 * |
3.2.2 * |
3.2.2 * |
4.2.2 |
4.2.2 |
3.2.2 * |
3.2.2 * |
4.2.2 |
4.2.2 |
4.2.2 |
3.2.2 * |
4.2.2 |
4.2.2 |
3.2.2 * |
4.2.2 |
4.2.1 |
4.2.2 |
4.2.2 |
4.2.1 |
4.2.1 |
4.2.1 |
4.2.2 |
4.2.2 |
4.2.2 |
4.2.1 |
4.2.2 |
4.2.2 |
4.0.2 |
4.0.2 |
4.2.2 |
4.0.2 |
Warning: VMware SD-WAN Releases 3.2.x, 3.3.x, and 3.4.x have reached the End of Support.
- Release 3.4.x for the Orchestrator and Gateway reached End of General Support (EOGS) on March 30, 2022, and End of Technical Guidance (EOTG) on September 30, 2022.
- Release 3.4.x for the Edge reached End of Support (EOGS) on December 31, 2022, and End of Technical Guidance (EOTG) on March 31, 2023.
- For more information please consult the Knowledge Base article: Announcement: End of Support Life for VMware SD-WAN Release 3.x (84151)
Warning: VMware SD-WAN Release 4.0.x has reached End of Support; Release 4.2.x has reached End of Support for Gateways and Orchestrators; and 4.3.x is approaching End of Support for Gateways and Orchestrators.
- Release 4.0.x reached End of General Support (EOGS) on September 30, 2022, and End of Technical Guidance (EOTG) December 31, 2022.
- Release 4.2.x Orchestrators and Gateways reached End of General Support (EOGS) on December 30, 2022, and End of Technical Guidance (EOTG) on March 30, 2023.
- Release 4.2.x Edges will reach End of General Support (EOGS) on June 30, 2023, and End of Technical Guidance (EOTG) September 30, 2025.
- Release 4.3.x Orchestrators and Gateways will reach End of General Support (EOGS) on June 30, 2023, and End of Technical Guidance (EOTG) September 30, 2023.
- Release 4.3.x Edges will reach End of General Support (EOGS) on June 30, 2023, and End of Technical Guidance (EOTG) September 30, 2025.
- For more information please consult the Knowledge Base article: Announcement: End of Support Life for VMware SD-WAN Release 4.x (88319).
Note: Release 3.x did not properly support AES-256-GCM, which meant that customers using AES-256 were always using their Edges with GCM deactivated (AES-256-CBC). If a customer is using AES-256, they must explicitly deactivate GCM from the Orchestrator prior to upgrading their Edges to a 4.x Release. Once all their Edges are running a 4.x release, the customer may choose between AES-256-GCM and AES-256-CBC.
Important Notes
Potential Issue With Sites Using a High-Availability Topology
A site where a pair of Edges are deployed in a High-Availability topology may encounter an issue where the Standby Edge reboots one or more times to resolve an Active-Active state. The Standby Edge reboot(s) can cause a disruption of customer traffic with the impact greater on sites using an Enhanced HA topology as the Standby Edge also passes customer traffic. The issue is being tracked by Issue #85369 under the Edge/Gateway Known Issues section of these Release Notes.
Mixing Wi-Fi Capable and Non-Wi-Fi Capable Edges in High Availability Is Not Supported
Beginning in 2021, VMware SD-WAN introduced Edge models which do not include a Wi-Fi module: the Edge models 510N, 610N, 620N, 640N, and 680N. While these models appear identical to their Wi-Fi capable counterparts except for Wi-Fi, deploying a Wi-Fi capable Edge and a Non-Wi-Fi capable Edge of the same model (for example, an Edge 640 and an Edge 640N) as a High-Availability pair is not supported. Customers should ensure that the Edges deployed as a High Availability pair are of the same type: both Wi-Fi capable, or both Non-Wi-Fi capable.
BGPv4 Filter Configuration Delimiter Change for AS-PATH Prepending
Through Release 3.x, the VMware SD-WAN BGPv4 filter configuration for AS-PATH prepending supported both comma and space based delimiters. However, beginning in Release 4.0.0 and forward, VMware SD-WAN will only support a space based delimiter in an AS-Path prepending configuration.
Customers upgrading from 3.x to 4.x need to edit their AS-PATH prepending configurations to "replace commas with spaces" prior to upgrade to avoid incorrect BGP best route selection.
Extended Upgrade Time for Edge 3x00 Models
Upgrades to this version may take longer than normal (3-5 minutes) on Edge 3x00 models (i.e., 3400, 3800 and 3810). This is due to a firmware upgrade which resolves issue 53676. If an Edge 3400 or 3800 had previously upgraded its firmware when on Release 3.4.5, 4.0.2, or 4.2.1, then the Edge would upgrade as expected. For more information, please consult Fixed Issue 53676 in the respective release notes.
Limitation When Deactivating Autonegotiation on VMware SD-WAN Edge Models 520, 540, 620, 640, 680, 3400, 3800, and 3810
When a user deactivates autonegotiation to hardcode speed and duplex on ports GE1 - GE4 on a VMware SD-WAN Edge model 620, 640 or 680; on ports GE3 or GE4 on an Edge 3400, 3800, or 3810; or on an Edge 520/540 when an SFP with a copper interface is used on ports SFP1 or SFP2, the user may find that even after a reboot the link does not come up.
This is caused by each of the listed Edge models using the Intel Ethernet Controller i350, which has a limitation that when autonegotiation is not used on both sides of the link, it is not able to dynamically detect the appropriate wires to transmit and receive on (auto-MDIX). If both sides of the connection are transmitting and receiving on the same wires, the link will not be detected. If the peer side also does not support auto-MDIX without autonegotiation, and the link does not come up with a straight cable, then a crossover Ethernet cable will be needed to bring the link up.
For more information please see the KB article Limitation When Deactivating Autonegotiation on VMware SD-WAN Edge Models 520, 540, 620, 640, 680, 3400, 3800, and 3810 (87208).
Support for New Hardware Platforms
Edge 510N, Edge 610N, Edge 620N, Edge 640N, and Edge 680N
VMware plans to introduce several new SD-WAN Edge hardware models that do not include integrated Wi-Fi. These include the Edge 510N, 610N, 620N, 640N, and 680N. These Edge models will be supported in this release. Any Wi-Fi configurations made in the VMware SD-WAN/SASE Orchestrator’s settings will not impact these Edge models.
Note: Mixing Wi-Fi Capable and Non-Wi-Fi Capable Edges in High Availability Is Not Supported
While the Edge models 510N, 610N, 620N, 640N, and 680N appear identical to their Wi-Fi capable counterparts, deploying a Wi-Fi capable Edge and a Non-Wi-Fi capable Edge of the same model (for example, an Edge 640 and an Edge 640N) as a High-Availability pair is not supported. Customers should ensure that the Edges deployed as a High Availability pair are of the same type: both Wi-Fi capable, or both Non-Wi-Fi capable.
Document Revision History
September 29th, 2021. First Edition
December 21st, 2021. Second Edition
- Added a new Orchestrator build R422-20211216-GA to Orchestrator Resolved Issues. This Orchestrator build remediates CVE-2021-44228, the Apache Log4j vulnerability, by updating to Log4j version 2.16.0. For more information on the Apache Log4j vulnerability, please consult the VMware Security Advisory VMSA-2021-0028.5.
- Added to Important Notes the Note: Limitation When Deactivating Autonegotiation on VMware SD-WAN Edge Models 520, 540, 620, 640, 680, 3400, 3800, and 3810. This note covers an issue that may be encountered when configuring a forced speed on some Ethernet ports of the listed Edge models.
January 4th, 2022. Third Edition
- Added a new Edge build R422-20211216-GA to the Edge Resolved section. This build is the new Edge GA build for Release 4.2.2.
- This Edge build includes the fixed issues #53951, #67060, #70919, #70954 , #72245, #72425, #73251, #72688, which are each documented in this section.
January 21st, 2022. Fourth Edition
- Added a new Edge build R422-20220119-GA to the Edge Resolved section. This build is the new Edge GA build for Release 4.2.2.
- This Edge build includes the fixed issues #58791, #68785, #70933, #72498, #75992, #77040, #77525, and #77586, which are each documented in this section.
- Added a new Orchestrator build R422-20220112-GA to Orchestrator Resolved Issues. This Orchestrator build remediates Apache Log4j vulnerabilities CVE-2021-44228 (which was first addressed in Orchestrator build R422-20211216-GA with Log4j version 2.16.0) and CVE-2021-45046, by updating to Log4j version 2.17.0. For updated information on the Apache Log4j vulnerabilities and their impact on VMware products, please consult the VMware Security Advisory VMSA-2021-0028.9.
February 7th, 2022. Fifth Edition
- Moved Issue #55327 from Edge/Gateway Resolved Issues to Edge/Gateway Open Issues as the issue was recurring with the 4.2.2 GA Build.
February 18th, 2022. Sixth Edition
- Added a new Edge build R422-20220210-GA to the Edge Resolved section. This build is the new Edge GA build for Release 4.2.2.
- This Edge build includes the fixed issues #48017, #55327, #57281, #66691, #72859, #72925, #78003, #78300, #78678, and #81224, which are each documented in this section.
March 2nd, 2022. Seventh Edition
- Under Support for New Hardware Platforms, added an important note: "Note: Mixing Wi-Fi Capable and Non-Wi-Fi Capable Edges in High Availability Is Not Supported" to the Edge 510N, Edge 610N, Edge 620N, Edge 640N, and Edge 680N section.
March 16th, 2022. Eight Edition
- Added a new Edge build R422-20220310-GA to the Edge Resolved section. This build is the new Edge GA build for Release 4.2.2.
- Edge build R422-20220310-GA includes the fixed issues #57597, #65695, #67745, #68923, #70586, #77625, #77642, #81221, and #83212, which are each documented in this section.
- Under the Compatibility section, added a new Warning that Release 3.4.x software is approaching End of Support for the Orchestrator and Gateway with End of General Support (EOGS) on March 30, 2022, and End of Technical Guidance (EOTG) June 30, 2022. This is for the Orchestrator and Gateway only. The 3.4.x Edge software is scheduled to enter its End of Support window beginning on December 31, 2022.
March 23rd, 2022. Ninth Edition
- Added Issue #84825, to the Edge/Gateway Known Issues section.
- Further edited Fixed Issue #58791 to remove the parts that states this ticket fixes HA site configured with a large number of BGP match and set filters. That part of the issue is not fixed with this ticket and is tracked with Issue #84825.
March 31st, 2022. Tenth Edition
- Reclassified Fixed Issue #83212 found in the most recent Edge Rollup build 422-20220310-GA to an Open Issue. #83212 has been moved to Edge/Gateway Known Issues section.
April 13th, 2022. Eleventh Edition
- Added Open Issue #62701 to the Edge/Gateway Known Issues section as this issue remains unresolved on all releases at this time.
- Included Gateway Build R422-20220310-GA in addition to the original Edge build R422-20220310-GA under Edge/Gateway Resolved Issues.
- Gateway build R422-20220310-GA was also released on March 15th, 2022 and is now the default build for 4.2.2 and features verified support for ESXi version 7.0. A customer who wants to deploy or upgrade a Gateway using a VMware hypervisor with ESXi version 7.0 should only use this build or later.
April 20th, 2022. Twelfth Edition
- Added a new Edge and Gateway build R422-20220419-GA to the Edge/Gateway Resolved section. This build is the new Edge and Gateway GA build for Release 4.2.2.
- Edge/Gateway build R422-20220419-GA includes the fixed issues #65466, #67336, #69194, #83946, and #84825, which are each documented in this section.
May 6th, 2022. Thirteenth Edition
- Removed Fixed Issue #67060 from the list of resolved issues in the Edge rollup build R422-20211216-GA as a duplicate which had been added in error. The fix for #67060 was included in the original GA build R422-20210923-GA and had been properly documented from initial publication of the 4.2.2 Release Notes.
- Added a new warning in the Compatibility section regarding Release 4.0.x approaching End of Support.
May 12th, 2022. Fourteenth Edition
- Added Open Issue #83437 to the Edge/Gateway Known Issues section.
May 18th, 2022. Fifteenth Edition
- Added a new Edge/Gateway rollup build R422-20220511-GA to the Edge/Gateway Resolved section. This is the sixth Edge/Gateway rollup build and is the new Edge/Gateway GA build for Release 4.2.2.
Edge/Gateway build R422-20220511-GA includes the fix for issue #67201, which is documented in this section.
May 27th, 2022. Sixteenth Edition
- Added a new Edge/Gateway rollup build R422-20220518-GA to the Edge/Gateway Resolved section. This is the seventh Edge/Gateway rollup build and is the new Edge/Gateway GA build for Release 4.2.2.
Edge/Gateway build R422-20220518-GA includes fixes for issues #80814, #82485, #88757, and #88796, which are each documented in this section. - Added Issue #88796 as a new Orchestrator Known Issue. This ticket tracks the issue as it applies to the Orchestrator OVA only, as the fix is included in the latest Gateway build.
- Added Issues #85369 and #85461 to the Edge/Gateway Known Issues section.
June 13th, 2022. Seventeenth Edition
- Added a new Important Note: "Potential Issue With Sites Using a High-Availability Topology" regarding ongoing issues with customer sites using a High-Availability topology for a pair of Edges. This issue continues to be tracked by Issue #85369 located in Edge/Gateway Known Issues.
- Under Compatibility, amended the End of Life dates for Release 4.2.x Edge software. The Edge software is broken out as a separate item and now reads: "Release 4.2.x Edges will reach End of General Support (EOGS) on June 30, 2023, and End of Technical Guidance (EOTG) September 30, 2023." The separate Orchestrator and Gateway entry retains the same End of Life dates as before.
- Revised Fixed Issue #53951 in the Edge/Gateway Resolved Issues section to include another scenario that could impact a customer that encounters this issue in the field.
July 1st, 2022. Nineteenth Edition
- Added Open Issue #88604 to the Edge/Gateway Known Issues section.
July 13th, 2022. Twentieth Edition
- Added Open Issue #91365 to the Edge/Gateway Known Issues section.
July 26th, 2022. Twenty-First Edition
- Added a new Edge/Gateway rollup build R422-20220530-GA to the Edge/Gateway Resolved section. This is the eighth Edge/Gateway rollup build and is the new Edge/Gateway GA build for Release 4.2.2.
Edge/Gateway build R422-20220530-GA includes fixes for issues #83437 and #87205, which are each documented in this section.
August 8th, 2022. Twenty-Second Edition
- Added a new Edge/Gateway rollup build R422-20220805-GA to the Edge/Gateway Resolved Issues section. This is the ninth Edge/Gateway rollup build and is the new Edge/Gateway GA build for Release 4.2.2.
- Edge/Gateway build R422-20220805-GA includes fixes for issues #52659, #59862, #80028, #85369, #87923, #89235, #90280, #90283, #93062, and #94395, which are each documented in this section.
August 10th, 2022. Twenty-Third Edition
- Added a new Orchestrator rollup build R422-20220715-GA to the Orchestrator Resolved Issues section. This is the third Orchestrator rollup build and is the new Orchestrator GA build for Release 4.2.2.
- Orchestrator build R422-20220715-GA includes fixes for issues #85883 and #88796, which are each documented in this section.
August 26th, 2022. Twenty-Fourth Edition.
- Added Open Issue #89217 to the Edge/Gateway Known Issues section.
- Removed Open Issue #49712 from Edge/Gateway Known Issues as Engineering concluded it was caused by a configuration error versus a defect in the code.
September 9th, 2022. Twenty-Fifth Edition.
- Added Open Issue #89217 to the Edge/Gateway Known Issues section.
- Added Fixed Issue #93383 to the Edge/Gateway Resolved Issues section for the 9th Rollup Build R422-20220805-GA as this was omitted in error.
September 28th, 2022. Twenty-Sixth Edition.
- Added Open Issues #86098, #94204, #96441, #96888, and #98136 to the Edge/Gateway Known Issues section.
October 3rd, 2022. Twenty-Seventh Edition
- Added Open Issues #59920 to the Edge/Gateway Known Issues section.
October 31st, 2022. Eleventh Edition.
- Added Open Issue #72491 to the Edge/Gateway Known Issues section.
December 6th, 2022. Twelfth Edition.
- Added Open Issue #59524 to the Edge/Gateway Known Issues section.
January 30th, 2023. Thirteenth Edition.
- Revised Known Issue #89217 to reflect a revised Edge version (R5012-20230123-GA-103475) and Platform Firmware version (R131-20221216-GA) needed to resolve the issue. The ticket also adds a link to the KB Article that covers #89217 and which includes step-by-step instructions for upgrading a 6x0 Edge.
- In the Compatibility section, revised the Import Note regarding End of Support for 4.2.x and added Release 4.3.x to reflect newly revised dates for the SD-WAN Edge software.
April 25th, 2023. Fourteenth Edition.
-
Updated the Compatibility section to mark all 3.x releases as having reached their End of Service Life (EOSL). Also updated the 4.x section to mark 4.2.x Orchestrators and Gateways as End of Service Life (EOSL). Additional information may be found in the KB Article Announcement: End of Support Life for VMware SD-WAN Release 4.x (88319).
Resolved Issues
The resolved issues are grouped as follows.
Edge/Gateway Resolved IssuesResolved in Edge/Gateway Version R422-20220805-GA
Edge/Gateway Version R422-20220805-GA was released on 08-08-2022 and is the 9th Edge/Gateway rollup for Release 4.2.2.
This Edge/Gateway rollup build addresses the below critical issues since the 8th Edge/Gateway rollup version R422-20220530-GA.
- Fixed Issue 52659: When a VMware SD-WAN Edge is configured as a DHCP relay agent, the Edge does not forward the DHCP NAK packets to a client.
If the DHCP server sends DHCP NAK packets, the Edge which is configured as DHCP relay will drop the packets without forwarding them.
- Fixed Issue 59862: When running the Remote Diagnostic "Interface Status", the test may fail with message "Error Reading data for test".
The issue is caused by a stray USB modem entry which is left behind in the VMware SD-WAN Edge's network configuration file after inserting and then removing a USB modem. The fix for this issue ensures that the stray data is handled gracefully, and the test runs properly. On Edges that do not have the fix, a reboot of the Edge clears the stray entry, and the test will run properly.
- Fixed Issue 80028: On a site deployed with a High Availability topology, the Standby Edge may experience a Dataplane Service failure and restart as a result.
This issue only occurs on the Standby Edge and never the Active Edge. The issue is caused by a race condition when the Deep Packet Inspection engine has invoked a cleanup while there still are packets being processed in the pipeline and could happen at any time. There is no impact to a customer using a standard HA configuration as the Standby Edge does not pass traffic, but on an Enhanced HA deployment, where the Standby Edge is also passing traffic, the restart of the Standby Edge service would briefly disrupt the customer traffic passing through the Standby for ~15 seconds.
- Fixed Issue 85369: For a site deployed with a High-Availability topology, the customer may observe traffic disruptions and possibly multiple reboots of the VMware SD-WAN Standby Edge.
A condition triggered by load and system events causes the Active Edge to experience delays in the timely delivery of HA heartbeats to the Standby Edge. The delay causes the Standby Edge to miss heartbeats and incorrectly assume the Active role causing an Active-Active state. To recover from the Active-Active state the Standby Edge reboots, possibly multiple times.
If the site does become Active-Active, a conventional HA setup would experience minimal traffic disruption since the Standby Edge does not pass traffic in this topology, but on an Enhanced HA deployment, where the Standby Edge is also passing traffic, the reboot(s) would disrupt some customer traffic.
- Fixed Issue 87923: When a malformed ICMP packet is sent to a VMware SD-WAN Edge, the Edge may experience a Dataplane Service failure and restart as a result.
The Edge does not validate the IP packet length (for example, an ICMP packet with a IP packet length of 24) and this can lead to a memory corruption of the Edge which triggers the Dataplane Service failure and restart.
- Fixed Issue 89235: On a customer enterprise which uses a Hub/Spoke topology and employs internet backhaul policies, backhaul traffic from a VMware SD-WAN Spoke Edge which is destined for the Internet may be dropped by the Hub Edge.
When this issue is encountered, the client users would notice issues for traffic destined for the Internet. The issue occurs after one of the following: an Edge power cycle (for example after a power outage), an Edge service restart, or a configuration change and is caused by a timing issue between the backhaul traffic originating from a Spoke Edge and the route advertised from the Spoke Edge.
When encountering this issue on a Spoke Edge without this fix, a user should flush the flows on the affected Spoke Edge to restore normal routing of backhaul traffic. This can be done on the Orchestrator through Remote Diagnostics > Flush Flows.
- Fixed Issue 90280: On a site deployed with a High Availability topology and configured to use Dynamic Edge-to-Edge, the VMware SD-WAN HA Edge may fail over unexpectedly.
This issue can be encountered at a site that has a high rate of creation and destruction of dynamic tunnels between other Edges. In such a scenario the Edge may incorrectly account for the number of interfaces that are up which can lead the Edge service to conclude that all the links are down and trigger an HA failover.
- Fixed Issue 90283: A customer may experience poor audio and/or video quality for VoIP and videotelephony calls if Underlay Accounting is turned on for the WAN link being used on the VMware SD-WAN Edge.
When checking the logs, the user would observe packet loss on bidirectional traffic when the traffic is asymmetric and one of the routes is via the underlay. In other words, when the routes for a flow are asymmetric such that in one direction the traffic takes an underlay route and in the reverse direction it takes an overlay path and where Underlay Accounting is toggled on for that WAN link, packet loss may be experienced on bidirectional flows which are typical of, but not limited to, VoIP and videotelephony calls.
- Fixed Issue 93062: When a user runs the Remote Diagnostic "Interface Status" on the VMware Orchestrator, the Orchestrator either returns an error for that test and does not complete or the test does not return results for routed interfaces.
The error message seen is "error reading data for test". If the test does complete, the results for routed interfaces are empty with no information about speed or duplex. Either way the Interface Status is broken. The issue is related to the debug command that underlies Interface Status omitting DPKD enabled ports.
- Fixed Issue 93383: Symptom: A VMware SD-WAN Edge may suffer one or more Dataplane Service failures with a disruption in customer traffic.
The issue is caused by a rare instance of a mismatch of the number of interfaces stored in the Edge in two different data structures which triggers an exception and results in the Edge service failing one or more times. The Edge service needs to restart to recover which, in a non-HA deployment, would cause a 10-15 second disruption of customer traffic. However, if the Edge service fails three consecutive times, the Edge will require a reboot or power cycle to recover.
- Fixed Issue 94395: On a site deployed with a High Availability topology, HA failover may fail as the Standby Edge is not moved to an Active state after the Active Edge has failed, resulting in a disruption of customer traffic.
This issue can be encountered when more than one pair of HA Edges are connected to the same upstream WAN switches or broadcast network. In this scenario, the HA Edges may process non-peer HA WAN heartbeats and this affects the local HA state and leads to indeterministic HA behavior, including the Standby Edge not being promoted to Active.
On an HA Edge pair using an Edge build that does not include a fix for this issue, the workaround is to avoid sharing the same broadcast network between two different HA pairs.
___________________________________________________________________
Resolved in Edge/Gateway Version R422-20220530-GA
Edge/Gateway Version R422-20220530-GA was released on 06-01-2022 and is the 8th Edge/Gateway rollup for Release 4.2.2.
This Edge/Gateway rollup build addresses the below critical issues since the 7th Edge/Gateway rollup version R422-20220518-GA.
- Fixed Issue 83437: For a site configured with an Enhanced High Availability topology, when upgrading the VMware SD-WAN HA Edges to a 4.2.x release, a user may observe degraded performance at the site and one or more WAN interfaces may show us UP when they are actually disconnected on one of the HA Edges.
This is a platform issue related to how the interface is set during the HA Edge's boot cycle which, in some cases on Enhanced HA setups, can cause a WAN interface physically connected on only one HA Edge to be incorrectly flagged as connected to both units. This results in the WAN link on the affected interface being intermittently degraded up to 100% and dropping all traffic on that link.
A user can identify this issue by going to Remote Diagnostics and running the Interface Status diagnostic. For the affected WAN circuit, if the output reads: "Link detected: true" but the speed shows "0mps, half duplex", then this site is hitting that issue.
While this issue is most commonly encountered when upgrading the Enhanced HA Edges to a 4.2.x release, this could also happen on Enhanced HA Edges that are already upgraded to a 4.2.x release and later an HA Edge is rebooted as the boot cycle is when the issue is triggered.
For an HA Edge using a build that lacks this fix, the issue can be remediated by forcing an HA failover, which can be done through the Orchestrator using Remote Actions > Force HA Failover. While this corrects the condition it does not prevent recurrences of the same issue later if an HA Edge is later rebooted.
- Fixed Issue 87205: For a customer deploying a VMware SD-WAN Edge with a Partner Gateway, when an Edge learns new routes from the Partner Gateway, customer traffic may be disrupted.
This issue is caused by traffic matching the wrong Business Policy. For example, DHCP traffic destined for the Partner Gateway could instead be matched to the Internet Backhaul rule with a resulting disruption in customer traffic.
Without the fix, the issue is remediated by flushing the Edge's flows using the Remote Diagnostic "Flush Flows". This remediation does not prevent future potential occurrences when new routes are learned by the Edge to the Partner Gateway.
___________________________________________________________________
Resolved in Edge/Gateway Version R422-20220518-GA
Edge/Gateway Version R422-20220518-GA was released on 05-24-2022 and is the 7th Edge/Gateway rollup for Release 4.2.2.
This Edge/Gateway rollup build addresses the below critical issues since the 6th Edge/Gateway rollup version R422-20220511-GA.
- Fixed Issue 80814: On a VMware SD-WAN Edge where a Standard Firewall Allow rule is configured which has a local Edge client Source IP address and a remote client as the Destination IP Address, and which also has a "Deny All" rule for other traffic, the traffic from the remote client to the local client is dropped.
This issue is encountered when there is a VLAN IP address mismatch between the source and destination hosts. When the source and destination hosts are part of different VLANs, the SD-WAN service prefers the source/destination IP address of the first packet as it is in the Firewall lookup key. As a result, for overlay inbound flows, there is a mismatch and traffic hits the Deny All firewall rule.
Without the fix, the workaround for this issue is to revert the rule in the direction of first IP packet of the flow, so that the packet is able to match the firewall rule.
- Fixed Issue 82485: On an entry level VMware SD-WAN Edge model (for example, Edge 510, 510-LTE, or 610) if a user runs the Remote Diagnostic "Route Table Dump", the Orchestrator UI page may time out and not return a result.
The issue is encountered if there are more than 16000 routes as it take the Edge more than 30 seconds to return the results. 30 seconds is the timeout limit for the page's WebSocket and so no result is returned. The fix for the issue optimizes the route table walk to ensure timeouts do not occur.
- Fixed Issue 88757: A user running the Remote Diagnostic "Route Table Dump" on the Orchestrator UI may find the attempt times out and the page returns no result.
The Route Table Dump diagnostic times out because the WebSocket timeout is 30 seconds and for a site with a large number of routes the amount of time the debug command takes to deliver all the routes to the Orchestrator may exceed that. The fix here is to lower the time out of the route dump process to less than 30 seconds and prevent the WebSocket from timing out prior to that, which ensures that the Route Table Dump will return a result
- Fixed Issue 88796: When deploying either a VMware SASE Orchestrator or a VMware SD-WAN Gateway and using an OVA on vSphere, the OVF properties set as part of the deployment (password, network information, etc.) are not applied to the image and the system cannot be accessed after deployment.
This only affects a new system deployed from an OVA using OVF/vApp properties (versus using ISO files). This issue is caused by upstream changes to cloud-init in recent updates.
Without the fix, the workaround is for the Operator to deploy the system using a cloud-init user-data ISO file.
Note: This fix is for the Gateway OVA only. The issue as it impacts the Orchestrator OVA is tracked with the same ticket #88796 but under the Orchestrator section.
___________________________________________________________________
Resolved in Edge/Gateway Version R422-20220511-GA
Edge/Gateway Version R422-20220511-GA was released on 05-16-2022 and is the 6th Edge/Gateway rollup for Release 4.2.2.
This Edge/Gateway rollup build addresses the below critical issue since the 5th Edge/Gateway rollup version R422-20220419-GA.
- Fixed Issue 67201: For a site using a High-Availability topology, the customer may observe multiple reboots of the VMware SD-WAN Standby Edge with a potential disruption to customer traffic.
When the Standby Edge is detected, the Active Edge synchronizes all the path information to the Standby Edge. However, where there are a large number of path synchronization messages, the way the Edge processes these path synchronization messages can lead to either a Dataplane Service failure on the Standby Edge or to a thread priority inversion which would causes a delay in heartbeat processing while processing which can lead to an Active/Active state. In either instance on a conventional HA topology the customer impact would be minimal since the Standby Edge is not passing customer traffic. However, on an Enhanced HA deployment, where the Standby Edge is also passing traffic, the reboot(s) would disrupt some customer traffic. An HA Edge which includes this fix has enhanced path information processing code path to prevent the issue from occurring.
___________________________________________________________________
Resolved in Edge and Gateway Version R422-20220419-GA
Edge/Gateway Version R422-20220419-GA was released on 04-20-2022 and addresses the below critical issues since Edge and Gateway version R422-20220310-GA.
- Fixed Issue 65466: A VMware SD-WAN Gateway or VMware SD-WAN Edge processing a large BGP route exchange may experience a Dataplane Service Failure and restart when running certain debug commands or generating a diagnostic bundle.
Either an Edge or Gateway processing a large number of routes (for example, an Edge advertising 50K BGP routes, or a Gateway learning +100K BGP routes from Edges), can encounter this issue if the debug command dispcnt (with parameters) is also run. The dispcnt debug command is used to monitor capacity drops and can be run either by a Partner Operator on the respective device's CLI or by a user during a diagnostic bundle creation. When this command is run on an Edge or Gateway with a large number of routes and another event (for example, route delete) occurs such that the original variable points to a memory location that is now stale, the result will be a Dataplane Service failure due to the illegal access to memory.
- Fixed Issue 67336: When a user looks at the Orchestrator's Monitoring page for a VMware SD-WAN Edge, the Transport statistics show much lower values when compared to the Application statistics for that Edge.
The issue prevents a user from getting an accurate picture of throughput for a particular Edge as the user could not know which data set is correct. The issue is the result of Transport statistics not including underlay accounting versus Application statistics which do.
- Fixed Issue 69194: If a user moves a USB modem from one USB port to a different port on a VMware SD-WAN Edge, the Edge may experience a Dataplane Service failure and restart as a result.
USB ports are incorrectly being bound to the DPDK AF_PACKET drivers. This driver does not support port removal and could cause the Edge's dataplane service to fail when the USB dongle is moved from one port to another.
- Fixed Issue 83946: VMware SD-WAN Edge LAN-side clients may observe disruptions in traffic, and for a site using RADIUS authentication, client users may observe authentication failures.
Large packets will be fragmented and these fragmented packets can be dropped by the Edge. The packets are dropped due to a memory leak during fragment IP identification translation during some error scenarios and if the Edge limit for fragmented packets is exceeded, then further fragmented packets will be dropped by the Edge.
For customers using RADIUS where large packets from a wireless client to an Edge using RADIUS authentication are involved this can cause authentication failures. For example, large packets from a wireless LAN controller (WLC) to a RADIUS server may be dropped.
- Fixed Issue 84825: When a large bulk routing configuration is applied on a VMware SD-WAN Edge in a single step, the Edge may experience repeated Dataplane Service failures resulting in repeated restarts of the Edge service to recover from each failure.
When a standalone (non-HA) Edge encounters this issue, there is significant impact to customer traffic because while a single Edge service restart disrupts traffic for ~15 seconds, repeated Edge service restarts would result in disruptions of ~60 seconds or more. On a site with a High-Availability topology, the customer would observe repeated failovers resulting from the Edge service restarts which would also disrupt customer traffic.
This issue occurs when a bulk routing configuration involving a large number of neighbors and route-maps is applied on an Edge in a single step. The Edge system faces great stress while converting these configurations into command specifications and applying them on routing protocols in a short span of time and this causes the repeated Edge service failures and restarts.
On an Edge build without the fix, to mitigate the risk of this issue a customer user would need to do the following:- Instead of applying a large configuration in a single step, the configuration should be broken into multiple smaller sections with each section applied separately.
- The number of routing filters should be minimized.
- The Edge should only be deliberately restarted in a maintenance window and Edge service restarts should be generally avoided if there are a number of routing filters configured, as the entire Edge configuration is applied at once during restart which would greatly increase the risk of encountering this issue.
___________________________________________________________________
Resolved in Edge Version R422-20220310-GA
Edge version R422-20220310-GA was released on 03-15-2022 and addresses the below critical issue since Edge version R422-20220210-GA.
Gateway version R422-20220310-GA was released on 03-15-2022 and adds support for ESXi version 7.0 when using a VMware based hypervisor.
- Fixed Issue 57597: A customer using Internet Backhaul as part of their deployment may observe handoff queue drops and degraded performance after upgrading to 4.x software versions.
A lock was newly introduced in Edge software versions 4.0 and later for synchronizing applications and this lock can cause performance degradation in the IPv4 internet backhaul. The fix in this ticket removes the lock while still ensuring proper application synchronization.
- Fixed Issue 65695: A customer may observe traffic failing when it is destined for a connected subnet.
The issue is that IPv4/IPv6 connected subnets are being redistributed to the overlay even after the 'Reachable' status goes to False. When the parent interface is down, the Edge service does not receive the 'down' notification for sub-interfaces and as a result the connected routes belonging to the sub-interfaces are not removed. Any traffic that would normally use those subnets when they are reachable is getting blackholed and failing completely.
- Fixed Issue 67745: A VMware SD-WAN Edge which has a WAN link connected to certain ISP-provided routers may experience customer traffic issues if the ISP route goes down and then comes up.
When a WAN link from the Edge to an ISP-provided router (this issue was found with an ISP router used by Spectrum) goes down or the ISP router goes down and then back up, the ISP router may perform a diagnostic which includes briefly assigning a private IP to the Edge in the subnet 192.168.100.0/24, and then after that it assigns the public IP address. However the Edge installs the connected route for 192.168.100.0/0 and it is not cleared after it gets the public IP address.
- Fixed Issue 68923: On a customer enterprise using BGP, a default route may be redistributed to a BGP peer though the reachable status for the installed default route is set to 'False'.
If a static route is configured on an Edge pointing to any Edge interface and that BGP peer learns the default route from the Edge and that interface is later deactivated which changes the reachable flag for that route to False, the route continues to be advertised. It is equally true that a route that is not being redistributed because the interface was down, but then when the interface comes up and marks the route status as 'True', the route would continue to not be redistributed. The cause in both instances is the Edge not readvertising the route on an interface status change that reflects the new route status.
- Fixed Issue 70586: When a routed interface on a VMware SD-WAN Edge is configured for 802.1x (uses RADIUS authentication), clients connected on that interface get silently de-authenticated whenever any other interface flaps (in other words, when any non-802.1x interface goes down and up in quick succession), and all of their traffic gets dropped until the client disconnects and then reconnects to the Edge.
The Edge is not checking that the interface that flapped is actually the one that had 802.1x clients authenticated and thus treats any interface flap is if it were a 802.1x interface flap and acts accordingly.
Without the fix, the only workaround is to force the client to physically disconnect and reconnect to get re-authenticated again.
- Fixed Issue 77625: On a site deployed with a High Availability topology, a user may observe the VMware SD-WAN Standby Edge rebooting multiple times.
The site goes into a Active/Active (Split-Brain) state due to HA threads being starved while processing the HA heartbeat packet. In an Active-Active state the tie-break goes to the Active Edge and the Standby Edge is rebooted to demote it back to its proper Standby status. In this case though, the Active/Active event is detected multiple times with Standby Edge reboots each time to recover the site.
Field issues have involved Edge 6x0 (610, 620, 640, 680) models but the issue is platform agnostic and could occur on other Edge models.
- Fixed Issue 77642: Customer may observe an increasingly high number of hand-off queue drops and packet drops leading to performance degradation on a VMware SD-WAN Edge.
On the Edge service there is a thread that monitors asynchronous flows that can become 100% utilized and this would cause the hand-off queue drops and resulting performance degradation.
- Fixed Issue 81221: If a customer configures a 1:1 NAT rule for a VMware SD-WAN Edge and that Edge is rebooted, the rule no longer works.
After the reboot, the Edge assigns the NAT address as the Edge interface address where the NAT rule is being applied and thus no tunnels are being built for traffic matching that rule.
Without the fix, the only remediation is to run the Remote Diagnostic "Flush NAT", which flushes the entire NAT table and reestablishes correct NAT rule operation.
___________________________________________________________________
Resolved in Edge Version R422-20220210-GAEdge version R422-20220210-GA was released on 02-18-2022 and addresses the below critical issue since Edge version R422-20220119-GA.
- Fixed Issue 48017: OSPF and BGP routes may take a longer than expected time to converge on the Overlay Flow Control (OFC).
Under high load, a situation may arise where some or all the routes learned on a VMware SD-WAN Edge may not show up on the OFC or get the necessary advertise and preference value assigned (with Dynamic Cost Calculation (DCC) deactivated). This can lead to the Edge constantly retrying the syncing of such routes to the VMware SD-WAN Orchestrator that will further increase the load on the Orchestrator.
- Fixed Issue 55327: The SSH connection from a VMware SD-WAN Gateway to a VMware SD-WAN Edge may not work if the tunnel from the Edge to the Gateway continuously flaps.
If the tunnel from Edge to Gateway flaps continuously, the route entry installed in the Edge for allowing the SSH connection from the Gateway may get deleted and cause the SSH connection to fail.
- Fixed Issue 57281: A VMware SD-WAN Edge may get into a state where the Edge triggers an exception and reboots.
This issue can be encountered on a customer enterprise with a Hub/Spoke topology where multiple Hubs are used. The exception is triggered due to an invalid memory access on the destination routes on flow control and was caused by lack of a proper sanity check for such a situation.
- Fixed Issue 66691: On a VMware SD-WAN Edge model 6x0 (610/620/640/680), auto-negotiation status is not shown correctly.
Auto-negotiation is not supported on SFP1 and SFP2 as a result of the Intel x553 NIC used by all Edge 6x0 models. However, auto-negotiation is supported on GE1-GE6 (copper ports). But the Edge's ethtool communicates that auto-negotiation is always on for all ports due to a defect in the ixgbe driver.
- Fixed Issue 72859: A VMware SD-WAN Edge may experience a Dataplane Service Failure and restart as a result, causing customer traffic disruption of 10-15 seconds or a High-Availability failover depending on the customer topology.
This issue is caused when the Edge receives packet fragments where the protocol is undefined or something unexpected. When that occurs the Edge drops the packet, but prior to dropping the packet, the Edge performs a lookup and the Edge service always expects this lookup to succeed, even though it may actually fail. And when this lookup fails it is triggering an issue in the Edge Dataplane Service and causing it to fail and restart. The issue is fixed by including a NULL check which prevents the Dataplane Service from failing on a failed packet lookup.
- Fixed Issue 72925: For customers who do SNMP polling for monitoring their enterprise and who deploy lower model VMware SD-WAN Edges (for example, Edge models 510, 520, or 610) which are also running a 4.x software release, SNMP polling takes exceptionally long to process and can even timeout.
This issue significantly reduces the effectiveness of SNMP polling for network monitoring when using Edges in the 510, 5x0, and 6x0 series. This issue is caused by the Release 4.x SNMPagent taking an unnecessarily long amount of time in traversing the debug command list, which is not actually required for the SNMP process.
- Fixed Issue 78003: For a customer using a Hub/Spoke topology, static tunnels from the VMware SD-WAN Spoke Edge to a Hub Edge might not form.
Typically if there are lots of Dynamic Edge-to-Edge tunnels already established on the Spoke Edge, the maximum tunnel number check is hit on the Spoke for the static tunnel. This check prevents static tunnel formation from the Spoke to the Hub.
- Fixed Issue 78300: If a VMware SD-WAN Edge is using a WAN link configured to be a backup, a user may observe logs or Orchestrator Events which suggest that tunnels are coming up or going down for this link.
By design, tunnels do not get established for backup links. But any tunnel request from a remote end (typically a dynamic Edge-to-Edge tunnel) might change the link status as it goes through the stack. In this fix, care have been taken so that no logs indicate that any tunnel formation or tear down is going on for the back up link.
- Fixed Issue 78678: On a site deployed with a High Availability topology, the VMware SD-WAN Edge performing the Standby role may get rebooted while processing synchronization messages from the Active Edge.
When the Standby Edge is handling a high number of flow synchronization messages, the SD-WAN service may detect a buffer overflow condition and trigger a reboot of the Standby Edge.
- Fixed Issue 81224: On a site deployed with a High Availability topology, when the site experiences an HA failover, the OSPF route tags may not propagate post-HA Failover.
On an HA failover, OSPF external LSA's (link state advertisements) do not have a route tag, which leads to improper routing.
___________________________________________________________________
Resolved in Edge Version R422-20220119-GA
Edge version R422-20220119-GA was released on 01-21-2022 and addresses the below critical issue since Edge version R422-20211216-GA.
- Fixed Issue 58791: A site deployed with a High-Availability topology where BGP is used may encounter an issue where the VMware SD-WAN Edge repeatedly fail over.
This issue affects HA sites configured within a Hub/Spoke topology where the HA site has greater than 512 BGPv4 filter prefixes configured.
When BGP is used with multiple network commands configured and while the Standby Edge is coming up it parses the all configurations symmetrically and for every network command vtysh is spawned and as a result this is causing the verp thread to not run. The verp thread being delayed results in a delay in heartbeat processing which causes the Standby Edge to believe the Active Edge is down and the Standby Edge then becomes active which leads to a split-brain state (active-active). To recover from the split-brain state, the Standby Edge restarts which merely repeats the cycle.
Without the fix the workaround is to reduce the number of BGP filter prefix configurations by aggregating them and getting the total number below 512 (256 Inbound, and 256 Outbound filters).
Note: A previous version of this ticket description stated this was also a fix for HA sites with BGP match and set operations. That part of the issue is not fixed with this ticket and is tracked with Issue #84825.
- Fixed Issue 68785: DHCP INFORM packets are dropped by the VMware SD-WAN Edge software when received on an interface configured as a DHCP relay.
DHCP clients can request additional network information like DNS server or the Gateway address using the DHCP INFORM message once it has acquired an IP address. When the Edge is configured as a relay agent, these INFORM messages should be forwarded to the DHCP server but are getting dropped.
- Fixed Issue 70933: After a configuration profile migration, a VMware SD-WAN Edge with High Availability enabled may experience multiple restarts.
During a configuration profile migration, only the device settings configuration is synchronized immediately with the Standby Edge. The remaining configurations are synchronized only in response to a heartbeat from the Standby Edge. When an Active Edge restarts to apply the latest configuration prior to receiving the heartbeat from the Standby Edge, the result will be a configuration mismatch between the Active and Standby Edge and this will cause multiple Edge restarts to synchronize the configurations of both HA Edges.
- Fixed Issue 72498: A customer may observe a VMware SD-WAN Edge consuming an increasingly large percentage of its memory and in models with smaller amounts of memory available (for example, Edge 510, 520, 610s) the Edge may initiate a service restart to clear the memory, which will cause a 5-10 second disruption of customer traffic.
This issue is caused by a memory leak. In a network deployment where Dynamic Edge to Edge is enabled and the Edge is dynamically building and tearing down a high number of tunnels with other Edges in the network, the Edge is not cleaning up the old IKE's from torn down tunnels and this slowly consumes memory over time with the potential in smaller memory Edges to reach a critical level, causing an Edge service restart to clear the memory.
Without the fix, a user could preemptively restart the Edge service in a maintenance window to clear the memory. But the Edge memory would just begin to slowly leak again.
- Fixed Issue 75992: A customer enterprise with multiple VMware SD-WAN Edges where some Edges are running a 3.4.x Edge version and other Edges are running a 4.3.x Edge version, the customer may observe service and traffic disruptions.
In some customer deployments using a VMware SD-WAN Gateway running Gateway version 4.3.0 and Edges running a mix of 3.4.x and 4.3.x versions, The Edge running 3.4.x have been seen with some invalid routes which have a non-zero network address but a zero mask. When such route gets installed into the FIB, this causes the routing issue.
- Fixed Issue 77040: When a customer deploys a Non SD-WAN Destination (NSD) of either type (via Gateway or via Edge), if the SA (Security Association) fails, there may be a memory leak that is specific to the type of NSD. So for a NSD via Gateway, the memory leak would be observed on the VMware SD-WAN Gateway, for the NSD via Edge, the memory leak would be observed on the VMware SD-WAN Edge and in either type could trigger a service restart with a disruption in customer traffic.
In either instance if the memory builds up sufficiently, the Gateway or Edge will trigger a defensive service restart to prevent a total memory exhaustion and clear up the built up memory. The memory leak is because the respective Edge/Gateway service is not automatically cleaning up the ik_ds structure which increments memory every 20 seconds and eventually causes the device to run out of memory.
- Fixed Issue 77525: For a site using a High-Availability topology, when the VMware SD-WAN HA Edges are upgraded to a new software image, the Standby may fail to upgrade and the VMware SD-WAN Orchestrator UI incorrectly lists the Standby Edge's status as 'Active' even though it is not.
When the Active Edge detects the Standby Edge it tries to fetch the Standby Edge's software version and if the version is greater than 3.4.x then the Active Edge copies the network configuration file to the Standby Edge. While fetching the Standby Edge software version, there may be an exception which is not handled by the Edge's HA code and this leads to an HA worker thread getting struck and further communication with the Standby Edge fails. At this point the management process between the Active and Standby Edge is broken and anything have to do with the management plane, including software management, Standby Edge status, and configuration changes, will not be synchronized between the Active and Standby Edge. This results in the Standby Edge being falsely detected as Active and which appears as an Active/Active "split brain" state on the Orchestrator but is not as the Standby Edge is still performing its proper role.
If there is an HA failover and the Standby Edge is promoted to Active, the Edge would be with a mismatched set of configurations and software. The Orchestrator would detect the configuration mismatch and push the updated configuration to this Edge while also completing the software upgrade the Standby Edge previously missed. And since an Edge software upgrade requires a reboot, the customer would observe another failover while the newly Active Edge was rebooted and then demoted back to Standby status.
This issue is not consistently encountered when an HA site upgrades the Edges' software. In addition, this issue can also happen when bringing up a new HA site, or when a standalone site is brought up to High-Availability, anytime the Standby Edge has to upgrade its software. But these secondary scenarios are more rare when compared with HA Edges undergo a software upgrade
Without the fix, a customer observing this issue would need to restart the Edge service or trigger an HA failover to clear the issue.
- Fixed Issue 77586: On a site deployed with a High Availability topology, using Release 4.2.2 GA, and using OSPF, customer traffic may experience disruption.
The HA IP is being given by the Edge's service to the VMware SD-WAN's routing protocols and the OSPF service uses that in the LSA (link state advertisement). In a field reported case, the HA IP is chosen as the OSPF router-id when the Edge is running 4.2.1, and after upgrading the HA Edges to 4.2.2, the router-id is changed to another interface IP. However, the HA IP LSA persists and was being advertised and this disrupts the SPF (shortest path first) calculation and this lead can lead to an outage in the network.
Note: This issue can be seen in any platform and any release without the fix and is not limited to the field case previously noted.
___________________________________________________________________
Resolved in Edge Version R422-20211216-GA
Edge version R422-20211216-GA was released on 12-20-2021 and addresses the below critical issue since Edge version R422-20210923-GA.
- Fixed Issue 53951: A VMware SD-WAN Edge may experience either a failure of traffic sent direct to the internet or a loss of connectivity to the VMware SD-WAN Orchestrator and the Edge is marked as down.
This issue can affect an Edge in one of two scenarios:
- For an Edge which uses public WAN links, when there is a flap (link goes down and then comes up) on a WAN link, the impact to the customer in this scenario is that traffic that is steered to the affected link and is also classified as Direct is dropped. This issue is especially impactful for a site where Business Policy rules are configured to force certain traffic to use one WAN link only while also being sent Direct.
- When enabling HA on an Edge using PPPoE WAN links, there is a change in the PPPoE interface IP and the old self route is deleted but with the new PPPoE IP address the new self route is not getting added. As a result the communication between the Orchestrator and the Edge no longer works.
Without the fix, the way to temporarily correct the issue is to either restart the Edge service to ensure Direct traffic is sent on the affected public WAN link, or reboot the Edge (where PPPoE links are used) which recovers the route to the Orchestrator.
- Fixed Issue 70919: On a VMware SD-WAN Edge using Release 4.2.1 GA that is also Wi-Fi capable, users may not be able to connect to the Edge's Wi-Fi and no SSID is broadcast.
When the Edge is not broadcasting Wi-Fi, the wlan0 interface (WLAN1 on the Orchestrator UI) is not detected when checking logs. This is the result of an exception in the W-Fi card's firmware that occurs under heavy traffic, this exception causes the Wi-Fi to fail.
In the kernel log a user would observe messages similar to the following:
2021-08-26T01:05:21.397 WARNIN kern kernel:[244841.443763] ath10k_pci 0000:01:00.0: SWBA overrun on vdev 1, skipped old beacon 2021-08-26T01:05:21.539 INFO kern kernel:[244841.586338] ieee80211 phy0: Hardware restart was requested 2021-08-26T01:05:24.207 WARNIN kern kernel:[244844.254081] ath10k_warn: 17 callbacks suppressed 2021-08-26T01:05:24.207 WARNIN kern kernel:[244844.254091] ath10k_pci 0000:01:00.0: failed to receive control response completion, polling.. 2021-08-26T01:05:24.223 ERR kern kernel:[244844.270245] ath10k_pci 0000:01:00.0: firmware crashed! (guid n/a)
The remediation for this issue is a script that detects a firmware failure and reloads the Wi-Fi kernel modules. As part of the module reload, the script also performs a PCIe reset of the Wi-Fi device which restarts the firmware and enables the device for use again.The detection of a failure and the subsequent recovery may take from 30-40 seconds and during this time, Wi-Fi will not be available. This should be understood as a defensive fix versus a complete fix that prevents the issue from occurring in the first place.
Without the script, the user must reboot or power cycle the Edge to restore Wi-Fi.
- Fixed Issue 70954: VMware SD-WAN Edge may experience multiple Dataplane Service Failures if they the Edge has a Business Policy configured with a mandatory link to a Zscaler Cloud Security Service (CSS) and the interface for that mandatory link fails.
The Edge should be dropping the traffic to Zscaler when the mandatory interface drops, versus suffering a service failure.
- Fixed Issue 72245: If a VMware SD-WAN Hub Edge is used as an internet breakout from an MPLS network, management (VCMP) tunnels from a connected Spoke Edge's private interface to any public Gateways may go down or not come up.
Management (VCMP) packets from a Spoke Edge's private interface to the public Gateway are sent via the Hub Edge. In this scenario, the Hub considers this flow as a direct flow and pushes these packets into the internet via a public interface. However, due to routing issues, these flows can be marked as "Via Gateway" and this can impact these flows and cause the issue described above.
- Fixed Issue 72425: OSPF routes are not advertised to the remote sites, even though a user can see the route received by the local VMware SD-WAN Edge.
The issue is the result of a race condition in the Edge while processing OSPF NSM (Neighbor State Machine) events and route additions. In the problem state, while adding the OSPF route to the RIB (Routing Information Base), the Edge does possess the OSPF neighbor information because the Edge has not processed the OSPF NSM state event. Due to this, the Edge adds the OSPF route with neighbor IP as 0. If the neighbor IP is 0 the Edge does not synchronize the route to the Orchestrator or advertise it to the Gateway and this is why the route is not seen in either Overlay Flow Control (OFC) or on the Gateway.
- Fixed Issue 72688: VMware SD-WAN Edge randomly restarts its Dataplane Service with resulting service interruptions due to the restarts.
Packets pinned to a decryption thread, when they float to another decryption thread are rejected by the non-owning thread. In the process of rejecting packets, the associated QAT crypto reference was incorrectly released leading to exceptions in the Dataplane Service and a failure and restart.
- Fixed Issue 73251: Users who need to authenticate via RADIUS may find they are unable to authenticate because fragmented traffic is failing to be sent from the VMware SD-WAN Edge.
RADIUS traffic is always fragmented and this issue impacts users trying to authenticate on a wireless link even more so. When this issue occurs, the fragmented packet count gets beyond what DPDK can handle on the affected particular interface. The fix proactively resets the fragmentation count to avoid traffic disruption.
Edge/Gateway Resolved Issues
Resolved in Version R422-20210923-GA
The below issues have been resolved since Edge version R421-20210624-GA-57011-60130 and Gateway version R421-20210407-GA.
- Fixed Issue 34037: A VMware SD-WAN Gateway may encounter a Dataplane Service failure after a peer tunnel drops.
When a peer tunnel goes dead, the cleanup process takes the fc->vpi and assigns it to NULL. There seems to be a few packets in the pipeline which still had a reference to this fc. As part of processing those packets fc->vpi is accessed which is NULL and hence the Gateway process hits an exception and restarts.
- Fixed Issue 34626: Invalid control packets from a VMware Edge to a VMware SD-WAN Gateway may cause the Gateway to suffer a Dataplane Service failure and restart.
For the flows from Edge to Gateway, the Edge sends a control message to the Gateway to synchronize the business policy actions for each flow. If the control message has an invalid action, it can cause the Gateway to restart when trying to route the data packets of the flow with an invalid action set.
- Fixed Issue 40268: Where a user changes the configuration of a VMware SD-WAN Hub or an Edge-to-Edge via Hub configuration, the Spoke Edge installs routes that are marked as 'False'.
The Spoke Edge installs routes in the FIB which are marked as False (as there is no tunnel from the Hub for those routes) and these routes stay in the FIB for ~2 minutes before being cleared out. In that time, these False routes may cause disruption to some networks.
- Fixed Issue 41837: The NAT IP address of the source and destination gets printed in a VMware SD-WAN Edge's log instead of the original IP address.
The Edge firewall logs should display the original source & destination IP address but instead the NAT'd IP gets printed, greatly undermining the usefulness of the firewall logs.
- Fixed Issue 43278: If a user configures an outbound BGP filter to match the default route or prefix, and then sets an AS Path prepend, the default route or prefix is advertised to the BGP neighbor, but no AS Path prepend occurs.
A BGP outbound neighbor filter set to match a prefix and set AS Path prepend does not work on any prefixes originated by using the BGP Advanced configuration "Networks" statement. This also does not work for a default route originated to a neighbor via the neighbor "DR Originate", via the BGP Advanced "Conditional" Default Route Advertise check box.
Also when using a static route configuration configuration on the VMware SD-WAN Edge, neither a default route or a non-DR static route set to "Advertise" would be advertised to a BGP neighbor with the AS Path prepended; the only AS in the prefix is the Edge's own AS.
- Fixed Issue 44379: New flows may fail to create on a VMware SD-WAN Gateway.
The issue occurs as a result of the Gateway not starting its flow clean-up event at bootup, which causes flows via the Gateway to eventually fail.
- Fixed Issue 44526: For an enterprise where two different sites deploy their VMware SD-WAN Edges as Hubs while also using a high-availability topology, and each site uses the other Hub site as a Hub in its profile. If one of the Hub sites triggers an HA failover, it may take up to 30 minutes for both Hub Edges to reestablish tunnels with each other.
On an HA failover, both Hub Edges try to initiate a tunnel with each other at the same time and neither replies to the peer, the packet exchange between both Hubs occurs, but IKE never succeeds. This leads to a deadlock that has been observed to take up to 30 minutes to resolve on its own. The issue is intermittent and does not occur after every HA failover.
Without this fix the only way to prevent this issue from occurring is to use a workaround where the customer configures only one of the two HA Hub sites to use the other Hub site as a Hub for itself. For example, where there are two HA Hub sites, Hub1 and Hub2, Hub1 could have Hub2 as a Hub for itself in its profile, but Hub2 must not use Hub1 as a Hub in its profile.
- Fixed Issue 46489: If different Partner Gateway enabled profiles are assigned to multiple VMware SD-WAN Edges, the Edges will retain stale routing entries for the VMware SD-WAN Partner Gateways not assigned in their profile.
If different Partner Gateway enabled profiles are assigned to multiple Edges, the Edge keeps the routing entries which are learned from other Gateways, and those routes are considered stale entries. The customer impact is traffic not being routed correctly because the Edge is trying to send traffic to invalid routes for that profile.
- Fixed Issue 47244: On an activated VMware SD-WAN Edge 6x0 with DPDK enabled, some Copper SFPs, the Edge will show the link as 'UP' even when no cable is inserted on the VMware SD-WAN Orchestrator UI.
This issue results in user confusion as to the actual status of a particular SFP port for an Edge in the 6x0 model line.
Prior to this fix, the only way to resolve the issue was plugging in a random cable, and then unplugging that cable from the affected SFP port.
- Fixed Issue 48612: VMware SD-WAN Virtual Gateways and Edges using a X710/XL710 network adapter strip all Rx packet VLAN tags.
This issue would have a major impact for customers using a Gateway as configured which also has DPDK enabled as they would not be able to process VLAN tagged traffic on its handoff interface. This issue was traced to the i40evf DPDK driver which sent opcode VIRTCHNL_OP_ADD_VLAN to the Physical Function (PF) to add VLAN filtered tags, however, the PF driver enabled VLAN tag stripping as part of the command to enable VLAN tag filtering and consequently all VLAN tags were stripped.
- Fixed Issue 48958: A VMware SD-WAN Gateway may lose connectivity on a bonded interface.
When the VeloCloud Management Protocol (VCMP) and the WAN port are set to use the same port, the Partner Gateway VLAN handoff configuration may cause the Gateway to go offline due to ARP resolutions failing. With this fix, when VCMP and the WAN port are the same, the VLAN handoff configuration from the VMware SD-WAN Orchestrator will be rejected in the Gateway. Without the fix, the workaround is to not assign the same port for VCMP and WAN.
- Fixed Issue 50223: A VMware SD-WAN Edge using Release 3.4.x or higher will not send information for physical LAN interfaces via SNMP.
In releases prior to 3.4.x when VMware SD-WAN used net-snmp, LAN interfaces were sent via SNMP. In Release 3.4.x, we added our own snmpagent, which fetches the data from the command debug.py --interfaces and that output does not have information about LAN interfaces. The fix adds LAN interfaces to that command so that the snmpagent can send the data via SNMP.
- Fixed Issue 50422: Peer MAC address may be learned incorrectly via ARP when using VLAN tagged routed interfaces.
If you have a VLAN tag assigned to a routed interface and the next hop sends untagged ARP requests, it will cause the untagged MAC to be learned and can cause traffic to black hole depending on which entry is learned first.
Without this fix, the only workaround is to filter out untagged ARP requests from the next hop if you have a VLAN tag on the routed interface.
- Fixed Issue 52127: On a VMware SD-WAN Edge with BGP or OSPF enabled, the Edge may encounter a Dataplane Service failure following a timeout waiting for a blocking socket to return from Tx.
This is typically observed with the following signature:
#0 0x00007f5c09f34754 in send () from deps/lib/libpthread.so.0
#1 0x000000000092cc7a in vc_zeb_send_to_client (buf=0x7f5aa7fed580 "", length=<optimized out>, client=0x7f5ad402ec90) at /mnt/build/workspace/master-nightly-build/common/libs/drp/vc_zebra.c:188The Edge routing and dataplane processes communicate with a TCP socket on the localhost. Depending on the runtime of some threads, it is possible for the task queuing system (ksoftirqd) on the local core to receive minimal runtime to deliver packets to the opened socket to the routing process, leading to a blocked Tx call for the OSPF and/or BGP thread.
Thread priority of the OSPF and BGP threads are now reclassified for all cores which leverage the use of the kernel scheduler to preempt rather than voluntarily yielding the core and allows for more runtime and cooperative pre-emption with ksoftirqd.
Note: This same issue is also tracked in duplicate ticket 39232, which is omitted from these Notes.
- Fixed Issue 53283: The link mode specified via a business policy is not honored when 1:1 NAT is also configured.
For outbound traffic, while finding the matching 1:1 NAT rule (if configured), the link mode configured via the business policy is ignored and the first matching 1:1 NAT rule is selected always. Ex: business policy asks to select a link as ‘mandatory’, but 1:1 NAT results in the selection of another link which is specified in the first matching 1:1 NAT rule. Without the fix, the only way to address the issue is to execute a Remote Diagnostic > Flush Flows for the affected VMware SD-WAN Edge.
- Fixed Issue 53750: A VMware SD-WAN Gateway may suffer a Dataplane Service failure and restart that service as a result.
If a VMware SD-WAN Edge detects loss for the traffic received from the Gateway, it sends a negative-acknowledgement (NACK) message to the Gateway to retransmit the lost packets. The Gateway checks the retransmission slots to retransmit the packets. Ideally the Gateway should stop retransmission once all the slots are checked, but the Gateway checks the retransmission slots repeatedly until it reaches the sequence number in the NACK message, and this can cause the monitor thread in the Gateway to detect this as a hung thread and restart the Gateway.
- Fixed Issue 54001: A VMware Edge is unable to send traffic after a Tx queue hang on SFP interfaces.
In rare cases, when the Edge sends an invalid sized packet (less than 17 bytes or greater than 1526 bytes) to DPDK, the transmit queue becomes stalled and causes any further traffic to not be forwarded by the Edge. Rebooting the Edge temporarily corrects the issue, but the problem can happen again when an invalid sized packet is sent from the Edge service to DPDK. Only upgrading to a level with the fix avoids this problem.
- Fixed Issue 54136: On a site using a High Availability topology, a VMware SD-WAN Active Edge may initiate a Dataplane Service restart, resulting in an HA failover.
The Active Edge consumes a high amount of memory when there are a high number of flows (1.9 million flows per second). When memory consumption reaches a critical level, the Edge will restart to clear the memory and cause a failover.
- Fixed Issue 56218: For a customer site deployed with a High-Availability topology or where HA has just been enabled, when the Edges are upgrade from 3.2.x to 3.4.x, the Standby Edge may go down.
When HA is enabled or the HA Edges are upgraded from 3.2.2 to 3.4.x after a WAN setting is configured using the Local UI, the HA interface (e.g. LAN1 or GE1 depending on the Edge model) will be removed from the Standby Edge and HA status will be set to HA_FAILED on the VMware SD-WAN Orchestrator.
- Fixed Issue 56346: A customer may observe Handoff Queue Drops when looking at a VMware SD-WAN Edge's Monitor > System page.
A VCRP (VeloCloud Route Protocol) route event updates leads to handoff queue drops in the VCMP (VeloCloud Management Plane) data thread. This is because when a route update is received, all the routes in the respective segment are invalidated. This leads to new route lookups in the data path. A particular function that is called as part of the route lookup does a costly hash enumerate operation leading to 40% increased VCMP data thread utilization. For the instance when this issue was found in the field, the quantity of handoff queue drops was not sufficient to impact network performance.
- Fixed Issue 56379: A VMware SD-WAN Gateway may exhaust its memory and suffer a Dataplane Service Failure and restart.
When a VMware SD-WAN Hub Edge is scaled for a high number of Spoke Edge (for example, ~4000), and when BGP flaps for this scaled Hub Edge's routes, the Hub Edge's Primary Gateway sends update messages to all of the Spoke Edges in a single instance. However, if if there are multiple BGP flaps in rapid succession, this will build up multiple update messages before they are sent out. This accumulation of update messages can cause the memory to be exhausted and trigger a Gateway the restart.
- Fixed Issue 56483: Packet loss, jitter, and latency values not showing in WAN link live monitoring on a VMware SD-WAN Orchestrator under the Monitor > Transport screen.
A user is unable to get real time data for packet loss, jitter, or latency for a particular WAN link under Monitor > Transport, with the graph showing as a flat line. In addition, when looking at the Monitor > Edge > Overview screen, all values for loss, jitter, and latency are expressed as '0'. Historical statistics will show correctly in Monitor > Transport, this issue only affects "Live Mode" statistics.
- Fixed Issue 56645: There are frequent WAN link flaps when a VMware SD-WAN Edge 610 is connected to certain Meraki Access Points.
When an Edge 610 is connected to a Meraki M36 access point (or similar models), the Ethernet link encounters frequent link drops. This is the result of a driver issue on the Edge 610.
- Fixed Issue 56876: A VMware SD-WAN Edges may encounter an issue related to memory management and trigger a kernel panic, which will result in an Edge reboot.
This resolved issue includes fixes for two different scenarios involving memory management on an Edge which triggers a kernel panic:
In the first scenario, where an Edge is using Dynamic Branch-to-Branch, the dynamic tunnels are created, and a small amount of memory is reserved for storing per-peer counters. When the dynamic tunnel is torn down, this memory is not cleaned up so as to optimize the bring up time the next time this same peer connects. On a small Edge (e.g., Edge 500, 510, 520, 610) which connects to a large number of different destinations over time, this can eventually exhaust available memory and trigger a kernel panic and an Edge reboot. Without this fix, a user needs to proactively restart the Edge's service if memory usage is greater than 90% of health statistics when looking at an Edge's Monitor > System screen on the VMware SD-WAN Orchestrator.
In the process of fixing the memory leak caused by Dynamic Branch-to-Branch, it was noted that malloc_trim (a process that clears up fragmented memory) was not being properly invoked and this process was modified as well for this fix. Not invoking malloc_trim properly can cause a different issue and can affect any Edge (not just smaller Edges) and does not require the Edge to be using either Dynamic Branch-to-Branch nor does Monitor > System show a memory usage exceeding 90%. This scenario is much more likely to occur if the Edge has a high number of flows.
- Fixed Issue 57011: For a site configured with a High-Availability topology, whenever segments are added and then deleted on that site, one of the HA Edges may experience a Dataplane Service failure and if the service failure is on the Active Edge, the site would also experience an HA failover.
When segments are added and then deleted from an HA site, there is the potential for stale segments (in other words, the deleted segments might still show up on one of the Edges in the HA pair). Due to this mismatch in segment information between the HA Edges, any event meant for the stale segment might be sent to the other Edge resulting in a Dataplane Service failure, an HA failover if the service failure is on the Active Edge, and the generation of a core dump that will be found on a diagnostic bundle taken after the failover.
- Fixed Issue 57859: A VMware SD-WAN Edge which has just been activated is unable to communicate with its VMware SD-WAN Bastian Orchestrator and is thus marked as offline by the Orchestrator.
Issue is seen when the WAN link selected to send traffic to the Bastian Orchestrator does not have any IP address assigned for sending direct traffic.
- Fixed Issue 58075: On a VMware SD-WAN Edge where High Availability has been enabled. an SNMP walk/query will get timed out.
SNMP query output will return only partial results and will ultimately get a timeout on a HA enabled Edge.
- Fixed Issue 58259: In some cases a customer may observe a VMware Non SD-WAN Destination tunnel down on the Gateway side with a Zscaler peer.
There are some cases when the Zscaler peer end deletes Phase 2 security association (SA) but the VMware SD-WAN Gateway still retains the SA. In these cases the tunnel will be torn down, and the customer will not be able to pass traffic.
Without the fix, the workaround is a phase2_sa_check.py script which walks over the Phase 2 SA table and checks if there is Phase 2 SA for which Phase1 SA is missing. If it finds one then the Gateway reestablishes the tunnel.
- Fixed Issue 58527: When running the Remote Diagnostic "List Active Flows", business policy name output is limited to 24 characters versus the expected 32 and the business policy name is trimmed to 24 characters from the actual 32 characters.
In .edge.info, the configured biz_policy name is listed properly (even if it occupies the full length in biz_policy_name field). But while displaying the biz_policy_name in user_flow_dump/ flow_dump output, we are using only 24 chars to store the policy name. So, actual configured biz_policy is not completely displayed.
- Fixed Issue 58535: When a customer has configured a Stateful Firewall, and under Network & Flood Protection has also configured a Denylist, the Denylist automatically sets itself to the most aggressive settings for new connections and the Stateful Firewall blocks any new connection.
The issue has a critical impact for customers using a Stateful Firewall as it renders the Denylist feature unusable. Once the Denylist feature is enabled the Firewall Events are filled with the logs: "FLOOD_ATTACK_DETECTED" and "Blacklisting source: xxx.xxx.x.x exceeded CPS limit : 0 per source". Where the IP address is the Edge's management IP address, and CPS = Connections Per Second. The New Connection Threshold limit is being set to 0% which effectively means any connection attempts will trigger the Denylist to block all connections. The default value of New Connection Threshold is 25%.
- Fixed Issue 58567: On a site configured with a High Availability topology where VNF's are also configured, there may be frequent HA failovers due to a VNF being down.
When a Check Point VNF is deployed with HA, the VMware SD-WAN Edge monitors the VNF state using SNMP queries. If the VNF state is marked down for 3 consecutive tries, the Edge determines the VNF to be down and initiates an HA switchover. The problem is that because a Check Point VNF can take greater than 1 second to respond to a SNMP queries intermittently, the Edge can come to an erroneous conclusion regarding the Check Point VNF's status and mark it down when it is up and make this mistake several times causing frequent HA failovers.
Without the fix, the only way to address this issue is to increase the number of SNMP retries to a higher value than 3 before determining that VNF is down. This can be configured in /opt/vc/etc/vnf/default.json by modifying the field "snmp_retries" and powering the VNF off and then on.
- Fixed Issue 58678: If Dynamic Edge-to-Edge is enabled and an invalid Dynamic Edge-to-Edge control message is received on a VMware SD-WAN Edge, the Edge can experience a Dataplane Service failure.
To create a tunnel to the peer Edge, the Edge requests Dynamic Edge-to-Edge information from the VMware SD-WAN Gateway. If the reply message from the Gateway is corrupt, it can cause the Edge to restart as proper validation is missing for some of the fields.
- Fixed Issue 58688: When using VMware Edge Network Intelligence an incorrect random IP address is associated with the client MAC address in the Edge Network Intelligence data.
The VMware SD-WAN Edge wrongly sends a WAN-side public IP address instead of the LAN-side one. Because of this the Edge Network Intelligence data shows mismatching IP and MAC association.
- Fixed Issue 58830: VMware SD-WAN Edge is dropping traffic from a routed client to a VCMP server with catch-all NAT subnet in Partner Gateway.
Ping from an Edge routed client to VCMP traffic fails. Ping fails in the listed scenario where a default static route advertised from a Partner Gateway to an Edge and the Edge itself has the local default static route configured pointing to an underlay L3 switch next hop for routed client reachability. Here the Edge drops ICMP reply packet from VCMP with error as " rfc1918 cloud route match".
- Fixed Issue 59008: The link internalID for multiple USB links may be the same across several VMware SD-WAN Edges, causing incorrect USB link statistics on the VMware SD-WAN Orchestrator.
The USB links in different Edges can have the same internal ID assigned. As a result, the monitoring across different Edges for a customer is impacted, as some data will be missed.
- Fixed Issue 59236: For sites using an Enhanced High-Availability topology, tunnels are not formed if the WAN link connected to the Standby Edge is a Metanoia SFP and this behavior persists even after an HA failover.
For Enhanced HA, the WAN ports are blocked on the Standby Edge (in other words, the Edge does not allow TX on its WAN interfaces). In order to bring up a Metanoia SFP interface, there is a packet exchange needed between the hardware. As the Edge does not allow TX, the interface initialization does not succeed.
- Fixed Issue 59527: On a site configure with a High Availability topology and where VNFs are deployed in HA as well, a customer may experience repetitive outages due to HA failover happening repeatedly.
When a VNF HA is up and all the LAN-side links go down for both Edges, this issue is triggered and will continue until LAN connectivity is restored in at least one of the Edges in the HA pair.
- Fixed Issue 59629: On a customer site deployed with a High-Availability topology, the user may observe the VMware SD-WAN Standby Edge restarting multiple times.
Both the Active and Standby Edge miss their HA heartbeat and both Edges become Active/Active (also known as "Split Brain"). To break the tie, the newly promoted Active Edge (the previous Standby Edge) will undergo a restart with a logging event: "Active/Active Panic". The fix for this issue involves promoting the priority of the HA Edge heartbeat thread so as to minimize the delay in processing the heartbeats which can be viewed as missed heartbeats causing the Active/Active state.
- Fixed Issue 60006: When HA is enabled on hardware-based VMware SD-WAN Edges like the 620 and 640, the Standby Edge may reboot.
When HA is enabled on a 620 or 640 (these are the models on which this issue has been observed), the Standby Edge may detect an Active/Active panic and the Standby Edge would reboot to correct the Active/Active state. This issue is caused by the following: during Edge initialization there is a chance of a race condition between the HA interface initialization and the HA State-machine initialization. In other words, the HA state machine starts much earlier than the HA interface driver initialization completes and as a result the HA state machine detects no heartbeat from the peer Edge and moves to an Active state. This issue happens infrequently and should it happen for a particular site, it is unlikely it would happen twice in the same session. In other words the site is not expected to get into some endless cycle of Standby Edge reboots.
- Fixed Issue 60010: For a site using VMware SD-WAN Edges with VNF deployed in a High-Availability topology, the VNF on the Standby Edge is not accessible via SSH after a LAN-side port flap.
The LAN side interface on the Standby VNF is in normally in a deactivated state. Due to the LAN-side port flap, it moves to a forwarding state which results in a wrong MAC address port mapping on the bridge interface which results in inaccessibility of the VNF.
- Fixed Issue 60073: DNS packets via a VMware SD-WAN Edge's PPPoE interface are not processed.
The DNS packets if traversed via Edge's PPPoE interface are not processed and dropped. Due to this the DNS over PPPoE functionality is impacted and customer would observer, for example, issues such as CSS tunnels not coming after an upgrade to Release 4.2.0 or later.
- Fixed Issue 60130: A site may experience intermittent periods of high packet loss and connectivity issues.
This is caused by the API that checks for ARP resolution telling the Edge there is a successful ARP resolution for a device while delivering a MAC address of 00:00:00:00. This address is kept in the ARP cache and any packets intended for the device where the MAC is listed as zero are dropped. In this issue, many such instances of successful ARP's with zero MAC addresses are delivered causing high packet loss and connectivity issues.
This fix corrects issues with the cached value of MAC addresses in a flow (the most common cause for the problem), however this fix does not address a rarer scenario where the ARP caches itself and then returns a zero MAC. That will be addressed in 62552. Other than having an Edge image with the fix, there is no workaround for this issue.
- Fixed Issue 60184: A Branch VMware SD-WAN Edge installs routes marked with uplink community from a non-profile Hub Edge (Dynamic Branch-to-Branch) and prefers these routes before everything else.
The non-profile Hub Edge is treated as a Branch Edge when Dynamic Branch-to-Branch is used. So, when there is a dynamic tunnel bring-up, the issue occurs as described. The only workaround is to add Hubs to all profiles but this cannot scale on larger networks where there are 20+ Hub Edges due to the enormous number of routes that would be created.
- Fixed Issue 60225: When running the Remote Diagnostic "Interface Status" for a VMware SD-WAN Edge, the output on the VMware SD-WAN Orchestrator for SFP interfaces shows the incorrect speed and duplex information.
The data on the Orchestrator is incorrect for SFP interfaces. For example, showing 0 Mbps / half-duplex where if viewed directly on the Edge, the data shows full duplex at 1000 Mbps, or something similar.
- Fixed Issue 60367: Stateful Firewall rules do not drop the first packet in a flow going to a VMware SD-WAN Edge IP even with a VLAN-specific drop rule in place.
Sending a ping to and Edge's VLAN IP is successful even with VLAN-specific Stateful Firewall drop rules. With VLAN specific Stateful Firewall Drop rules, the behavior is not consistent between ping to a VLAN host and the VLAN IP of the Edge. Ping to a VLAN IP of the Edge is successful. The fix disallows ping to either the Edge VLAN-IP or VLAN host.
- Fixed Issue 60523: Ping fails to a routed-client IP address if a SLA probe is enabled.
ICMP response packet fails to process by the Edge Dataplane Service If a SLA probe is enabled for the routed client IP address. Without the fix the only way to resolve this was to deactivate the ICMP probe.
Without the fix, the only workaround is to deactivate the ICMP probe.
- Fixed Issue 60570: Path status may display the incorrect status when using the New UI on the VMWare SD-WAN Orchestrator,
This is a display issue only and does not affect actual customer traffic. And example of this issue is the New UI showing a path as dead when an Edge diagnostic bundle log would show the path as stable.
- Fixed Issue 61361: When applying a software update to upgrade a VMware SD-WAN Edge 3400, 3800 and 3810 to Edge Release 3.4.5, 4.0.2, or 4.2.1, there is a change the Edge models may not boot back up immediately after the update.
Release 3.4.5, 4.0.2, and 4.2.1 include a particular firmware update for the complex programmable logic device (CPLD), and the update triggers a reboot that can sometimes get "stuck", requiring a manual power cycle to restart the system.
Without the fix, a local user needs to manually power cycle the Edge to complete the update.
- Fixed Issue 61387: A VMware SD-WAN Gateway may suffer a Dataplane Service failure and restart when a user attempts to generate a diagnostic bundle for that Gateway.
This issue stems from a Gateway memory leak that can result in a high memory utilization on the Gateway. When the Gateway's memory usage is at a high enough level, generating a diagnostic bundle puts that memory usage into a critical state and triggers the Dataplane Service failure and restart.
- Fixed Issue 61433: In a cascaded Hub/Spoke topology where one VMware SD-WAN Edge is a Spoke to a Hub-cluster and also a Hub to another Spoke, the underlay routing changes on one of the Hub-cluster members might delete the routes from this Spoke/Hub Edge.
In case of cascaded Hub topology, the route removal on a Hub-cluster member triggered an unintentional delete message from the VMWare SD-WAN Gateway to the Spoke Edge also serving as a Hub Edge. That message was due to an update from the deep Spoke, that should have been ignored on the Gateway, but due to this issue, the Gateway sent a delete message to the Spoke/Hub Edge, resulting in the Edge losing those underlay routes (learned from the Hub cluster).
Without the fix in this build, there is no workaround for this issue involving a cascaded Hub topology. It is advised to avoid making an Edge a Spoke to a Hub-cluster and also a Hub to a deep Spoke, otherwise this issue may appear.
- Fixed Issue 61502: During activation of a VMware SD-WAN Edge, the download of the new software image to be applied is delayed indefinitely.
In an environment with unreliable network connectivity, or certain types of traffic throttling, the HTTPS download of the new software image can get stuck. Without this fix, should this scenario happen, please power cycle the Edge and wait for a couple of minutes. The download should restart automatically, though it will restart all the way from the beginning.
- Fixed Issue 61596: There is a performance degradation when Partner BGP is enabled with secure option and a static route is configured as unsecure, or vice versa.
The performance degradation is caused by a miscalculation of IP address maximum length when an unsecure static route picks up the secure flag from a BGP configuration. Initially the VMware SD-WAN Gateway does a route lookup and if it finds an insecure static route, the Gateway then checks if the BGP is enabled or not. If BGP is enabled, the Gateway checks to see the encryption set and then picks up the encryption set for BGP which is secure, and then fragmentation happens because the secure option is more conservative than insecure.
- Fixed Issue 61622: Google Drive traffic is misidentified as "Other TCP/UDP" or as "APP_UNKNOWN".
This traffic should be identified as "Google Documents (aka Google Drive)". The issue is caused by the Deep Packet Inspection (DPI) engine not having the most up-to-date subnets/ports for Google Drive.
- Fixed Issue 61725: For a site using a High-Availability topology where USB WAN links are used, running the Remote Diagnostic "HA Info" will result in errors.
When a USB/LTE modem is present or was previously present only on the VMware SD-WAN Active Edge and not on the Standby Edge, The Active Edge tries to fetch USB/LTE interface details on the Standby Edge and the result is the Edge throws an error since the USB/LTE interface is not present on the Standby Edge.
- Fixed Issue 61758: At a site using a VMware SD-WAN Edge 520 or 540, the customer would observe lower-than-expected throughput.
Customer can observe over a 100 Mbps drop in throughput from what they should expect at a site using either a 520 or 540. The cause of this issue are stale ring buffers held open by a child process for a former Edge Dataplane parent process which keeps a newly executed Edge process from opening ring buffers.
- Fixed Issue 62197: A VMware SD-WAN Gateway may restart its Dataplane Service.
The Gateway encounters a memory leak which occurs while syncing routes from itself to the VMware SD-WAN Orchestrator. When memory consumption reaches critical levels, the Gateway's dataplane service restarts to clear the memory, causing a brief disruption in customer traffic using the Gateway.
- Fixed Issue 62280: The VMware SD-WAN Edge's LAN subinteface is not showing in a traceroute from a routed host to a client connected through Edge-to-Edge.
When the traceroute is done from a host (not directly connected to the Edge), to a client in an Edge-to-Edge topology, the Edge's interface IP is not displayed in the o/p. This happens only when a VMware SD-WAN Gateway configuration is not done on the Edge interface on the path to the host.
Without the fix, the only workaround is to enable the Gateway configuration on the Edge interface connecting to that traceroute host. - Fixed Issue 62552: A site may experience intermittent periods of high packet loss and connectivity issues.
This is caused by the API that checks for ARP resolution telling the Edge there is a successful ARP resolution for a device while delivering a MAC address of 00:00:00:00. This address is kept in the ARP cache and any packets intended for the device where the MAC is listed as zero are dropped. In this issue, many such instances of successful ARP's with zero MAC addresses are delivered causing high packet loss and connectivity issues.
Note: Both issue 60130 and this issue have the same underlying behavior and cause but the expected fixes for each ticket differ. 60150 will have a defensive workaround fix while 62552 will have a complete fix that prevents any recurrence of this issue.
- Fixed Issue 62620: On a site deployed with a High Availability topology, direct traffic to some of the destinations might stop working after HA failover.
The flows from the Active VMware SD-WAN Edge are synced to the Standby Edge along with the port allocated for the NAT entry so that when there is a failover, there is no disruption to traffic after failover. The port allocated on the Standby Edge is never freed even after the flow expires. So when there is a failover, there is a possibility of NAT port exhaustion and NAT failure. As a result, packets can get dropped in the Edge.
- Fixed Issue 62685: If LAN side NAT is configured with the same outside IP for different LAN subnets with NAT type as source, traffic destined for the Cloud will not work.
For the outside IP used in LAN side NAT rules, we configure a static route and advertise it to the remote branches. For the return traffic to be routed to the correct the LAN subnet, route lookup should be done based on the Inside IP configured in the LAN side NAT rule instead of the next hop in the static route. But for the return traffic from cloud, the route lookup is done based on the next hop in the static route and traffic can get routed to the wrong LAN subnet.
- Fixed Issue 62815: Operator user is unable to access a VMware SD-WAN Edge via SSH through the Edge's VMware SD-WAN Primary Gateway.
The SSH session times out. When looking at the flow on the Edge, a flow is created, but the packet counters only show packets inbound, not outbound.
- Fixed Issue 62890: Application IDs are different when viewing statistics in Edge Network Intelligence and the VMware SD-WAN Orchestrator.
There are two different ways that applications are learned:
1. On the first packet, via a database of well-known SaaS applications IP addresses and ports (e.g. Office 365), or by learning the IP of an application the Orchestrator had previously learned.
2. After a series of packets are analyzed using deep packet inspection (DPI).The Edge Network Intelligence Application IDs were not being updated by #2. That means that first-packet applications like Office365 are visible, but applications that require DPI (e.g. AnyDesk) are seen in the Orchestrator but not Edge Network Intelligence.
- Fixed Issue 63056: A VMware SD-WAN Edge may encounter a kernel panic with a resulting reboot and core.
The mutex mon process fails with SIGXCPU and a core is triggered. Allowing all threads to use both cores is the fix along with moving the Edge Dataplane Service < > frr communication to Unix Domain sockets which gains all the benefits of TCP sockets without the heavy kernel overhead and better performance.
- Fixed Issue 63141: For a site using an Enhanced High-Availability topology where Metonia ADSL2+ SFP modules are being used, on a failover the ADSL2+ SFP modules fail to come up.
When an HA Edge fails over or if the network is restarted, an Edge with an ADSL2+ configuration fails to come up.
- Fixed Issue 63359: For a site configured with a High-Availability topology and OSPF and where the VMware SD-WAN Edges are using a MGMT IP Edge build, when these Edges are upgraded from a 3.4.x to a 4.2.x MGMT-IP build, OSPF connectivity may be broken post-upgrade.
When the HA Edges are upgraded to a 4.2.x MGMT IP build, the HA systems may define its Router ID as 169.254.2.2. This is not the expected behavior given that the Edge selection of Router ID should not take the HA interface's IP Address into account. This Router ID breaks OSPF connectivity and there is a complete disconnection as route exchange no longer occurs.
Without the fix, the only workaround is to restart the Edge service (triggering an HA failover) as this will force a reselection of the Router ID which should be a correct one after the restart.
- Fixed Issue 63362: For a site using an Enhanced High-Availability Topology, a DHCP/PPPoE enabled interface stops sending traffic after the Standby Edge is either rebooted, or power cycled.
In an Enhanced HA topology if DHCP/PPPoE is enabled on a proxy interface (in other words, the HA link state is set to USE_PEER) it fails to get an address from the server after the Standby Edge either reboots or power cycles.
Without the fix, the only workaround is to either change the dynamic address to a static address type or do a forced HA failover to get an IP address from the server. - Fixed Issue 63513: For a customer using Edge Network Intelligence, the software version displayed for the VMware SD-WAN Edge is not updated after an Edge software upgrade.
The Edge has in fact upgraded to the latest Edge Network Intelligence version, but the Edge communicates the older version number to the VMware SD-WAN Orchestrator and this is what the customer observes. The customer encounters the issue after upgrading an Edge. After the Edge is upgraded from an older to a newer release, the customer continues to see the older release version for Edge Network Intelligence.
- Fixed Issue 64205: User will observe a high number of handoff queue drops of VCMP Data for a VMware SD-WAN Gateway, leading to a poor user experience.
When there are continuous flow create events, the packet processing on VCMP (VeloCloud Management Protocol) Data thread gets slower. This fix reduces the VCMP Data thread load by redirecting VCMP Control messages to a different thread and by eliminating some of the continuous log messages.
- Fixed Issue 64633: A customer who uses a Non SD-WAN (NSD) via Gateway to connect to a VMware Cloud (VMC) on AWS peer may observe an intermittent traffic drop lasting ~30 seconds each time.
This issue is observed with VMware Cloud (VMC) on AWS only. The peer starts an IKE rekey 30 seconds before the security association (SA) expiration and after each rekey the peer retains the old SA and uses it until its expiration, while the VMware SD-WAN Gateway deletes the inbound SA. The deletion of the inbound SA causes the traffic drop with this peer. The frequency of this issue is contingent on the peer's rekey policy. If the peer rekeys every 45 minutes, then this issue would happen every 45 minutes, if 12 hours, then every 12 hours. The traffic will recover automatically after ~30 seconds by itself, when the peer switches to the new SA.
- Fixed Issue 64961: A VMware SD-WAN Edge may experience a Dataplane Service Failure and restart that service if processing IP packets that include options.
The processing of IP packets with options, could result in a Dataplane Service Failure due to incorrect parsing of the options fields (the parsing continues beyond the end of the options list). The Dataplane Service failure is triggered by mutex mon. Without this fix, the only way to minimize the risk of this issue is to avoid setting options other than Record Route (RR) and No Option (NOP) in the user-traffic IP packets.
- Fixed Issue 65037: A HTTPS/SSL connection may fail to establish because of a corrupt certificate if the certificate has special characters or spaces in the SSL common name field.
The VMware SD-WAN Edge inspects all user traffic passing through it so that it may identify the application to which the traffic belongs. It is needed for correctly applying business policies and also for the VMware SD-WAN Orchestrator to display per-application statistics on the Edge's Monitoring page. However an issue in the application identification code caused a byte in the SSL common name to be overwritten in case the SSL common name had special characters or spaces and thereby corrupting it.
- Fixed Issue 65186: For a customer site using multiple WAN links, if there is a business policy configured to use one link with a Preferred or Mandatory policy, the traffic type covered by the business policy continues to be load balanced across all available links.
Even though the Business Policy is configured to route traffic to one WAN Link using a mandatory or preferred configuration, traffic would be load balanced on multiple WAN links.
- Fixed Issue 65219: A KVM SR-IOV type VMware SD-WAN Gateway using a i40evf driver drops customer packets of 1500 bytes or greater.
Anything less than 1496 byte data size will not be dropped. If a user attempts to SSH into the Gateway host, the user will observe a hang based on the condition described.
- Fixed Issue 65293: The throughput performance of a VMware SD-WAN Gateway deployed in AWS and running with Amazon's Elastic Network Adapter (ENA) driver is degraded when using Release 4.x.
This issue will occur if the Gateway is upgraded to a 4.x build (from 3.x) or on a new deployment using a 4.x build. Gateways using Release 4.0.0 or later have DPDK v19.11, and starting from DPDK v19.02, Amazon's ENA driver uses low-latency queuing (LLQ). However, for LLQ to work efficiently the write-combine for memory setting must be enabled per the ENA reference guide. If memory mapping is not write-combined, a Gateway deployed on AWS experiences high CPU usage, significantly impacting throughput. The fix for this issue enables write-combining on the ENA adapter for Gateways deployed on AWS.
- Fixed Issue 65432: A traceroute from a client which is LAN-side connected to a VMware SD-WAN Edge to a DC server via a VMware SD-WAN Gateway does not display the Gateway IP in the traceroute output.
On initiating a traceroute from the LAN client to the DC which is reachable through the Gateway, the traceroute displays all the hops except the gateway IP.
- Fixed Issue 65521: A VMware SD-WAN Edge may encounter a Dataplane Service failure and restart as a result.
An Edge service restart will disrupt customer traffic for ~5-10 seconds. The Edge Dataplane Service fails while processing an unexpected control message during a VeloCloud Management Protocol (VCMP) tunnel creation handshake. This issue not dependent on network topology or number of flows, or throughput. It is both rare and random, but has the potential to occur on any type of customer enterprise.
- Fixed Issue 65539: A BGP session established between two devices across two different branches does not come up when the customer has upgraded their VMware SD-WAN Edges to Release 4.2.x.
When a customer upgrades their Edges from a lower version to Release 4.2.x, the BGP sessions between 2 LAN devices of different branches established over VCMP tunnels will not come up.
- Fixed Issue 65839: For flows initiated from the clients behind a VMware SD-WAN Hub Edge to the LAN behind a Spoke Edge, the return traffic from the spoke is routed via the Partner Gateway if the default route is advertised from the Partner Gateway.
The expected behavior is for a flow that originates from a Hub Edge to return by the Hub Edge as well. If there is no default route or Edge-to-Edge route advertised from the Hub Edge to the Spoke Edge, the route lookup on the Spoke Edge for the return traffic matches the Partner Gateway default route and the return traffic is routed to the Partner Gateway instead of the Hub Edge.
Without this fix, the only way to avoid this issue is to advertise a default route or an Edge-to-Edge route from the Hub Edge to the Spoke Edge. - Fixed Issue 65985: For a customer using Dynamic Edge-to-Edge, a VMware SD-WAN Edge in their network may abruptly drop all tunnels and then be unable to build tunnels to any other sites in the network.
Once the site drops all its tunnels, the Edge's maximum tunnel value becomes corrupted and shows a negative value for the maximum # of tunnels. This corrupted value prevents the Edge from forming any new Dynamic Edge-to-Edge tunnels to other Edges. The impact is severe as the Edge cannot communicate with any other site in the network.
Without the fix the only way to clear this issue is to perform an Edge service restart or an HA failover for HA sites.
- Fixed Issue 66355: For a customer where the Stateful Firewall is enabled and at least one LAN side NAT (Many:1) rule is configured, inter-VLAN flows do not work.
With Many:1 LAN side NAT rules, the TCP state is not maintained properly for the inter-VLAN traffic and with Stateful Firewall also enabled, the packets will be dropped.
- Fixed Issue 66366: For a customer using multicast with a large number of neighbors, a VMware SD-WAN Edge may experience a Dataplane Service failure and restart, causing a brief disruption in customer traffic.
"Large number of neighbors" is defined as ~1600 PIM neighbors. In the case where this issue happens, while traffic is running for a group from 1600 Spoke Edges to one receiver behind a Hub Edge, the PIM service fails and this in turn causes the Edge service to also fail, causing the restart.
- Fixed Issue 66676: When a Business Policy NAT is configured, the return traffic from the VMware SD-WAN Gateway may not NAT back to the original source IP.
During the NAT entry insertion in the code, it is expected to delete the older entries. However, due to not using all keys for hash table look up, older entries were not getting deleted in some instances and this was causing the NAT entry insertion error.
- Fixed Issue 66714: User is unable to use a hostname for DCHP Option 150 on a VMware SD-WAN Edge.
If a user configures a hostname for DHCP Option 150, the attempt to obtain an IPv4 address from the Edge with a DHCP client will result in dnsmasq error messages in the Edge logs which refer to the hostname as a bad IP address and the DHCP client obtains no IP address from the DHCP service on the Edge. While RFC 5859 was designed to use IPv4 addresses instead of a hostname, other current networking devices allow for the usage of a hostname for Option 150. So customers who are using hostnames on other devices would need to accommodate for Edge devices so that DHCP service on the edge does not break.
- Fixed Issue 66801: For a customer site using a High-Availability topology and a VNF, the customer may not be able to connect to a VNF to perform trust establishment from a management server.
The issue is seen at HA sites when routed interfaces are DHCP enabled and there is no default route present in the kernel route table. In that case the kernel responds with "ICMP destination unreachable".
Without the fix, the workaround to prevent this issue is to add a default route on the Standby Edge so the the Edge does not send "ICMP Unreachable" back to the VNF VM, causing the SSH connection to reset.
- Fixed Issue 67060: VMware SD-WAN Edge may show a large memory utilization which may potentially cause an Edge service restart if sufficiently high.
The issue is a memory leak which manifests as a slow and continuous increase in memory utilization. The issue is occurs when multiple HTTP request packets are sent for a single flow, the memory leak specifically happens while the Edge is parsing the HTTP request packets.
- Fixed Issue 67083: VMware SD-WAN Edge may experience a Dataplane Service failure and restart as a result, with a brief disruption of customer traffic.
In a few scenarios the VeloCloud Management Protocol (VCMP) data packet are processed with wrong the parameters (for example, a data packet is misclassified as a control packet) which triggers an exception and the service restart.
- Fixed Issue 67173: When the same route is learned from multiple IBGP neighbors, the second best route selected from the BGP process is being used by the VMware SD-WAN Edge resulting in a black-holing of certain customer traffic.
Due to an issue in the Free Range Routing suite (FRR), IBGP was sending multiple next-hops to the Edge and it was picking the second best (last in the next hop order) to update the forwarding information base (FIB). The fix includes a command in the BGP process to send only the best next hop to the Edge.
- Fixed Issue 67197: A customer network may experience periodic disruption of multicast service in deployments with more than 1500 sources associated with a multicast group.
A software issue in the PIM stack's join-prune message handling logic fails with an exception when handling join-prune updates in deployments with more than 1500 sources associated with a multicast group.
Without the fix, the only way to prevent this issue is to limit the total number of multicast sources to 1000.
- Fixed Issue 67259: Multicast traffic flow disrupted when PIM process restarts multiple times and PIM neighbor do not come up.
On a scale setup with 1600 PIM neighbors, when restarting PIM process multiple times while traffic is running from 700 Spoke Edges to a receiver behind a Hub Edge, after one of the restarts, only 570+ PIM neighbors came up out of the 1600 PIM neighbors. The only way to clear this issue is restart the Edge service.
- Fixed Issue 67790: For a customer enterprise which uses either BGP or OSPF and has configured an inbound filter(s) to ignore certain routes, when Dynamic Cost Calculation (DCC) is enabled on this enterprise, the inbound filter(s) will no longer be in effect and traffic will attempt to use those routes.
Prior to DCC being enabled, the forwarding information base (FIB) will not include the routes that were set to IGNORE on the BGP/OSPF inbound filter. After DCC is enabled the FIB now includes these routes and traffic will attempt to use these routes with the potential for significant traffic disruption for the customer enterprise.
Without the fix, the only workaround is to restart OSPF/BGP for the inbound filter to be properly applied.
- Fixed Issue 68840: For a customer using a High-Availability topology, SNMP polling is not able to retrieve LAN and WAN information from the VMware SD-WAN Standby Edge.
For HA SNMP GET, the Standby LAN/WAN count (vceHaStandbyLanItfNum and vceHaStandbyWanItfNum) is displayed either partially or not all.
- Fixed Issue 68994: Customers who deploy a Non SD-WAN Destination (NSD) tunnel from a VMware SD-WAN Edge with a VMware SD-WAN Gateway may observe the tunnel flapping.
This issue is observed at tunnel establishment or at IKE rekey. Either the Edge or the Gateway deletes the security associations (SAs) based on IKESAID=0 which causes tunnel flapping. The tunnel automatically stabilizes, but the time needed to do this is not consistent and that can further the impact to customer traffic to the NSD.
- Fixed Issue 69497: The VMware SD-WAN MIBs shows vceLinkVpnState SNMP object even though that is no longer a valid object.
VMware SD-WAN no longer shows a differentiated VPN state on the VMware SD-WAN Orchestrator but still exposes this in SNMP. To be specific, the SNMP Collector polls for SNMP OID 1.3.6.1.4.1.45346.1.1.2.3.2.2.1.26, which it should no longer do.
- Fixed Issue 69681: If a VMware SD-WAN Edge is configure with Hot Standby WAN links and also uses SNMP polling, the user will observe SNMP errors.
Error message would be similar to following:
ERROR [oids (10028:MainThread:10028)] [VCE.Path]<update>: Path failed update buffer: KeyError('HOTSTANDBY_IDLE',) INFO [oids (10028:MainThread:10028)] [VCE]<update_if_stale>: Current MIB buffer size: 217 DEBUG [oids (10028:MainThread:10028)] [VCE.Link]<ip2octet>: Failed to convert IP to Octet for caller <class 'vcsnmp.oids.Link'>[publicIpAddress] on []: ValueError("invalid literal for int() with base 10: ''",), used default value[00 00 00 00] instead
Cause of issue is that SNMP path states do no include Hot Standby link states and this causes SNMP issues including error messages.
- Fixed Issue 70154: For a customer enterprise where the Stateful Firewall is enabled, the user will observe packet drops when sending bidirectional pings between branch clients with the same ICMP ID.
If a ping is initiated from Client A in Branch 1 to Client B in Branch 2 and vice versa, the ICMP states for both the pings will be tracked with the same flow object if the ICMP ID is the same and this can lead to multiple packet drops because of the sequence number check.
Without this fix, the workaround is to either deactivate the Stateful Firewall or to generate ping with different ICMP IDs.
- Fixed Issue 70310: For a customer using multiple segments, when one or more segments are deleted or deactivated, a VMware SD-WAN Edge may suffer a Dataplane Service failure and restart that service, causing a brief interruption of customer traffic.
When a segment is deleted, the Edge does not fully clean up the memory associated with this deleted segment. There are scenarios where the Active Edge synchronizes events to the Standby Edge by referencing such segments which results in a service failure on the Standby Edge as these segments are not present.
- Fixed Issue 70789: Customer may experience random drops in traffic due to IPSec Anti-Replay detection.
If either a VMware SD-WAN Edge or VMware SD-WAN Gateway receives two packets which each update the cache entry sequence number, then it is possible that the first packet will update the replay window incorrectly, which may trigger IPsec Anti-Replay detection which would cause the IPsec packet to drop.
Resolved in Orchestrator Build R422-20220715-GA
Orchestrator build R422-20220715-GA was released on 08-10-2022 and is the 3rd Orchestrator rollup for Release 4.2.2.
This Orchestrator rollup build addresses the below critical issues since the 2nd Orchestrator rollup, version R422-20220112-GA.
- Fixed Issue 88796: When deploying either a VMware SASE Orchestrator or a VMware SD-WAN Gateway and using an OVA on vSphere, the OVF properties set as part of the deployment (password, network information, etc.) are not applied to the image and the system cannot be accessed after deployment.
This only affects a new system deployed from an OVA using OVF/vApp properties (versus using ISO files). This issue is caused by upstream changes to cloud-init in recent updates.
On an Orchestrator without the fix, the workaround is for the Operator to deploy the system using a cloud-init user-data ISO file.
Note: This entry tracks the Orchestrator OVA only. The Gateway side of this issue is fixed with build R422-20220518-GA and later.
- Fixed Issue 85883: When a VMware SD-WAN Edge is preconfigured to have a primary active WAN link and also a backup WAN link on the VMware SD-WAN Orchestrator, if the Edge is activated using the WAN link designated as the backup, the Orchestrator will continue to show this link status as active even after the primary WAN link is connected.
This is a cosmetic issue as the WAN link configured as a backup is being used as configured, meaning that after the primary WAN link is connected to the Edge, the back up WAN link's management tunnels are torn down and no traffic is being passed. The issue is that the Orchestrator is not accurately displaying the WAN link's backup status on the Monitor > Edge > Overview page which can cause user confusion as to the WAN link's actual status.
___________________________________________________________________
Orchestrator Build R422-20220112-GA
Orchestrator build R422-20220112-GA was released on 01-21-2022 and is the 2nd Orchestrator rollup build for Release 4.2.2.
This Orchestrator build remediates Apache Log4j vulnerabilities CVE-2021-44228 (which was first addressed in Orchestrator build R422-20211216-GA with Log4j version 2.16.0) and CVE-2021-45046, by updating to Log4j version 2.17.0. For updated information on the Apache Log4j vulnerabilities and their impact on VMware products, please consult the VMware Security Advisory VMSA-2021-0028.9.
___________________________________________________________________
Orchestrator Build R422-20211216-GA
Orchestrator build R422-20211216-GA was released on 12-20-2021 and is the 1st Orchestrator Rollup Build for Release 4.2.2.
This Orchestrator build remediates CVE-2021-44228, the Apache Log4j vulnerability, by updating to Log4j version 2.16.0. For more information on the Apache Log4j vulnerability, please consult the VMware Security Advisory VMSA-2021-0028.5.
___________________________________________________________________
Resolved in Version R422-20210920-GA
The below issues have been resolved since Orchestrator version R421-20210415-GA.
- Fixed Issue 20900: If the MaxMind geolocation service is enabled and cannot reach the MaxMind server, new VMware SD-WAN Edge activations will not work.
The Edge creates an HTTPS connection to the VMware SD-WAN Orchestrator in order to activate. The default timeout for the request is 120 seconds and for the proxied connection, it is 60 seconds. As the Orchestrator is attempting to geolocate the Edge (IPv4 remote address) uploads waits for the response from the MaxMind service in order to proceed with the activation. Hence, after 60 seconds, NGINX stops for the upload service’s response and closes the connection. Therefore the activation fails because of a 504 timeout from the NGINX.
With the new system property service.maxmind.timeout.seconds, the Maxmind API call is made with a custom timeout. If the timeout is reached, the call proceeds with the activation workflow and hence the Edge gets successfully activated.
- Fixed Issue 45078: When configuring a VNF for a customer on the VMware SD-WAN Orchestrator, if a VNF state is configured at the Profile level one way, and then configured a different way at the site level using Edge Override, when Edge Override is later deactivated, the site continues to use the Edge Override settings and does not revert back to the Profile settings as expected.
The happens when configuring VNF Insertion parameter on a Configuration Profile where the opposite setting is configured for a site using Edge Override and later Edge Override is itself deactivated, but the setting persists.
- Fixed Issue 48706: Users may not be able to save changes on the Configure > Edge > Device tab with the source interface selected under the Syslog configuration.
The error the user would see on the VMware SD-WAN Orchestrator is "Provided source interface is not present in the segment on segment: <Segment Name>." The is caused by the user creating and deleting a number of segments in such a way the segment sequence is no longer sequential.
- Fixed Issue 48791: User is unable to switch a VMware SD-WAN Edge between Profiles when the Edge has an interface configured using Edge Override.
For example, if a customer two Configuration Profiles: Profile 1 and Profile 2 and associates an Edge with Profile 1. If the user then uses Edge Override to configure GE2 to routed and adds a static route for GE2, when the user later tries to assign this same Edge to Profile 2, the user will observe an error that GE2 does not exist on Profile 2 as routed. This issue occurs because when a user configures an Edge interface using Edge Override that belongs to a profile, the VMware SD-WAN Orchestrator is unable to switch because the Orchestrator is not validating the Edge Override presence.
- Fixed Issue 52379: The VMware SD-WAN Orchestrator sends out an ‘Edge Down’ alert email if the VMware SD-WAN Edge recovers within the configured delay interval.
Administrators can be falsely alerted of an Edge being down in their network even though they configured a delay to allow an Edge to be down for a period of time before triggering that alert.
- Fixed Issue 52863: The VMware SD-WAN Orchestrator UI allows non-standard BGP timer configurations and does not throw an error.
While enabling Partner handoff configuration at the customer configuration page, when a user configures the BGP keep/hold timers on the Orchestrator that do not comply with the BGP standard in RFC 4271, the Orchestrator allows the configuration to be saved. However, on the VMware SD-WAN Edge itself, FRR changes the keep/hold values to comply with standards. For instance, if a user configures a keep 2 second/hold 5 second on the Orchestrator, the Edge FRR will change the keep value to 1 sec so that 3 x keep = (less or equal to hold).
- Fixed Issue 53525: When using the New UI on a VMware SD-WAN Orchestrator and viewing the Edge overview page, the Links column does not show the state of the link (e.g., Backup, Standby).
This link state information is correctly shown on the Old UI and with this fix will show as expected on the New UI.
- Fixed Issue 53652: When a customer enterprise that is using a custom application map is upgraded from 3.x to 4.x, the customer may observe random names for their custom applications created prior to the upgrade.
Whenever a custom application map is configured with an Application ID (appId) which already exists as part of a default initial application map, the VMware SD-WAN Orchestrator will always show the display name of the default initial application map and override the customer defined name. This is also true when the Orchestrator is upgraded from a lower version to a higher version and the higher version default initial application map has an appid which conflicts with an appId of the custom applications created in a lower version. After the Orchestrator upgrade, those custom applications will show an incorrect display name which is the display name of the appid for the higher version's default initial application map.
- Fixed Issue 53857: A VMware SD-WAN Orchestrator deployment which uses a KVM image based on Release 4.0.0 will fail to deploy.
The reason for the failure is that the KVM image has an incorrect virtual disk size and the volumes will not expand to the required size. On a deployment, the Orchestrator scripts automatically expand Orchestrator volumes to take 80% of the maximum size of the underlying disks (physical volumes). In this case, because of the incorrect virtual size, that expansion is inadequate for Orchestrator database requirements and the deployment fails. It is possible to deploy an Orchestrator using an older build without this fix, but the volumes must be resized manually.
- Fixed Issue 53919: On VMware SD-WAN Orchestrator UI, when viewing historical data (>2 weeks) for a small time range (example: 1 hour), no data is being returned even though data is there and can be viewed when using a larger time range.
When querying for Edge statistics in a large time range (example: last 31 days), data appears for the whole time range. However, when zooming in at the historical data and querying for a small time range, it now says no data available.
- Fixed Issue 54546: The VMware SD-WAN Orchestrator UI does not display the Cloud Security Service or Non SD-WAN via Edge tunnels correctly on the Monitor > Edges page.
The issue may happen when there are multiple VMware SD-WAN Edges that use WAN links through USB interfaces for CSS or NSD via Edge tunnels. The Orchestrator is sorting the tunnel events by time and the latest event per ‘datakey’ is being used to determine the effective state. Since the key’s value is same for many entries, some of the tunnels got left out. This is a display issue only with no customer impact beyond showing a false status.
- Fixed Issue 55871: Some API calls to REST APIv2 (/sdwan) HTTP cause the server to produce HTTP 500 errors.
In some cases where customer data does not conform precisely to the schema that the API expects, the API produces an HTTP 500 error rather than return data which is inconsistent with the documented API schema. This behavior was driven by a design decision that has since been revisited. Calls to "GET /enterprises", "GET /enterprises/{enterpriseLogicalId}/edges", and "GET /enterprises/{enterpriseLogicalId}/clientDevices" are known to be affected.
- Fixed Issue 57046: At the creation of a customer on the VMware SD-WAN Orchestrator, Operator access is disallowed. But when the Orchestrator is upgraded to a 4.0.x release, the MODIFY_ENTERPRISE_CUSTOM_ROLES and VIEW_PATH_STATS privileges are added by system patches without checking the Operator access.
Customer settings show permissions delegated to the Operator, and yet the Operator does not have the permissions to, for example, read network services or modify Edges/Profiles. So there is a false status displayed for what an Operator may do on that customer's enterprise.
- Fixed Issue 57163: Customer cannot receive notifications via SNMP trap for Cloud Security Service (CSS) or Non SD-WAN Destinations (NSD) via Edge tunnel alerts.
The issue occurs when a customer wants to use a SNMP trap to receive CSS/NSD via Edge tunnel alerts, but the SNMP traps are not being triggered for those events.
- Fixed Issue 58127: A user will observe several issues with Enterprise Reporting.
The issues include an ID field on the reports that should not be shown, the username is missing in the generated report, and a 50 report limitation. This ticket fixes the first two issues. The 50 report limitation is a System Property configuration designed to prevent Orchestrator performance issues. While an Orchestrator administrator could increase the report cap, it would entail some performance risk on the Orchestrator.
- Fixed Issue 58627: Users configured to receive Alerts may receive a Link Up Alert when in reality the link remains down.
Sometimes after a link is marked as 'Down', statistics for that link that were generated before the link went down may not be sent to the VMware SD-WAN Orchestrator for up to a minute after the event. Once the Orchestrator receives these lagging link statistics, it is fooled into thinking the Link is back up and thus triggers a Link Up alert if the Alert settings are aggressive (e.g. 0 minute delay). The fix ensures that the Orchestrator does not interpret delayed link statistics as indicating that the link is now up.
- Fixed Issue 59094: When an operator is attempting to upgrade a VMware SD-WAN Orchestrator, the update script does not provide a proper warning message about the schema update requirements.
If an operator misses the step to apply schema changes on the larger tables, there could be an error on the Orchestrator services. Also there is not an easy way to find out what changes are missing. This fix addresses this issue when, upon a backend service restart, it will regenerate any missing schema changes required on a large table.
- Fixed Issue 59689: When using the New UI on a VMware SD-WAN Orchestrator with an exceptionally high number of enterprise and Edges, the Monitor > Firewall Logs page may load slowly or stop responding completely.
This issue has been seen on a hosted Orchestrator with 200+ customer enterprises and thousands of Edges.
- Fixed Issue 60502: BGP routes learned on a VMware SD-WAN Edge may be not pushed to the VMware SD-WAN Orchestrator and VMware SD-WAN Gateway with an "OFC cap reached" message observed.
This issue occurs where an Edge is processing greater than 2000 or more routes. In such a case, the Edge sends a 2000 route batch to the routing API and this upload takes around 300 seconds to process the batch. However, NGINX rejects the call after a timeout of 60 seconds and the Edge receives a rejection and recalls the routing API with the same 2000 routes. The upload processes the routes again successfully but NGINX rejects the call due to timeout and this loop continues and the Orchestrator ends up processing the same batch again and again, never learning the routes.
- Fixed Issue 60608: The dropdown to select an interface under the "Static Route Settings" section of a VMware SD-WAN Edge's device settings is unavailable.
When a user attempts to add a static route to an Edge on the VMware SD-WAN Orchestrator UI, the Interface will remain "Not Applicable" regardless of the IP Address of the next hop.
- Fixed Issue 61000: Newly created Operator Profiles may not be selectable from the Partner Overview page on the VMware SD-WAN Orchestrator UI.
When an Orchestrator has over 100 Operator Profiles, and then a user attempts to select some of them from the Partner Overview page, only 100 will be displayed in the UI. Without the fix, the only way to address this is to have VMWare SD-WAN Technical Support assign the requested Operator Profile.
- Fixed Issue 61312: A VMware SD-WAN Orchestrator may encounter an issue where routes are no longer updated and the CPU utilization of the Orchestrator is near 100%, especially after the Orchestrator is upgraded.
This issue manifests when an Edge sends ~2K+ route updates to the Orchestrator's routing API. In those scenarios where the Orchestrator is unable to process the entire set of routes sent on a particular API call within 60 seconds, it results on a timeout for that call which in turn results in the API call being rejected entirely. The Edge receives this rejection and attempts to push the same 2K+ routes to the Orchestrator again, leading to the same scenario as before which creates a loop that overloads the Orchestrator's vCPU resources. When present this issue can prevent route updates from being processed.
To address this issue, two system properties have been added:
edge.learnedRoute.maxRoutePerCall This property ensures only a limited number of routes are processed from an Edge. If the property value is ‘200’, then 200 routes will be processed per Edge request which ensures that an acknowledgment is sent to the Edge on time.
vco.learnedRoute.simultaneous.maxQueue This property ensures only the configured number of Edges may have route requests queued at a time. If the property value is ‘8’, then only 8 Edges would be permitted to send route requests at a time and those in excess of the configured value would be rejected immediately prior to the routes being processed.
- Fixed Issue 61625: A VMware SD-WAN Hub Edge does not advertise all routes if there are more than ~250 routes to be advertised on OSPF.
Whatever the number of routes, whether it is 300 or 2000 or more, only ~250 routes will show as advertised and the remainder will have their advertise flag set to FALSE. This is due to a delay in processing routes beyond that ~250 amount and is resolved by processing a much smaller batch of routes per request.
- Fixed Issue 61852: Monitor > Firewall Logs page in the New UI does not display correct pagination information.
The page row count is incorrect for this section.
- Fixed Issue 62145: When a VMware SD-WAN Orchestrator is upgraded to 4.2.1 from lower releases, the migration fails providing a unique constraint break error. This is on `logicalId` field on the client device table.
Release 4.2.1 has a long-running operation that runs on migration which adds `logicalId` to the client device table. This operation is only performed based on a precondition query. This precondition query was incorrect causing logicalId field to be empty. Addition of a constraint on the logicalId field caused duplication error as more than 1 rows consisted of logicalId as empty string.
Without the fix, the only workaround for this is on migration, manually run the pending long running query which will add unique logicalId to all rows of client device table and add then run the unique constraint query.
- Fixed Issue 62624: When a user attempts to uncheck the Partner Gateway box on Gateways > Overview page of the VMware SD-WAN Orchestrator UI, an error pops up which displays a Profile name only, with no indication which Customer owns the Profile.
This is a significant issue if needing to change the status of a VMware SD-WAN Gateway since a user cannot know which customer(s) are using this Gateway since all the user can see is the Profile, which effectively means nothing without the Customer connected to it.
- Fixed Issue 62654: User sees the Customer name when attempting to uncheck the Partner Gateway checkbox while the Partner Gateway is in use.
When a user unchecks the Partner Gateway checkbox for a particular Gateway on the VMware SD-WAN Orchestrator UI while the Gateway is used by one or more customers and a customer profile as well, the Orchestrator shows only the name of the Profile and Edge not the name of the customer names using the Gateway.
- Fixed Issue 63556: User has the option to add more than one TACAC server on the VMware SD-WAN Orchestrator UI.
While the user can add more than one TACAC server, this is not a valid configuration. The reason is that if the first TACAC server fails, the second TACAC server is not going to take over in any case. The fix removes the option for adding more than one TACAC server.
- Fixed Issue 64039: In some cases, a customer may observe their DHCP server as inactive.
Issue can be observed in the following scenario: after providing values to addressing type, enable the DHCP server and give values and click on the Update button. If the user opens the subinterface popup, they would observe the DHCP server showing as inactive with all the fields under DHCP server hidden.
- Fixed Issue 65253: When configuring a Firewall Rule, the drop down list for Object Groups is unusable on the VMware SD-WAN Orchestrator UI when 20+ groups are configured.
Even with 5+ Object Groups (Address Group, Port Group) configured, the Object Group drop down list appears near the bottom of the browser screen. With 20+ rules the Object Groups list is completely out of the screen, and it’s impossible to see it unless the user zooms out a lot on the browser but by then the text is so tiny as to also be unusable.
- Fixed Issue 65526: The VMware SD-WAN Orchestrator generates Alerts and Events for a VMware SD-WAN Edge in a "Degraded" state which never reaches an "Offline/Down" state.
When a VMware SD-WAN Edge initially loses connectivity to the Orchestrator (on a heartbeat check), this state is called "Degraded". Should the Edge loss of connectivity to the Orchestrator continue, the Edge would then be marked as Offline/Down, and this second state is when an "Edge Down" Event should be posted on the Orchestrator's Monitor > Events page and a matching Alert sent out as appropriate to a Customer's Alerts configuration. However, the Orchestrator is generating an Event and sending an Alert for an Edge in a Degraded state, resulting in a possibly large number of spurious Edge Down Events and Alert notifications for the customer.
- Fixed Issue 66203: When a Gateway Pool is assigned to a Customer which contains both Cloud Hosted and Partner Gateways, the Partner is not able to modify the gateway handoff configuration for their managed Gateways.
The issue occurs when a Partner administrator user tries to modify the gateway handoff configuration in the Customer tab present on the customers page.
- Fixed Issue 66597: On a VMWare SD-WAN Orchestrator where there is a customer with a very large number of Edges deployed, when adding multiple VMware SD-WAN Gateways to a Gateway Pool that customer is using, a large number of Edges may show as down on the Orchestrator.
This issue was observed in the field with a customer who had ~7000 Edges connected to the Orchestrator. When there is a change in the Gateway Pool for that customer, the Orchestrator needs to push configuration changes to all the Edges and the control plane recalculations for more than 700+ edges in a 30 second window causes heartbeats/statistic pushes to fail with 'POOL_ENQUEUELIMIT' error. Because of heartbeat failures, the Edges show as down on the Orchestrator.
- Fixed Issue 66631: The Migration Tool does not work when attempting to migrate large customer enterprises.
Large customer enterprise is defined as one with 100 or more Edges. The migration tool will fail at the step where is is supposed to stringify the whole data blob and write to a file. When doing the configuration export, the migration tool was using JSON.stringify to stringify the output data and write it to the file, which will fail when the configuration is huge.
- Fixed Issue 67153: Alert emails are being sent out even if the VMware SD-WAN Edge came up within the configured delay interval.
The VMware SD-WAN Orchestrator sends Edge Down / Up Alerts notifications even if the events happened within the configured delay interval.
- Fixed Issue 71399: Or a VMware SD-WAN Orchestrator deployed in a Disaster Recovery (DR) configuration, the Operator User may observe that the Standby Orchestrator has failed to synchronize with the Active Orchestrator.
On the Orchestrator UI under the Replication page, a user would observe all Sync activities as failed under the Activity Monitor. The DR synchronization failure happens on initial handshake where the Active Orchestrator fails to copy the configuration database to the Standby Orchestrator.
Known Issues
Open Issues in Release 4.2.2
The known issues are grouped as follows.
Edge/Gateway Known Issues- Issue 14655:
Plugging or unplugging an SFP adapter may cause the device to stop responding on the Edge 540, Edge 840, and Edge 1000 and require a physical reboot.
Workaround: The Edge must be physically rebooted. This may be done either on the Orchestrator using Remote Actions > Reboot Edge, or by power-cycling the Edge.
- Issue 25504:
Static route costs greater than 255 may result in unpredictable route ordering.
Workaround: Use a route cost between 0 and 255
- Issue 25595:
A restart may be required for changes to static SLA on a WAN overlay to work properly.
Workaround: Restart Edge after adding and removing Static SLA from WAN overlay
- Issue 25742:
Underlay accounted traffic is capped at a maximum of the capacity towards the VMware SD-WAN Gateway, even if that is less than the capacity of a private WAN link which is not connected to the Gateway.
- Issue 25758:
USB WAN links may not update properly when switched from one USB port to another until the VMware SD-WAN Edge is rebooted.
Workaround: Reboot the Edge after moving USB WAN links from one port to another.
- Issue 25855:
A large configuration update on the Partner Gateway (e.g. 200 BGP-enabled VRFs) may cause latency to increase for approximately 2-3 seconds for some traffic via the VMware SD-WAN Gateway.
Workaround: No workaround available.
- Issue 25921:
VMware SD-WAN Hub High Availability failover takes longer than expected (up to 15 seconds) when there are three thousand branch Edges connected to the Hub.
- Issue 25997:
The VMware SD-WAN Edge may require a reboot to properly pass traffic on a routed interface that has been converted to a switched port.
Workaround: Reboot the Edge after making the configuration change.
- Issue 26421:
The primary Partner Gateway for any branch site must also be assigned to a VMware SD-WAN Hub cluster for tunnels to the cluster to be established.
- Issue 28175:
Business Policy NAT fails when the NAT IP overlaps with the VMware SD-WAN Gateway interface IP.
- Issue 31210:
VRRP: ARP is not resolved in the LAN client for the VRRP virtual IP address when the VMware SD-WAN Edge is primary with a non-global CDE segment running on the LAN interface.
- Issue 32731:
Conditional default routes advertised via OSPF may not be withdrawn properly when the route is turned off. Reactivating and deactivating the route will retract it successfully.
- Issue 32960:
Interface “Autonegotiation” and “Speed” status might be displayed incorrectly on the Local Web UI for activated VMware SD-WAN Edges.
- Issue 32981:
Hard-coding speed and duplex on a DPDK-enabled port may require a VMware SD-WAN Edge reboot for the configurations to take effect as it requires deactivated DPDK.
- Issue 35778:
When there are multiple user-defined WAN links on a single interface, only one of those WAN links can have a GRE tunnel to Zscaler.
Workaround: Use a different interface for each WAN link that needs to build GRE tunnels to Zscaler.
- Issue 35807:
A DPDK routed interface will be deactivated completely if the interface is deactivated and reactivated from the VMware SD-WAN Orchestrator.
- Issue 36923:
Cluster name may not be updated properly in the NetFlow interface description for a VMware SD-WAN Edge which is connected to that Cluster as its Hub.
- Issue 38682:
A VMware SD-WAN Edge acting as a DHCP server on a DPDK-enabled interface may not properly generate “New Client Device" events for all connected clients.
- Issue 38767:
When a WAN overlay that has GRE tunnels to Zscaler configured is changed from auto-detect to user-defined, stale tunnels may remain until the next restart.
Workaround: Restart the Edge to clear the stale tunnel.
- Issue 39134:
The System health statistic “CPU Percentage” may not be reported correctly on Monitor > Edge > System for the VMware SD-WAN Edge, and on Monitor > Gateways for the VMware SD-WAN Gateway.
Workaround: Users should use handoff queue drops for monitoring Edge capacity not CPU percentage.
- Issue 39374:
Changing the order of VMware SD-WAN Partner Gateways assigned to a VMware SD-WAN Edge may not properly set Gateway 1 as the local Gateway to be used for bandwidth testing.
- Issue 39608:
The output of the Remote Diagnostic “Ping Test” may display invalid content briefly before showing the correct results.
- Issue 39624:
Ping through a subinterface may fail when the parent interface is configured with PPPoE.
- Issue 39659:
On a site configured for Enhanced High Availability, with one WAN link on each VMware SD-WAN Edge, when the standby Edge has only PPPoE connected and the active has only non-PPPoE connected, a split brain state (active/active) may be possible if the HA cable fails.
- Issue 39753:
Deactivating Dynamic Branch-to-Branch VPN may cause existing flows currently being sent using Dynamic Branch-to-Branch to stall.
- Issue 40096:
If an activated VMware SD-WAN Edge 840 is rebooted, there is a chance an SFP module plugged into the Edge will stop passing traffic even though the link lights and the VMware SD-WAN Orchestrator will show the port as 'UP'.
Workaround: Unplug the SFP module and then replug it back into the port.
- Issue 40421:
Traceroute is not showing the path when passing through a VMware SD-WAN Edge with an interface configured as a switched port.
- Issue 42278:
For a specific type of peer misconfiguration, the VMware SD-WAN Gateway may continuously send IKE init messages to a Non-SD-WAN peer. This issue does not disrupt user traffic to the Gateway; however, the Gateway logs will be filled with IKE errors and this may obscure useful log entries.
- Issue 42388:
On a VMware SD-WAN Edge 540, an SFP port is not detected after deactivating and reactivating the interface from the VMware SD-WAN Orchestrator.
- Issue 42488:
On a VMware SD-WAN Edge where VRRP is enabled for either a switched or routed port, if the cable is disconnected from the port and the Edge Service is restarted, the LAN connected routes are advertised.
Workaround: There is no workaround for this issue.
- Issue 42872:
Enabling Profile Isolation on a Hub profile where a Hub cluster is associated does not revoke the Hub routes from the routing information base (RIB).
- Issue 43373:
When the same BGP route is learnt from multiple VMware SD-WAN Edges, if this route is moved from preferred to eligible exit in the Overlay Flow Control, the Edge is not removed from the advertising list and continues to be advertised.
Workaround: Enable distributed cost calculation on the VMware SD-WAN Orchestrator.
- Issue 44832:
Traffic from one Non SD-WAN Destinations via Edge to another Non SD-WAN Destinations via Edge (i.e. 'hairpinning' or 'NAT loopback'), is dropped on the VMware SD-WAN Edge.
- Issue 44995:
OSPF routes are not revoked from VMware SD-WAN Gateways and VMware SD-WAN Spoke Edges when the routes are withdrawn from the Hub Cluster.
- Issue 45189:
With source LAN side NAT is configured, the traffic from a VMware SD-WAN Spoke Edge to a Hub Edge is allowed even without the static route configuration for the NAT subnet.
- Issue 45302:
In a VMware SD-WAN Hub Cluster, if one Hub loses connectivity for more than 5 minutes to all of the VMware SD-WAN Gateways common between itself and its assigned Spoke Edges, the Spokes may in rare conditions be unable to retain the hub routes after 5 minutes. The issue resolves itself when the Hub regains contact with the Gateways.
- Issue 46053:
BGP preference does not get auto-corrected for overlay routes when its neighbor is changed to an uplink neighbor.
Workaround: An Edge Service Restart will correct this issue.
- Issue 46137:
A VMware SD-WAN Edge running 3.4.x software does not initiate a tunnel with AES-GCM encryption even if the Edge is configured for GCM.
- Issue 46216:
On a Non SD-WAN Destinations via Gateway or Edge where the peer is an AWS instance, when the peer initiates Phase-2 re-key, the Phase-1 IKE is also deleted and forces a re-key. This means the tunnel is torn down and rebuilt, causing packet loss during the tunnel rebuild.
Workaround: To avoid tunnel destruction, configure the Non SD-WAN Destinations via Gateway/Edge or CSS IPsec rekey timer to less than 60 minutes. This prevents AWS from initiating the re-key.
- Issue 46391:
For a VMware SD-WAN Edge 3800, the SFP1 and SFP2 interfaces each have issues with Multi-Rate SFPs (i.e. 1/10G) and should not be used in those ports.
Workaround: Please use single rate SFP's per the KB article VMware SD-WAN Supported SFP Module List (79270). Multi-Rate SFPs may be used with SFP3 and SFP4.
- Issue 46918:
A VMware SD-WAN Spoke Edge using the 3.4.2 Release does not update the private network id of a Cluster Hub node properly.
- Issue 47084:
A VMware SD-WAN Hub Edge cannot establish more than 750 PIM (Protocol-Independent Multicast) neighbors when it has 4000 Spoke Edges attached.
- Issue 47355:
When the same route is learned via local underlay BGP, Hub BGP and/or statically configured on the Partner Gateway, the sorting order of the routes is incorrect with the Hub BGP being preferred over the underlay BGP.
- Issue 47664:
In a Hub and Spoke configuration where Branch-to-Branch via Hub VPN is deactivated, trying to U-turn Branch-to-Branch traffic using a summary route on an L3 switch/router will cause routing loops.
Workaround: Configure Cloud VPN to enable Branch-to-Branch VPN and select “Use Hubs for VPN”.
- Issue 47681:
When a host on the LAN side of a VMware SD-WAN Edge uses the same IP as that Edge’s WAN interface, the connection from the LAN host to the WAN does not work.
- Issue 47787:
A VMware SD-WAN Spoke Edge configured with a backhaul business policy incorrectly sends traffic via the VMware SD-WAN Gateway path if that flow is initiated from the Hub Edge to that Spoke Edge.
- Issue 48166:
A VMware SD-WAN Virtual Edge on KVM is not supported when using a Ciena virtualization OS and the Edge will experience recurring Dataplane Service Failures.
- Issue 48175:
A VMware SD-WAN Edge running Release 3.4.2 will form an OSPF adjacency on a non-global segment if the non-global segment has an interface configured in the same IP range as an interface configured on the global segment
- Issue 48530:
VMware SD-WAN Edge 6x0 models do not perform autonegotiation for triple speed (10/100/1000 Mbps) copper SFP's.
Workaround: Edge 520/540 supports triple speed copper SFPs but this model has been marked for End-of-Sale by Q1 2021.
- Issue 48597: Multihop BGP neighborship does not stay up if one of the two paths to the peer goes down
If there is a Multihop BGP neighborship with a peer to which there are multiple paths and one of them goes down, user will notice that the BGP neighborship goes down and does not come up using the other available path(s). This includes the Local IP-loopback neighborship case too.
Workaround: There is no workaround for this issue.
- Issue 48666:
IPsec-fronted Gateway Path MTU calculation does not account for 61 Byte IPsec overhead, resulting in higher MTU advertisement to LAN client and subsequent IPsec packet fragmentation.
Workaround: There is no workaround for this issue.
- Issue 49172:
A Policy Based NAT rule configured with the same NAT subnet for two different VMware SD-WAN Edges does not work.
- Issue 49738:
In some cases, when a VMware SD-WAN Spoke Edge is configured to use multiple Hub Edges, the Spoke Edge may not form tunnels to one of the Hubs configured in the Hub list.
- Issue 50518:
On a VMware SD-WAN Gateway where PKI is enabled, if >6000 PKI tunnels attempt to connect to the Gateway, the tunnels may not all come up because inbound SAs do not get deleted.
Note: Tunnels using pre-shared key (PSK) authentication do not have this issue.
- Issue 51428: Multicast traffic loss may be observed on a site where the VMware SD-WAN Edge has a sub-interface configured with PIM.
When a sub-interface configured with PIM is moved from a segment to another on the fly, pimd (the process that manages PIM) may restart and the site would experience intermittent multicast traffic loss.
Workaround: Deactivate the sub-interface first, and then move the sub-interface to another segment. Once moved, re-enable the sub-interface.
- Issue 51436: For a site using an Enhanced High-Availability topology while deploying a VMware SD-WAN Edge using an LTE modem, if the site gets into a "split-brain" state, the HA failover takes ~5-6 minutes.
As part of the recovery from a split-brain state, the LAN ports are brought down on the Active Edge and this impacts LAN traffic during the time the ports are down and until the site can recover.
Workaround: There is no workaround for this issue
- Issue 52483: If underlay accounting is enabled for an interface, the VMware SD-WAN Edge wrongly forwards the traffic back to the same interface instead of forwarding to the overlay.
This behavior is caused by an issue with underlay accounting and a recursive route resolution.
Workaround: Toggle underlay accounting off for the affected interface.
- Issue 53219: After a VMware SD-WAN Hub Cluster rebalances, a few Spoke Edges may not have their RPF interface/IIF set properly.
On the affected Spoke Edges, multicast traffic will be impacted. What happens is that after a cluster rebalance, some of the Spoke Edge fail to send a PIM join.
Workaround: This issue will persist until the affected Spoke Edge has an Edge Service restart.
- Issue 53337: Packet drops may be observed with an AWS instance of a VMware SD-WAN Gateway when the throughput is above 3200 Mbps.
When traffic exceeds a throughput above 3200 Mbps and a packet size of 1300 bytes, packets drops are observed at RX and at IPv4 BH handoff.
Workaround: There is no workaround for this issue.
- Issue 53359: BGP/BFD session may fail during some DDoS attack scenarios.
If traffic is flooded from the client connected to the routed interface to the LAN client, the BGP/BFD session can fail. Also when real-time high priority traffic is flooded to the overlay destination, the BGP/BFD session can fail.
Workaround: There is no workaround for this issue.
- Issue 53830: On a VMware SD-WAN Edge, some of the routes in BGP view may not have the correct preference and advertise values when DCC flag is enabled causing incorrect sorting order in the Edge's FIB.
When Distributed Cost Calculation (DCC) is enabled in a scaled scenario with a large number of routes on an Edge, when looking at an Edge diagnostic bundle for the log bgp_view some of the routes may not be correctly updated with the preference and advertise values. This issue, if found at all, would be a found in a few Edges as part of a large enterprise (100+ Spoke Edges connected to either Hub Edges or Hub Clusters).
Workaround: This issue can be addressed by either relearning the underlay BGP routes or performing a "Refresh" option on the OFC page of the VMware SD-WAN Orchestrator for the affected routes. Please note that performing a "Refresh" of a route would re-learn the routes from all the Edges in the enterprise.
- Issue 53934: In an enterprise where a VMware SD-WAN Hub Cluster is configured, if the primary Hub has Multihop BGP neighborships on the LAN side, the customer may experience traffic drops on a Spoke Edge when there is a LAN side failure or when BGP is deactivated on all segments.
In a Hub cluster, the primary Hub has Multihop BGP neighborship with a peer device to learn routes. If the physical interface on the Hub by which BGP neighborship is established, goes down, then BGP LAN routes may not become zero despite BGP view being empty. This may cause Hub Cluster rebalancing to not happen. The issue may also be observed when BGP is deactivated for all segments and when there are one or more Multihop BGP neighborships.
Workaround: Restart the Hub which had the LAN-side failure (or BGP deactivated).
- Issue 54846: VMware SD-WAN SNMP MIBs use counters for Jitter, Latency, and Packet Loss.
In VMware SNMP MIBs, Latency, Jitter, and Packet Loss are defined as Counter64 which is not appropriate for these types. Counters should be used for data types that are ever increasing values and which never reset in SNMP like bytes Tx/Rx. In contrast, latency, jitter, and packet loss do not have ever increasing values but dynamically adjusted values and should not use counters.
Workaround: There is no workaround for this issue.
- Issue 59524: A VMware SD-WAN Edge may experience a Dataplane Service failure and restart to recover if DHCP Relay is configured on the Edge.
The issue can occur during high stress conditions where the allocation of DHCP relay packets may fail, and the Edge's DHCP Relay processing does not handle this properly and continued DHCP allocation failures eventually triggers an exception on the Edge Service and a restart.
Workaround: There is no workaround for this issue, but it is resolved on Edge Version 4.3.0 and later.
- Issue 59920: A VMware SD-WAN Edge may experience a Dataplane Service failure and restart when a configuration change is made to an Edge interface used by 32 or more paths.
A configuration change can include deleting the interface or updating the interface with new parameters. When a change is made on the interface using more than 32 paths, it triggers an exception in the Edge service that causes the restart.
Workaround: This issue is fixed in Release 4.3.0 and later. Contact VMware SD-WAN Support for the availability of a hotfix build for 4.2.2. On an Edge using a build without a fix for this issue, the user should only make changes to Edge interfaces in maintenance windows.
- Issue 61543: If more than one 1:1 NAT rule is configured on different interfaces with the same Inside IP, the inbound traffic can be received on one interface and the outbound packets of the same flow can be routed via different interface.
For the NAT flows from Outside to Inside, the 1:1 NAT rules will be matched against the Outside IP and the interface where the packets are received. For the outbound packets of the same flow, the VMWare SD-WAN Edge will try to match the NAT rules again comparing the Inside IP and the outbound traffic can be routed via the interface configured in the first matching rule with "Outbound Traffic" enabled.
Workaround: There is no workaround for this issue outside of ensuring no more than one 1:1 NAT rule is configured with a particular Inside IP address.
- Issue 61716: A VMware SD-WAN Edge using Release 4.2.0 or higher may drop VMware SD-WAN Orchestrator generated packets due to a route look-up failure.
The process that manages management packets has a hard rule for picking default cloud route for sending traffic. Release 4.2.x includes a a fix which changed the sorting logic and now the Edge prefers a default route learned via BGP/OSPF over the Cloud Route. In this scenario the route sorted default route look like this with underlay BGP being most preferred one. In Fast Path the Orchestrator packets are dropped since the route lookup does not give a Cloud Route for the destination but gives an underlay BGP route.
Workaround: There is no workaround for this issue.
- Issue 62275: Generating a diagnostic bundle on a VMware SD-WAN Gateway causes a spike in memory usage which could cause the Gateway to restart which also causes the diagnostic bundle to fail.
When a diagnostic bundle is fetched on a Gateway with a large number of links, the debug.py debugging command consumes a lot of memory to hold all the data and send it back to the Edge process after proper formatting. If there are other memory handling issues while this issue occurs the memory spike can exhaust the Gateway's memory, causing a restart to clear the memory.
- Issue 62552: A site may experience intermittent periods of high packet loss and connectivity issues.
This is caused by the API that checks for ARP resolution telling the Edge there is a successful ARP resolution for a device while delivering a MAC address of 00:00:00:00. This address is kept in the ARP cache and any packets intended for the device where the MAC is listed as zero are dropped. In this issue, many such instances of successful ARP's with zero MAC addresses are delivered causing high packet loss and connectivity issues.
Note: Both issue 60130 and this issue have the same underlying behavior and cause but the expected fixes for each ticket differ. 60130 will have a defensive workaround fix while 62552 will have a complete fix that prevents any recurrence of this issue.
Workaround: There is no workaround for this issue.
- Issue 62701: For a VMware SD-WAN Edge deployed as part of an Edge Hub Cluster, If Cloud VPN is not enabled under the Global Segment but is enabled under a Non-Global Segment, a control plane update sent by the Orchestrator may cause all the WAN links to flap on the Hub Edge.
The Hub Edge's WAN links going down, then up in rapid succession (flap) will impact real time traffic like voice calls. This issue was observed on a customer deployment where Cloud VPN was not enabled on the Hub Edge's Global segment, but the Cluster configuration was enabled which means this Hub Edge was part of a Cluster (and a Cluster configuration is applicable to all segments). When a configuration change is pushed to the Hub Edge, the Hub Edge's dataplane will start parsing data and will start with the Global Segment where it will see Cloud VPN not enabled and the Hub Edge erroneously thinks clustering is deactivated on this Global Segment. As a result, the Hub Edge will tear down all tunnels from the Hub's WAN link(s) which will cause link flaps on all that Edge's WAN links. For any such incident the WAN links only go down and recover a single time per control pane update.
Workaround: The workaround is to activate Cloud VPN on all segments, meaning the Global Segment and all Non-Global Segments.
- Issue 67458: When a VMware SD-WAN Hub Edge with a large number of Spoke Edges is upgraded to Release 4.2.1 or 4.2.2, some tunnels to other Spoke Edges will not come up for the Hub Edge.
A large number of Spoke Edges is understood at ~1000 or more. This issue is not consistent, but generally ~1/3rd of the VeloCloud Management Protocol (VCMP) tunnels are not established between the Hub Edge and the connected Spoke Edges. This is caused by the Hub Edge ignoring the
MP_INITs as the number of half open TDs exceeds the Hub Edge's upper limit.
Workaround: Restarting the Edge Service will restore full tunnel connectivity.
- Issue 69324: For a customer site deployed using a High-Availability topology, when HA is enabled on the site, both VMware SD-WAN Edges will display as Active.
This is not a true split-brain scenario because the Active and Standby are actually performing their assigned roles, they are just reporting an incorrect status that will show up on the VMware SD-WAN Orchestrator. Both Active and Standby Edge prompt will display as Active even though the HA verp command shows proper Active and Standby states.
Workaround: To clear the issue, the user needs to either restart or reboot the Standby Edge.
- Issue 74291: A VMware SD-WAN Edge in a High-Availability topology may appear as offline after a failover despite having internet access and functional DNS.
This issue can occur after a High-Availability failover and is caused by a token error on the newly promoted Active Edge which results in a heartbeat failure to the Orchestrator. Without the heartbeat, the Orchestrator marks the Edge as down and a user would not be able to update the Edge's configuration through the Orchestrator.
Workaround: Without an Edge build with the fix, the way to remediate the issue is to locally force another failover either through the local UI or by power-cycling the Active Edge.
- Issue 83212: When looking at the VMware SASE Orchestrator for Monitor > Edge > Transport, there is a discrepancy between the Link and Application statistics table.
The Application and Link statistics should be match but the Application statistics show a higher value than the Link statistics. This issue is most commonly occurs where there is a VMware SD-WAN Edge Hub Cluster topology where the Spoke Edge uses a single WAN link. If this single WAN link experiences some loss, the packets are retransmitted and are accounted twice in Application statistics which results in the observed discrepancy.
Workaround: There is no workaround for this issue.
- Issue 85369: For a site deployed with a High-Availability topology, the customer may observe customer traffic disruptions and possibly multiple reboots of the VMware SD-WAN Standby Edge.
A condition triggered by load and system events causes the Active Edge to experience delays in the timely delivery of HA heartbeats to the Standby Edge. The delay causes the Standby Edge to miss heartbeats and incorrectly assume the Active role causing an Active-Active state. To recover from the Active-Active state the Standby Edge reboots, possibly multiple times.
If the site does become Active-Active, a conventional HA setup would experience minimal traffic disruption since the Standby Edge does not pass traffic in this topology, but on an Enhanced HA deployment, where the Standby Edge is also passing traffic, the reboot(s) would disrupt some customer traffic.
Workaround: There is no workaround for this issue.
- Issue 85461: If a VMware SD-WAN Edge is used to forward DNS, and LAN devices connected to the Edge are using the Edge for DNS forwarding, all DNS traffic may fail.
All DNS forwarding traffic is affected, not just Conditional DNS. Depending on the Edge software, this issue can be encountered on an Edge as follows:
- If the Edge is using Release 4.2.2, the Edge can encounter this issue if the Edge is using routed LAN ports with no Gateway IP address specified. Switched LAN ports + VLANs are not affected in 4.2.2.
- If the Edge is using either Release 4.3.0/4.3.1, 4.5.0/4.5.1, or 5.0.0.x, the Edge can encounter the issue if the Edge is using switched LAN ports and VLANs, or the Edge is using routed LAN ports with no Gateway IP address specified.
For switched interfaces, the cause of the issue stems from the deprecation and removal of the Management IP interface in favor of a loopback interface in Release 4.3.0, 4.5.0, and 5.0.0.x and later. Because DNS uses segment NAT, the DNS packet has no matching entry for the destination IP when the Edge does segment NAT table lookup and the Edge drops the packet.
For routed interfaces, the lack of a Gateway IP means the DNS packet is routed to the Edge as the next hop and the Edge does not forward the DNS packet further.
Workaround: The workaround for this issue is to either not use the Edge to forward DNS, or...
- When using Edge Release 4.2.2: use either switched LAN ports or routed LAN ports that include a Gateway IP address.
- When using Release 4.3.x, 4.5.x, or 5.0.0.x use only routed LAN ports with a Gateway IP address specified.
- Issue 86098: For a site using an Enhanced High-Availability topology where a PPPoE WAN link is used on the Standby Edge, a user may observe that the default proxy route is not installed in the Active Edge and traffic using that link fails.
When an Enhanced HA Edge pair come up, the PPPoE link synchronizes with the Standby Edge and provides a default route with a next hop of 0.0.0.0. As a result this route is not installed on the Active and traffic using this link is dropped.
Workaround: There is no workaround for this issue.
- Issue 88604: For a site using a High-Availability topology, if a WAN interface goes down and then comes back up on a VMware SD-WAN Standby Edge, the event is not recorded on the VMware SASE Orchestrator.
A user does not have visibility on Standby Edge interface events, which is especially impactful on Enhanced HA deployments where the Standby Edge is also passing traffic.
Workaround: There is no workaround for this issue.
- Issue 89217: A VMware SD-WAN Edge in the 6x0 model line (610, 610N, 610-LTE, 620, 620N, 640, 640N, 680, 680N) may suddenly power off for no reason.
The 6x0 Edge would have all lights off, both the front status LED and the rear Ethernet port lights, and can only be recovered by manually power cycling the Edge.
The cause of the issue is traced to a PIC microcontroller exclusive to the Edge 6x0 line which uses a PIC firmware version of v20M or earlier (v20L, v20K, v20J). This issue can only occur when the 6x0 Edge uses a PIC version of v20M or earlier, but even with this version the odds of experiencing the power off issue are rare (approximately 1/1,000). The issue cannot occur on a 6x0 Edge with a PIC firmware version of v20N or later.
Note: A 6x0 Edge's Firmware including PIC version can be determined on an Orchestrator using 5.x by going to the Monitor > Edge > Overview page for that Edge and clicking the dropdown information box next to the Edge name which includes the Edge Information, Device Version, and the Device Firmware. However this only works on an Edge using Release 4.5.1.
The issue is resolved by upgrading the 6x0 Edge to Platform Firmware 1.3.1 (R131-20221216-GA), which includes PIC version v20N. To do this the 6x0 Edge must be connected to a VMware SASE Orchestrator using Release 5.x (5.0.0 or later), and the 6x0 Edge must first be upgraded to Edge hoftix build R5012-20230123-GA-103475. Once the 6x0 Edge is upgraded to R5012-20230123-GA-103475, the user would then update the 6x0 Edge Platform Firmware to version R131-20221216-GA in the same way that an Edge's software version is modified.
For more information and a step-by-step guide to upgrading a 6x0 Edge to Platform Firmware 1.3.1, see the KB Article: VMware SD-WAN 6X0 model Edges may power off with no LEDs and require a power cycle to come back to a working state (88970). This KB article was updated on January 27th, 2023 to reflect the new Edge and Platform Software needed to resolve the issue.
For information on uploading a Platform Firmware bundle to an Orchestrator, consult the Platform Firmware and Factory Images with New Orchestrator UI section of the VMware SD-WAN Operator Guide.
For information on updating a 6x0 Edge’s Platform Firmware, consult the View or Modify Edge Information section of the VMware SD-WAN Administration Guide.
Workaround: To recover the Edge from the problem state:
- Disconnect the Edge from the power source.
- Wait 20 seconds.
- Reconnect the Edge to the power source.
If you do not wish to upgrade the 6x0 Edge's platform firmware, the user can ensure the power to the Edge is consistent and does not flap rapidly or consistently. A good way to ensure a reliable power source is to connect the 6x0 Edge to a Uninterruptible Power Supply (UPS).
If the user prefers to keep the Edge on a lower software release (for example, Release 4.3.1, or 4.5.1), the customer can temporarily upgrade the Edge to R5012-20230123-GA-103475, perform the Platform Firmware upgrade to version 1.3.1 (R131-20221216-GA) so that the PIC version is v20N, and then downgrade the Edge’s software back to their preferred version. Downgrading the 6x0 Edge's software to an earlier version does not also downgrade the Edge's Platform Firmware and the Edge would continue to use Platform Firmware version 1.3.1. In this use case the customer Edges would need to be on an Orchestrator using Release 5.x.
If the 6x0 Edge is on an Orchestrator that does not use version 5.x and has experienced this issue and requires an update of its PIC firmware, the customer may reach out to VMware SD-WAN Support and they will manually update the Edge’s PIC version.
- Issue 91365: For a customer using Edge Network Intelligence, an VMware SD-WAN Edge where Analytics is configured experiences a memory leak that will result in the Edge triggering an Edge Service restart to clear the memory.
When the Analytics function is enabled on an Edge, the Edge's Dataplane service begins leaking memory at a steady rate that will result in the Edge needing to trigger an unscheduled Service Restart to clear the memory leak when it reaches a critical level (60% memory utilization for longer than 90 seconds). An Edge Service restart causes a 10-15 second disruption in customer traffic. In the field the time it takes to trigger an Edge Service restart has been ~3 to 4 days, and once the memory is cleared the memory leak will resume with the same general time window for the next Edge Service restart. The period when the Edge would reach a critical memory usage level depends on the Edge model and the amount of information the Analytics feature is recording for that Edge.
Workaround: The customer has two options, a) temporarily turn off Analytics for the Edge until a fixed Edge build is delivered; or b) monitor the Edge's memory. When memory utilization reaches 40% and the Orchestrator records a Memory Warning Event, schedule a manual Edge Service Restart in a maintenance window to clear the memory and ensure minimal customer impact.
- Issue 94204: A user may observe that attempts to generate a diagnostic bundle for a VMware SD-WAN Edge fail.
The Edge diagnostic bundles fail to complete because the Edge runs out of disk space. This can happen if the Edge has generated one or more cores and is caused by the Edge sending these cores to the /vnf/tmp folder. Each core is unpacked in the /vnf/tmp folder and due to a core's unpacked size quickly fills this folder which causes the diagnostic bundle to fail.
Workaround: There is no workaround for this issue.
- Issue 96441: On a site using a High Availability Topology, the customer may observe frequent HA failovers.
The issue is triggered by the HA interface being marked by the Edge as down and then coming back up within 500-1000ms which can trigger an HA failover. However, these interface down events are spurious and caused by a DPDK-enabled interface using polling with an interval of 500ms to determine interface status. Using this method, the underlying device driver can sometimes report a spurious interface down event and each event causes the Edge to mark the interface as down until the next poll of the interface status (in 500ms) reports that the interface is up.
Workaround: There is no workaround for this issue.
- Issue 96888: In certain load conditions, the routing protocols for either BGP or OSPF may randomly restart, leading to route re-convergence and traffic disruption.
Under higher load conditions the BGP and OSPF routing protocol processes are made to wait longer than expected by the Edge CPU to get scheduled and this leads to a stall and restart of the routing protocol. The routing protocol delay is caused by insufficient CPU bandwidth allocation and can occur on any Edge model.
Workaround: If an Edge is experiencing this issue, a customer may contact VMware Support for assistance or upgrade their Edge to Release 4.5.1, build R451-20220916-GA or later.
- Issue 98136: For customer enterprises using a Hub/Spoke topology where Dynamic Branch To Branch VPN is configured, client users behind a SD-WAN Spoke Edge may observe that some traffic has unexpected latency resulting from the traffic using a sub-optimal path.
Spoke Edge traffic that experiences this issue uses a route that was initially a non-uplink route for a Hub Edge not included in the Profile the Spoke Edge was using. A Dynamic Branch To Branch VPN tunnel can be formed from the Spoke Edge to the Hub Edge because of traffic being sent towards some other unrelated prefix and in this instance the non-uplink route is installed in the Spoke Edge.
As a result of this non-uplink route, all traffic towards this prefix starts going through the Hub Edge and the non-uplink route becomes uplink (community change to uplink community) but the non-uplink route installed previously is not revoked and the traffic takes the Hub Edge path as long as the Dynamic Branch To Branch VPN tunnel remains up.
Workaround: Wait for the Dynamic Branch To Branch VPN tunnel to tear down, after which the uplink route will not be installed in the Spoke Edge when a new Dynamic Branch To Branch VPN tunnel is formed towards the Hub Edge.
- Issue 19566:
After High Availability failover, the serial number of the standby VMware SD-WAN Edge may be shown as the active serial number in the Orchestrator.
- Issue 21342:
When assigning Partner Gateways per-segment, the proper list of Gateway Assignments may not show under the Operator option "View" Gateways on the VMware SD-WAN Edge monitoring list.
- Issue 24269:
Monitor > Transport > Loss not graphing observed WAN link loss while QoE graphs do reflect this loss.
- Issue 25932:
The VMware SD-WAN Orchestrator allows VMware SD-WAN Gateways to be removed from the Gateway Pool even when they are in use.
- Issue 32335:
The ‘End User Service Agreement’ (EUSA) page throws an error when a user is trying to accept the agreement.
Workaround: Ensure no leading or trailing spaces are found in Enterprise Name.
- Issue 32435:
A VMware SD-WAN Edge override for a policy-based NAT configuration is permitted for tuples which are already configured at the profile level and vice versa.
- Issue 32856:
Though a business policy is configured to use the Hub cluster to backhaul internet traffic, the user can unselect the Hub cluster from a profile on a VMware SD-WAN Orchestrator that has been upgraded from Release 3.2.1 to Release 3.3.x.
- Issue 32913:
After Enabling High Availability, Multicast details for the VMware SD-WAN Edge are not displayed on the Monitoring Page. A failover resolves the issue.
- Issue 33026:
The ‘End User Service Agreement’ (EUSA) page does not reload properly after deleting the agreement.
- Issue 34828:
Traffic cannot pass between a VMware SD-WAN Spoke Edge using release 2.x and a Hub Edge using release 3.3.1.
- Issue 35658:
When a VMware SD-WAN Edge is moved from one profile to another which has a different CSS setting (e.g. IPsec in profile1 to GRE in profile2), the Edge level CSS settings will continue to use the previous CSS settings (e.g. IPsec versus GRE).
Workaround: Deactivate and then reactivate GRE at the Edge level to resolve the issue.
- Issue 35667:
When a VMware SD-WAN Edge is moved from one profile to another profile which has the same CSS setting but a different GRE CSS name (the same endpoints), some GRE tunnels will not show in monitoring.
Workaround: Deactivate and then reactivate GRE at the Edge level to resolve the issue.
- Issue 36665:
If the VMware SD-WAN Orchestrator cannot reach the internet, user interface pages that require accessing the Google Maps API may fail to load entirely.
- Issue 38056:
The Edge-Licensing export.csv file not show region data.
- Issue 38843:
When pushing an application map, there is no Operator event, and the Edge event is of limited utility.
- Issue 39633:
The Super Gateway hyper link does not work after a user assigns the Alternate Gateway as the Super Gateway.
- Issue 39790:
The VMware SD-WAN Orchestrator allows a user to configure a VMware SD-WAN Edge’s routed interface to have greater than the supported 32 subinterfaces, creating the risk that a user can configure 33 or more subinterfaces on an interface which would cause a Dataplane Service Failure for the Edge.
- Issue 40341:
Though the Skype application is properly categorized on the backend as Real Time traffic, when editing the Skype Business Policy on the VMware SD-WAN Orchestrator, the Service Class may erroneously display “Transactional”.
- Issue 41691:
User cannot change the 'Number of addresses' field although the DHCP pool is not exhausted on the Configure > Edge > Device page.
- Issue 43276:
User cannot change the Segment type when a VMware SD-WAN Edge or Profile has a partner gateway configured.
- Issue 44153:
The VMware SD-WAN Orchestrator does not consistently send alert emails to the email addresses configured in the 'Alerts and Notifications' section.
- Issue 46254:
During a VMware SD-WAN Edge activation, the VMware SD-WAN Orchestrator does not detect a changed WAN link MTU or the presence of a VLAN ID for DHCP configured interfaces.
- Issue 47269:
The VMware SD-WAN 510-LTE interface may appear for Edge models that do not support an LTE interface.
- Issue 47713:
If a Business Policy Rule is configured while Cloud VPN is deactivated , the NAT configuration must be reconfigured upon enabling Cloud VPN.
- Issue 47820:
If a VLAN is configured with DHCP deactivated at the Profile level, while also having an Edge Override for this VLAN on that Edge with DHCP enabled, and there is an entry for the DNS server field set to none (no IP configured), the user will be unable to make any changed on the Configure > Edge > Device page and will get an error message of ‘invalid IP address []’ that does not explain or point to the actual problem.
- Issue 48085:
The VMware SD-WAN Orchestrator allows a user to delete a VLAN which is associated with an interface.
- Issue 48737:
On a VMware SD-WAN Orchestrator which is using the Release 4.0.0 new user interface, If a user is on a Monitor page and changes the Start & End time interval and then navigates between tabs, the Orchestrator does not update Start & End interval time to the new values.
- Issue 49225:
VMware SD-WAN Orchestrator does not enforce a limit of 32 total VLANs.
- Issue 49790:
When a VMware SD-WAN Edge is activated to Release 4.0.0, the activation is posted twice in Events.
Workaround: Ignore the duplicate event.
- Issue 50531:
When two Operators of differing privileges use the same browser window when accessing the New UI on a 4.0.0 Release version of the VMware SD-WAN Orchestrator, and the Operator with lesser privileges tries to login after the Operator with higher privileges, that lesser privileged Operator will observe multiple errors stating that the "user does not have privilege".
Note: There is no escalation in privileges for the Operator with lower privileges, only the display of error messages.
Workaround: The next operator may refresh that page prior to logging in to prevent seeing the errors, or each Operator may use different browser windows to avoid this display issue.
- Issue 51722: On the Release 4.0.0 VMware SD-WAN Orchestrator, the time range selector is no greater than two weeks for any statistic in the Monitor > Edge tabs.
The time range selector does not show options greater than "Past 2 Weeks" in Monitor > Edge tabs even if the retention period for a set of statistics is much longer than 2 weeks. For example, flow and link statistics are retained for 365 days by default (which is configurable), while path statistics are retained only for 2 weeks by default (also configurable). This issue is making all monitor tabs conform to the lowest retained type of statistic versus allowing a user to select a time period that is consistent with the retention period for that statistic.
Workaround: A user may use the "Custom" option in the time range selector to see data for more than 2 weeks.
- Issue 60039: RMA Reactivation does not work when the VMware SD-WAN Edge model is changed.
When performing an RMA Reactivation for a site where the Edge model is also being changed, the VMware SD-WAN Orchestrator does not save the model change making the reactivation link ineffective. This only affects RMA Reactivations where the Edge model is changed, an RMA Reactivation where the Edge model remains the same will work as expected.
Workaround: If using a different Edge model for a site, the user would need to create a new Edge and manually apply all Edge-specific settings.