VMware SASE 4.5.2 | 12 April 2024

  • VMware SASE™ Orchestrator Version R452-20230730-GA

  • VMware SD-WAN™ Gateway Version R452-20230628-GA

  • VMware SD-WAN™ Edge Version R452-20240125-GA

Check for additions and updates to these release notes.

What's in the Release Notes

The release notes cover the following topics:

This release is recommended for all customers who require the features and functionality first made available in Release 4.5.0, as well as those customers impacted by the issues listed below which have been resolved since Release 4.5.1.

Important:

Release 4.5.2 contains all Edge, Gateway and Orchestrator fixes that are listed in the 4.5.1 Release Notes.

Caution:

Release 4.5.x Orchestrators and Gateways reached End of General Support (EOGS) on September 30, 2023. As a result there are no additional updated builds for the Gateway after R452-20230628-GA, and the Orchestrator after R452-20230730-GA.

Compatibility

Release 4.5.2 Orchestrators, Gateways, and Hub Edges support all previous VMware SD-WAN Edge versions greater than or equal to Release 4.2.0.

Note:

This means Edge releases prior to 4.2.0 are not supported.

The following SD-WAN interoperability combinations were explicitly tested:

Orchestrator

Gateway

Edge

Hub

Branch/Spoke

4.5.2

4.5.2

4.2.2

4.2.2

4.5.2

4.5.2

4.5.2

4.2.2

4.5.2

4.5.2

4.2.2

4.5.2

4.5.2

4.3.2

4.5.2

4.3.2

4.5.2

4.3.2

4.3.2

4.3.2

4.5.2

4.5.2

4.3.2

4.3.2

4.5.2

4.5.2

4.5.2

4.3.2

4.5.2

4.5.2

4.3.2

4.5.2

4.5.2

4.5.0

4.5.2

4.5.0

4.5.2

4.5.0

4.5.0

4.5.0

4.5.2

4.5.2

4.5.0

4.5.0

4.5.2

4.5.2

4.5.2

4.5.0

4.5.2

4.5.2

4.5.0

4.5.2

4.5.2

4.5.1 RU3

4.5.2

4.5.1 RU3

4.5.2

4.5.1 RU3

4.5.1 RU3

4.5.1 RU3

4.5.2

4.5.2

4.5.1 RU3

4.5.1 RU3

4.5.2

4.5.2

4.5.2

4.5.1 RU3

4.5.2

4.5.2

4.5.1 RU3

4.5.2

5.1.0

4.5.2

4.5.0

4.5.2

5.1.0

5.1.0

4.5.2

4.5.0

4.5.0

4.5.2

4.5.2

4.5.2

4.5.0

4.5.0

4.5.2

4.5.2

4.5.0

4.5.2

4.5.0

4.5.0

Important:

VMware SD-WAN Release 4.0.x has reached End of Support; Release 4.2.x and 4.3.x have reached End of Support for Gateways and Orchestrators; and 4.5.x is approaching End of Support for Gateways and Orchestrators.

  • Release 4.0.x reached End of General Support (EOGS) on September 30, 2022, and End of Technical Guidance (EOTG) December 31, 2022. 

  • Release 4.2.x Orchestrators and Gateways reached End of General Support (EOGS) on December 30, 2022, and End of Technical Guidance on (EOTG) March 30, 2023.   

  • Release 4.2.x Edges reached End of General Support (EOGS) on June 30, 2023, and will reach End of Technical Guidance (EOTG) September 30, 2025.

  • Release 4.3.x Orchestrators and Gateways reached End of General Support (EOGS) on June 30, 2023, and End of Technical Guidance (EOTG) September 30, 2023.

  • Release 4.3.x Edges reached End of General Support (EOGS) on June 30, 2023, and will reach End of Technical Guidance (EOTG) September 30, 2025.

  • Release 4.5.x Orchestrators and Gateways reached End of General Support (EOGS) on September 30, 2023, and End of Technical Guidance on (EOTG) December 31, 2023.

  • For more information please consult the Knowledge Base article: Announcement: End of Support Life for VMware SD-WAN Release 4.x (88319).

Upgrade Paths for Orchestrator, Gateway, and Edge

Orchestrator

Due to infrastructure changes in the Orchestrator beginning in Release 4.0.0, any Orchestrator using a 3.x Release needs to be first upgraded to 4.0.0 prior to being upgraded to 4.5.2. Orchestrators using Release 4.0.0 or later can be upgraded to Release 4.5.2.  Thus, the upgrade paths for the Orchestrator are as follows:

Orchestrator using Release 3.x → 4.0.0 → 4.5.2.

Orchestrator using Release 4.x → 4.5.2.

Gateway

Gateway upgrades from 3.x to 4.5.2 are not supported. In place of upgrading, a 3.x Gateway needs to be freshly deployed with the same VM attributes, and the old instance is then deprecated.

Upgrading a Gateway using Release 4.0.0 or later is fully supported for all Gateway types.

Edge

An Edge can be upgraded directly to Release 4.5.2 from any Release 3.x or later.

Important Notes

VMware Security Advisory 2024-0008

LAN-Side NAT Behavioral Change

Beginning in Releas 4.5.0, when a LAN-side NAT is configured for many-to-one translations using Port Address Translation (PAT), traffic initiated from the opposite direction can allow unexpected access to fixed addresses based on the outside mask and original IP address. This new behavior applies to Destination NAT (DNAT), Source NAT (SNAT), and Source and Destination NAT (S+D NAT) rules.

For example, a SNAT rule with an inside network of 192.168.1.0/24 and an outside address of 10.1.1.100/32 permits outside-to-inside translation to 192.168.1.100.

To address this new behavior, SD-WAN now blocks traffic when a connection is initiated in the reverse PAT direction.

To restore the original behavior, a user needs to configure two rules of the same type as the original rule (SNAT, DNAT, S+D NAT) in a particular order. For example, using the earlier SNAT scenario a user needs to configure the following:

  1. SNAT rule with an inside network of 192.168.1.100/32 and an outside address of 10.1.1.100/32

  2. SNAT rule with an inside network of 192.168.1.0/24 and an outside address of 10.1.1.100/32

If the original rule is a DNAT or S+D NAT, then the user would need two DNAT or S+D NAT rules with the same structure and order.

Beginning in Release 4.5.0 and through 5.2.0, a user can determine if flows are dropped for this type of traffic in the dispcnt logs of a diagnostic bundle by searching for the counter lan_side_nat_reverse_pat_drop.

Potential Issue with Sites Using a High Availability Topology

A site where a pair of Edges are deployed in a High Availability topology may encounter an issue where the Standby Edge reboots one or more times to resolve an Active-Active state. The Standby Edge reboot(s) can cause a disruption of customer traffic with the impact greater on sites using an Enhanced HA topology as the Standby Edge also passes customer traffic. The issue is being tracked by Issue #85369, which is fixed in the 1st rollup build for Release 4.5.1: R451-20220701-GA. The issue is tracked under the Edge/Gateway Resolved Issues section for R451-20220701-GA in these Release Notes and it is strongly recommended that customers with HA sites upgrade their Edges to R451-20220701-GA at a minimum and preferably the latest 4.5.2 Edge software.

Accessing Cloud Web Security and Secure Access

A customer wishing to access VMware Cloud Web Security or VMware Secure Access must upgrade their Edges to Release 4.5.0 or later.  These services are inaccessible on Edges using a release earlier than 4.5.0.

Extended Upgrade Time for Edge 3x00 Models

Upgrades to this version may take longer than normal (3-5 minutes) on Edge 3x00 models (i.e., 3400, 3800 and 3810). This is due to a firmware upgrade which resolves issue 53676. If an Edge 3400 or 3800 had previously upgraded its firmware when on Release 4.2.1 or 4.3.0 or later then the Edge would upgrade as expected. For more information, please consult Fixed Issue 53676 in the respective release notes.

Limitation with BGP over IPsec on Edge and Gateway, and Azure Virtual WAN Automation

The BGP over IPsec on Edge and Gateway feature is not compatible with Azure Virtual WAN Automation from Edge or Gateway. Only static routes are supported when automating connectivity from an Edge or Gateway to an Azure vWAN.

Limitation When Deactivating Autonegotiation on VMware SD-WAN Edge Models 520, 540, 620, 640, 680, 3400, 3800, and 3810

When a user deactivates autonegotiation to hardcode speed and duplex on ports GE1 - GE4 on a VMware SD-WAN Edge model 620, 640 or 680; on ports GE3 or GE4 on an Edge 3400, 3800, or 3810; or on an Edge 520/540 when an SFP with a copper interface is used on ports SFP1 or SFP2, the user may find that even after a reboot the link does not come up.

This is caused by each of the listed Edge models using the Intel Ethernet Controller i350, which has a limitation that when autonegotiation is not used on both sides of the link, it is not able to dynamically detect the appropriate wires to transmit and receive on (auto-MDIX). If both sides of the connection are transmitting and receiving on the same wires, the link will not be detected. If the peer side also does not support auto-MDIX without autonegotiation, and the link does not come up with a straight cable, then a crossover Ethernet cable will be needed to bring the link up.

For more information please see the KB article Limitation When Deactivating Autonegotiation on VMware SD-WAN Edge Models 520, 540, 620, 640, 680, 3400, 3800, and 3810 (87208).

Mixing Wi-Fi Capable and Non-Wi-Fi Capable Edges in High Availability Is Not Supported 

Beginning in 2021, VMware SD-WAN introduced Edge models which do not include a Wi-Fi module: the Edge models 510N, 610N, 620N, 640N, and 680N. While these models appear identical to their Wi-Fi capable counterparts except for Wi-Fi, deploying a Wi-Fi capable Edge and a Non-Wi-Fi capable Edge of the same model (for example, an Edge 640 and an Edge 640N) as a High Availability pair is not supported. Customers should ensure that the Edges deployed as a High Availability pair are of the same type: both Wi-Fi capable, or both Non-Wi-Fi capable.

Available Languages

The VMware SASE Orchestrator using version 4.5.2 is localized into the following languages: Czech, English, European Portuguese, French, German, Italian, Spanish, Japanese, Korean, Simplified Chinese, and Traditional Chinese.

Document Revision History

April 12th, 2024. Seventeenth Edition.

  • Corrected the wording for Open Issue #118704 to change the workaround from a CLI action to an action on the Orchestrator UI to restart the Edge service to remediate the issue.

  • Added Open Issue #142366 to the Edge/Gateway Known Issues section.

April 2nd, 2024. Sixteenth Edition.

  • Added an Important Note regarding CVE-2024-22247, which details a missing authentication and protection mechanism vulnerability that impacts an SD-WAN Edge. VMware's response to this vulnerability is documented in VMSA-2024-0008. More information on mitigating this vulnerability is found in the KB article: VMware Response to CVE-2024-22247 (VMSA-2024-0008) (97391).

March 26th, 2024. Fifteenth Edition.

March 15th, 2024. Fourteenth Edition.

  • Removed Issue #118568 from the Edge /Gateway Resolved Issues section. This issue was previously located in the Edge build R452-20231025-GA section but the fix was never included for the Release 4.5.2. The fix for this issue is available in any 5.2.0 or later Edge build.

March 6th, 2024. Thirteenth Edition.

  • Added Fixed Issue #97559 to the Edge/Gateway Resolved Issues section for the second Edge/Gateway rollup build R452-20231025-GA. This issue was omitted in error from the eighth edition of the release notes.

  • Added Fixed Issue #97759 to the Edge/Gateway Resolved Issues section for the original GA Edge/Gateway build R452-20230628-GA. This issue should have been included in the first edition of these release notes.

  • Added Open Issue #115089 to the Edge/Gateway Known Issues section.

February 5th, 2024. Twelfth Edition.

  • Added a new Edge rollup build R452-20240125-GA to the Edge/Gateway Resolved section. This is the fourth Edge rollup build and is the new default Edge GA build for Release 4.5.2.

  • Edge build R452-20240125-GA includes the fixes for issues #124844, #126520, #126571, #130901, #130907, #132274, and #132716, each of which is documented in this section.

  • Note:

    This is an Edge-only release. The default Gateway Release 4.5.2 build remains R452-20230628-GA.

December 14th, 2023. Eleventh Edition. R452-202401xx-GA

  • Added a new Edge rollup build R452-20231205-GA to the Edge/Gateway Resolved section. This is the third Edge rollup build and is the new default Edge GA build for Release 4.5.2.

  • Edge build R452-20231205-GA includes the fixes for issues #114562, #130368, and #133297, each of which is documented in this section.

  • Added Open Issues #131122 and #134088 to the Edge/Gateway Known Issues section.

  • Note:

    This is an Edge-only release. The default Gateway Release 4.5.2 build remains R452-20230628-GA.

November 02, 2023 Tenth Edition.

  • Moved Open Issue #103662 from Edge/Gateway Known Issues section to the Edge/Gateway Resolved Issues section. This ticket covered the behavior of High Availability Edges entering an Active/Active state and requiring the Standby to reboot. The causes of this HA Edge behavior are resolved through three tickets: #112115, #112131, and #118333, each of which is documented in the 4.5.2 Release Notes.

November 01, 2023 Ninth Edition.

  • Added the 4.5.2 Gateway hotfix build number for Fixed Issue #116257 in the Edge/Gateway Resolved Issues section.

October 30, 2023. Eighth Edition.

  • Added a new Edge rollup build R452-20231025-GA to the Edge/Gateway Resolved section. This is the second Edge rollup build and is the new default Edge GA build for Release 4.5.2.

  • Edge build R452-20231025-GA includes the fixes for issues #62701, #72965, #74422, #103049, #103118, #105034, #105933, #109963, #110406, #110561, #110577, #111592, #112115, #112509, #116257, #116593, #116894, #118333, #119010, #119853, #121998, #122426, #122790, #122988, #123128, #123214, #123475, #123593, #123954, #124106, #124162, #125421, #125487, #126458, #126500, #126519, #127403, and #127603, each of which is documented in this section.

  • Note:

    This is an Edge-only release. The default Gateway Release 4.5.2 build remains R452-20230628-GA.

October 18rd, 2023. Seventh Edition.

October 3rd, 2023. Sixth Edition.

  • Added Open Issue #105933 to the Edge/Gateway Known Issues section.

  • Updated the Compatibility section regarding which VMware SD-WAN 4.x software versions are either end of support life or entering their end of end of support life window. In particular, 4.5.x Orchestrators and Gateways entered this window starting on September 30th, 2023 by reaching End of General Support (EOGS).

August 24th, 2023. Fifth Edition.

  • Added an Available Languages section to make clear the languages into which the VMware SASE 4.5.2 Orchestrator is localized.

August 4th, 2023. Fourth Edtion.

  • Added a new Edge rollup build R452-20230803-GA to the Edge/Gateway Resolved section. This is the first Edge rollup build and is the new default Edge GA build for Release 4.5.2.

  • Edge build R452-20230803-GA includes the fixes for issues #114938, #117037, and #122528, each of which is documented in this section.

    Note:

    This is an Edge-only release. The default Gateway Release 4.5.2 build remains R452-20230628-GA.

August 3rd, 2023. Third Edition.

  • Added a new Orchestrator rollup build R452-20230730-GA to the Orchestrator Resolved section. This is the first Orchestrator rollup build and is the new default Orchestrator GA build for Release 4.5.2.

  • Orchestrator build R452-20230730-GA includes the fixes for issues #64145, and #119080, each of which is documented in this section.

July 12th, 2023. Second Editon.

  • The first edition of the 4.5.2 Release Notes was published with an incomplete list of all fixed and open issues. The second edition corrects this and represents a complete record of what is fixed in Release 4.5.2 and also the tickets which remain open for this release.

July 6th, 2023. First Edition.

Edge and Gateway Resolved Issues

Important:

Release 4.5.2 includes all Edge and Gateway fixed issues listed in the 4.5.1 Release Notes.

Resolved in Edge Version R452-20240125-GA

Edge build R452-20240125-GA was released on 02-02-2024 and is the 4th Edge rollup for Release 4.5.2.

This Edge rollup build addresses the below critical issues since the 3rd Edge rollup, R452-20231205-GA.

  • Fixed Issue 124844: For a customer enterprise where a Hub/Spoke network topology is used and also deploys one or more sites with a High Availability Edge topology, Spoke Edge routes may be shared with another Spoke Edge even though branch to branch VPN is not enabled. Should this occur, these false routes (reachability: False) being present on the Active HA Edge are used for data traffic over the Reachability: True routes, which impacts network connectivity and customer traffic.

    In this scenario one of the Spoke Edges is configured to act as a Hub in a different profile, and for this specific use case it would be expected for the Spoke Edge routes to be shared without branch to branch VPN enabled. The issue is that after HA failover, these stale routes installed on the newly Active HA Edge should be deleted and sometimes this does not happen which results in the issue documented above.

    On a site using HA Edges without a fix for this issue, the only way to temporarily remediate the issue is to reboot the HA Edges. A more lasting workaround is to enable route backtracking to resolve connectivity issues.

  • Fixed Issue 126520: Users in an enterprise with a large number of active applications may observe that traffic matching Business Policy rules is not always steered properly.

    In environments with many active applications, the Edge DNS cache can become full, which would trigger an alert every 10 minutes that DNS entries have been missed. In addition, when the DNS cache is full, this issue could also impact first packet routing based on business policies.

    Clearing the Edge DNS cache will temporarily relieve the issue.

  • Fixed Issue 126571: A VMware SD-WAN Edge may experience multiple kernel panics and core dumps resulting in the Edge restarting repeatedly.

    These kernel panics are the result of an out of memory (OOM) condition. While writing a core to the Edge's persistent storage, page allocation for the file system and I/O writes exacerbate the memory consumption and eventually cause the OOM condition.

    Since core dumps are write-only from the kernel, it is unnecessary to keep anything in the Edge's memory cache. As a result, the fix for this issue involves bypassing the Edge's page-cache completely and writing out the core dump and then synchronizing to flush the file system and I/O cache.

  • Fixed Issue 130901: User traffic may drop for flows which try to switch from "backhaul via CSS" to "Direct".

    The Edge does not allow switching the traffic path if the flow matches a different business policy rule which steers it to via some other path/route. This handling was missing for the flows which start with "backhaul via CSS".

    On an Edge without a fix for this issue, the user can configure the business policy to steer traffic with a destination IP address.

  • Fixed Issue 130907: SNMP walk on ipAddrTable does not fetch the entire table.

    An SNMP walk on the ipAddrTable only fetches the IP Address, the rest of the fields are not fetched due to a corruption of the data of physical LAN interfaces.

  • Fixed Issue 132274: A VMware SD-WAN Edge may go offline on the Orchestrator is a user downgrades from a 5.x build to a 4.x build.

    When an Edge is loaded with a 5.x image, the DNS cache entry is created with both IPv4 and IPv6 addresses, for example:

    {"vco-fd00-aaaa-1-1--2-169.254.8.2": {"resolved": 1702377994, "addr":["fd00:aaaa:1:1::2", "169.254.8.2"]}}

    In this example, the field addr is a list but in 4.x it is expected as a scalar with only an IPv4 address.

    This results in the Edge service not accepting a resolved FQDN as provided by the Edge management process with an unexpected format from the DNS cache and this causes the Edge to go offline.

    If an Edge without a fix for this issue encounters this issue, the workaround is to adjust the DNS cache entry to the 4.5 format in /velocloud/state/dns_cache.json and then restart the Edge service. Here is an example of a 4.5 DNS cache format: {"vco-fd00-aaaa-1-1--2-169.254.8.2": {"resolved": 1702377994, "addr": "169.254.8.2"}}.

  • Fixed Issue 132716: 1:1 NAT rule may not work on a VLAN-enabled PPPoE WAN link.

    When adding a self-ip to the self-ip table for IP-based WAN interfaces like PPPoE, the Edge process should always account for the VLAN. The issue is that when the Edge receives packets from this type of interface, the VLAN is not present and the lookup fails and all packets steered to the PPPoE link are dropped.

Resolved in Edge Version R452-20231205-GA

Edge build R452-20231205-GA was released on 12-14-2023 and is the 3rd Edge rollup for Release 4.5.2.

This Edge rollup build addresses the below critical issues since the 2nd Edge rollup, R452-20231025-GA.

  • Fixed Issue 114562: Rate limiting may not work for SSH flows.

    When a business policy is created to rate limit the transit SSH flows, the rate limit setting will not honored although the business policy is applied successfully. This is because these flows are considered control flows although the SSH is not for the Edge, but for some remote devices.

  • Fixed Issue 130368: Direct traffic that matches a Business Policy rule where "Available" link steering with "Transport Group" is configured does not work as expected if the desired link is in an Unstable state.

    WAN link's are marked as Unstable usually for high loss, but any factor that trigger an Unstable state is equally valid for this issue. The issue is the result of the Edge's link selection code for "Available" with "transport group" steering traffic to a better quality link even though the "Available" option only requires that the link be up, not that it also be of good quality.

  • Fixed Issue 133297: For a customer using the Self-Healing feature or who deploy sites with a High Availability topology, the Self-Healing feature and HA Split Brain prevention do not work as expected.

    In either case the customer would have Edges using 4.5.2 software which are connected to an SD-WAN Gateway using a software version earlier than 5.1.0.0 (that means the issue can occur when connected to any 4.5.x or 5.0.x Gateway).

    Both Self-Healing and HA Split Brain prevention rely on the Edge receiving critical management messages from the Gateway that the Edge must implement to correct what the SD-WAN and Edge Network Intelligence services have detected. With this issue the messages (for example: RMSG_CLIENT_FLUSH_FOR_NHID_DSTID and RMSG_CLIENT_HA_SPLITBRAIN) may not achieve their desired purpose for Edges running either 4.5.x or 5.0.x software while using Gateways which use a software version lower than 5.2.2.0. This is the result of an interoperability issue between the Edge and Gateway with regards to these messages.

    The third 4.5.2 Edge rollup version includes the Edge fix for this issue, but the customer must also ensure their Edges connect to a Gateway with 5.1.0.x or later software version to fully resolve this issue.

Resolved in Edge Version R452-20231025-GA

Edge build R452-20231025-GA was released on 10-30-2023 and is the 2nd Edge rollup for Release 4.5.2.

This Edge rollup build addresses the below critical issues since the 1st Edge rollup, R452-20230803-GA.

  • Fixed Issue 62701: For a VMware SD-WAN Edge deployed as part of an Edge Hub Cluster, If Cloud VPN is not enabled under the Global Segment but is enabled under a Non-Global Segment, a control plane update sent by the Orchestrator may cause all the WAN links to flap on the Hub Edge.

    The Hub Edge's WAN links going down, then up in rapid succession (flap) will impact real time traffic like voice calls. This issue was observed on a customer deployment where Cloud VPN was not enabled on the Hub Edge's Global segment, but the Cluster configuration was enabled which means this Hub Edge was part of a Cluster (and a Cluster configuration is applicable to all segments). When a configuration change is pushed to the Hub Edge, the Hub Edge's Dataplane will start parsing data and will start with the Global Segment where it will see Cloud VPN not enabled and the Hub Edge erroneously thinks clustering is not enabled on this Global Segment. As a result, the Hub Edge will tear down all tunnels from the Hub's WAN link(s) which will cause link flaps on all that Edge's WAN links. For any such incident the WAN links only go down and recover a single time per control pane update.

    For Edges without a fix for this issue, the workaround is to activate Cloud VPN on all segments, meaning the Global Segment and all Non-Global Segments.

  • Fixed Issue 72965: For a customer site deployed with a High Availability topology, a customer may observe a failover or in the case of an Enhanced HA deployment, disruption in traffic routed through the Standby Edge.

    The HA failover or Standby restart is the result of one of the HA Edges experiencing a Dataplane Service failure. The failure is the result of the HA Edge not handling HA link synchronization messages and lock/unlock order properly and this leads to a deadlock in HA threads and the service failure.

  • Fixed Issue 74422: In cases of High Availability, the Edge may go offline if only the Standby Edge has a WAN link which is up and has a valid IP address.

    This issue occurs when a WAN link has DHCP enabled where only the Standby Edge has a WAN link available. When the Standby WAN link receives an IP address from the DHCP server, it sends the interface details to the Active Edge. The Active Edge makes a call to add the IP address as a route, however this function does not add the route to the Linux kernel. The Edge function only adds the route to the FIB (forwarding information base). As a result the Edge's management process throws an error as there is no route present in Linux kernel route table for the packet to exit and the site is effectively offline.

  • Fixed Issue 97559: On a customer site deployed with an Enhanced High-Availability topology, a WAN link connected to the VMware SD-WAN Edge in a Standby role may show as down on the VMware SASE Orchestrator and not pass customer traffic even though the Edge's WAN interface where the WAN link is connected is up.

    A user looking at a tcpdump or diagnostic bundle logging would observe ARP requests coming in and the Standby Edge not responding as a result of its port being blocked. In Enhanced HA, when an Edge assumes the role of Standby, the following events should occur in sequence:

    1. The Standby Edge blocks all ports.

    2. The Standby Edge then detects that it is deployed in Enhanced HA and unblocks its WAN ports to pass traffic.

    When this issue occurs, Event 1, the initial port blocking takes an unexpectedly long time to complete and the follow-up Event 2, the unblocking of all WAN ports is completed prior to the completion of Event 1. And then Event 1 completes and thus the final state is all WAN ports are blocked on the Standby Edge.

    On an HA Edge without a fix for this issue, the workaround is to force an HA failover that promotes the Standby Edge to Active brings up the HA Edge's WAN link(s).

  • Fixed Issue 103049: Polling a VMware SD-WAN Edge via SNMP may not work when SNMPv3 is configured.

    When a user turns on SNMP and sets up SNMPv3 user credentials via the Orchestrator prior to activating the Edge, if the user tries to poll the Edge via SNMP the Edge does not respond.

    On an Edge without a fix for this issue, the workaround is to change any SNMPv3 setting (like adding or updating a user) and then change it back to its original setting.

  • Fixed Issued 103118: WAN traffic my drop on a Standby Edge deployed in an Enhanced High Availailability topology.

    Standby interfaces remain in a blocked state even though the site is configured in Enhanced HA mode even though the Standby interfaces are up as expected.

  • Fixed Issue 103662: For a customer site configured with any High Availability topology, the user may observe that the VMware SD-WAN Edge in the Standby role has rebooted multiple times.

     The reboots are a result of the HA site experiencing an Active/Active (Split-Brain) state that SD-WAN corrects by rebooting the Standby Edge. These Active/Active states are the result of heartbeats from the Active Edge being delayed by other Edge processes that lead the Standby Edge to think the Active Edge has gone offline and is thus promoted to active. As with any Active/Active situation, this issue is potentially disruptive to a site deployed with Enhanced HA since the Standby is also passing customer traffic through its WAN links.

    On Edges with HA activated, where there is no fix for this issue, the workaround is to increase the HA failover time from the default of 700ms to 7000ms.

  • Fixed Issue 105034: SNMP polling for a VMware SD-WAN Edge's CPU and memory always gets a zero as a response value.

    The SNMP polling for the CPU and memory as part of the Edge health stats always gets a response value of zero. The issue is resolved with CPU utilization now modified as "CPU load average" and memory utilization now populated as the response.

  • Fixed Issue 105933: A user cannot SSH to VMware SD-WAN Edge models 610/610-LTE or 520/540 via a routed interface.

    There is no drop rule for duplicate SSH packets which originate via an af-pkt driver used by the affected Edge's OS. Because of this the Edge kernel receives 2 SSH packets: one via the vce1 interface, and another direct SSH packet because of the nature of the driver. This causes the Edge kernel to reply for 2 SSH requests, confusing the SSH client and results in the SSH failure.

    For an Edge without a fix for this issue, the user can add an IP table rule to drop the SSH packets received from interfaces other than vce1.

  • Fixed Issue 109963: A user cannot SSH to a VMware SD-WAN Virtual Edge.

    All Virtual Edge types (Azure, AWS, and so forth) are affected by this issue. When an SSH attempt is made, the Virtual Edge receives two SSH packets and this causes the Edge kernel to reply for two SSH requests, confusing the SSH client and results in the SSH failure.

  • Fixed Issue 110406: For a customer site deployed with a High Availability topology, if the HA Edge pair are downgraded to an earlier software version, the Active Edge may not complete the downgrade.

    When encountering this issue, only the Standby Edge successfully downgrades to the specified software version while the Active Edge never downgrades, which means the site is effectively a standalone site and no longer HA.

    On an HA site where this issue is encountered without a fix, the workaround is to deactivate HA, downgrade the previous Active Edge as a Standalone, and then reactivate HA with both Edges now on the same downgraded version.

  • Fixed Issue 110577: For a customer site deployed with a High Availability topology where a large number of flows use NAT, when an HA failover is triggered client users may experience a longer than expected traffic disruption.

    With a large number of flows using NAT (~480K), the HA Standby Edge may experience high CPU utilization and a very slow flow synchronization rate with the Active Edge. If an HA failover occurs during this state, some flows may be broken and need to be reestablished and users would experience this as poor traffic quality.

  • Fixed Issue 111592: For a customer enterprise using a Hub/Spoke topology where Business Policies are configured to use internet backhaul, internet traffic using the backhaul rule may be either slow or not work at all.

    In some instances during the creation of the flow, the Business Policy matching is changed due to updated Deep Packet Inspection (DPI) information. This could lead to the loss of the logical ID of the Hub Edge or Non SD-WAN Destination, which is supposed to backhaul the packets.

  • Fixed Issue 112115: A VMware SD-WAN Edge under a high CPU load may experience a Dataplane Service failure and restart to recover.

    Under high CPU conditions, multiple service failures triggered by a mutex monitor can occour due to a lower priority thread acquiring the debug ring lock. The resolution to this issue is an enhancement to the Dataplane that makes that particular thread both lock-free and wait-free. 

  • Fixed Issue 112509: A VMware SD-WAN Edge configured to use a VNF may experience a Dataplane Service failure and restart to recover.

    The issue is traced to SKB (network buffer) handling. In some instances the SKB allocation check is missing and this can trigger the Edge service failure.

  • Fixed Issue 116257: For a VMware SD-WAN Edge connected through a Partner Gateway where a NAT handoff is configured for a remote server, return traffic to the Edge may drop from that server.

    If the traffic is initially not encrypted from the Edge to the remote server and then updated with an encrypted flag, once the route is updated, the reverse traffic is dropped on the Edge due to a route lookup failure.

    The issue can be temporarily resolved by flushing flows on the affected Edge.

    Important:

    This is a fix for both the Edge and the Gateway and a customer needs to upgrade their Edge and be connected to a Gateway that is also upgraded to a build that includes the fix.

    For customers using a hosted Orchestrator, there is a 5.0.1.5 Gateway hotfix build R5015-20231031-GA-116257 which includes the Gateway half of the fix. If you need the complete fix for this issue, please contact Support to ensure your Gateways are on the 5.0.1.5 hotfix build R5015-20231031-GA-116257.

  • Fixed Issue 116593: For a customer enterprise configured to use IPv6 addresses and where routes are learned from Remote Access, a VMware SD-WAN Edge in that network may experience a Dataplane Service failure and need to restart to recover.

    The issue stems from the Edge experiencing a Dataplane Service failure caused by improper lock usage to protect RA route DLLs. The result is client users at the Edge observing traffic dropping due to the Edge restart and from routes dropping that are restored after the restart.

  • Fixed Issue 116894: 1:1 NAT does not work properly when the Outside IP address and Source IP address are in the same subnet.

    With this 1:1 NAT configuration the Edge changes the source port during the NAT translation and the result is traffic dropping that matches this rule for inbound traffic.

  • Fixed Issue 118333: For a customer site deployed with a High Availability topology where the HA Edge pair is either a model 520, 540, or 610, the customer may observe multiple HA failovers due to the site experiencing an active-active (split brain) condition.

    VMware SD-WAN Edge 520, 540, and 610's use a switch made by Marvel where if internet backhaul is configured can trigger a situation where the Standby Edge also becomes active while not demoting the Active Edge. Active-Active states are resolved by rebooting the Standby Edge and this will be recorded in the Edge Events.

  • Fixed Issue 119010: On the VMware SD-WAN Edge models 520 and 540, the Edge may not forward traffic from a VLAN located on LAN ports 1-4 to a VLAN located on LAN ports 5-8, and vice-versa.

    The Edge models 520 and 540 have two LAN NIC cards, each with a bank of four ports for a total of 8 LAN ports. When a LAN is configured for a LAN port on the first bank of ports and a different VLAN configured for a LAN port on the second bank of ports, the Edge does not handle this traffic properly and it is dropped.

  • Fixed Issue 119853: For a customer site deployed with a High Availability topology, when there is a TCP flap between the Active and Standby Edges, the client users for that HA Edge would observe traffic loss.

    On a TCP flap triggered between the Active and Standby Edge, the HA Edges reset their link states, which causes paths using that link also being torn down and rebuilt and this results in traffic loss for user traffic using that HA Edge interface.

  • Fixed Issue 121998: For a customer using the Stateful Firewall in a Hub/Spoke topology, traffic that matches a firewall rule configured for Spoke-to-Hub traffic where the rule includes a source VLAN may be dropped.

    When there is an application classification, business policy table, or firewall policy table version change, SD-WAN performs a firewall lookup for flows on its next packet. Due to a timing issue, that packet could be one from the management traffic (VCMP) side. As a result, during a firewall policy lookup key creation, SD-WAN swaps the Spoke Edge VLAN with the Hub Edge VLAN and this leads to not matching the rule and dropping that traffic.

    For an Edge without a fix for this issue, a customer can to change the Source from an Edge VLAN to 'Any'.

  • Fixed Issue 122426: If a customer performs an SNMP query for a VMware SD-WAN Edge interface configured to use DPDK, a customer may experience a longer than expected delay in getting results.

    The delay is caused by a back-end script for collecting the interface data that is not properly optimized.

  • Fixed Issue 122790: When a VMware SD-WAN Edge is upgraded to Release 4.5.2, the customer may observe that the Edge no longer communicates via Wi-Fi interfaces.

    For an Edge using exclusively Wi-Fi for client users this would be experienced as a total loss of communication. When a Wi-Fi configured Edge is upgraded, it generates the error wireless.generate_config if the customer has configured a Wi-Fi channel manually (versus letting the Orchestrator autoselect the Wi-Fi channel) and all Wi-Fi communication drops.

  • Fixed Issue 122988: For a customer site configured with a High Availability topology, a customer may observe that the Standby Edge restarts multiple times.

    This issue can be observed in Events. In addition, if an Enhanced HA topology is used where the Standby Edge also passes traffic, client user traffic using the WAN link(s) on the Standby Edge would also be impacted.

    The issue is triggered by the HA Edge packet forwarding thread being starved due to a file operation and the Standby Edge misses the HA heartbeat, causing the Standby Edge to become active and that triggers an active/active state where the Standby Edge is restarted to recover the state.

    For a site without a fix for this issue, a user can increase the HA failover time from 700ms to 7000ms on a 5.2.0 or later Orchestrator.

  • Fixed Issue 123128: For a customer site configured with a High Availability topology where the HA Edges are Edge models 520, 540, 610, or 620, a customer may observe that the Standby Edge restarts multiple times.

    This issue can be observed in Events. In addition, if an Enhanced HA topology is used where the Standby Edge also passes traffic, client user traffic using the WAN link(s) on the Standby Edge would also be impacted.

    This issue can only occur on the listed Edge models, each of which uses its kernel service to forward traffic and the kernel threads run at lower priority and packets can get queued in the kernel threads for more than 700ms. When the packets are queued for more than 700ms the Standby Edge misses the HA heartbeat, causing the Standby Edge to become active and that triggers an active/active state where the Standby Edge is restarted to recover the state.

  • Fixed Issue 123214: Traffic using a static route may drop because it is using a different interface than the one configured.

    In earlier releases the original next hop of a recursive routes was not maintained along with the routes. Only the resolved next hop was maintained with a route. So if the resolution of the route's original next hop changes later, there is no way to get new resolution as the original next hop itself is not there.

    On Edges without a fix for this issue, the workaround is to remove the static route and then add it after the connected route to the next hop is added.

  • Fixed Issue 123475: Connected Static Route (CSR) type flows that are matched to a Source + Destination LAN Side NAT rule may drop.

    Source + Destination LAN Side NAT rules may inappropriately apply only a destination NAT to the first packet for a flow, and observe a NAT collision on the return packet, for CSR → CSR flows.

    Note:

    Source NAT and Source + Destination NAT rules are not supported for CSR → CSR traffic.

  • Fixed Issue 123593: For a customer site using a High Availability topology where the customer is also using Edge Network Intelligence with Analytics turned on, in rare conditions the VMware SD-WAN HA Edge may not retrieve the Analytics configurations from the Edge Network Intelligence back-end.

    It is possible for both the Active and Standby Edges to acquire the token from the Edge Network Intelligence back-end. If the Standby Edge obtains the token after the Active Edge, the Active Edge's token will be stale, resulting in this scenario.

  • Fixed Issue 123954: SSH to a loopback IP address in an Edge from a client connected to a remote Edge does not work.

    When this issue is encountered, for an SSH request received via overlay, the SSH packet is not being decoded properly in a fast path pipeline. As a result, the Edge process drops it.

  • Fixed Issue 124106: When LAN side NAT is configured for Many:1 translations where Port Address Translations (PAT) is used, traffic initiated from the opposite direction allows unexpected access to fixed addresses based on the outside mask and original IP address.

    For example, an SNAT rule with inside network, 192.168.1.0/24 and outside address 10.1.1.100/32 permits outside to inside translation to 192.168.1.100.

    The issue is resolved by blocking traffic when a connection is initiated in the reverse PAT direction. While a local configuration option has been added to allow LAN > WAN translations in case a customer requires this behavior, the option will be lost between software upgrades and customers should create explicit translations to activate the previous behavior.

    For releases 4.5 to 5.2, a counter named lan_side_nat_reverse_pat_drop will indicate when flows are dropped.

    For release 5.4 and later, 6 separate counters are used:

    • lan_side_nat_rev_pat_drop_snat1

    • lan_side_nat_rev_pat_drop_snat2

    • lan_side_nat_rev_pat_drop_dnat1

    • lan_side_nat_rev_pat_drop_dnat2

    • lan_side_nat_rev_pat_drop_sdnat1

    • lan_side_nat_rev_pat_drop_sdnat2

  • Fixed Issue 124162: When a user takes a packet capture on a VMware SD-WAN Edge interface, they may see a packet that appears to be corrupted.

    There is no actual packet corruption, the packet only appears corrupted in the PCAP file. This issue is due to a defect in the way the Edge writes packets to the packet capture interface, VLAN-tagged packets may be written incorrectly and will show up as a corrupted packet (invalid ether-type) in the PCAP file.

  • Fixed Issue 125421: A customer may observe that the WAN links on a VMware SD-WAN Edge are intermittently showing as down and then up on the Monitoring and Events page of the VMware SASE Orchestrator UI, with the potential the Edge may become unresponsive and fail to pass traffic until it is manually rebooted, or the Edge can experience a Dataplane Service failure and restart.

    This is an Edge memory leak issue that is encountered when the Edge Dataplane Service cannot open shared memory, causing stale PIs. This in turn causes open file descriptor exhaustion which will initially impact WAN links. However if this issue is sufficiently advanced and results in Edge memory exhaustion the Edge can:

    1. Become unresponsive and unreachable through the Orchestrator, which requires an on-site reboot/power cycle.

    2. Can trigger an Edge service failure with a core file generated, with the Edge restarting to recover.

  • Fixed Issue 125487: Edge-to-Edge traffic flow may be disrupted by an ARP resolution issue.

    When encountering this issue, the Edge is forwarding the ARP request to the next hop IP address using the primary interface IP address instead of the subinterface IP address. The issue is triggered during flow creation when a non-connected route is used to reach the destination, and if the Edge's subinterface is used for that connectivity, the Edge does not properly fill the source IP address for the subinterface case.

  • Fixed Issue 126458: For a customer site deployed with a High Availability topology where the HA Edges are Edge models 520/540, the customer may observe multiple HA failovers that are the result of an Active/Active state.

    The condition is triggered on HA configured 520/540 Edges, when the number of concurrent flows exceeds 300K.

    On Edge 520/540 HA Edges without a fix for this issue, the workaround is to increase the HA failover time from 700ms to 7000ms on the Configure > Edge > Device page as this will reduce the change of an Active/Active state.

  • Fixed Issue 126500: A VMware SD-WAN Edge model 3400 may experience a lower than expected throughput capacity when running Edge software version 5.0.1 or later.

    When the issue is experienced the DPDK thread is not running on the Edge 3400 which results in lower than expected throughput.

  • Fixed Issue 126519: For customer enterprises that subscribed to Edge Network Intelligence with Analytics activated, traffic using Extensible Authentication Protocol (EAP) is identified or logged.

    Customers looking for EAP traffic in Analytics logs would see the EAP section list 0 instances matching it.

  • Fixed Issue 127403: On the Test & Troubleshoot > Remote Diagnostics page of the Orchestrator UI, when running the remote diagnostic Troubleshoot OSPF - List OSPF Redistributed Routes or TroubleshootBGP - List BGP Redistributed Routes, the test returns an error with no data.

    After running either diagnostic the user observes an error: Error reading data for test.

  • Fixed Issue 127603: SNMP polling does not work as expected on VMware SD-WAN Edges using Release 4.5.2.

    In particular, the customer encounters a failure when querying SNMP statistics for object identifiers (OIDs) of type vceHealth.

Resolved in Edge Version R452-20230803-GA

Edge build R452-20230803-GA was released on 08-04-2023 and is the 1st Edge rollup for Release 4.5.2.

This Edge rollup build addresses the below critical issues since the original Edge GA build, R452-20230628-GA.

  • Fixed Issue 114938: When looking at Monitor > Edges > Destinations for a Customer Enterprise, a user may observe an incorrect domain name for a Destination.

    The issue is caused by the Edge's Deep Packet Inspection (DPI) engine adding invalid host names (for example, IP address or IP address:Port) into the Edge's DNS cache. These invalid host names can fill up the Edge's DNS cache fully and may lead to Max DNS reached events, and the valid host names cannot be added after this occurs.

  • Fixed Issue 117037: For a customer using a Hub/Spoke topology where multiple WAN links are used to send and receive traffic from the Spoke Edge to the Hub Edge, customers may observe lower than expected performance for traffic that is steered by Business Policies because the WAN links are not aggregating the WAN link's bandwidth.

    SD-WAN uses a counter for accounting the number of packets buffered in a resequencing queue. This counter is managed per peer and used to make sure only 4K packets are buffered per peer. Under some conditions, this counter can become negative. Prior to Release 4.2.x, when this counter became negative, the respective counter was immediately reset back to 0 after flushing the packets in the resequencing queue. However, starting in Release 4.3.x, this counter is updated automatically to ensure that the counter stays within expected bounds.

    The result of this change in behavior can cause cases where the counter accounting is incorrect and the resequencing queue can stay at a very high number to which SD-WAN reacts by flushing every single packet. This action not only prevents bandwidth aggregation but can reduce the effectiveness of flows that would otherwise be on a single link.

    On Edges without a fix for this issue, the workaround is to configure business policies that steer matching traffic to a single mandatory link.

  • Fixed Issue 122528: For a customer enterprise which uses WAN static routes with ICMP probes configured, the ICMP probes may stop functioning on multiple VMware SD-WAN Edges at once with all traffic using those routes dropping.

    Each Edge has an ICMP probe sequence counter with a maximum number of 65535 iterations. When this counter rolls over after 65535 iterations, the probes fail.

    On an Edge without a fix for this issue, the workaround is to remove the ICMP probe, restart the Edge service, and then restore the probe.

Resolved in Edge/Gateway Version R452-20230628-GA

Edge and Gateway Version R452-20230628-GA was released on 07-06-2023 and resolves the following issues since Edge Version R451-20230112-GA-87923 and Gateway version R451-20220701-GA.

Edge build R452-20230628-GA includes a remediation for the VMware SD-WAN Bypass Authentication Vulnerability (CVE-2023-20899). For more information on this vulnerability, please consult the VMware Security Advisory VMSA-2023-0015.

  • Fixed Issue 26085: A customer using a Hub/Spoke topology and Partner Gateways may observe traffic being dropped at a VMware SD-WAN Spoke Edge if one of the Gateways is unconfigured from a Hub Edge.

    The traffic dropped is using a stale route for a Gateway that is no longer assigned. When a Gateway is unconfigured from a Hub Edge, the Gateway itself does not know this has occurred and treats the event like a simple tunnel down event. As a result the Gateway continues to provide the Spoke Edge with its route and the Spoke Edge does not remove the remote route (reachable via Hub Edge) because the Hub Edge is still reachable to the Spoke Edge.

    When this issue is present in the absent of a fixed build, the only way to remediate it is to slap the Spoke Edge to Gateway link.

  • Fixed Issue 42488: On a VMware SD-WAN Edge where VRRP is activated for either a switched or routed port, if the cable is disconnected from the port and the Edge Service is restarted, the LAN connected routes are advertised.

    If the link on a port is removed and the interface is not deactivated, the Edge does not revoke the route from the Gateway causing other Edges to forward the traffic to the Edge with no link connected. The customer impact is that traffic might blackhole for the connected route for interfaces which do not have a link connected.

    Without the fix the only workaround is to deactivate the interface if no link is connected.

  • Fixed Issue 51486: A user cannot SSH to a VMware SD-WAN Edge using a loopback interface.

    After the removal of the management interface and the introduction of loopback interfaces on the Edge, the support for SSH to any Edge virtual interface (always up) is not supported.

    Beginning with Edge Release 4.5.2, a user can SSH to an Edge using a loopback interface.

  • Fixed Issue 53378: On a VMware SD-WAN Edge where a WAN link bandwidth is manually configured under WAN Settings, the bandwidth settings are honored on traffic using the Global Segment, but not honored on traffic using a Non-Global Segment.

    On a WAN link with a higher capacity than the manually configured capacity in WAN Settings, the Global Segment would enforce the lower configured value, but a non-Global Segment would use the actual capacity of the link.  This occurs despite Underlay Accounting being configured on the Edge interface that the WAN link is using.

  • Fixed Issue 56153: For a customer enterprise where a Non SD-WAN Destination via Gateway is deployed and where BGP over IPsec is being used, if an inbound BGP filter is unassigned by the customer, the filter is not removed on the VMware SD-WAN Gateway and the route map is applied with it.

    This issue can cause unexpected routing for the customer since they are expecting the inbound BGP filter to be inactive when it is still being used by the Gateway and Edge.

  • Fixed Issue 57170: A customer enterprise using BGP, private links, and a Partner Gateway may experience a loss of connectivity with clients behind the Partner Gateway towards the server behind a VMware SD-WAN Edge.

    The Internet traffic uses the MPLS network instead of the NAT handoff process.

  • Fixed Issue 63577: For a customer enterprise using a Zscaler type Cloud Security Service (CSS), should the primary tunnel go down and traffic fails over to the secondary tunnel and then the primary tunnel is restored, the traffic on the secondary tunnel immediately switches back to the primary tunnel.

    In this scenario, when the Zscaler primary tunnel comes back up, the existing flows on the secondary tunnel are expected be maintained for at least 30 minutes before failing back to the primary tunnel. In this issue, the existing flows fail back over to the primary tunnel immediately after it is restored. This behavior holds across all tunnel types including IPsec and GRE.

  • Fixed Issue 64032: A VMware SD-WAN Edge configured with MH-BGP and BFD may lose routes after a neighbor flaps.

    The issue is the result of BGP routes from a neighbor not being installed in the route table. When MH-BGP neighborship and BFD are configured for the same IP address, BGP routes are not installed in the route table after BGP neighborship is flapped without affecting the BFD session by making the BGP configuration first incorrect and then correct.

    On an Edge without a fix for this issue, restarting the Edge service will resolve that particular instance of the issue.

  • Fixed Issue 68748: When a client device using Windows OS is connected to a VMware SD-WAN Edge, the Event report for "New Client Device Seen" is the incorrect OS version.

    The Edge is truncating the description in the dhcp_fingerprints.json file and this prevents the customer from having the correct OS for that connected client device.

  • Fixed Issue 71719: PPTP Connection is not Established along Edge to Cloud path.

    Connection to the PPTP server behind the VMware SD-WAN Edge does not get established.

  • Fixed Issue 71745: A VMware SD-WAN Gateway or SD-WAN Edge may experience a Dataplane Service failure and restart as a result.

    Both Edges and Gateways have an an internal library for managing UUIDs (universally unique identifiers). A very rare race condition in this library can cause a "use-after-free" issue that triggers a segmentation fault and a service failure for the respective Edge or Gateway. A factor that increases the risk of an Edge or Gateway experiencing this issue are frequent tunnel flaps (tunnels being torn down and rebuilt).

  • Fixed Issue 72384: Path MTU for IPv6 tunnel might not work correctly when the next hop MTU is 1280 (minimum possible for IPv6).

    For IPv6 Path MTU calculation, lower end is not 1280 and it was causing this issue when doing the binary search algorithm for Path MTU.

  • Fixed Issue 72395: When a user configures a routed interface with an IPv4 IP address which has no default gateway and the Edge is connected to a direct client, IPv4 traffic to the direct client works but IPv6 traffic from the direct client fails.

    Traffic to the IPv6 next hop does not work when the IPv4 next hop is not configured on the routed interface.

    Without a fix for this issue, a user would need to configure the IPv4 next hop on that interface.

  • Fixed Issue 74149: For a customer using a Zscaler type Cloud Security Service where the L7 Health Check is configured, if a VMware SD-WAN Edge is rebooted while a WAN link is also down, the L7 Health Check process may not send probes to the Zscaler service even after both the Edge and the WAN link(s) are fully restored.

    This issue is not consistent and happens rarely even when the listed conditions are met. When the Edge is being rebooted, and L7 Health check is configured, and if the Edge WAN interface undergoes a state transition Up/Down, during restart and initialization time, the Edge may miss sending L7 Probes.

    Without the fix, the only way to get the Edge to resume sending L7 Probes is to turn off and then turn back on L7 Health Check. 

  • Fixed Issue 74632: On a customer enterprise with a Hub/Spoke topology, IPv6 tunnels from a VMware SD-WAN Spoke Edge to a Hub Edge may not come up on Edge interfaces configured as IPv6 DHCP stateful.

    When an interface is configured with DHCP stateful, tunnels may not come up immediately if the router advertisement is received late.

  • Fixed Issue 75573: A VMware SD-WAN Gateway may experience a Dataplane Service failure, generate a core, and restart as a result.

    The Gateway service fails because the process vrf_init spawned a vrf worker thread and started handling configurations before vc_sinfo_vrf_hash was initialized.

  • Fixed Issue 75593: Customer deployments using BGP may experience issues with degraded performance because of suboptimal routing due to unexpected route preferences for uplink BGP routes.

    This issue is caused by a customer enterprise's BGP prefix advertise and preference values not being updated properly when the route prefix is updated from a non-uplink to an uplink or vice-versa, which results in asymmetric routing.

    A customer without a fix for this issue can disable and enable BGP uplink community to restore correct routing.

  • Fixed Issue 75668: The DSCP tag is reset for LAN side traffic when it is routed to an internal LAN destination.

    For the routed/direct user traffic, the Edge resets the DSCP tag to 0 and traffic that ingresses and egresses on the same Edge (in other words, stays local to the Edge) has the DSCP tag modified to a CSP=0DSCP marking and is reset to CS0 for underlay traffic when it traverses the Edge.

  • Fixed Issue 75882: For a customer site configured with a High Availability topology, the Standby Edge may remain stuck in an Initializing state and not be available for failover.

    SD-WAN sends WAN side heartbeats when HA is enabled. The Standby Edge can experience a Dataplane Service failure when SD-WAN sends the heartbeat on the Standby Edge's interface before the interface's internal variables are initialized. The fix delays the sending of WAN side heartbeats until the interface is properly initialized.

  • Fixed Issue 76153: IPv6 tunnels between a VMware SD-WAN Edge and a VMware SD-WAN Gateway may flap between a STABLE and an UNSTABLE state.

    When the Edge and Gateway IPv6 addresses belong to the same IPv6 network, then the Gateway is not able to pick the correct destination MAC address for responding to the Edge, as a result the tunnels flap.

  • Fixed Issue 76348: For a customer site configured with a High Availability topology, IPv6 connected routes may be missing on the VMware SD-WAN Edge used as the Standby for some interfaces.

    IPv6 connected routes may be missing on the Standby Edge for interfaces configured with a DHCP stateless type on a non-global segment. This would only impact HA sites with Enhanced HA as only there are the Standby Edges WAN links also used.

  • Fixed Issue 76574: A VMware SD-WAN Gateway may become flooded with "IPV6-IKE-POC: ike msg recvd" log entries.

    This is observed on a large scale deployment of ~6K tunnels or more and are seen in DBGCTL. These "IPV6-IKE-POC" messages can overflow Gateway logs and impede troubleshooting.

  • Fixed Issue 76589: A VMware SD-WAN Edge Cluster may not perform a Spoke Edge rebalancing after a LAN-side failure.

    A Hub Cluster may not perform a Spoke Edge rebalancing after a LAN side failure with either IPv4 or IPv6 BGP neighborship on the Hub Cluster due to the Hub getting inaccurate data regarding Spoke Edges for all other Hub Clusters.

  • Fixed Issue 76591: A VMware SD-WAN Edge configured with IPv4/IPv6 Dual Stack on both LAN and WAN interfaces may experience a Dataplane Service failure and restart to recover.

    The issue is triggered during the IPv6 Neighbor Discovery progress and is caused by a packet retransmission that leads to the service failure and core file.

  • Fixed Issue 76681: For a customer configured with IPv4/IPv6 Dual Stack WAN interfaces who also uses a Cloud Security Service (CSS) with a Zscaler type, the customer may observe duplicate tunnels established for the same WAN interface.

    Each WAN interface in Dual Stack gets an IPv4 tunnel, and an IPv6 tunnel even though IPv6 tunnels are not supported for Zscaler.

  • Fixed Issue 76837: A customer using BGP may observe that a peer router is not sending traffic to a VMware SD-WAN Edge within its network.

    Troubleshooting the issue would reveal that the default route via default-originate is not being advertised by the Edge. The issue is caused by a route map string associated with the default route being truncated and so the Edge does not match the default route with anything in its route map, and this results in the peer router either dropping traffic or sending it using an invalid route where the traffic is blackholed.

    Without a fix for this issue, a user would need to configure a static route on the peer router for the default route until it is possible to upgrade to an Edge version that includes the fix.

  • Fixed Issue 76880: A VMware SD-WAN Gateway may experience a Dataplane Service failure and restart as a result.

    There is no specific trigger for this issue and is the result of a timing issue where the Gateway may not know the correct MTU for tunnels with other Edges for a very brief period of time. Any user packet might not get fragmented correctly due to this issue. The fix for this issue handles those packets by explicitly dropping them.

  • Fixed Issue 76966: When creating very large network configurations with more than 100 segments or VLANs, the DNS and DHCP services on the VMware SD-WAN Edge stop working.

    When such large configurations are sent to the Edge, the scripts that run on the Edge to configure and start the dnsmasq service (for DNS and DHCP) fail because of an overly long command line that is synthesized during the restart.

    There is no workaround to this issue beyond reducing the number of segments or VLANs.

  • Fixed Issue 77066: A VMware SD-WAN Gateway may experience a Dataplane Service failure and trigger a core and restart the service to recover.

    The issue is triggered by a memory corruption of the Gateway caused by two Gateway processes that respectively handle transmission and reception packets simultaneously trying to access the same node in a search tree.

  • Fixed Issue 77457: If a user tries to generate a packet capture (PCAP) for an interface on a standby VMware SD-WAN Edge, the VMware SASE Orchestrator reports that the PCAP has failed.

    When a user tries to generate a PCAP for the Standby Edge in an Enhanced High Availability deployment, the Orchestrator UI records the Request Status as Failed and the explanation "Failed to upload diagnostic bundle: 'unicode' does not have the buffer interface."

  • Fixed Issue 77608: For a customer enterprise with a Hub/Spoke topology, a VMware SD-WAN Edge may run out of memory and defensively restart to clear the state.

    When a huge amount of IPv6 traffic (~1M flows) originating over a /64 subnet combined with neighbor discovery failures rabidly consumes the Edge's memory and triggers the restart.

  • Fixed Issue 77633: A customer enterprise configured with BGP may experience traffic issue due to stale routes persisting in route maps.

    If a user configures an outbound filter ::/0 with exact match True Permit and configures a default originate in neighbor in the additional options and then removes both from the neighbor, only the default originate is removed from Edge configuration. The filter route map is still associated to the neighbor.

  • Fixed Issue 77755: On a VMware SD-WAN Edge running Release 4.0.0 and later, if a customer deploys a VNF (Virtual Network Functions) image where the checksum is configured with capital letters, the VNF deployment fails due to a checksum mismatch and the image will be removed from the Edge.

    This issue is caused by 4.0.0 and later Edge software performing a case sensitive comparison of the Operator-configured checksum with the Edge calculated checksum. A configured checksum containing capital letters causes a checksum mismatch, even if the calculated checksum values match.

    Release 5.0.0 and later uses a case insensitive comparison to verify an Operator-configured checksum matches the calculated checksum on the Edge. 

  • Fixed Issue 77917: On a VMware SD-WAN Edge running Edge Release 4.5.0 or 4.5.1, a diagnostic bundles generated by the Edge Local UI can be downloaded directly by an unauthroized user without authentication if they know the right URL.

    This does not affect Edge diagnostic bundles generated through the VMware SASE Orchestrator UI, only those generated while a user is logged into an Edge's Local UI using the Download Diagnostic Bundle option shown below.

    Local UI Diagnostic Bundle.

    Only diagnostic bundles triggered and downloaded to the Edge through the Local UI are subject to this vulnerability.


    There is no immediate risk if diagnostic bundles have not been generated directly on the Edge using the Local UI as diagnostic bundles generated via the Orchestrator UI are not stored on the Edge.

    As the diagnostic bundle URL is composed of the date of generation of the diagnostic bundle, an attacker could guess it. Because the diagnostic bundle has the potential to contain sensitive user information this is a potential security risk for customers using Edge Release 4.5.0 or 4.5.1 who match the conditions outlined above.

    The fix for this issue is as follows: once the diagnostic bundle is generated by the authenticated user is ready, a secure download link comprised of the hash of "a secret + current time + relative path to file" is generated and returned to the user. The existing process where the diagnostic bundle file is automatically downloaded from the user's browser once it is ready does not change. The link which is returned is a secure link instead of a direct static link.

    The link expires after a timeout, so if another user uses the link after the timeout, a "410 Gone" message is shown.

    This fix remediates VMware SD-WAN Bypass Authentication Vulnerability (CVE-2023-20899). For more information on this vulnerability, please consult the VMware Security Advisory VMSA-2023-0015.

    Workaround: If using Edge build 4.5.0 or 4.5.1 the customer has options to prevent this vulnerability from being exposed:

    Leave the Local UI port closed in the Edge Firewall settings (which is the default setting unless explicitly overridden by the administrator).

    The default setting for Local Web UI Access in the Edge's Edge Security settings (under the Configure > Firewall page) is to Deny All, which is the recommended setting in almost all cases. In the very rare case where access to this UI is needed, it is recommended to only open up the Local Web UI to specific trusted IP addresses, and then disable it as soon as its need is over.

    A Local Web UI firewalled off as above will not be exposed to this vulnerability.

  • Fixed Issue 78026: A VMware SD-WAN may experience a Dataplane Service failure and restart if the user adds a new consolidated WAN overlay and deletes an old WAN overlay.

    When the configuration of a consolidated WAN overlay link that has a large number of tunnels (+5K) is modified, there is the chance for an Edge service failure.

  • Fixed Issue 78037: A VMware SD-WAN Edge may experience a spike in memory usage followed by a Dataplane Service failure there a DHCPv6 server is configured with more than 1K addresses.

    Issue occurs for both route and switched interfaces. Issue can occur when +1K addresses are configured for Clients on a DHCPv6 server. When over 1K clients are getting addresses the quantity of DHCPv6 solicit packets generated can lead to Edge memory exhaustion and the failure of the Edge service.

  • Fixed Issue 78050: A VMware SD-WAN Edge Dataplane Service failure may occur when the PPTP server is present on the LAN side.

    When PPTP server is present in the LAN side, and PPTP client from the Internet connects to it via inbound firewall rules, the Edge crashes due to PPTP control channel lookup failure. This control channel lookup is needed to ensure the GRE data channel is sent out via the same link back to PPTP client.

    This issue is seen only if PPTP traffic is seen on the Edge. So on an Edge without a fix, the only workaround is to not use PPTP sessions.

  • Fixed Issue 78435: A VMware SD-WAN Edge that is activated with a URL through the Local UI may throw an error that the Edge activation failed, when it actually succeeded.

    URL activation of Edge with local UI throws error "edge activation could not be completed"

    The issue occurs because the Edge refers to an older activation request with incorrect parameters when responding to the request for activation status. Meanwhile, the current activation request with the correct parameters is actually processing. As a result, the Local UI throws an error even though the Edge activation is processing correctly.

  • Fixed Issue 79335: IPv6 traffic may not pass through a VMware SD-WAN Edge.

    The interface hop limit value is set to 0 when a RA (router advertisement) is received with a 0 hop limit, and this can cause IPv6 drops.

    On an Edge without a fix for this, the user can advertise RA with a non-zero hop limit.

  • Fixed Issue 79533: Branch to Branch routes may be missing from a routing table if the customer enterprise is configured with no handoff which would result in incoming no handoff traffic being dropped.

    When no handoff is configured after Branch to Branch routes are synchronized to a Gateway from an Edge, Branch to Branch routes are not populated into the Edge.

  • Fixed Issue 79550: A VMware SD-WAN Gateway or SD-WAN Edge may experience a Dataplane Service failure and restart.

    This issue can occur if a repeated management traffic (VCMP) fragment is received, causing the process to write beyond the allocated buffer for that packet and triggering the failure.

  • Fixed Issue 79619: For a customer site deployed with a High Availability topology, when the HA link between the Active and Standby Edge is unstable or disconnected, the Standby Edge does not fail over.

    The Standby Edge should failover on any evidence of a heartbeat failure from the Active Edge and this is not happening in this issue.

  • Fixed Issue 80028: On a site deployed with a High Availability topology, the Standby Edge may experience a Dataplane Service failure and restart as a result.

    This issue only occurs on the Standby Edge and never the Active Edge. The issue is caused by a race condition when the Deep Packet Inspection engine has invoked a cleanup while there still are packets being processed in the pipeline and could happen at any time.

    There is no impact to a customer using a standard HA configuration as the Standby Edge does not pass traffic, but on an Enhanced HA deployment, where the Standby Edge is also passing traffic, the restart of the Standby Edge service would briefly disrupt the customer traffic passing through the Standby for ~15 seconds. 

  • Fixed Issue 80149: If Layer 7 (L7) Health Check is activated for a Non SD-WAN Destination (NSD) or Cloud Security Service (CSS) where there are redundant tunnels, a customer may experience both tunnels simultaneously being marked as down and then coming up intermittently if there are transmission issues on the Primary tunnel.

    With this issue, L7 probes for both the Primary and Secondary Tunnels are sent via the Primary Tunnel Interface. If the Primary Tunnel interface has packet transmission failures (for example, high latency), it would affect both the Primary and Secondary L7 Probe packets and the tunnels would both get torn down simultaneously, impacting customer traffic for that NSD or CSS. 

  • Fixed Issue 80353: When running Remote Diagnostics > HA Info, the attempt may fail with "Error reading data for test".

    The specific condition that triggers the failure is if the Standby Edge is disconnected. The expected result is a response that displays the information for the Active Edge only.

  • Fixed Issue 80814: On a VMware SD-WAN Edge where a Standard Firewall Allow rule is configured which has a local Edge client Source IP address and a remote client as the Destination IP Address, and which also has a "Deny All" rule for other traffic, the traffic from the remote client to the local client is dropped.

    This issue is encountered when there is a VLAN IP address mismatch between the source and destination hosts. When the source and destination hosts are part of different VLANs, the SD-WAN service prefers the source/destination IP address of the first packet as it is in the Firewall lookup key. As a result, for overlay inbound flows, there is a mismatch and traffic matches the Deny All firewall rule.

    Without the fix, the workaround for this issue is to revert the rule in the direction of first IP packet of the flow, so that the packet is able to match the firewall rule.

  • Fixed Issue 80881: For a site deployed with a High Availability topology, The VMware SD-WAN Edge in the Standby role may fail to get configuration changes from the Active Edge.

    When the Standby Edge comes up newly it may fail to synchronize with the Active Edge with the result that it does not carry the latest configuration changes including the configuration to upgrade the Standby to a new software image.

  • Fixed Issue 81353: For a VMware SD-WAN Virtual Edge with an Azure type, packets may be dropped at the Rx of interfaces due to a low buffer.

    The ring buffer setting was not a part of non-DPDK managed interfaces, which is being used by Azure platforms. The NIC Rx ring buffer queues are set to a low number.

    On an Edge without a fix for this issue, rebooting the Edge can resolve the issue temporarily.

  • Fixed Issue 81355: VMware SD-WAN Gateways deployed using the Azure platform may experience issues with packets a size greater than 1500 bytes.

    The packets greater than 1500 bytes are dropped with error message: pkt_too_big_drop. Packets much larger than 1500 bytes are dropped with error message: sock_too_big_dropp.

    The issue is the result of the Azure platform not using DPDK bounded interfaces which keeps the Gateway's DPDK.json list empty and the DPDK network configurations do not initialize the Linux interface's TSO/GSO settings.

  • Fixed Issue 81627: A Business Policy which is configured to match applications (for example, YouTube or the BBC) with Direct and Interface Mandatory may not be honored when the interface goes down.

    If the DPI (Deep Packet Inspection) classification has not been done, the initial packets will hit "Default-Internet-Other" Direct Business policy for which the link policy is bw_balance. For example, on an Edge with WAN links GE3 and GE4, if the Business Policy has configured mandatory interface GE3, and initially SD-WAN steers traffic to GE4 during the initial DPI classification being "Default-Internet-Other", it will continue to use the GE4 link due to current link restore logic causing direct traffic to work even though GE3 goes down.

  • Fixed Issue 81859: When activating a VMware SD-WAN Edge 610-LTE, the CELL interface may not come up after the Edge completes its activation.

    This issue is not consistent but when it occurs it can have a major impact if the Edge 610-LTE's only public link is the mobile CELL link as the Edge would be effectively down and intervention for this Edge would need to be local in the form of someone power cycling the Edge to recover it.

    Note:

    If encountering this issue and the 610-LTE has other wired public WAN links, the user would need to either restart the Edge service through the Orchestrator using Remote Actions > Service Restart in a suitable maintenance window or restart the Edge's modem to restore the CELL interface.

    If the 610-LTE only uses a CELL interface for internet, someone local to the Edge would have to power cycle the Edge as it would be inaccessible through the Orchestrator.

    If the 610-LTE Edge being activated only uses CELL for internet, the Edge should be activated with someone present to potentially power cycle it should it go down after completing activation.

  • Fixed Issue 82104: In rare cases, VMware SD-WAN Edges activated in a High Availability topology may be unable to communicate with a VMware SASE Orchestrator which will mark the site as down and preclude any intervention through the Orchestrator to the site.

    This issue occurs only when an unusual and invalid configuration is applied to the HA Edges. The configuration specifies that the HA port is configured as "trunk" (which should not be allowed), with zero VLANs (also should not be allowed), but where "all VLANs" are set. Instead of throwing an error at this configuration and preventing a user from activating HA for the Edges, the Orchestrator allows it, and this configuration triggers a Management plane failure on the HA Edges which no longer send a heartbeat to the Orchestrator and the Orchestrator marks the site as down.

    On an Edge without a fix for this issue, avoid using the configuration outlined above.

  • Fixed Issue 82432: A VMware SD-WAN Edge may experience a Dataplane Service failure and restart as a result.

    A large quantity of fragmented packets coming to an Edge can trigger an assert in net_ipfrag_alloc() which causes the Edge service to fail.

  • Fixed Issue 82457: A VMware SD-WAN Edge with an LTE model type (510-LTE or 610-LTE) may not activate if the CELL1 interface receives an IPv6 address from the activating Orchestrator.

    Activation is successful if the IPv4 Address is used.

  • Fixed Issue 82485: On an entry level VMware SD-WAN Edge model (for example, Edge 510, 510-LTE, or 610) if a user runs the Remote Diagnostic "Route Table Dump", the Orchestrator UI page may time out and not return a result.

    The issue is encountered if there are more than 16000 routes as it takes the Edge more than 30 seconds to return the results. 30 seconds is the timeout limit for the page's WebSocket and so no result is returned. The fix for the issue optimizes the route table walk to ensure timeouts do not occur.

  • Fixed Issue 82808: For a VMware SD-WAN Edge that is using a Cloud Security Service (CSS) and has turned on L7 Health Check, the customer may observe traffic failing using these CSS tunnels even though the VMware SASE Orchestrator continues to mark the tunnels as UP.

    Even though the L7 probe fails with a 4XX HTTP error, the VMware SD-WAN Gateway does not acknowledge the failure and does not inform the Orchestrator to mark the CSS tunnels as DOWN.

  • Fixed Issue 83040: A customer enterprise with a Hub/Spoke topology that uses both Partner Gateways and Non SD-WAN Destination (NSD) may observe traffic that should use an NSD instead uses a Hub.

    The Spoke Edge would have a business policy which backhauls traffic to the NSD and if a Partner Gateway handoff is also configured for it, the Spoke sends traffic that should use a NSD instead to the Hub Edge. The Hub in turn sends the traffic direct to the internet. If the Partner Gateway handoff is disabled, then this NSD traffic is routed properly.

  • Fixed Issue 83166: When a VMware SD-WAN Gateway is freshly deployed with an AWS c5.4xlarge instance type from the AWS Portal with IPv6 option selected, neither IPv6 nor the default routes are configured.

    As a result of IPv6 and default routes not being configured, the AWS Gateway IPv6 management tunnels are not forming, and the Gateway will not work.

  • Fixed Issue 83209: For customers using OSPF in their enterprise, OSPF routing may not work as expected.

     The Issue occurs when there is a change in the OSPF router-id, and the Edge service is restarted. Only loopback interfaces and Interfaces with 'Advertise' flag configured are considered for router-id selection. When there is a new loopback interface configured with a higher IP address, upon restarting the Edge service, the new loopback IP address is selected as the router-id and if the Edge is elected as the DR (Designated Router) the issue is seen.

    Without the fix, the only workaround is to force the use of the old Router ID. To bring back the old Router ID, configure Advertise Flag on the respective interface (an Edge service restart will be required).

  • Fixed Issue 83651: A VMware SD-WAN Gateway may experience a Dataplane Service failure, generate a core, and restart as a result.

    When routes are revoked from the BGP redistribute table (having ~50K or more routes) and at the same time the Gateway receives BGP routes add/del from routing process, there is an issue with inserting the node into the hash table which ends up with duplicate nodes and triggers the service failure.

  • Fixed Issue 83694: When a user logs into a VMware SD-WAN Edge's Local UI, the VMware SASE Orchestrator does not record and display this action in Monitor > Events.

    The customer administrators would not be aware of any local user logins to an Edge's local user interface.

  • Fixed Issue 84000: A VMware SD-WAN Gateway connected to Edges deployed in a dual stack (IPv4/IPv6) configuration where is a high frequency of tunnel teardowns and creations with the Edges may experience a memory leak which if sufficiently high would trigger a service restart.

    When a VCMP (Encrypted Management) tunnel is created and deleted multiple times for an Edge with a dual stack configuration, in the Gateway for that Edge there can be the appearance of a pi leak if the Gateway is operating at high scale. There is no real pi leak, but pi deletion is happening slowly and this slow delete rate can cause shared memory issues which may ultimately become critical.

    On a Gateway without the fix, a service restart will temporarily clear the memory.

  • Fixed Issue 84313: On a customer enterprise with a Hub/Spoke topology, an IPv6 overlay link may be advertised to the underlay peers of VMware SD-WAN Spoke Edges.

    On configuring an IPv6 address on the overlay and enabling advertise, the same address is getting advertised via the underlay as well.

  • Fixed Issue 84349: An Orchestrator or Gateway that is software upgraded via an SSH session may not complete the upgrade if the SSH session disconnects during the upgrade.

    When upgrades are done over an SSH session and the session disconnects, the upgrade process will be aborted which may leave the system in an inconsistent state.

    For a Gateway or Orchestrator without a fix for this issue, run the upgrade commands from the system console. If it must be done over a network session, use the nohup command and check /var/log/vcg_software_update.log for progress if disconnected. For example: nohup /opt/vc/bin/vcg_software_update. Please note that nohup will force upgrade to run in a non-interactive mode. By default, the installer will abort if upgrade integrity cannot be checked and will reboot the system after the upgrade is complete.

  • Fixed Issue 84360: When running Remote Diagnostics > Path Stats for a private or MPLS path to a VMware SD-WAN Gateway, the result is zero bytes Rx/Tx.

    The expectation is that private/MPLS links should have path stats for both Rx and Tx to the Gateways.

  • Fixed Issue 84501: The NAS-IP address is set as a loopback IP address by default in RADIUS packets.

    The NAS-IP address is set as a loopback IP address by default in the RADIUS packets sent from the Edge (Authenticator) to the Radius Server. Set the NAS-IP address as the source interface IP address selected/configured with RADIUS Authentication settings. If Auto is selected as the source interface, the loopback IP will be set as NAS-IP by default.

  • Fixed Issue 84313: On a customer enterprise with a Hub/Spoke topology, an IPv6 overlay link may be advertised to the underlay peers of VMware SD-WAN Spoke Edges.

    On configuring an IPv6 address on the overlay and enabling advertise, the same address is getting advertised via the underlay as well.

  • Fixed Issue 84741: A user observe inaccurate throughput statistics on the Monitor > Transport screens.

    RX (incoming) packets and link statistics are not incremented to the Orchestrator for traffic which is sent direct on an interface where Reverse Path Forwarding (RPF) is deactivated on the WAN Overlay.

  • Fixed Issue 84790: When a VMware SD-WAN Edge with any model type other than 510/510-LTE is rebooted, the Edge may erroneously report the critical event Unable to launch service wifihang to the VMware SASE Orchestrator.

    The wifihang event message is designed for use only with the Edge 510/510-LTE models and alerts a customer to a problem with that Edge model's Wi-Fi process. When this event message is observed on any other Edge model, whether that model uses Wi-Fi or not (for example: the Edge 3400), the event message is spurious, and the event can be safely ignored.

    Even on an Edge without a fix for this issue, a user can safely ignore the wifihang event message on any Edge other than an Edge 510 or 510-LTE as it is spurious.

  • Fixed Issue 84828: A VMware SD-WAN Edge may experience a Dataplane Service failure and restart to recover, cause a customer traffic disruption of 10-15 seconds.

    The Edge would generate a core with SIGXCPU event. The issue caused by an Edge exceeding the timeout value for executing a command which leads the mutex monitor to conclude there is a stuck thread and triggering the core and restart. Even though a VMware Support Engineer can increase the timeout value for mutex monitor, the Edge service is not respecting the increased value.

  • Fixed Issue 85154: When a VMware SD-WAN Virtual Edge on AWS with instance type C4.xlarge is upgraded from an older Edge release to Release 4.5.1, and is then downgraded back to an older Edge release, the Edge goes into a deactivated state where the Edge does not form management tunnels with the Gateway and Orchestrator.

    The cause of the issue is the Orchestrator erroneously deactivating the Edge because of what the Orchestrator detects as a serial number mismatch.

    When an Edge is upgraded to 4.5.1 there is no workaround for this issue beyond NOT downgrading from Release 4.5.1 once the AWS Edge is on this release.

  • Fixed Issue 85156: For a site deployed with a High Availability topology, the customer may observe multiple reboots of the VMware SD-WAN Standby Edge with a potential disruption to customer traffic.

    The HA control data synchronization processing logic on the Standby Edge for data received via TCP can lead to the data getting only partially read. This can cause multiple such short messages to be processed on the Standby which can slow down the Standby node. In low-end Edge platforms (for example, Edge models 510, 520, 610, 620), this slow down can significantly impact heartbeat processing between the Active and Standby which leads to the Standby Edge incorrectly being promoted to Active. In an Active-Active state the tie-break goes to the Active Edge and the Standby Edge is rebooted to demote it back to its proper Standby status.

    When this issue is encountered on a conventional HA topology the customer impact would be minimal as the Standby Edge does not pass customer traffic. On an Enhanced HA deployment, where the Standby Edge is also passing traffic, the reboot(s) would disrupt some customer traffic. The fix for this issue adds enhancements in the Edge TCP message processing logic to improve the performance on the Standby Edge and prevent a system slowdown.

  • Fixed Issue 85448: A Business Rule that is configured to match a domain name in the address group may not be applied if the domains do not match the case of the actual domain.

    Domain names in address groups are case sensitive, which is a violation of the RFC https://datatracker.ietf.org/doc/html/rfc4343, which specifies that domain names should be case insensitive. This results in Business Rules not getting hit for domains where the case differs. 

  • Fixed Issue 85459: An attempt to SSH either from an Edge LAN-side client to an Edge, or from a remote branch Edge client to an Edge may not work after LAN side NAT rules are configured.

    SSH reply packet packets coming from the Edge's SSH process go through the Edge's Dataplane service and since LAN side NAT rules are configured, it is possible the SSH reply packets use LAN side NAT rules to go to different destination than the original client that generated the SSH traffic which causes an SSH attempt to an Edge to not work.

    On an Edge without a fix for this issue, the only workaround is to remove the NAT rule.

  • Fixed Issue 85637: When Non SD-WAN Destination (NSD) via BGP is configured for both the primary and secondary neighbors, and a user tries to deactivate 'default-originate' for the neighbors one after the other, only one default route is withdrawn on the peer.

    For this issue, the command clear ip bgp vrf <vrf_id> <nbr_ip> soft out is sent to Edge's routing process after the BGP configure application in edged_bgp_nbr_config_apply(). On the Edge routing side, this command ends up resetting a flag SUBGRP_STATUS_DEFAULT_ORIGINATE on update sub-group, because of which the default route is not withdrawn.

    Note that this issue is applicable to regular BGP too, and not just NSD via BGP.

  • Fixed Issue 85640: Under heavy load, a VMware SD-WAN Edge may experience a Dataplane Service failure due to a SIGXCPU signal being raised and restart to recover.

    Under heavy load, several threads performing various activities such as IP security packet processing and logging are starved of CPU resources. They are not able to complete the task within the stipulated time frame. This leads to a SIGCPU signal being raised and subsequent Edge service process termination.

  • Fixed Issue 85679: Remotely debugging a VMware SD-WAN Edge using a GDB tool cannot be done.

    In some support scenarios a VMware Technical Support Engineer may work with a customer to remotely troubleshoot an Edge through an SSH session. One tool an Engineer may use is a GNU Project Debugger (GDB). In this issue, using the GDB tool terminates the SSH session thus limiting Engineering's ability to live troubleshoot an Edge.

  • Fixed Issue 85752: A VMware SD-WAN Edge which uses a Partner Gateway may receive the same prefix Partner Gateway static route twice.

    The Edge should receive only one route from a Gateway at any given moment. This issue occurs when the same prefix route is configured on the Gateway page as NAT, and on the Gateway Handoff page as LAN tagged.

  • Fixed Issue 85892: The metrics tool Wavefront reports inconsistent metrics for Non SD-WAN Destination (NSD) endpoints if the NSD endpoints have redundancy enabled.

    While connecting to a redundancy-enabled NSD endpoint, the VMware SD-WAN Gateway service ends up creating internal counters with the same name for both endpoints (primary and secondary). This leads to inconsistent reporting in Wavefront.

  • Fixed Issue 86024: A VMware SD-WAN Edge may restart while revoking redistributed routes from BGP.

    While revoking a redistributed route from BGP, a deadlock can occur with the locks related to redistribution table nodes and this would lead to an exception and an Edge service restart.

  • Fixed Issue 86719: When a VMware SD-WAN Edge is upgraded from 3.4.x to 4.3.1, the Edge's OSPF sessions may not come up and the Edge would not receive routes.

    The OSPF neighborships could be down after the Edge is upgraded due to an OSPF area format mismatch between the Orchestrator and the Edge.

  • Fixed Issue 86994: On a customer enterprise where Dynamic Branch to Branch is activated, when attempting to troubleshoot a VMware SD-WAN Edge in this enterprise the dispcnt debugging command does not work.

    The dispcnt debug command does not provide all the counter values and fails with Domain (null) does not exist. This also fails when referring to the relevant logs in an Edge diagnostic bundle. This significantly hinders troubleshooting a customer network issue.

    This issue arises in enterprises where Dynamic Branch to Branch is activated due to the large quantity of tunnels that are created and torn down towards each peer. The counters to store various metrics of the peers are stored in a shared memory and over time, these shared memory segments get into a bad state due to a collision and the counters are not fetched by the dispcnt command.

    Without a fix for this issue, the user can only clear the state by performing a service restart of the affected Edge.

  • Fixed Issue 87056: Sending a DNS request to a IPv6 DNS server may fail.

    This issue is encountered if a loopback interface source address needs to be chosen as a Source IP to send to the DNSv6 traffic.

    The only way to work around this issue to configure DNSv6 traffic to take a different source interface other than loopback.

  • Fixed Issue 87205: For a customer deploying a VMware SD-WAN Edge with a Partner Gateway, when an Edge learns new routes from the Partner Gateway, customer traffic may be disrupted.

    This issue is caused by traffic matching the wrong Business Policy. For example, DHCP traffic destined for the Partner Gateway could instead be matched to the Internet Backhaul rule with a resulting disruption in customer traffic.

    Without the fix, the issue is remediated by flushing the Edge's flows using the Remote Diagnostic "Flush Flows". This remediation does not prevent future potential occurrences when new routes are learned by the Edge to the Partner Gateway.

  • Fixed Issue 87233: For a VMware SD-WAN Gateway using the Telegraf service, this service may be down where the Gateway is used in a large scale deployment.

    Large scale deployment is defined as ~4K peers and ~6K tunnels (in a mix of IPv4 and IPv6) where traffic is running for a long time. The Telegraf service is started and stopped whenever vc_procmon is restarted. On any critical issue like low memory in the Gateway's Dataplane process, this also stops and starts all related processes, and this was stopping Telegraf manually as well in the post_exit scripts. As a result, the Telegraf service is administratively stopped, it does not restart automatically.

    On a Gateway without a fix for this issue, the Operator or Partner user needs to perform a manual Gateway service start or restart with "service telegraf start".

  • Fixed Issue 87538: For a customer site deployed with an Enhanced High Availability topology, traffic loss may be high for more than 10 minutes after an HA failover.

    With DHCP enabled on an Enhanced HA site, more traffic loss may be observed after an HA failover.

  • Fixed Issue 87543: For a customer site using a High Availability topology and also configured to use IPv6, the Active Edge may experience a Dataplane Service failure and restart upon an HA failover.

    When the Standby Edge becomes Active, as part of the HA synchronization message when the entire IPv6_entry is synchronized, the Edge erroneously synchronizes the dad_info pointer also which can trigger an exception and an Edge service failure.

  • Fixed Issue 87612: For a VMware SD-WAN Edge with VNF Insertion on one or more VLANs, client users on those VLANs are unable to obtain IP addresses from a DHCP Relay server.

    The Edge is not forwarding the DHCP relay packets and thus the client users are not receiving IP addresses.

    Without the fix, the only workaround is to disable VNF Insertion on the VLAN.

  • Fixed Issue 87710: A VMware SD-WAN Edge may experience a Dataplane Service failure and restart as a result.

    The Edge determines tunnel direction from the object related to the peer of the tunnel. Due to a race condition between the tunnel and peer information object creation, the Edge may try to access the peer information when it is not yet created, and this triggers an exception and causes the Edge service to fail.

  • Fixed Issue 87899: On a customer enterprise using BGP for routing, users may observe performance issues arising from stale routes and BGP configurations.

    When this issue is encountered, stale configurations are left in the Edge due to a race condition between the SYNC and ASYNC command threads.

  • Fixed Issue 87956: For a VMware SD-WAN Edge using a single WAN interface onto which two or more user-defined WAN overlays with the same next hop are configured, if a WAN link connected to the single interface goes down and up, only one of the user-defined overlay tunnels is reestablished.

    For example, if there are user-defined overlays with different source IP addresses but the same next hop which steers traffic to a Hub Edge and a Gateway respectively, on a WAN link flap, only the tunnel to the Gateway is reestablished and the tunnel to the Hub Edge is not, resulting in an impact to customer traffic intended for the Hub Edge.

    By design the Edge does not support having multiple user-defined WAN overlays with the same next hop but the Edge does not perform a check on the configuration and instead treats it as valid. Only when the WAN link flaps does the Edge do a check and enforce a single user-defined WAN overlay to the exclusion of other WAN overlays. This is why the configuration "works" for the Edge when it is applied or when the Edge is restarted, and the configuration is reapplied. The fix for this issue allows multiple user-defined WAN overlays for the same interface and thus all the tunnels will be reestablished after a WAN link flap or any other circumstance that could tear down the overlay tunnels.

    Without the fix, the only way to restore all the tunnels is to restart the Edge service, which can be done on the Orchestrator using Test & Troubleshoot > Remote Actions > Restart Service.

  • Fixed Issue 88055: On VMware SD-WAN Edge models 3x00, a customer may observe that when the throughput is sustained at 10 Gbps or greater, the WAN path latency may become stuck and degrade the stability and throughput of the Edge.

    In 10G environments with rapid clock drift between VCMP endpoints, WAN path latency measurements can get stuck which impairs the effectiveness of Dynamic Multipath Optimization (DMPO) and this leads to incorrect path selection and throughput degradation.

  • Fixed Issue 88148: For a site deployed with an Enhanced High Availability topology, a WAN link on the HA Standby Edge may not come up and instead remain in an "Initial" state.

    The issue occurs due to a race condition where the interface IP address for a "USE_PEER" interface is zeroed out. As part of the HA interface synchronization, the iface->ip_addr is checked for sanity, but not the interface IP and this causes the WAN link on the Standby Edge to remain in an Initial state since it has no IP address.

  • Fixed Issue 88152: SNMP requests to a VMware SD-WAN Edge's subinterface do not work.

    This is a day one behavior and any SNMP requests to a subinterface of the Edge will timeout. The fix for this issue adds support for these SNMP requests to an Edge's subinterface.

  • Fixed Issue 88207: On a customer site deployed with a High Availability topology, when there are TCP flaps between the Active and Standby Edges, an increase in memory utilization may be observed on the Standby Edge.

    Whenever there are TCP flaps between the Active and Standby Edge, there is an event that is sent from Active to Standby leading to duplicate entries in the End Point info table which leads to increase in the mod_edge_ep_info_t memory object. This can be observed in a diagnostic bundle for the .dce logs.

    For an HA pair without a fix for this issue, the only way to remediate the issue to so restart the Edge service on the Standby, restart the Active Edge, and then restart the Edge service on the Standby Edge.

  • Fixed Issue 88317: On a VMware SD-WAN Edge which uses both public and private links and has SD-WAN Reachable configured, when a public link goes down, direct traffic does not use the private link as expected.

    When a business policy is set to prefer the public link, the flow does not use the SD-WAN reachable private link while the preferred public link is down. The fix adds the logic to allow SD-WAN reachable links as well when direct link selection tries to find out private links as a last resort.

  • Fixed Issue 88450: SSH through an IPv4 address is not working on a VMware SD-WAN Edge.

    When the IP Table rules are changed to allow an SSH packet coming from a WAN side client (for example, a VCMP Server), the rule is not present. As a result, SSH from a WAN side client to an Edge via a IPv4 address after the overlay preference change fails.

  • Fixed Issue 88550: For customers using Edge Network Intelligence, a VMware SD-WAN Edge is not able to communicate with the Edge Network Intelligence service when a DNS is not explicitly configured.

    When DNS is not configured explicitly, the Edge Network Intelligence service uses Google DNS by default. If DNS chooses a Loopback interface as a source interface, then reachability to the service is broken due to DNS lookup failure.

    For a customer enterprise not using an Edge build with the fix, the workaround is to configure DNS explicitly on the Orchestrator and choose a real interface as the source interface versus a Virtual Loopback interface.

  • Fixed Issue 88604: For a site using a High Availability topology, if a WAN interface goes down and then comes back up on a VMware SD-WAN Standby Edge, the event is not recorded on the VMware SASE Orchestrator.

    A user does not have visibility on Standby Edge interface events, which is especially impactful on Enhanced HA deployments where the Standby Edge is also passing traffic.

  • Fixed Issue 88757: A user running the Remote Diagnostic > Route Table Dump on the Orchestrator UI may find the attempt times out and the page returns no result.

    The Route Table Dump diagnostic times out because the WebSocket timeout is 30 seconds and for a site with a large number of routes the amount of time the debug command takes to deliver all the routes to the Orchestrator may exceed that. The fix here is to lower the time out of the route dump process to less than 30 seconds and prevent the WebSocket from timing out prior to that, which ensures that the Route Table Dump will return a result.

  • Fixed Issue 89235: Backhaul Traffic from a VMware SD-WAN Spoke Edge to the Internet may get dropped on the Edge used as the backhaul Hub.

    Due to a timing issue between backhaul traffic from the Spoke Edge and the route advertised from the Spoke Edge, the backhaul traffic is dropped on the Hub Edge. The issue can occur after an Edge service restart, an Edge power outage, or a configuration change. When looking at Edge logs, the user would observe an increasing quantity of drops for the nsch_drop_stale_pi counter, which indicates that packets are being dropped due to "stale PI". "PI" means "peer information", a logical representation of a peer device (Edge or Gateway), and a stale PI in this instance means that the Spoke Edge connected to the Hub Edge leaked the memory associated with this peer after disconnecting, rather than cleaning it up properly.

    On an enterprise without a fix for this issue, running Remote Diagnostic > Flush Flows on the Hub Edge remediates the issue by clearing out the stale PIs.

  • Fixed Issue 89364: For a site using an Enhanced High Availability topology, if a user runs Remote Diagnostics > Interface Status, the link speed of the Standby Edge interface shows as 0 Mbps / Half-Duplex.

    Speed and auto-negotiation details are not fetched from the Standby Edge where the interface is up, and the details are not displayed correctly.

  • Fixed Issue 89596: A VMware SD-WAN Edge may experience a Dataplane Service failure and restart as a result, disrupting customer traffic.

    This issue can occur when a customer has configured NAT. When a new flow using NAT is established there is a very rare race condition which may trigger an exception in the Edge service that causes a failure and a restart.

    Without a fix for this issue, the only way to prevent the issue is to disable NAT.

  • Fixed Issue 89722: SNMP polling does not work when the SNMP server is on the public internet.

    Routing changes beginning in Release 4.3.x negatively impact customers wanting to use SNMP to poll a VMware SD-WAN Edge from an SNMP server on the public internet. Critically, when an SNMP request comes in on a public WAN link, the response is sent not only using a different interface from the interface the request came in on, but is sent through a SD-WAN Gateway, and this effectively breaks SNMP for this scenario.

  • Fixed Issue 90044: When a VMware SD-WAN Gateway is configured with an ICMP probe and the Gateway is restarted, the ICMP probe does not recover and remains down.

    The ICMP Probe state in debug.py --icmp reads as DOWN after a Gateway restart.

    On a Gateway without a fix for this issue, the workaround is to deactivate the ICMP probe and then reactivate it.

  • Fixed Issue 90098: For a customer enterprise where Branch-to-Branch VPN is configured, in some scenarios a tunnel can be tried endlessly even though it cannot ever come up due to a configuration change.

    The scenario involves an Edge trying to create a tunnel with a peer Edge that is either offline or had an IP address changed. The Edge does not realize the peer is unreachable and endlessly tries to create a tunnel to the non-existent destination which impairs overall performance and cannot be stopped by the customer.

    The issue is caused by the lack of expiration time limit for non-working Branch-to-Branch tunnels. In addition the issue is difficult to troubleshoot because there is no message generated about where the Edge is getting the Branch-to-Branch message reply and there is no debug command on the connected Gateway to display the valid Branch-to-Branch information for a peer.

  • Fixed Issue 90182: A VMware SD-WAN Edge in a BGP environment may experience a Dataplane Service failure and restart to recover.

    There is a race condition in accessing vrf_tun_sk, as one thread is freed another thread de-references it. This issue is more likely to occur in a BGP environment when BGP flaps but could happen even if BGP did not flap.

  • Fixed Issue 90216: Traceroute might not show the correct IP address of a VMware SD-WAN Hub Edge when the traffic flow is from Client > Spoke Edge > Hub > Server.

    If a Spoke Edge has a configured Business Policy to backhaul its traffic to a Hub Edge with Transport Group configured to use Private Wired and Mandatory, when the traceroute packet reaches the Hub Edge, the Hub Edge responds with the incorrect IP address (in this case, the public IP address, instead of the private IP address) to the traceroute.

  • Fixed Issue 90283: A customer may experience poor audio and/or video quality for VoIP and videotelephony calls if Underlay Accounting is turned on for the WAN link being used on the VMware SD-WAN Edge.

    When checking the logs, the user would observe packets for bidirectional traffic where the traffic is asymmetrically routed, and one of the routes is via the underlay. In other words, when the routes for a flow are asymmetric such that in one direction the traffic takes an underlay route and in the reverse direction it takes an overlay path and where Underlay Accounting is toggled on for that WAN link, packet loss may be experienced on bidirectional flows which are typical of, but not limited to, VoIP and videotelephony calls.

  • Fixed Issue 90513: A user may be unable to SSH into a VMware SD-WAN Edge despite having configured an IP Address which was permitted for that activity.

    When adding an IP Address to be allowed for remote SSH into an Edge, there is the potential for a race condition which results in the IP Tables not being updated with a command failure. The result is that SSH is blocked to the Edge.

  • Fixed Issue 90797: A VMware SD-WAN Edge may experience a Dataplane Service failure and restart to recover, causing a 10-15 second disruption in customer traffic for each restart.

    The issue is triggered by an invalid memory access event on the Edge's Deep Packet Inspection (DPI) Engine. The DPI Engine's flows are not cleaned up regularly leading to a memory corruption in that module. The fix for this issue deletes the flows for each DPI Engine flow as soon as the packet classification is completed.

    If this issue happens with Edges that are members of a Hub Cluster, there is also customer traffic disruption since there would be traffic rebalancing after each Cluster member restarts.

  • Fixed Issue 90851: A VMware Edge may experience a Dataplane Service failure and restart if it is deployed in a large scale environment.

    Large scale is defined as one where the total number of Edge VLANs, Routed Interfaces, Loopback Interfaces, and Tunnel Interfaces (from Non SD-WAN Destinations via Edge) exceeds 280 and when that occurs this leads to memory corruption and a service failure.

  • Fixed Issue 90876: DNS fails on a Non-Global Segment for a one-hop away client who is connected to a VMware SD-WAN Edge either by a LAN interface or a routed sub-interface without a Gateway IP.

    The cause of the issue differs depending on which Edge interface type the one-hop away client is using.

    • If the one-hop away client connected to an Edge via a LAN port, DNS resolution fails for the Non-Global Segment as the Edge routes the reply packet to the client is the VCE1 interface and the Edge process treats it as a Global segment. As a result, the reply packet is dropped as a static route is available in the Global Segment routing table.

    • If the one-hop away client is connected to an Edge via a routed sub-interface port which does not have a Gateway IP address and a static route for the client on Orchestrator, then DNS resolution fails for the client as the Edge does not have a route for the client. It matches the connected route and sends an ARP for the destination IP itself and ARP fails, and the reply is not sent.

    For an Edge that does not have a fix for this issue, the workaround for a client using an Edge's LAN is to only use the Global Segment.  For a client using a routed sub-interface, the workaround is to provide a Gateway IP address and, if that is not possible, only use the Global Segment.

  • Fixed Issue 91164: On a customer enterprise deployed with a Hub/Spoke topology where the VMware SD-WAN Hub Edge is configured for High Availability, the HA Hub Edge may not forward internet backhaul traffic after an HA failover.

    The issue is confined to a scenario where the Standby Edge does not set the destination route for internet backhaul flows when the backhaul flow is configured to route via a static route using a non-WAN overlay interface. When the Standby Edge is then promoted to Active in an HA failover these factors cause the internet backhaul traffic to fail.

  • Fixed Issue 91203: For a customer enterprise configured with a Hub/Spoke topology where the VMware SD-WAN Spoke Edge is configured to backhaul traffic through a Hub Edge, a user may observe poor traffic performance for backhauled flows.

    The backhaul leg on the Hub Edge is determined by the Source and Destination route types (in other words, Source = enterprise, Destination = cloud) but this approach may lead to inconsistent behavior as it depends on incidents based on route changes and can result in dropped packets for backhauled flows. The fix for this issue is to make the backhaul leg determination based on the Spoke Edge's messaging.

  • Fixed Issue 91333: For a customer enterprise that uses BGP for routing, if the uplink option is activated, the routing preference is not being updated for uplink routes.

    The user would instead observe an unexpected preference value for uplink routes which is the result of a timing sequence.

  • Fixed Issue 91705: Traffic sent for first time may get incorrectly classified when it is destined for port 443.

    Instead of being classified correctly by a domain like google.com, the traffic is instead classified as the more general APP_TCP which means the traffic may not match business rules configured to match certain applications.

  • Fixed Issue 91720: For a customer enterprise that uses a Hub/Spoke topology, a user can remove a VMware SD-WAN Hub Edge from the Backhaul Hub configuration even though that Hub is being used with a Business Policy configured to use internet backhaul.

    Once a Business Policy for backhauling Spoke Edge traffic through a Hub Edge has been configured, the expected behavior is that the VMware SASE Orchestrator "locks" that Hub Edge and prevents a user from removing it from the Backhaul Hub configuration in the Configure > Device Settings section. However, with this issue the user can remove the Hub Edge and cause significant customer traffic disruption. 

  • Fixed Issue 91875: For a customer who has configured a WAN link as a Backup on a VMware SD-WAN Edge, they may observe the backup WAN link becoming active intermittently even though the conditions requiring the link to become active are not present.

    The issue is caused by a race condition on an Edge process that leads the Edge to erroneously think the backup WAN link is needed and proceeds to build a tunnel for that link which the Edge has no failsafe for detecting and tearing down this erroneous tunnel.

  • Fixed Issue 92027: A customer who has configured dual stack (IPv4 and IPv6) on a VMware SD-WAN Edge interface and has also configured an IPv6 preference on the interface, the customer may observe that both IPv4 and IPv6 tunnels are formed from the WAN link towards Gateway even though tunnel preference is configured.

    Customer might see dual stack tunnels towards a Gateway without honoring the overlay preference. While there is no functionality impact, this is not the expected behavior.

  • Fixed Issue 92400: On a customer site configured with a High Availability topology and where Edge interfaces are configured with subinterfaces, the Standby Edge takes a longer time to converge after an HA failover.

    Both Gratuitous ARP and Nexthop ARP are not sent from the subinterface, and this leads to a convergence time greater than the expected sub-seconds. 

  • Fixed Issue 92454: The Remote Diagnostic > Traceroute does not work when a domain name that only resolves to an IPv4 address is entered in the Destination field.

    If a domain name resolves to an IPv4 address only, the Traceroute command executed through Remote Diagnostics does not work. This is because the VMware SD-WAN Edge always tries to resolve the domain name for the IPv6 record and fails to find the IPv4 address.

    On an Edge without this fix, the workaround is to use the IPv4 address corresponding to the domain name directly in the Traceroute command. The IPv4 address can be obtained by supplying the domain name to the Remote Diagnostic > DNS Test.

  • Fixed Issue 92459: Application map incorrectly classified the Outlook Web Application as a Business Application.

    This issue impacts customers who have specific business policy rules in place to match the Outlook Web Application in terms of how that traffic is prioritized and steered and are instead getting that traffic treated like a general Business Application which would likely have very different treatments defined for it. Issue is related to SD-WAN's DPI (deep packet inspection) engine.

  • Fixed Issue 92686: Router advertisement (RA) routes do not move to Route Reachable = False state when an Edge interface is brought down.

    This can lead to a traffic black-hole for a flow assigned this route. The Edge waits for the route to age-out because the RA does not listen to interface events.

  • Fixed Issue 92708: A VMware SD-WAN Edge running software version 4.5.0 or 4.5.1 may experience a Dataplane Service failure and restart as a result.

    The issue arises out of a lock not being acquired when modifying a particular field, and this leads to the Edge service trying to double free it which leads to an assert being triggered which causes the service failure.

  • Fixed Issue 92758: A site with a High Availability topology may experience several different issues on the VMware SD-WAN HA Edges including in incorrect LED status or an HA failure.

    The incorrect LED status on the Active Edge shows as Yellow instead of Green even though the Edge is up, and the WAN links are up and stable.

    This issue is traced to a shared memory corruption on the Edge which manifests in several forms. This can be confirmed by fetching the counters with getcntr tool for a specific domain such as vcedge.com. The output of the tool shows “Domain does not exist” and the counter name is not found.

    VMware SD-WAN relies on the ftok() system call to derive keys of SYSV shared memory. ftok() uses the last 16 bits of inode for calculating the key. This can cause a key collision when inode numbers differ by at least 64K. When such collision occurs, the dynamic tunnel shared memory counters can corrupt global shared memory variables resulting in several possible Edge issues including an incorrect LED status, inability of counters, or HA failure.

  • Fixed Issue 92927: When a VMware SD-WAN Edge interface is deactivated, the interface continues to have a link with a connected device.

    When an interface is selected to be deactivated through the Orchestrator, it is removed from the Edge's DPDK control and placed under the control of the Edge's kernel driver. However, the script which does this conversion sets the kernel admin up on the interface, so if the interface is connected, it will attempt to autonegotiate with the connected device.

  • Fixed Issue 93052: Client users behind a VMware SD-WAN Edge may observe degraded traffic quality including high latency and slow throughput speeds.

    This immediate cause of the issue is a Path FSM (Finite State Machine) thread running with 100% Edge CPU usage. When an Edge CPU is running at 100% this will result in degraded path quality.

    The reason the Path FSM thread is maxing out the Edge CPU is the result of unreliable counter values which leads the Path FSM thread to conclude that there are more messages in the queue (when there was none actually) which is served by this thread. This resulted in the thread running all the time without sleeping. The fix adds an API which checks actual queue data structures to determine the state of the queue.

  • Fixed Issue 93055: On a customer site deployed with a High Availability topology, a VMware SD-WAN Edge may experience a Dataplane Service failure and restart as a result, triggering an HA failover.

    An HA failover on an Enhanced HA topology would cause disruption for traffic using a WAN link on the Active Edge. This issue is limited to builds prior to Release 5.x.

    When this issue is encountered the logs would show something similar to the following:

    ERROR [edged 7205(7212)] Process edged (pid 3022) exited on signal SIGABRT: restarting after 3.0 seconds INFO [Thread-3 7205(15804)] edged: posted ERROR event EDGE_SERVICE_FAILED

  • Fixed Issue 93062: When a user runs Remote Diagnostics > Interface Status on the VMware SASE Orchestrator, the Orchestrator either returns an error for that test and does not complete or the test does not return results for routed interfaces.

    The error message seen is "error reading data for test". If the test does complete, the results for routed interfaces are empty with no information about speed or duplex. Either way the Interface Status is broken. The issue is related to the debug command that underlies Interface Status omitting DPKD activated ports.

    On an Edge without a fix for this issue, the user would need to generate a diagnostic bundle for the Edge to see the status for routed interface.

  • Fixed Issue 93141: On a site deployed with a High Availability topology, a customer using an L2 switch upstream of the HA Edge pair may observe in the switch logs evidence of an L2 traffic loop, though there is no actual loop.

    The issue is caused by the HA Edge sending the HA interface heartbeat with the Virtual MAC address to the Orchestrator instead of the interfaces actual MAC address, which is caused by the HA Edge storing the Virtual MAC address in its MAC file. As a result the connected L2 switch detects traffic from the same source MAC coming from two different Edge interfaces and would log it as an L2 loop. This issue is cosmetic at the log level as there is no actual L2 loop and there is no customer traffic disruption or loss of contact with the Orchestrator arising from this issue.

    On an HA pair of Edges without a fix for this issue, the customer can safely ignore L2 loop detection events from upstream switches that arise out of the Edge's HA interface (usually GE1). 

  • Fixed Issue 93237: A VMware SD-WAN where 1000 or more Object Groups are configured will experience a Dataplane Service failure and restart to recover, which causes a 10-15 second customer traffic disruption.

    When 1000 or more Object Groups are configured in the Configure > Business Policy page of the Orchestrator UI, the configuration that is pushed to the Edge triggers an Edge memory corruption which causes the Edge service to fail and restart.

  • Fixed Issue 93853: A VMware SD-WAN Gateway under a heavy load may experience a Dataplane Service failure with a SIGXCPU code and restart the service to recover.

    Under heavy load, several Gateway threads performing various activities such as routing and logging are starved of CPU resource and are not able to complete the task within the stipulated time frame. The Gateway service interprets these lagging threads as deadlocked and raises the SIGXCPU signal with a subsequent Gateway Dataplane process termination.

  • Fixed Issue 93965: A VMware SD-WAN Edge may experience a Dataplane Service failure, generate a core, and restart to recover.

    Checking a core reveals that the Edge service exited with a SIGXCPU signal. The Edge operating system has a Unix socket used as a queue for interface event communication between threads. The issue is caused by the socket depth being too small which leads to thread blocking and results in the SIGXCPU signal being raised.

  • Fixed Issue 94014: If a user creates and deletes a large number of Segments, a VMware SD-WAN Edge in the enterprise may experience a Dataplane Service failure, generate a core, and restart to recover.

    When a user, creates and deletes a large number of segments (128 segment enterprise), the Edge does not free counters and these counters are left stale and the Edge process does not clean them up which triggers the service failure.

  • Fixed Issue 94401: On a VMware SD-WAN Edge where the Stateful Firewall is enabled, a TCP Established flow may time out too quickly and get flushed.

    The TCP Established flow, is treated as a TCP Non-Established flow and is subject to a shorter timeout. When there is a TCP Reset (RST) seen in a TCP flow, followed by a TCP 3-way handshake, even though the TCP state shows as Established, the flow gets flushed after being subjected to a Non-Established TCP flow timeout.

  • Fixed Issue 94430: For a customer enterprise that uses a Hub/Spoke topology where multiple Hubs are deployed, a user behind a VMware SD-WAN Spoke Edge may observe issues with traffic that is destined for a Hub Edge.

    Client traffic issues occur when the Spoke Edge forwards traffic towards a Hub different than the one expected to receive the traffic. The issue is caused by the AS path length for the remote BGP routes not being calculated properly in certain scenarios. Because of this, the routes from the Hubs that should have a lower routing preference instead end up having greater AS_PATH length and may be preferred.

    If encountering this issue without a fix, the customer can withdraw and re-advertise the route that is expected to be preferred.

  • Fixed Issue 94532: Where a VMware SD-WAN Gateway using a 4.5.x version, an Operator may observe a large packet even which reads: "VeloCloud Gateway receiving packet size > 1500 bytes. Confirm GRO settings" even though the Gateway's packet size is less than 1500 bytes.

    Issue is caused by the 4.5.x Gateway receiving IPv6 packets, which is not supported for the 4.5.x release and the Gateway should drop those packets without allowing any additional packets in the pipeline.

  • Fixed Issue 94663: For a customer enterprise configured with a Hub/Spoke topology, when running Remote Diagnostics > List Paths for a VMware SD-WAN Edge being used as a Spoke, the output does not show the Hub Edge paths.

    The output only shows the paths to the Gateways due to a missing parameter when running this diagnostic. 

  • Fixed Issue 94775: On a customer enterprise using a Hub/Spoke topology where the VMware SD-WAN Spoke Edge backhauls their traffic through a Hub Edge, client users may observe traffic performance issues.

    The is caused by the wrong flag being set for backhauled traffic, the backhauled packets are handled on the Spoke Edge as if they were on a Hub Edge. This leads to route lookup issues on the Hub and the backhaul packets get dropped.

  • Fixed Issue 94828: If a user deletes an RAS (Remote Access Service) configuration, the VMware SD-WAN Gateway may experience a Dataplane Service failure, generate a core, and restart to recover.

    For RAS subnets, handoff pe_pr is used as pr. For handoff pe_pr, the peer_routes table is never created. However, when deleting the RAS subnets, an attempt is made to delete the RAS subnet from pr->peer_routes table, and this triggers a SIGSEGV and the resulting Gateway service failure.

  • Fixed Issue 94881: When looking at Gateways > Monitor on a VMware SASE Orchestrator UI, the connected peer count does not correspond to the actual number of VMware SD-WAN Edges connected to the Gateway.

    When the VMware SD-WAN Gateway exports its peer count metric to the Orchestrator, it includes not only Edges but Non SD-WAN Destinations, which should be excluded.

  • Fixed Issue 94980: For a site deployed with a High Availability topology, the VMware SD-WAN Standby Edge may experience a Dataplane Service error and restart after a PPPoE WAN link is configured for the HA Edges.

    When examining the core generated by the Standby Edge, a user would see the message vc_is_use_cloud_gateway_set after the PPPoE link is configured.

    There is no workaround for this issue beyond configuring PPPoE links in a maintenance window to manage the risk of this action.

  • Fixed Issue 95047: When a security port scanning utility scans a VMware SD-WAN Edge where Edge Network Intelligence (Analytics) is not activated, the scan will report that Syslog Port 514 is closed, which means it could be accessible.

    Edge Network Intelligence listens on Port 514 (Syslog). If Analytics are not activated, the Port 514 is still accessible, but it will not respond to requests. Therefore, a port scanner reports the port as "closed" (in other words, the port is accessible but there is no application listening on it).

  • Fixed Issue 95073: On a customer enterprise using a Hub/Spoke topology where the VMware SD-WAN Spoke Edge backhauls their traffic through multiple Hub Edges, client users may observe significant issues for backhauled traffic.

    The issue is caused by a route lookup failure on the Spoke Edge for traffic matching a backhaul rule. The Spoke Edge drops the backhauled flows that fail to get a route to a Hub Edge.

  • Fixed Issue 95121: When a "locked SIM" (a SIM that is password locked) is used in a VMware SD-WAN Edge model 510-LTE or 610-LTE, the customer will experience failures establishing connection in the network.

    Users encounter failure in path establishment when using locked LTE SIM cards with the SIM slots of Edge 510-LTE and 610-LTE models because SIM unlock is not working from the Orchestrator and this is due to lack of support for locked SIMs in the Edge's ModemManager scripts.

  • Fixed Issue 95399: The customer does not see either the EDGE_INTERFACE_UP or EDGE_INTERFACE_DOWN events on the VMware SASE Orchestrator.

    The issue is seen only when the customer monitors the events in the Orchestrator. Previously INTERFACE UP/DOWN events would be triggered when the interface goes down or up. With Release 4.5.1 it is not reported, and the customer cannot see the EDGE_INTERFACE_UP or EDGE_INTERFACE_DOWN event. The issue is caused by the dhclient addition in the 4.5.1 release because dhclient was not configured to send these events to the Orchestrator.

    The link alive/link dead events for monitoring can also be used. However, the exact EDGE INTERFACE UP and EDGE INTERFACE DOWN will not be available for monitoring.

  • Fixed Issue 95452: DNS resolution may fail when verifying for a domain via a VMware SD-WAN Edge to a routed client name server.

    When changing the Edge routed interface to DNS proxy, the Edge does not give the interfaces time to come up after the configuration change.

  • Fixed Issue 95501: For a customer enterprise that uses a Hub/Spoke topology and BGP for routing, client users at VMware SD-WAN Spoke Edges may observe poor traffic performance.

    An administrator would observe that the Spoke Edge prefers routes marked with uplink community from a Hub not included in its profile over the Hub Edge configured to be used for that Spoke Edge. This is because the Spoke Edge traffic is taking a Dynamic Branch to Branch path for uplink prefixes.

    The issue is caused by SD-WAN resetting the uplink flag for routing messages received from a Hub Edge. As a result, when a Dynamic Branch to Branch tunnel is formed, direct routes are installed for these uplink prefixes leading to suboptimal routing and degraded traffic performance.

  • Fixed Issue 95503: In rare instances a customer may observe that a VMware SD-WAN Edge model 610, 610N, or 610-LTE shows the same MAC address for all Ethernet interfaces.

    An Edge 610 (any type) may show an eth0 MAC address ending with 0xF*. In such cases, GE1 through GE6 ports receive the same MAC address due to an issue with the script that calculates and allocates MAC addresses.

    The fix corrects this script behavior, and an affected Edge 610 type would properly calculate and allocate unique MAC addresses once the Edge is upgraded to a build that includes it.

  • Fixed Issue 95603: If a Zscaler server changes its IP address, the DNS lookup continues to use the old IP address which causes Non SD-WAN Destination (NSD) tunnel failure.

    If a remote server changes its IP address, the L7 health check fails and does not recover. The fix for this issue discovers the IP address change and flushes the L7 health check table.

    On a VMware SD-WAN Edge without a fix for this issue, rebooting the Edge reestablishes the tunnel.

  • Fixed Issue 95650: When a user configures a VMware SD-WAN Edge's interface settings when there are 128 segments configured, the Edge may experience a Dataplane Service failure, generate a core, and restart.

    When the Edge is configured with 128 segments, the iptables rule application takes too much time to complete and this leads to a mutex monitor exception and the Edge service failure.

  • Fixed Issue 95821: For a customer site configured with a High Availability topology, if the user configures the VMware SD-WAN Edge's certificate mode as "Acquire", three certificates are generated with an integrated Certificate Authority.

    This issue has no functional impact, but the HA Edges should only generate two certificates, one for each HA Edge. The issue is caused by a missing check for certificate validation on the Standby Edge, which resulted in the Edge sending an additional CSR to the Orchestrator, resulting in the generation of an additional certificate.

  • Fixed Issue 95850: On a customer enterprise where OSPF is used, when a user generates a diagnostic bundle for a VMware SD-WAN Edge, the OSPF routes may flap during the bundle generation resulting in disrupted customer traffic.

    As part of the diagnostic bundle generation the commands vcdbgdump -r remote-routes and vcdbgdump -r remote_routes are run. As these commands take more than 40 seconds in a customer environment, the OSPF hellos that were queued to the event dispatcher thread were not processed. Due to this, the OSPF neighborship flaps, causing a network outage.

    On an Edge without a fix for this issue a customer should either not generate a diagnostic bundle except in a maintenance window or reach out to VMware SD-WAN Support to generate the bundle as they have internal tools to prevent the issue from occurring on a temporary basis.

  • Fixed Issue 95968: When two DHCP stateless IPv6 addresses are configured on an interface and one address is deprecated, the VMware SD-WAN Edge continues to use the deprecated address as the source address for newly created flows.

    When the Router Advertisement (RA) is periodically received, the Edge resets the deprecated flag for an entry even if a preferred value is 0 in the incoming router advertisement. This causes the source address selection to choose this as the best entry based on the longest prefix match until the time the Edge sets the deprecated flag again.

  • Fixed Issue 96084: A VMware SD-WAN Edge may not be able to reach its VMware SASE Orchestrator due to a source route lookup failure.

    The Edge would show as offline on the Orchestrator because of this, but it only affects management traffic and customer traffic is not affected. This affects flows that go from the Orchestrator through the Gateway to the Edge and is caused when a customer has a Non SD-WAN Destination configured where the peer datacenter IP address matches the Orchestrator's IP address.

  • Fixed Issue 96385: VMware SD-WAN Edge traffic to the Orchestrator via the Gateway may get dropped after the Gateway is upgraded to 4.5.1 resulting in the Edge showing offline on the Orchestrator.

    This issue happens to Gateways configured with 64 GB RAM capacity and is causes by the NAT service exiting continuously after the upgrade to 4.5.1. When Edge via Gateway to the Orchestrator fails, SD-WAN will switch this traffic to Direct, so the impact is temporary and impacts management traffic only.

  • Fixed Issue 96411: A VMware SD-WAN Edge may experience a Dataplane Service failure and restart as a result, resulting in a 10-15 second interruption in customer traffic.

    The issue can occur on an Edge where frequent link flaps (where a WAN link goes down and then returns in rapid succession). The issue is caused by a memory corruption which results in a double free state and an Edge service failure.

  • Fixed Issue 96553: For a customer enterprise site deployed with a High Availability topology, a user may observe that the VMware SD-WAN Edge used as a Standby has a CPU utilization is very high despite no traffic passing through this Edge.

    As noted, this issue can occur whether the Standby Edge is in standard HA or enhanced HA where it would be expected to pass traffic. The issue is caused by a ha_worker thread running 100% on the Standby Edge. The impact in standard HA is minimal but in Enhanced HA this can impact performance for the traffic using the Standby Edge.

  • Fixed Issue 96626: When a VMware SD-WAN Edge interface has a secondary IP address assigned to it, connections through the secondary IP address fail.

    Any request coming from another branch to an IP in the secondary network will generate an ARP from the primary IP address rather than the secondary IP address. As a result, the ARP would remain unresolved, leading to failure in the traffic going through the secondary IP address.

  • Fixed Issue 96739: When a user looks at the Monitor > Application tab for a VMware SD-WAN Edge on a VMware SASE Orchestrator, the screen may show Destination FQDNs with the wrong domain names.

    This issue can occur when the Edge's statistics reach its limit (known as an overflow condition), and instead of displaying these statistics as Overflow, the Orchestrator displays random domain names on the Application tab's Destination FQDN.

  • Fixed Issue 96799: A VMware SD-WAN Edge may experience a Dataplane Service failure and restart as a result when handling ICMPv6 error packets.

    In a high scale setup (test system was a 7 site enterprise where each Edge has three segments and 1K routes per segment), the Edge service may fail when attempting to send ICMPv6 error packets. This is a rare occurrence and happens when the Edge service cannot allocate any additional packets.

  • Fixed Issue 96863: A VMware SD-WAN Edge where the WAN links prefer IPv6 may experience an Dataplane Service failure, resulting in a brief disruption of customer traffic.

    The issue can occur on either an Edge or a VMware SD-WAN Gateway when IPv6 is activated, resulting in a service failure and a restart to recover.

  • Fixed Issue 96870: A VMware SD-WAN Edge may experience a Dataplane Service failure and restart when handling ICMPv6 error packets.

    For this ticket there are multiple scenarios where an ICMPv6 error packet may trigger an Edge service failure, however the most likely cause is self-generated, self-destined packets.

    This is similar to Fixed Issue 96799, which also involves ICMPv6 error packets, but the cause and fix are different.

  • Fixed Issue 96925: For a customer enterprise deployed with a High Availability topology, the VMware SD-WAN in the Standby role may experience a triple Dataplane Service failure and stop passing traffic until the service is manually restarted or the Edge is rebooted.

    If an Edge has three consecutive failures of its Dataplane Service, the Edge does not recover after the third one and remains deactivated. The issue is triggered when the HA Edge has more than 60 tunnels. In a standard HA setup this has less impact but in an enhanced HA setup where the Standby Edge is passing traffic on its WAN links, that traffic will be dropped. As noted, the only remediation to this issue is manually restarting the Standby's Edge service or rebooting the Standby Edge.

  • Fixed Issue 97073: For a customer enterprise site using an Enhanced High Availability topology, after an HA failover the users may observe that one of the WAN links goes down.

    Due to Enhanced HA link cleanup not being done properly, even if the link is up on the Active Edge, SD-WAN sends packets meant for that link to the Standby Edge. This causes link connectivity to go down.

    On an HA site not using a build with a fix for this issue, the link state can be corrected by a reboot of the Active Edge which triggers an HA failover. However, depending on the sequence of events happening when Enhanced HA is setup for an interface, this may not work with the first attempt. If the initial reboot of the Active Edge which forces an HA failover does not work, initiate additional reboots until the link is recovered.

  • Fixed Issue 97152: When a customer enterprise has a Business Policy configured with a Service Group as anything wired and Link Mode as "Available", traffic is not steered over to a wireless link when the wired link(s) goes down and client users at the site would observe their traffic that matches that rule is failing.

    When a Business Policy rule has explicit Service Group of wired WAN links with a Link Mode of Available, and there are wireless links available at the site, the expectation is that traffic using that rule would fail over to the wired WAN link(s) if the wired links in the service group went down (in other words, became unavailable) to ensure the seamless flow of traffic matching that rule. In this issue, steering the traffic to the wireless links is not occurring.

  • Fixed Issue 97225: Primary and secondary networks are not installed in the routes of other VMware SD-WAN Edges after toggling primary and secondary IP addresses, resulting in several issues related to IPv6 addresses.

    Some of the ways this issue impacts a customer includes:

    • IPv6 addresses missing on interfaces.

    • Tunnels not forming with IPv6 addresses.

    • Edge-to-Orchestrator communication being broken which has serious implications because the Orchestrator will mark the Edge as offline and will not allow further control or configuration of the Edge through the Orchestrator UI.

    The issue is caused by a race condition between the Edge Dataplane process and the Edge's network interface daemon (netifd) when they are restarted due to a change in a VLAN IP address which leads to the IPv6 addresses being removed from interfaces, which leads to the above impacts.

  • Fixed Issue 97272: On a site deployed with a High Availability topology and which uses OSPF, when the HA Edges get into an Active-Active "Split Brain" condition, the users may observe that all traffic drops.

    All traffic drops due to the Standby Edge setting the link-state advertisement (LSA) age to the maximum value (3600) in the core-router, which causes the removal of the default route. When operating properly, the core-router has the LSA age synchronized with the Active Edge, but in an Active-Active state is encountered, the Standby Edge is also Active, and it sends the LSA to the core-router. Both Active and Standby Edge have the same router ID but are sending different LSA ages to the core router. On seeing the mismatch, the core router sets the LSA age to the maximum value of 3600 and leads to the OSPF default route being removed.

    On an Edge without a fix for this issue the customer would need to restart OSPF on the Active Edge once the Active-Active state is resolved.

  • Fixed Issue 97321: From the time a user activates Edge Network Intelligence Analytics on a VMware SD-WAN Edge, the Edge can potentially trigger an Edge Service restart, each instance of which causes 10-15 seconds of customer traffic disruption.

    When Analytics is enabled on the Edge, the Edge can experience an out of memory condition followed by a "double free" memory state. The Edge restarts its service to restore memory.

    The symptoms for this issue can happen multiple times while Analytics are activated.

  • Fixed Issue 97759: For a VMware SD-WAN Gateway where Non SD-WAN Destinations via Gateway are connected, if the Gateway receives a packet from the NSD with a time to live (TTL) value of 1, the Gateway can experience a Dataplane Service failure and restart to recover.

    This issue can be experienced multiple times on the Gateway if the NSD is repeatedly sending packets with TTL =1 values. The issue is caused by the Gateway treating a TTL with value 1 as having value 0 which results in a double free memory condition and the resulting service failure.

  • Fixed Issue 97992: On a site deployed with a High Availability topology, if an HA failover is triggered or the VMware SD-WAN Edge in the Standby role is restarted, the Active HA Edge may experience a Dataplane Service failure and restart, triggering another failover.

    The issue is rare but when it does occur the Active Edge receives a route synchronization from the Standby Edge as a result of the failover of the Standby Edge's restart, and if this route list is corrupted with invalid address family segment-ID, this leads to an assert during the route installation and triggers the Active Edge to have a service failure.

  • Fixed Issue 97997: The Default route via the default-originate is not originated by the Edge.

    The route-map string associated with default-originate is truncated and doesn't match any route-map/prefixes. Therefore, the default route does not get originated.

  • Fixed Issue 98223: When Edge Network Intelligence Analytics is activated on a VMware SD-WAN Edge, the Edge may lose contact with the VMware SASE Orchestrator and cause the Orchestrator to mark the Edge as down on the Orchestrator UI.

    When Analytics is activated, the Edge communication with the Analytics backend sometimes gets mixed with the Edge communication with the Orchestrator. This results in a loss of communication with the Orchestrator which causes the Orchestrator to declare that the Edge is down when it is not.

  • Fixed Issue 98514: On a customer enterprise deployed with a High Availability topology, whenever a configuration change is applied to the VMware SD-WAN HA Edges, the user would observe an event stating "Management service failed" on the Standby Edge and that the management service is restarting as a result.

    Since this is the management service (which does not involve customer traffic), and on the Standby Edge, there is no negative impact to client users at the HA site when the Standby Edge's management service restarts. This is still a critical event recorded in Edge Events that would greatly concern the customer administrators.

  • Fixed Issue 98694: When a customer enterprise is configured with redundant static routes, if the primary route goes down, the alternate route(s) is not advertised, and traffic is dropped.

    When an interface goes down on a VMware SD-WAN Edge, the alternate routes are not advertised to the VMware SD-WAN Gateway even though the routes via the interface are now unreachable. Routes for the prefix will not be present in the Gateway even though there are alternate routes through other interfaces for those prefixes on the Edge. The issue is because the SD-WAN service sends a route delete without checking if there is an alternate reachable static route while handling an interface down.

  • Fixed Issue 99215: On VMware SD-WAN Edge models 610, 620, 640, and 680, when the user deactivates the SFP1 interface or configures it as switched, the SFP2 interface may stop receiving packets.

    On these Edge models, the SFP2 interface may stop receiving packets if SFP1 is configured as routed and the user chooses to either deactivate SFP1 or reconfigure SFP1 as switched.

  • Fixed Issue 99303: On a site deployed with a High Availability topology, the VMware SD-WAN Edge in the Standby role may experience a Dataplane Service failure, generate a core file, and restart to recover.

    The Standby Edge may suffer a service failure when handling a large number of flow/route synchronization messages from the Active Edge. On an Enhanced HA deployment, where the Standby Edge is also passing traffic, the restart would disrupt some customer traffic.

  • Fixed Issue 99714: When looking at the logs for a VMware SD-WAN Edge, a user may encounter repeated messaging that a "CWS Policy" is missing even though the customer does not use Cloud Web Security.

    The user would observer something similar to the following:
    
    2022-10-12T13:33:35.009 MSG	[ROUTING] vc_get_cws_pol_id:200 [S] CWS Policy info missing for 00000000-0000-0000-0000-000000000000

    This log is sent ~900 times per second and because of this the logs on the Edge roll over every ~10 minutes. While this messaging has no functional impact on the Edge, their frequency makes troubleshooting any Edge issue very difficult.

    The message can potentially show up if there is a Backhaul Business Policy configured and as noted above has nothing to do with whether the customer has configured Cloud Web Security.

  • Fixed Issue 99718: BGP neighbor does not get established when the secondary IP address on an SVI (Switch Virtual Interface) is used.

    When the Edge processes ingress packets, it verifies if the ingress packet's destination address matches with the ingress interface's IP address. Since only primary IP addresses are compared, packets with a destination IP address as a secondary IP address are dropped. As a result, the BGP session is not formed on this secondary IP.

  • Fixed Issue 99942: When a VMware SD-WAN Gateway is upgraded, the Wavefront metrics may not be exported with an error showing on the Gateway logs.

    The Gateway log error would read: 2022-08-19T11:52:00Z E! [inputs.exec] Error in plugin: found ":", expected equals.

    Wavefront metrics are not exported if the enterprise name is in double quotes, which is the case for point tags and any user data in those double quotes is not uploaded to the Gateway.

  • Fixed Issue 100005: VMware SD-WAN Edge model 610 or 610-LTE returns incorrect value for ifSpeed - OID 1.3.6.1.2.1.2.2.1.5.

    When queried for interface speed, an Edge model 610 returns an incorrect value because the DPDK AF_PACKET driver has a hardcoded speed of 10G.

  • Fixed Issue 100010: For a VMware SD-WAN Edge which is deployed with private WAN links that are configured for both IPv4 and IPv6, when a user generates a diagnostic bundle or runs the Remote Diagnostic "List Paths" the Edge may experience a memory leak.

    When traffic is run through private links, SD-WAN first checks if the IPv4 address is configured and if so, the value is saved to a JSON file and again SD-WAN checks if the IPv6 address is configured. And if there is an IPv6 address, SD-WAN overwrites the previously stored value before appending it to the JSON array which leads to a memory leak. The greater the scale of traffic being processed by the Edge, the greater the leak of memory when the triggering actions are performed.

  • Fixed Issue 100172: If a user attempts to SSH to an Edge via a VMware SD-WAN Gateway, the Gateway may experience a Dataplane Service failure and generate a core followed by a restart to recover.

    The Gateway can encounter this issue when a user is attempting to SSH to an Edge via the Gateway and that SSH session generates a FRAG_NEEDED ICMP error message.

  • Fixed Issue 100237: For a customer enterprise where a Partner Gateway is used and the PG advertises a secure default route to a VMware SD-WAN Gateway, a client user may experience a failure to download a file direct from the Internet.

    The full scenario includes the Edge using multiple WAN links, having "Secure Default Route Override" configured, and a Business Policy created where the Network Service parameter is set to Direct. In this scenario the flow for traffic using this Business Policy can choose a different WAN IP address each time alternatively and a download would fail.

    Without a fix for this issue, the user must configure the Business Policy to limit all traffic to one WAN link by marking it as Mandatory.

  • Fixed Issue 100363: A VMware SD-WAN Gateway may experience a Dataplane Service failure and trigger a restart of the service, which results in a 1-5 second disruption of traffic.

    This issue occurred during stress testing with the failure occurring at futex_abstimed_wait and the result of a deadlocked thread that triggers the service failure and restart.

  • Fixed Issue 100795: A VMware SD-WAN Edge may experience a Dataplane service failure when it is upgraded to new software.

    Despite the failure, the Edge does not generate a core with this issue, which occurs because the Edge service parses the route-add internal event incorrectly.

  • Fixed Issue 101102: When a VMware SD-WAN Edge that is initially assigned to a Hosted Gateway is reassigned to a Partner Gateway, SSH to the Edge through the Gateway stops working.

    When reassigned to a Partner Gateway, the Edge loses the vce1 IP address and as a result SSH through the Gateway does not work.

    On an Edge without a fix for this issue, a user initiated Edge Service restart remediates the issue.

  • Fixed Issue 101431: For a customer who subscribes to Edge Network Intelligence, when the user activates Analytics on a VMware SD-WAN Edge, the dashboard may display the message "No Management IP Assigned" for the Edge.

    In rare cases, the Edge does not send the Management IP address to the Edge Network Intelligence backend, and this results in the above message.

  • Fixed Issue 101448: On a customer site configured with a High Availability topology, when the customer moves from an Intermediate Certificate Authority (CA) to an Integrated CA, the VMware SD-WAN Edge in the Standby role may not generate a certificate.

    The Standby Edge fails to generate a certificate when moving from intermediate certificate to integrated certificate and both HA Edge must have a certificate as this will cause issues when the Standby Edge is promoted to Active.

  • Fixed Issue 101592: For the physical version of a VMware SD-WAN Edge (for example, Edge 610, Edge 3200), if a user unplugs the cable from an Edge interface and then replugs it back in (also known as a hotplug event), the Edge does not always record that the cable is back in the interface and the Orchestrator would show that interface as offline.

    The Edge uses a process called netifd to bring up a configured interface automatically. For a hotplug event, the socket buffer size is not specified which can lead to a socket RX queue overrun.  This prevents netifd from handling hotplug events properly.

  • Fixed Issue 101753: A VMware SD-WAN Edge may appear offline on their VMware SASE Orchestrator even though they are up and passing traffic.

    The issue occurs because the Edge continue to source traffic to the Orchestrator from an IP address that is no longer available, so the return traffic is dropped as a result.

  • Fixed Issue 102026: Path loss will be reported at low throughput (less than 50Mbps) when the VCMP endpoints are in the same subnet for a VMware SD-WAN Gateway.

    When ARP refresh kicks in, there is a short interval where the interface mask is incorrectly set to /32. During this interval, there is a possibility where malformed packets (packets with no L2 encapsulation) could get sent out.

  • Fixed Issue 102607: A VMware SD-WAN Edge may experience a Dataplane Service failure if it is using a Non SD-WAN Destination via Gateway or Edge and BGP over NSD is also configured.

    Issue can be encountered when an NSD datacenter route and an Edge-to-Edge route use the same prefix. In this scenario, the packets destined and encrypted for the DC can reach the SD-WAN management tunnels and this can cause a memory leak or even a service failure.

  • Fixed Issue 102607: A VMware SD-WAN Edge may experience a Dataplane Service failure if it is using a Non SD-WAN Destination via Gateway or Edge and BGP over NSD is also configured.

    Issue can be encountered when an NSD datacenter route and an Edge-to-Edge route use the same prefix. In this scenario, the packets destined and encrypted for the DC can reach the SD-WAN management tunnels and this can cause a memory leak or even a service failure.

  • Fixed Issue 102693: On a site configured with a High Availability topology, when a user attempts to determine what software version and factory build the VMware SD-WAN HA Edges are using, those fields may show as empty on the VMware SASE Orchestrator.

    When HA is activated for a pair of Edges, the Edges may not send the factory software and build versions to the Orchestrator on the initial heartbeat and as a result the Orchestrator cannot display them.

  • Fixed Issue 103116: When running Remote Diagnostics > Traceroute where the source is Gateway and the destination is a remote IP address with multiple midway hops, the output does not display all the hops properly.

    Because of an unexpected limit on the UDP port range being used, some traceroute UDP packets are not handled properly in the Edge. Because of this all midway hops are not displayed when the destination is a large number of hops away.

  • Fixed Issue 103503: A VMware SD-WAN Edge may not be able to add new DNS entries to its cache as it has reached capacity and is exhausted.

    The Edge's DPI (Deep Packet Inspection) engine is adding an IP address to the hostname in the Edge's DNS cache and these unexpected IP addresses can exhaust the DNS cache and hinder performance related to quickly retrieving domains.

  • Fixed Issue 103527: A PPTP (Point-to-Point Tunneling Protocol) session is not reestablished after disconnecting.

    After a PPTP session is reconnected, the Edge sees the call request/reply re-transmits, but on receiving the transmitted reply the Edge returns an error without clearing the GRE-NAT entry. Further connection attempts are dropped due to the existing GRE-NAT entry.

    On an Edge without a fix for this issue, the workaround is to clear the NAT database by running Remote Diagnostics > Flush NAT.

  • Fixed Issue 103529: For a customer site using a High Availability topology where one or more 1:1 NAT rules are configured, after an HA failover traffic using a 1:1 NAT rule may be dropped.

    For an Edge in an HA setup with 1:1 NAT configured, when the respective flows are synchronized to the Standby Edge, the wrong destination route is selected due to missing information in flow-sync table, and wrong route selection can cause drops.

    On an HA Edge pair without a fix for this issue, running Remote Diagnostic > Flush Flows will temporarily resolve the issue until the next HA failover.

  • Fixed Issue 103558: On a customer enterprise using Edge Network Intelligence, when Analytics is activated for a VMware SD-WAN Edge the ENI dashboard may display "No Management IP Assigned" for that Edge.

    When Analytics is enabled, in rare cases the Edge does not send the Management IP address to the Edge Network Intelligence back-end.

  • Fixed Issue 103700: An application may get misclassified by SD-WAN and matched to the wrong Business Policy or Firewall Rule despite that application having a customized entry in the customer's application map.

    Applications in an application map with a mustNotPerformDpi tag can still be classified via SD-WAN's Deep Packet Inspection (DPI) engine. In a large scale deployment, a collision can occur while looking up the application classification via fast database cache. Due to this although an application is configured with mustNotperformDpi, it will still be classified via DPI with a potentially unexpected classification.
  • Fixed Issue 103708: When new rules are added in a BGP filter configuration, there may be unexpected BGP routes received and sent by the VMware SD-WAN Edge.

    When new rules are added to the BGP filters from the Orchestrator, the prefix lists are added in the Edge's routing configuration without removing the old entries. This behavior results in stale route prefix lists and unexpected filtering behavior.

  • Fixed Issue 103962: The IPv6 and IPv4 connected routes redistributed into OSPFv3 or BGPv6 have different metrics, which can result in different routing for IPv6 traffic versus IPv4.

    Currently the IPv6 connected routes corresponding to a routed interface are installed with a different metric than IPv4 connected routes on the same interface. This is because of the different metrics given by the Edge OS kernel for IPv4 and IPv6 routes. When they are redistributed into dynamic protocols like OSPF/BGP this difference in the IPv4/IPv6 metrics is propagated.

  • Fixed Issue 103983: For a VMware SD-WAN Edge which has a Non SD-WAN Destination via Edge using redundant tunnels and has turned on the L7 Health Check feature, when the primary tunnel goes down, the backup tunnel also goes down, resulting in all traffic dropping that uses this NSD.

    The issue is caused by L7 probes going out the wrong path and this causes the Edge to view the secondary tunnel as also down along the primary tunnel when the probes fail for the primary tunnel.

  • Fixed Issue 104046: For a customer site deployed with a High Availability topology, the VMware SASE Orchestrator may display the Standby Edge as up when it is actually down.

    The scenarios where this happens are when either the HA Interface cable is disconnected between the HA Edges, or the Standby Edge is powered down. The issue is caused by the HA Active Edge sending an Active status when the Standby Edge is down due to a check by the Active Edge's management process that refers only to whether HA is configured while ignoring the actual status of the Standby Edge.

  • Fixed Issue 104141: Users behind a VMware SD-WAN Edge or customers connected to a VMware SD-WAN Gateway may experience significant issues for any traffic that is using that Edge or traversing that Gateway to the point that no traffic may be forwarded.

    When the issue is encountered, the Edge or Gateway has an unbounded number of memory buffers (mbufs) being consumed by the jitter buffer queue due to increasing management tunnel time stamps received from a peer. This triggers integer underflow in the jitter calculation, causing packets to be buffered effectively indefinitely. At first this only affects buffered flows, but over a long enough period the number of mbufs consumed for the jitter buffer queue approaches the total of mbufs available and the SD-WAN device (Edge or Gateway) can become unable to forward all traffic entirely. If this affects a Gateway it would only affect multi-path traffic that traverses the Gateway and customer traffic going direct would not be affected.

    Another ticket, #105744 also addresses the symptoms found here but fixes a separate cause. The difference between the two tickets: the fix included in #104141 addresses the memory buffers being consumed by the jitter buffer queue due to the increasing management time stamps received by the peer. The fix included in #105744 restricts the jitter buffer count to 25% of the total memory buffers no matter what else happens to ensure that this issue cannot recur.

    Without a fix for this issue for either the Edge or Gateway, a user can monitor the memory buffer (mbuf) usage on the Orchestrator and look for increased mbuf usage due to packets being queued in the jitter buffer. If the user does observe the issue, they can flush flows for the Edge (through Remote Diagnostics) or Gateway to temporarily alleviate the issue but the issue would eventually recur until the fix was applied.

  • Fixed Issue 104275: A VMware SD-WAN Edge that is activated to a site where High Availability is also configured may show offline if the Edge's GE1 interface is down.

    When activating the Edge with "Active Standby Pair" configured and no cable connected on GE1, the null entries are added into the DNS cache which causes the resolution failure and so the Edge is shown as offline.

  • Fixed Issue 104487: Customer sites whose VMware SD-WAN Edges use a particular VMware SD-WAN Gateway as their Primary Gateway may experience issues with user traffic destined for the Gateway because the Gateway cannot connect to the internet even though it shows as up on Orchestrator monitoring.

    When this issue occurs, the Gateway fails to transmit packets to the remote access service (RAS) due to these packets becoming stuck in the Gateway's transmit queue, as a result Edges connected to this Gateway cannot build tunnels to it. This issue occurs only for data packets containing customer traffic and not for keepalive packets between the Gateway and the RAS, which is why the Gateway will continue to show as up on Orchestrator monitoring despite the issue occurring. Customer traffic tagged as Direct to the internet would not be affected by this issue as it does not use a Gateway to reach the Internet.

  • Fixed Issue 105102: When High Availability is configured for a customer site, there is a chance a VMware SD-WAN Edge may be deactivated and go into an offline state with no customer traffic passing.

    When the issue is encountered it is the result of 1 or 2 heartbeats from the Standby Edge being leaked and the Orchestrator sees both HA Edges heartbeating and because one of the heartbeats does not match the Edge serial number, the Orchestrator initiates a deactivation of the HA Edge. This issue can be disruptive in an Enhanced HA topology since both Edges are passing traffic.

  • Fixed Issue 105433: For a site using an Enhanced High Availability topology, the VMware SD-WAN HA Edges may go offline with the VMware SASE Orchestrator if a WAN interface on the Standby Edge flaps.

    The Standby Edge is not synchronizing the dynamic IP Address update to the Active Edge when the interface state is changed and due to this connectivity between the HA Edge site and the Orchestrator fails. This only affects management traffic and does not affect customer traffic.

  • Fixed Issue 105440: When the Data Type for DHCP Option 43 is set to "Text" and the "Value" is configured as a text string that begins with a numeral, the option is ignored, and an error is reported.

    A typical example of this issue is when the Option 43 Value is configured as an IP address. The user sees an Event with a message "messages" : "dhcp.py:527: Invalid value for option 43: <text string configured>, ignored".

  • Fixed Issue 105514: The IPv6 and IPv4 connected routes redistributed into OSPFv3 or BGPv6 have different metrics.

    Currently the IPv6 connected routes corresponding to a routed interface are installed with different metric than IPv6 connected routes on the same interface. This is because of the different metrics given by kernel for IPv4 and IPv6 routes. When they are redistributed into dynamic protocols like OSPF/BGP this difference in the IPv4/IPv6 metrics is propagated.

  • Fixed Issue 105744: Users behind a VMware SD-WAN Edge or customers connected to a VMware SD-WAN Gateway may experience significant issues for any traffic that is using that Edge or traversing that Gateway to the point that no traffic may be forwarded.

    This ticket and Issue #104141 are directly related and have the same symptoms and cause which will be repeated here: when the issue is encountered, the Edge or Gateway has an unbounded number of memory buffers (mbufs) being consumed by the jitter buffer queue due to increasing management tunnel time stamps received from a peer. This triggers integer underflow in the jitter calculation, causing packets to be buffered effectively indefinitely. At first this only affects buffered flows, but over a long enough period the number of mbufs consumed for the jitter buffer queue approaches the total of mbufs available and the SD-WAN device (Edge or Gateway) can become unable to forward all traffic entirely. If this affects a Gateway it would only affect multi-path traffic that traverses the Gateway and customer traffic going direct would not be affected.

    The difference between the two tickets: the fix included in #104141 addresses the memory buffers being consumed by the jitter buffer queue due to the increasing management time stamps received by the peer. The fix included in #105744 restricts the jitter buffer count to 25% of the total memory buffers to ensure that this issue could not recur.

    Without a fix for this issue for either the Edge or Gateway can be monitored on the Orchestrator where a user would look for increased mbuf usage due to packets being queued in the jitter buffer and the user can flush flows for the Edge or Gateway to temporarily alleviate the issue, but the issue would eventually recur until the fix was applied.

  • Fixed Issue 106123: VMware SD-WAN may misclassify packets due to the DPI (Deep Packet Inspection) Engine not being the most current build.

    When an Edge or Gateway misclassifies a packet, this can lead to numerous issues for SD-WAN customers. The fix upgrades the Edge and Gateway's DPI engine to the most current build which ensures that the SD-WAN service is classifying customer traffic at a higher level of accuracy.

  • Fixed Issue 106252: For a site using an Enhanced High Availability topology, some flows may not get installed on the VMware SD-WAN Edge serving as the Standby.

    For flows that are peer initiated (originated from a different Edge), while synchronizing the flow from the Active Edge to the Standby Edge, the installation of the flow fails on the Standby. This results in such flows to not be present on the Standby which can cause traffic disruption after an HA failover.

  • Fixed Issue 106289: A VMware SD-WAN Hub Edge may drop packets on flows to connected Spoke Edges or backhaul flows.

    The Backhaul flow flag is set during the QoS synchronization process, there is a place in the code where it sets it during flow creation. The Edge should set this flag only as a result of QoS synchronization message processing.

    Should this issue be encountered on a Hub Edge without a fix, flush the flows on the Hub Edge to temporarily remediate the issue.

  • Fixed Issue 106627: A customer who uses Layer 7 (L7) Health Check with a Non-SD-WAN Destination (NSD) or Cloud Security Service (CSS) where redundant tunnels are also configured may see that all tunnels show as down even though they are up.

    The issue is caused by the VMware SD-WAN Edge sending the L7 probes to the back-up tunnels instead of the primary tunnels and thus triggering a false indication that the tunnels are down.

  • Fixed Issue 106700: For users who have configured a loopback interface as the source interface for a Layer 7 (L7) Health Check on a VMware SD-WAN Edge, if a user changes any parameter of the loopback interface the L7 probes may fail, and the IPsec tunnels associated with that L7 Health Check would report as down.

    When the loopback interface configuration is changed in any way the L7 probes can be sent to an interface designated as “None” with IP address 0.0.0.0 and thus the probes fail which results in the IPsec tunnel being marked as down.

  • Fixed Issue 106721: On a site deployed with a High Availability topology, after an HA failover, an ICMP probe is in an INIT state for an unreachable next-hop and yet the corresponding static routes are advertised to the OSPF and BGP neighbors.

    This would attract traffic for the prefixes that are unreachable and potentially impact customer traffic.

    On HA Edges without a fix for this issue, the workaround is to deactivate and then reactivate the ICMP probe.

  • Fixed Issue 106898: On a customer site deployed with an Enhanced High Availability topology, the VMware SD-WAN Edge serving as the Active may fail to get the dynamic IP address from a DHCP server.

    In Enhanced HA, the Active Edge fails to get the dynamic IP address when a Standby Edge's subinterface where DHCP is configured is flapped.

  • Fixed Issue 106913: Hub external routes are not advertised to a Non SD-WAN Destination via Gateway via BGP on a VMware SD-WAN Gateway.

    This issue is the result of an inherited behavior from BGP on a Partner Gateway. It was intentional for Handoff BGP to avoid redistributing OSPF HUB external routes into a PG BGP to avoid a loop and NSD BGP inherited this behavior from the PG BGP.

    If the customer is experiencing this issue using a Gateway without a fix, the workaround is to add a static route in the Hub and advertise it.

  • Fixed Issue 107216: When running the Remote Diagnostic "Interface Status", the output displays an inaccurate link speed.

    When an interface is selected for autonegotiate 'off', the interface is no longer running under DPDK with a silicon driver. The new driver in use for DPDK is 'af_packet' which leverages the underlying kernel driver. The new manual speed is not set after a PCI unbind from DPDK back to the kernel. As a result the link speed when running the ethtool debug command used by Interface Status, the result is inaccurate.

  • Fixed Issue 107302: If the source interface chosen for Layer 7 (L7) Heath Check probes is modified, including changing the IP address, the IPsec tunnel associated with the L7 Health Check may be marked as down for up to 30 seconds.

    It may take up to 30 seconds for the probes to be corrected. This may lead to the IPsec tunnel being marked down if enough probes fail before the configuration is corrected.

  • Fixed Issue 107309: When a customer configures the L7 Health Check for a Non SD-WAN Destination via Edge on a 4.x Orchestrator and the Orchestrator is upgraded to Release 5.x, if the customer attempts to modify the L7 probe retry value, the Edge does not apply the new value.

    For example, if the L7 Health Check probe retry value is 3 (the tunnel is marked as down on 3 failed probes) and the customer changes this value to 1, the L7 Health Check continues to use the original value of 3 retries before the tunnel is marked down. This issue is fixed on the Edge build as the Edge is not applying the new configuration it is receiving from the Orchestrator.

  • Fixed Issue 107317: For a customer using SNMP where the server is located on the internet/cloud, an SNMP walk may succeed for some VMware SD-WAN Edge interfaces and fail with a timeout on other interfaces.

    When the SNMP walk fails the SNMP request enters one interface but exits a different interface and these response packets never reach the SNMP server. The issue is caused by the Edge incorrectly classifying SNMP traffic such that it steers these packets to a different interface for response packets regardless of the interface used for reception and so while SNMP walks work for the interface the Edge designates for SNMP response, it fails for all the others.

  • Fixed Issue 107356: For a customer who has deployed a Non SD-WAN Destination (NSD) with redundant tunnels and Layer 7 (L7) Health Checks activated, a secondary NSD tunnel may not come up for up to 30 seconds after the primary tunnel has gone down.

    For an NSD configured with redundant primary/secondary tunnels, the L7 state for the secondary tunnel is carried forward from instance to instance. If the secondary tunnel comes up and is then terminated, the L7 state may be marked as down to keep the secondary tunnel in the down state.  When the secondary tunnel is later brought up again the L7 health check process may take up to 30 seconds to see the tunnel and resume sending L7 probes that verify the secondary tunnel is up.   

  • Fixed Issue 107550: For a customer who has deployed a Non SD-WAN Destination via an Edge or Gateway, some IPsec encrypted packets may be dropped between the client device and the peer Datacenter.

    The current implementation for IPsec tunnels uses an inner IP address header time-to-live (TTL) value. This implementation does not match the RFC requirement, and the TTL value must be constructed. If the packet originator uses a low TTL value, there is good chance that this packet does not reach the destination.

  • Fixed Issue 107654: A VMware SD-WAN Gateway may experience a Dataplane Service failure, generate a core, and restart to recover when the Transport LAN/VLAN is configured on the Partner Handoff page.

    The Gateway service does not handle this configuration correctly which triggers an exception and a service failure. Going forward the Transport LAN/LAN feature is no longer supported, and an existing configuration should be removed by the user as SD-WAN will remove it anyway.

  • Fixed Issue 107708: On a VMware SD-WAN Edge where the SD-WAN Overlay Rate Limit is configured, the SD-WAN Gateway may not adhere to the limit exactly when downstream traffic flows from the internet to the Edge.

    Traffic flowing from the Internet to the Edge is not rate limited by the Gateway exactly as configured. The SD-WAN Overlay Limit is exceeded by a few Mbps. This happens as the VCMP (management) overhead is not taken into account in the Gateway for calculating the rate limit.

  • Fixed Issue 108374: For a customer enterprise that uses Dynamic Branch-to-Branch and has configured LAN-Side NAT Rules, a route change may cause traffic to a remote LAN to fail.

    On a route change, such as Dynamic Branch-to-Branch tunnels, LAN-Side NAT may not be properly recalculated for existing flows and this causes them to break, impacting traffic destined for a peer Edge LAN subnet.

  • Fixed Issue 108610: For a customer enterprise where Firewall is turned on, when an Edge is upgraded from a 3.x Release to a 4.x Release, the Firewall blocks traffic that was not previously being blocked.

    When the Orchestrator is moved to 4.2.2 in the following sequence: 3.4.4 to 4.0.0 to 4.0.2 to 4.2.2, the Address Group configuration is not reflecting on Edges connected to that Orchestrator that are from a 3.4.4 Edge Release to 4.2.2 Edge Release This is because the Edge expects a different version of the address groups JSON file in 3.4.4 and 4.2.2, the Orchestrator which upgraded in the above sequence does not send the JSON files to the Edges in the new format until there is a configuration change. As a result, address groups configurations do not work.

  • Fixed Issue 108630: On a customer site deployed with an Enhanced High-Availability topology, if the VMware SD-WAN HA Edge in the Standby role uses a PPPoE WAN link, tunnels fail to establish over the PPPoE interface.

    Tunnel fails to come up over PPPoE in Enhanced HA when PPPoE interface is up and running on the Standby Edge. A customer would not notice this issue if the Active is using the PPPoE link, but on an HA failover, traffic would no longer pass on the now Standby Edge using this PPPoE link.

  • Fixed Issue 108739: When a VMware SD-WAN Virtual Edge is upgraded, the Edge may experience multiple issue including being deactivated on the first boot after the Edge software upgrade.

    The issue is caused by the cloud-init data not being preserved during the Edge software upgrade. Any virtual Edge when upgraded will rerun cloud-init, which can have potentially negative side effects.

    For this issue there is no workaround other than avoiding an upgrade completely and deploying a new Virtual Edge instance with the new image or upgrading directly to an image with the fix to preserve cloud-init data (For example, Release 4.5.2 or 5.2.0.).

  • Fixed Issue 108982: For a VMware SD-WAN Edge where ICMP Probes are configured, the customer may observe that the Edge Probes stop working with an INIT state.

    The ICMP probe timer can become corrupted which causes the probe state machine to be stuck in INIT state. The timer corruption is due to a race condition when multiple threads are trying to add/remove/expire the timer.

  • Fixed Issue 109500: When LAN side NAT uses the same inside and outside definition, the first packet of a direct flow is dropped.

    The issue is that the match information in the original NAT table is the same for the initial LAN Side NAT translation and for the direct NAT translation. This causes a conflict in the tables which drops the first packet.

  • Fixed Issue 109511: When the max tunnel count is encountered by a VMware SD-WAN Edge, the EDGE_TUNNEL_CAP_WARNING Event may not be seen on the VMware SASE Orchestrator.

    The Edge is not sending a message to the Orchestrator for a tunnel cap warning when the issue happens for the first time within a 24 hour period.

  • Fixed Issue 109828: On a site deployed with a High Availability topology, the VMware SD-WAN Edge may experience a Dataplane Service failure, generate a core, and restart to recover.

    This issue can be experienced where the Edge is handling a large amount of traffic (defined as a three segment site which advertises and revokes 1k routes for each segment). On an Enhanced HA deployment, where the Standby Edge is also passing traffic, the service failure and restart would disrupt some customer traffic.

  • Fixed Issue 109830: Combinations of certain Point-to-Point Tunneling Protocol (PPTP) VPN Clients and Servers may not be able to immediately reestablish a connection after it is interrupted when using 1:1 NAT with the PPTP server behind the VMware SD-WAN Edge and the client is on the Internet/Cloud.

    This issue has been confirmed to occur with Windows Server 2016 and Windows 10 client but may occur with other versions as well. The issue is caused by the server reusing the same PPTP call-id for the new connection while the client uses a new call-id. When the server call-id is reused for the new connection, the firewall does not identify it as such.

    The same issue may be encountered with the network locations of the client and server swapped (in other words, the client is on the LAN behind the Edge and the PPTP server is on the Internet/Cloud).

    On an Edge without a fix for this issue, using Remote Diagnostics > Flush Flows to flush the stale PPTP connection from the NAT table will temporarily resolve the issue.

  • Fixed Issue 110320: For a customer subscribing to Edge Network Intelligence where Analytics is activated on their enterprise, when the name of a VMware SD-WAN Edge is changed on the VMware SASE Orchestrator, this change may not be reflected in the Edge Network Intelligence UI.

    The Edge Network Intelligence sub module in the Edge does not react to the Edge name change and the result is the name change is not reflected on the Edge Network Intelligence UI.

  • Fixed Issue 110044: On a large scale enterprise where a Datacenter site is deployed with a High Availability topology, the VMware SD-WAN Edge in the Standby role may experience multiple Dataplane Service failures, generating a core each time, and restarting each time.

    Large scale enterprise with a Datacenter HA Edge is understood as a Hub Edge with over 4K Spoke Edges and 500K NAT Entries. When that level of traffic is sent over a period of time, the Standby Edge can experience multiple Edge service failures with restarts each time. The impact in Standard HA is minimal but in Enhanced HA this would impact performance for the traffic using the Standby Edge.

  • Fixed Issue 110456: ICMP probe from a Partner Gateway or Cloud Gateway to a directly attached device may drop packets if the ICMP request code field is not 0.

    Depending on the vendor, some may inspect the code field for an ICMP request packet and deem it not correct if the field is not 0.

  • Fixed Issue 110473: For a customer enterprise using BGP, unexpected routes may be received/sent briefly, and flows may match the wrong Business Policies when a BGP neighbor is removed.

    When a BGP neighbor is removed from the Orchestrator, the inbound/outbound route-maps associated with it are removed first and then the neighbor is removed from the Edge routing configuration. This leads to a momentary leak of routes denied by those route-maps. This in turn affects the business policy behavior if a flow is created with the leaked routes.

    On an Edge without a fix for this issue, a user can run the Remote Diagnostic "Flush Flows" to remediate the issue.

  • Fixed Issue 110564: For a customer site deployed in a High Availability topology, the TCP session used to synchronize data between the Active and Standby Edge may go down. On an Enhanced HA deployment, this results in WAN link traffic is not forwarded on the Standby Edge.

    For either Standard HA or Enhanced HA deployments, there could be a scenario where a child process is using the port intended for TCP sessions between the Active and Standby Edge. In this scenario, the Active Edge cannot bring up the TCP server due to bind errors, and results in the Standby Edge's interface state not being exchanged. For Enhanced HA deployments, the result is that WAN link(s) cannot be used for forwarding traffic.

  • Fixed Issue 110576: For a customer enterprise site deployed with a High Availability topology, when an interface is configured with IPv6, the link local address gets updated for GE1 without any IPv6 address on any interface.

    When High Availability is activated for a site, /proc/sys/net/ipv6/conf/eth1/disable_ipv6 is set to 0 instead of 1.

  • Fixed Issue 110828: A VMware SD-WAN Gateway that is used for Non SD-WAN Destinations via Gateway may experience a Dataplane Service failure, generate a core, and restart to recover.

    When a corrupt packet from an NSD site is received on the Gateway, the Gateway may not process this corrupt packet correctly and trigger the Gateway service failure.

  • Fixed Issue 110962: For a customer enterprise site deployed with a High Availability topology, when the VMware SD-WAN Edge in the role of Standby is either rebooted, restarts, or is connected to the Active Edge, the Active Edge may experience a Dataplane Service failure, generate a core, and restart to recover, which will trigger an HA failover.

    When the Standby Edge reboots, restarts, or is connected to the Active Edge, the Active Edge's service encounters a dead lock that triggers the service failure.

  • Fixed Issue 110970: For a customer who subscribes to Edge Network Intelligence and has one or more sites deployed with a High Availability topology, when Analytics is activated for an HA site, Analytics may not work.

    Due to multiple race conditions, the Edge Network Intelligence thread may assume that the Active Edge is in the Standby role, and this stops Analytics functionality.

  • Fixed Issue 111073: A VMware SD-WAN Edge using Release 4.5.1 may report the wrong interface speed to SNMP.

    ifSpeed is a 32-bit value and if it cannot accommodate the value of the speed given by the Edge in bits per second (bps), then it is advised to refer to ifHighSpeed which gives the value in Mbps.

  • Fixed Issue 111162: In a customer enterprise which uses a Partner Gateway and deploys Edges in High Availability, an HA Edge may have suboptimal routing when a PG route through a secondary Partner Gateway is selected as the best route.

    In the HA Edge, when there are A-A or A-S transitions there is a chance that the order for a PG route from a secondary Partner Gateway is set to 4 thereby becoming the best route. Usually primary Partner Gateway routes have higher order values.

  • Fixed Issue 111314: On a customer enterprise where Dynamic Cost Calculation is activated, traffic drops may be observed.

    This issue can occur if one of the Edges advertises more specific routes to all Edges in the enterprise before going offline. One this Edge is offline, the routes remained in all the other edges FIB (forwarding information base) with Reachability = False, and this results in packet drops.

    On an Edge without a fix for this issue, a user can remediate the issue by manually initiating an Edge Service restart through the Remote Actions menu on the Orchestrator.

  • Fixed Issue 111646: A VMware SD-WAN Gateway under a high CPU load may experience a Dataplane Service failure and restart to recover.

    A user looking in the Gateway generated core would see mutex monitor exception and the message Program terminated with signal SIGXCPU, CPU time limit exceeded message. The issue is related to a Gateway process releasing a lower priority thread lock.

  • Fixed Issue 111765: For a customer enterprise site using an Enhanced High Availability topology where DHCP is configured on the VMware SD-WAN Edge interface, the Edge may fail to a dynamic IP address from the DHCP server.

    In Enhanced HA, the HA Edge can fail to get the dynamic IP address when the DHCP configured interface connected to the Standby Edge is flapped. This can be the result of a Standby Edge restart or reboot, or anything that causes the interface to go down and come back up.

  • Fixed Issue 111788: On a VMware SD-WAN Edge, when FIPS mode is activated, the Gateway may lose contact with the VMware SASE Orchestrator.

    The Management plane process may not start on a VMware SD-WAN Gateway when FIPS mode is activated. The impact is significant because the management process is responsible for sending heartbeats to the Orchestrator and with no heartbeats that communication channel is lost. The result of this is that the Gateway's information will not be synchronized with the Orchestrator and configuration changes made via the Orchestrator are not pushed to the Gateway.

  • Fixed Issue 111840: For a customer enterprise using more than 8 VMware SD-WAN Edges configured as Hubs, users may observe poor traffic performance due to sub-optimal routing.

    When a Spoke Edge is configured with multiple Hub Edges, the route via the Hub is getting preferred over a Branch-to-Branch direct route leading to sub-optimal routing.

    On Edges without a fix for this issue, the customer can configure the Hub Edges first followed by the VPN Hub Edges in the Branch to Hub site list.

  • Fixed Issue 111924: A customer may observe that across all their sites Multi Path traffic (in other words, traffic that traverses the VMware SD-WAN Gateway) is being dropped even though their VMware SD-WAN Edge's tunnels to the Gateway are up and stable.

    There is no limit on the maximum number of times a Gateway can re-transmit a VCMP packet (SD-WAN's management protocol), and such re-transmits can overwhelm low bandwidth links. These re-transmits will also cause packet build-up on the scheduler when the Edge has a low bandwidth link since the re-transmits cannot be drained fast enough. Eventually the scheduler queues become full and lead to the scheduler dropping packets from all Edges. Direct traffic that does not use the Gateway would not be affected by this issue.

    When this issue is encountered on a Gateway without a fix for this issue, the only remediation is for an Operator user to identify the Edges which are causing the packet buildup on the scheduler using the debug.py --qos_dump_net command and block them in the affected Gateway.

  • Fixed Issue 112131: For a customer site deployed with a High Availability topology where the customer has an Edge Network Intelligence license, activating Analytics for the HA site may result in the site experiencing an Active/Active (Split-Brain) state and the Standby Edge rebooting to resolve the state.

    Once Analytics is activated for an HA site, the Edge Network Intelligence service may starve the Edge's Dataplane threads, which delays HA packet processing. This results in both Edges declaring themselves as an Active Edge and SD-WAN has to reboot the Standby Edge to recover the situation. This issue can happen at any time, not just at the initial activation of Analytics. As with any Active/Active situation, this issue is potentially disruptive to a site deployed with Enhanced HA since the Standby is also passing customer traffic through its WAN links.

  • Fixed Issue 112313: For a customer enterprise deployed with a High Availability topology, when examining the Edge logs in a diagnostic bundle, the edge.log would be flooded with 'HAD' logs for two different MAC address updates.

    The issue is the result of cyclic peer MAC address updates between the Active and Standby Edge, due to different peer MAC addresses stored between the Active and Standby Edge. The issues impact is that when any log is flooded with messages like this, troubleshooting for some other problem become difficult because of not only the messages themselves but because the logs will roll over much more quickly.

  • Fixed Issue 112325: On a customer enterprise using BGP for routing and uses multiple segments, when a VMware SD-WAN Edge in their enterprise is either service restarted or rebooted, BGP on non-global segments on the Edge's subinterfaces does not come up.

    If the Edge's main interface on the global segment and the subinterface on the non-global segment have the same IP address, the WAN overlay and BGP route is established only on the main interface after an Edge process restart or reboot.

  • Fixed Issue 112452: A customer may observe that a VMware SD-WAN Edge configures for High Availability is experiencing L2 loops on the Edge's WAN interfaces.

    When an interface is changed from switched to routed, an issue is seen with the MAC address in the origmacs file. If a virtual MAC address is stored for an interface or if there is no original MAC address for the interface stored in the file, the Edge uses the virtual MAC address to send out WAN heartbeats leading to a L2 loop being detected.

    On an HA Edge without a fix for this issue, the workaround is to have Support delete the origmacs file and then reboot first the Standby HA Edge, and then the Active HA Edge.

  • Fixed Issue 112882: When a VMware SD-WAN Edge is upgraded to a 4.5.x Release, SNMP may stop working on the Edge.

    Any changes to the entries SNMP segnat table were not having corresponding "Update" API invocation. This was leading to a corresponding destination IP address/port getting stuck.

  • Fixed Issue 113153: For a customer enterprise deployed with a High Availability topology, the VMware SD-WAN Edge in the Active role may experience a Dataplane Service failure which will trigger an HA failover, when the Standby Edge is restarted or rebooted.

    When the Active Edge experiences a Dataplane Service failure will also trigger an HA failover. The issue can be encountered when the HA site is forwarding a large amount of traffic.

  • Fixed Issue 114004: A customer may observe that a VMware SD-WAN Edge experiences a memory leak if SNMP is configured on the Edge.

    The Edge memory leak is slow but if it is not addressed it will reach a critical level of memory usage that results in the Edge defensively triggering a service restart to clear the memory. This restart can disrupt customer traffic for 15-30 seconds while the Edge recovers and without a fix for this issue, the memory leak will simply start over.

  • Fixed Issue 114084: For a customer who has configured a Zscaler-type Cloud Security Service (CSS) with L7 Health Check for a VMware SD-WAN Edge, when updating the Zscaler Cloud Server on the VMware SASE Orchestrator, the updated details are not applied to the Edge.

    Despite the Orchestrator showing the new Zscaler Cloud Server configuration, the Edge and Gateway do not send either traffic or the L7 probes through this new server but the old Zscaler server.

  • Fixed Issue 114288: A VMware SD-WAN Edge may experience a Dataplane Service failure, generate a core file, and restart to recover.

    The Edge process that counts link object references can reach zero inadvertently and destroy the object and this triggers the Edge service failure.

  • Fixed Issue 114511: For an auto-discovered WAN link, unchecking the Path MTU Discovery option does not work, and the Edge continues to implement the feature and resize the MTU.

    The configuration is processed during an update for auto-discovered links and this specific configuration for Path MTU was not processed during the configuration update. 

  • Fixed Issue 114671: On a customer site deployed with a High Availability topology where one or more Cloud Security Services is also being used, when there is an HA failover the traffic going via CSS fails to resume post-failover.

    By design SD-WAN blocks traffic on the Standby Edge, and the CSS assumes that the primary tunnel is down and tries to establish the backup tunnel. The result post-failover is that both primary and backup tunnels are up, and this leads to CSS traffic taking the backup tunnel instead of primary tunnel which causes asymmetric traffic forwarding.

  • Fixed Issue 114854: When a packet capture is performed on a VMware SD-WAN Edge model 610 where a WAN link is configured to use a VLAN tag, this VLAN tag is missing for return traffic associated with that WAN link.

    The lack of the VLAN tag makes troubleshooting network issues more difficult and is specific to the Edge 610 with DPDK enabled.

  • Fixed Issue 115078: On a large scale customer enterprise with ~16K users and a flow creation rate of ~2K per second on Hub Edges, users can experience poor traffic quality due to high latency.

    The Gateway's Deep Packet Inspection (DPI) engine can be overloaded on Hub Edges when a high number of peer initiated flows are created. This is due to the port cache being populated with Source IP address and port # instead of the Destination IP address and port # for these peer flows.

  • Fixed Issue 115136: A customer may observe a gradual memory usage increase on a VMware SD-WAN Edge in a customer enterprise that uses BGP for routing.

    The Edge's BGP daemon is causing a gradual memory leak on the Edge over several days and can do this even when BGP is not configured for that Edge. If the memory leak continues for a sufficient period to bring the Edge's memory usage beyond the critical threshold of 60% of available RAM for more than 90 seconds, the Edge will defensively restart its service to clear the leak which can result in customer traffic disruption for 10-15 seconds. The only remediation without an Edge fix is to restart the BGP process by terminating it, or preemptively perform an HA failover/Edge service restart in a suitable service window.

  • Fixed Issue 115148: When a customer deploys a Cloud Security Service with redundant tunnels (in other words, active and standby) and L7 Health Check is toggled on, if the primary CSS tunnel goes and then comes back up, the standby CSS tunnel may remain up.

    If the standby tunnel is up when L7 Health Check is toggled on and then this feature is toggled off prior to the primary CSS tunnel coming back up, the standby tunnel can remain in an up state indefinitely. The cause of the issue is that even though L7 Health Check is toggled off, the Gateway will check the state of L7 for the primary tunnel and its state will read as down, and as a result the Gateway concludes the primary tunnel is down and will keep the standby tunnel up.

    On an Edge without a fix for this issue, if a user performs a Remote Actions > Edge Service Restart, this will resolve the issue for that location.

  • Fixed Issue 115276: A customer may observe increased memory usage on a VMware SD-WAN Edge when UDP transit flows are sent at a high rate.

    Flow creation is handled asynchronously on Edges. When multiple packets of the same transit flow are enqueued to the Edge's flow create service, the flow objects get leaked resulting in the memory leak. If the memory leak continues for a sufficient period to bring the Edge's memory usage beyond the critical threshold of 70% of available RAM, the Edge will defensively restart its service to clear the leak which can result in customer traffic disruption for 10-15 seconds.

  • Fixed Issue 115604: A VMware SD-WAN Edge or Gateway may experience a Dataplane Service failure and generate a core with an Assert in the logging.

    When an Edge or Gateway processes a corrupted packet, the software can hit an assert where actual user packet length is more than internal packet buffer. The Gateway is expected to drop this kind of packet and prevent it from being sent to the Edge, but instead processes it and this results in the service failure and restart.

  • Fixed Issue 116589: When a VMware SD-WAN Edge is upgraded from Release 3.x to a 4.x or newer release, there may be cases where the configured address groups are not parsed.

    In release 3.x there is no configuration to set isakmpLifeMins and ipsecLifeMins. So when an Orchestrator is running 3.x it will send an Edge Property configuration file without isakmpLifeMins and ipsecLifeMins.

    When the Orchestrator is upgrade to 4.x+ it will have the fields isakmpLifeMins and ipsecLifeMins, but this configuration is not pushed to the Edge 3.x Edge. If the Edge is also upgraded from 3.x to 4.x+ the Edge will start coming up with the last known good configuration (which does not include isakmpLifeMins and ipsecLifeMins).

    When the Edge comes up it will find out that isakmpLifeMins and ipsecLifeMins, which are mandatory parameters, are not present so it will stop processing the Edge property configuration file further meaning after this specific configuration none of the other configurations in these files like address groups will not be processed. As a result, the address group configuration is not present.

  • Fixed Issue 116641: VMware SD-WAN Edge logs contain route lookup failure logs with error reason as "None".

    When traffic is failing, sometimes customer may see route lookup failure logs with error reason as "None", which provides no value in troubleshooting the issue. The fix provides an actual reason which assists the user in troubleshooting the issue.

    For example, where previously the failure log would like this:

    20:40:41.796089,856|7|6825/10072|edged_ipv4_route_lookup_vcmp_transit:6880 Route lookup failure for tuple src_ip=10.0.1.25 dst_ip=169.254.6.18 segment_id=0 lookup failed due to [None]

    In Release 4.5.2 it looks like this (bold part is change):

    04:07:35.464563,968|7|9720/1917413|edged_ipv4_route_lookup_vcmp_transit:6958 Route lookup failure for tuple src_ip=10.0.1.25 dst_ip=169.254.6.18 segment_id=0 sroute=(nil) droute=(nil) lookup failed due to [No src and dst route]

  • Fixed Issue 116916: SD-WAN Management (VCMP) paths may go down after the addition or deletion of an IPv6 kernel default route to any destination via a VMware SD-WAN Edge interface which is used by VCMP.

    VCMP paths goes down upon addition or deletion of a IPv6 kernel default route involving any interface used by VCMP to form paths with other nodes (In other words, Edges or Gateways). On an Edge without a fix for this issue, there is no workaround for this issue beyond removing the IPv6 route and restarting the Edge service.

  • Fixed Issue 117943: On a VMware SD-WAN Edge with Wi-Fi capability, the automatic channel selection for some countries can result in the Edge choosing channels which result in either poor Wi-Fi performance or even complete failure of Wi-Fi to come up.

    In some countries like Great Britain, Wi-Fi takes a long time to come up when configured for the 5GHz band (or can even fail to come up). The automatic channel selection for the 5GHz band can end up selecting inappropriate channels in certain countries - either very low-power channels, or channels that require radar detection (which can delay or fail the startup).

    On an Edge without a fix for this issue, when selecting the 5GHz band in a European country, configure the radio channel to either 36, 40, 44 or 48 explicitly (instead of leaving it as "auto").

  • Fixed Issue 118591: For a customer site deployed with an Enhanced High Availability topology, a VMware SD-WAN Edge in the Standby role may have a WAN interface flap frequently.

    In Enhanced HA when a high number of flows are sent or a high number of routes installed, The Standby Edge WAN interface state can be moved from UP to DOWN and back to UP.

  • Fixed Issue 119491: For a VMware SD-WAN Edge where Edge Network Intelligence Analytics is activated, the customer may observe a gradual increase in Memory Usage on the Edge.

    The specific scenario is an Edge where Analytics is activated and is also receiving RADIUS traffic, in that instance an Edge memory leak can happen. If the memory leak continues for a sufficient period to bring the Edge's memory usage beyond the critical threshold of 70% of available RAM, the Edge will defensively restart its service to clear the leak which can result in customer traffic disruption for 10-15 seconds.

Orchestrator Resolved Issues

Important:

Release 4.5.2 includes all Orchestrator fixed issues listed in the 4.5.1 Release Notes.

Resolved in Orchestrator Version R452-20230730-GA

Orchestrator build R452-20230730-GA was released on 08-03-2023 and is the 1st Orchestrator rollup for Release 4.5.2.

This Orchestrator rollup build addresses the below critical issues since the original Orchestrator GA build, R452-20230629-GA.

  • Fixed Issue 64145: A customer may not be able to configure Partner Gateway handoffs on the VMware SASE Orchestrator.

    As part of an earlier change to the Orchestrator, the handoff configurations under the "gatewayList" key can be unintentionally deleted as a byproduct of calling the "updateConfigurationModule" if the Partner Gateway was first deployed on an older Orchestrator build and had a legacy key inside of its handoff configuration that the Orchestrator UI no longer uses.

  • Fixed Issue 119080: For a customer site using a High Availability topology where Alerts for HA Failovers are also configured, whenever the site performs an HA failover, the VMware SASE Orchestrator sends multiple HA Failover alerts.

    When an HA failover happens, the Orchestrator does not update the HA Edge serial number change until after the new HA Edge takes over as Active and sends the HA State as active message. Up until this message is sent, multiple HA Failover alerts are triggered which causes confusion for the customer.

Resolved in Orchestrator Version R452-20230629-GA

Orchestrator Version R452-20230629-GA was released on 07-06-2023 and resolves the following issues since Orchestrator Version R451-20220831-GA.

  • Fixed Issue 75818: When a user generates a report on the VMware SASE Orchestrator, the link included in the email is misleading.

    The text used in the emails the Orchestrator sends for reports reads: "Click here to view your reports" which creates the expectation that clicking on that link will take the user to the reports section. However the user is required to first log onto the Orchestrator and then navigate to the Reporting screen to retrieve the report.

  • Fixed Issue 70987: A user may be unable to delete a VMware SD-WAN Edge from the VMware SASE Orchestrator.

    The Edges are offline but not being updated to disconnected but instead stay in a degraded status and thus not eligible for deletion.

  • Fixed Issue 75474: Customers may observe VMware SD-WAN Edges in their inventory for which they had performed an RMA.

    Maestro users do not have the privilege to delete Edges from a customer's inventory and this resulted in Edges remaining in a customer's inventory even though they had been returned.

  • Fixed Issue 84064: For a customer who is a deploying Microsoft Azure Virtual Hub, they have the option of configuring BGP on the VMware SASE Orchestrator.

    BGP is not officially supported, on "Microsoft Azure Virtual Hub", which is Automated. But users are allowed to configure BGP on the Orchestrator and should a user configure Azure automation with BGP, tunnels will go down to the Gateway and the Azure site will not pass traffic.

  • Fixed Issue 84152: When a customer generates a Top Talkers report for their enterprise, the Top Talker names may be listed as 'Unknown'.

    "Top Talkers" are the top sources from all the flows in a given time range. The Top Talker name may not show if the client device is not present for the (Source IP + MAC Address) unique pair.  This happens because the client devices are saved based on which Visibility Mode (IP Address or MAC Address) is configured for the VMware SD-WAN Edge.  For example, an Orchestrator may save a device for (IP Address 1, MAC Address 1) and then the (IP Address 2, MAC Address 2) record is not saved if Visibility Mode is set to IP Address. This would lead to the Top Talker corresponding to IP Address 2/MAC Address1 being marked as 'Unknown'.

  • Fixed Issue 86189: On a VMware SASE Orchestrator deployed with a Disaster Recovery (DR) topology, a user may get emails for recurring reports which have already been deleted by them.

    The Standby Orchestrator instance would have the recurring reports job running on its instance. This means that even though the Active Orchestrator is not generating these reports, the Standby Orchestrator is generating them. This symptom is observed when the user has set up emailing for every report generation job.

  • Fixed Issue 93846: For Partners and Operators managing Edge inventory using ZTP, when a user tries to reassign a VMware SD-WAN Edge to a different customer that was previously assigned to one customer and then deleted, the VMware SASE Orchestrator returns the error "Edge is not found."

    The Orchestrator determines the Edge does not exist because it does not see the logical Edge after it is deleted from a customer enterprise and a user is unable to reassign it to another customer.

  • Fixed Issue 94169: For a VMware SASE Orchestrator may experience significant performance issues including the disk becoming full because it is no longer processing files.

    The issue can be encountered where an Orchestrator file, after being processed successfully, does not get deleted from the file processing queue. The entry is eventually picked up for processing after a suitable time has passed by, although the file is not present on disk. This scenario is not handled well by the file processing workers, and they get stuck with an uncaught exception. These workers are VcoQueue workers who have a configured timeout of 900 seconds and once stuck, nothing is processed by this worker leading to processing delays and file build up on the Orchestrator's disk.

    The overall cause is a lack of proper error handling in all stages of the pipeline where the Orchestrator is processing files stored on disk and if the files are not present it should catch them properly, increment a metric and continue execution without stalling.

  • Fixed Issue 94610: When a user initiates a forced High Availability failover through Remote Actions > Force HA Failover, the VMware SASE Orchestrator may not generate and send an Alert for the HA failover.

    Since the HA failover is forced by the Orchestrator, both the Active and Standby Edge anticipate the failover, and this can cause both the HA_GOING_ACTIVE and HA_READY messages to be sent in the same heartbeat by the HA Edge. If the ‘HA State’ sent in the heartbeat shows as ‘Ready,’ this fools the Orchestrator into not generating an Alert for the failover because it only sees this 'Ready' message and does not see the 'Going Active' one.

  • Fixed Issue 95519: When a customer applies a filter to Path Stats, the result never exceeds 100 even though the actual number of results exceeds that number.

    The issue is the result of an incorrect query string being constructed by making an API call from the Orchestrator UI console using a higher limit in it. The default limit is 100, so whatever filter is being applied from the UI is applied in the backend on a 'limited' result set, the filter should be applied before the limit and not the other way around. 

  • Fixed Issue 95777: When upgrading a VMware SASE Orchestrator the i18n service may not come up after the upgrade is complete.

    The issue is that the upgrade script will stop the i18n service during the upgrade but does not restart it after the upgrade.

    The Operator user can still restart i18n service manually with the command: systemctl start i18n-service.

  • Fixed Issue 95847: When a VMware SASE Orchestrator is upgraded, the Operator performing the upgrade may observe the schema upgrade was not successful and must be manually rerun.

    When an Orchestrator is upgraded to a version with a new ClickHouse schema (including 4.5.1) there is the potential for a race condition on the backend and the old schema version is not up and ready prior to being upgraded. As a result, the Operator needs to manually rerun the schema upgrade.

  • Fixed Issue 95922: A Partner Administrator with a Superuser role cannot manage Gateway migration activities for customers created after the VMware SASE Orchestrator is upgraded to 4.5.0 or 4.5.1.

    The Partner Superuser sees no Gateway migration actions for a customer created after the Orchestrator is upgraded.

  • Fixed Issue 98007: A user's attempts to change the Edge profile on a VMware SASE Orchestrator may fail with a UI displaying a validation error that does not assist in debugging the issue.

    In the fixed version, for debugging purposes the Orchestrator prints the error object to get the stack trace for Edge profile switch UI validation.

  • Fixed Issue 98518: If a user removes a VMware SD-WAN Gateway from a Gateway Pool where no customers are assigned, customers who use Partner Gateways may observe that their Partner Handoff Configurations have also been removed for multiple Gateways.

    When a Gateway is removed from a Gateway Pool, the Orchestrator checks for handoffs and is erroneously viewing some handoffs as not in use when they are. This results in the Orchestrator unsetting and then overriding the handoff configurations for that Gateway because of the erroneous understanding that there are no handoff configurations in use.

  • Fixed Issue 100110: A VMware SASE Orchestrator's disk may stop storing data for Edge statistics.

    The volume used for data.export runs out of space quickly because PATHSTATS and HEALTHSTATS files and directories are held beyond their retention period instead of being deleted.

    The workaround in the absence of a fix is to manually set up a cron job to delete the directories and files older than the retention period.

  • Fixed Issue 101449: If a user configures more than 32 subinterfaces on a VMware SD-WAN Edge using Release 4.3.x or later, the Orchestrator will throw an error and prevent the configuration from being applied.

    The restriction is designed to protect a customer enterprise that has Edges running a release below 4.3.x (for example, 4.2.2 or 3.4.6), and more than 32 subinterfaces is not supported. With this change, the Orchestrator will allow more than 32 subinterfaces to be configured and the customer will be warned to only do this if the Edge is using Release 4.3.0 or later.

  • Fixed Issue 111946: A user cannot see the paths on the Edge > Monitor > Paths tab on a VMware SASE Orchestrator when the peer list is greater than 100.

    When a user navigates to the Edge > Monitor > Paths tab, the Orchestrator's backend returns all records even if there are more than 100. This is because the backend omits the limit constraint, which is the maximum number of records that should be returned. The records that are returned after the limit count are unnormalized, meaning that they are not formatted in a way that is compatible with the UI. This causes an error in the UI. The Orchestrator should only return the records that are within the submitted limit.

  • Fixed Issue 116023: The VMware SASE Orchestrator default certificate authority (CA) expired on May 28th, 2023.

    The default CA loaded from data in development environments expired on 5/28/2023.  Release 4.5.2 updates the certificate, the key, and references to the old CA.

Known Issues

Open Issues in Release 4.5.2

Edge/Gateway Known Issues

  • Issue 14655:

    Plugging or unplugging an SFP adapter may cause the device to stop responding on the Edge 540, Edge 840, and Edge 1000 and require a physical reboot.

    Workaround: The Edge must be physically rebooted. This may be done either on the Orchestrator using Remote Actions > Reboot Edge, or by power-cycling the Edge.

  • Issue 25504:

    Static route costs greater than 255 may result in unpredictable route ordering.

    Workaround: Use a route cost between 0 and 255

  • Issue 25595:

    A restart may be required for changes to static SLA on a WAN overlay to work properly.

    Workaround: Restart Edge after adding and removing Static SLA from WAN overlay.

  • Issue 25742:

    Underlay accounted traffic is capped at a maximum of the capacity towards the VMware SD-WAN Gateway, even if that is less than the capacity of a private WAN link which is not connected to the Gateway.

  • Issue 25758:

    USB WAN links may not update properly when switched from one USB port to another until the VMware SD-WAN Edge is rebooted.

    Workaround: Reboot the Edge after moving USB WAN links from one port to another.

  • Issue 25855:

    A large configuration update on the Partner Gateway (e.g. 200 BGP-enabled VRFs) may cause latency to increase for approximately 2-3 seconds for some traffic via the VMware SD-WAN Gateway.

    Workaround: No workaround available.

  • Issue 25921:

    VMware SD-WAN Hub High Availability failover takes longer than expected (up to 15 seconds) when there are three thousand branch Edges connected to the Hub.

  • Issue 25997:

    The VMware SD-WAN Edge may require a reboot to properly pass traffic on a routed interface that has been converted to a switched port.

    Workaround: Reboot the Edge after making the configuration change.

  • Issue 26421:

    The primary Partner Gateway for any branch site must also be assigned to a VMware SD-WAN Hub cluster for tunnels to the cluster to be established.

  • Issue 28175:

    Business Policy NAT fails when the NAT IP overlaps with the VMware SD-WAN Gateway interface IP.

  • Issue 31210:

    VRRP: ARP is not resolved in the LAN client for the VRRP virtual IP address when the VMware SD-WAN Edge is primary with a non-global CDE segment running on the LAN interface.

  • Issue 32731:

    Conditional default routes advertised via OSPF may not be withdrawn properly when the route is turned off. Reenabling and then deactivating the route will retract it successfully.

  • Issue 32960:

    Interface “Autonegotiation” and “Speed” status might be displayed incorrectly on the Local Web UI for activated VMware SD-WAN Edges.

  • Issue 32981:

    Hard-coding speed and duplex on a DPDK-enabled port may require a VMware SD-WAN Edge reboot for the configurations to take effect as it requires turning off DPDK.

  • Issue 34254:

    When a Zscaler CSS is created and the Global Segment has FQDN/PSK settings configured, these settings are copied to Non-Global Segments to form IPsec tunnels to a Zscaler CSS.

  • Issue 35778:

    When there are multiple user-defined WAN links on a single interface, only one of those WAN links can have a GRE tunnel to Zscaler.

    Workaround: Use a different interface for each WAN link that needs to build GRE tunnels to Zscaler.

  • Issue 36923:

    Cluster name may not be updated properly in the NetFlow interface description for a VMware SD-WAN Edge which is connected to that Cluster as its Hub.

  • Issue 38682:

    A VMware SD-WAN Edge acting as a DHCP server on a DPDK-enabled interface may not properly generate “New Client Device" events for all connected clients.

  • Issue 38767:

    When a WAN overlay that has GRE tunnels to Zscaler configured is changed from auto-detect to user-defined, stale tunnels may remain until the next restart.

    Workaround: Restart the Edge to clear the stale tunnel.

  • Issue 39134:

    The System health statistic “CPU Percentage” may not be reported correctly on Monitor > Edge > System for the VMware SD-WAN Edge, and on Monitor > Gateways for the VMware SD-WAN Gateway.

    Workaround: Users should use handoff queue drops for monitoring Edge capacity not CPU percentage.

  • Issue 39374:

    Changing the order of VMware SD-WAN Partner Gateways assigned to a VMware SD-WAN Edge may not properly set Gateway 1 as the local Gateway to be used for bandwidth testing.

  • Issue 39608:

    The output of the Remote Diagnostic “Ping Test” may display invalid content briefly before showing the correct results.

  • Issue 39624:

    Ping through a subinterface may fail when the parent interface is configured with PPPoE.

  • Issue 39753:

    Turning off Dynamic Branch-to-Branch VPN may cause existing flows currently being sent using Dynamic Branch-to-Branch to stall.

  • Issue 40096:

    If an activated VMware SD-WAN Edge 840 is rebooted, there is a chance an SFP module plugged into the Edge will stop passing traffic even though the link lights and the VMware SD-WAN Orchestrator will show the port as 'UP'.

    Workaround: Unplug the SFP module and then replug it back into the port.

  • Issue 40421:

    Traceroute is not showing the path when passing through a VMware SD-WAN Edge with an interface configured as a switched port.

  • Issue 42278:

    For a specific type of peer misconfiguration, the VMware SD-WAN Gateway may continuously send IKE init messages to a Non-SD-WAN peer. This issue does not disrupt user traffic to the Gateway; however, the Gateway logs will be filled with IKE errors, and this may obscure useful log entries.

  • Issue 42388:

    On a VMware SD-WAN Edge 540, an SFP port is not detected after deactivating and then reenabling the interface from the VMware SD-WAN Orchestrator.

  • Issue 42872:

    Enabling Profile Isolation on a Hub profile where a Hub cluster is associated does not revoke the Hub routes from the routing information base (RIB).

  • Issue 43373:

    When the same BGP route is learnt from multiple VMware SD-WAN Edges, if this route is moved from preferred to eligible exit in the Overlay Flow Control, the Edge is not removed from the advertising list and continues to be advertised.

    Workaround: Enable distributed cost calculation on the VMware SD-WAN Orchestrator.

  • Issue 44995:

    OSPF routes are not revoked from VMware SD-WAN Gateways and VMware SD-WAN Spoke Edges when the routes are withdrawn from the Hub Cluster.

  • Issue 45189:

    With source LAN side NAT is configured, the traffic from a VMware SD-WAN Spoke Edge to a Hub Edge is allowed even without the static route configuration for the NAT subnet.

  • Issue 45302:

    In a VMware SD-WAN Hub Cluster, if one Hub loses connectivity for more than 5 minutes to all of the VMware SD-WAN Gateways common between itself and its assigned Spoke Edges, the Spokes may in rare conditions be unable to retain the hub routes after 5 minutes. The issue resolves itself when the Hub regains contact with the Gateways.

  • Issue 46053:

    BGP preference does not get auto-corrected for overlay routes when its neighbor is changed to an uplink neighbor.

    Workaround: An Edge Service Restart will correct this issue.

  • Issue 46216:

    On a Non SD-WAN Destinations via Gateway or Edge where the peer is an AWS instance, when the peer initiates Phase-2 re-key, the Phase-1 IKE is also deleted and forces a re-key. This means the tunnel is torn down and rebuilt, causing packet loss during the tunnel rebuild.

    Workaround: To avoid tunnel destruction, configure the Non SD-WAN Destinations via Gateway/Edge or CSS IPsec rekey timer to less than 60 minutes. This prevents AWS from initiating the re-key.

  • Issue 46391:

    For a VMware SD-WAN Edge 3800, the SFP1 and SFP2 interfaces each have issues with Multi-Rate SFPs (i.e. 1/10G) and should not be used in those ports.

    Workaround: Please use single rate SFP's per the KB article VMware SD-WAN Supported SFP Module List (79270). Multi-Rate SFPs may be used with SFP3 and SFP4.

  • Issue 47084:

    A VMware SD-WAN Hub Edge cannot establish more than 750 PIM (Protocol-Independent Multicast) neighbors when it has 4000 Spoke Edges attached.

  • Issue 47664:

     In a Hub and Spoke configuration where Branch-to-Branch via Hub VPN is not enabled, trying to U-turn Branch-to-Branch traffic using a summary route on an L3 switch/router will cause routing loops.

    Workaround: Configure Cloud VPN to enable Branch-to-Branch VPN and select “Use Hubs for VPN”.

  • Issue 47681:

    When a host on the LAN side of a VMware SD-WAN Edge uses the same IP as that Edge’s WAN interface, the connection from the LAN host to the WAN does not work.

  • Issue 48166:

    A VMware SD-WAN Virtual Edge on KVM is not supported when using a Ciena virtualization OS and the Edge will experience recurring Dataplane Service Failures.

  • Issue 48502:

    In some scenarios, a VMware SD-WAN Hub Edge being used to backhaul internet traffic may experience a Dataplane Service Failure due the improper handling of backhaul return packets.

  • Issue 48530:

    VMware SD-WAN Edge 6x0 models do not perform autonegotiation for triple speed (10/100/1000 Mbps) copper SFP's.

    Workaround: Edge 520/540 supports triple speed copper SFPs, but this model has been marked for End-of-Sale by Q1 2021.

  • Issue 48597: Multihop BGP neighborship does not stay up if one of the two paths to the peer goes down.

    If there is a Multihop BGP neighborship with a peer to which there are multiple paths and one of them goes down, user will notice that the BGP neighborship goes down and does not come up using the other available path(s). This includes the Local IP-loopback neighborship case too.

    Workaround: There is no workaround for this issue.

  • Issue 48666:

    IPsec-fronted Gateway Path MTU calculation does not account for 61 Byte IPsec overhead, resulting in higher MTU advertisement to LAN client and subsequent IPsec packet fragmentation.

    Workaround: There is no workaround for this issue.

  • Issue 50518:

    On a VMware SD-WAN Gateway where PKI is enabled, if >6000 PKI tunnels attempt to connect to the Gateway, the tunnels may not all come up because inbound SAs do not get deleted.

    Note:

    Tunnels using pre-shared key (PSK) authentication do not have this issue.

  • Issue 51436: For a site using an Enhanced High Availability topology while deploying a VMware SD-WAN Edge using an LTE modem, if the site gets into a "split-brain" state, the HA failover takes ~5-6 minutes.

    As part of the recovery from a split-brain state, the LAN ports are brought down on the Active Edge, and this impacts LAN traffic during the time the ports are down and until the site can recover.

    Workaround: There is no workaround for this issue.

  • Issue 53219: After a VMware SD-WAN Hub Cluster rebalances, a few Spoke Edges may not have their RPF interface/IIF set properly.

    On the affected Spoke Edges, multicast traffic will be impacted. What happens is that after a cluster rebalance, some of the Spoke Edge fail to send a PIM join.

    Workaround: This issue will persist until the affected Spoke Edge has an Edge Service restart.

  • Issue 53337: Packet drops may be observed with an AWS instance of a VMware SD-WAN Gateway when the throughput is above 3200 Mbps.

    When traffic exceeds a throughput above 3200 Mbps and a packet size of 1300 bytes, packets drops are observed at RX and at IPv4 BH handoff.

    Workaround: There is no workaround for this issue.

  • Issue 53359: BGP/BFD session may fail during some DDoS attack scenarios.

    If traffic is flooded from the client connected to the routed interface to the LAN client, the BGP/BFD session can fail. Also when real-time high priority traffic is flooded to the overlay destination, the BGP/BFD session can fail.

    Workaround: There is no workaround for this issue.

  • Issue 53934: In an enterprise where a VMware SD-WAN Hub Cluster is configured, if the primary Hub has Multihop BGP neighborships on the LAN side, the customer may experience traffic drops on a Spoke Edge when there is a LAN side failure or when BGP is not enabled on all segments.

    In a Hub cluster, the primary Hub has Multihop BGP neighborship with a peer device to learn routes. If the physical interface on the Hub by which BGP neighborship is established, goes down, then BGP LAN routes may not become zero despite BGP view being empty. This may cause Hub Cluster rebalancing to not happen. The issue may also be observed when BGP is not enabled for all segments and when there are one or more Multihop BGP neighborships.

    Workaround: Restart the Hub which had the LAN-side failure (or where BGP is not enabled).

  • Issue 57210: Even when a VMware SD-WAN Edge is working normally and is able to reach the internet, the LED in the Local UI's Overview page shows as "Red".

    The Edge's Local UI determines the Edge's connectivity by whether it can resolve a well-known name via Google's DNS resolver (8.8.8.8). If it cannot do so for any reason, then it thinks it is offline and shows the LED as red.

    Workaround: There is no workaround for this issue, except to ensure that DNS traffic to 8.8.8.8 can reach the destination and be resolved successfully.

  • Issue 61543: If more than one 1:1 NAT rule is configured on different interfaces with the same Inside IP, the inbound traffic can be received on one interface and the outbound packets of the same flow can be routed via different interface.

    For the NAT flows from Outside to Inside, the 1:1 NAT rules will be matched against the Outside IP and the interface where the packets are received. For the outbound packets of the same flow, the VMWare SD-WAN Edge will try to match the NAT rules again comparing the Inside IP and the outbound traffic can be routed via the interface configured in the first matching rule with "Outbound Traffic" enabled.

    Workaround: There is no workaround for this issue outside of ensuring no more than one 1:1 NAT rule is configured with a particular Inside IP address.

  • Issue 63629: In a network topology where the VMware SD-WAN Hub Edge and Spoke Edge have different IP family preferences (in other words, IPv4/IPv6 dual stack), the customer can see more bandwidth allocated to the peer than expected.

    If both families IPv4 and IPv6 are enabled, the Edge internally creates two different link objects. The bandwidth values are added for both of them when it should be added only for one.

    Workaround: The workaround for this issue is to not have different tunnel preferences if Hub/Spoke topology have dual stack enabled.

  • Issue 64627: A VMware SD-WAN Gateway may experience a Dataplane Service restart, causing 3-5 seconds of traffic disruption.

    When there are a large number of sub-paths configured on a VMware SD-WAN Edge's WAN links or there are frequent flaps of the VeloCloud Management Plane (VCMP) tunnels, it may lead to the exhaustion of the counters and the ultimate Dataplane Service restart of the Gateway.

    Workaround: There is no workaround for this issue.

  • Issue 65560: Traffic from a customer to PE (Provider Edge) device fails.

    BGP neighborship between a Partner Gateway and Provider Edge does not get established when tag-type is selected as "none" on the handoff configuration. This is because ctag, stag values get picked from /etc/config/gatewayd instead of the handoff configuration on the Orchestrator when tag-type is "none".

    Workaround: Update the ctag, stag values to 0 each under vrf_vlan->tag_info in /etc/config/gatewayd. Do a vc_procmon restart.

  • Issue 67879: A Cloud Security Service (CSS) tunnel is deleted after a user changes a WAN Overlay setting from auto-detect to user-defined on a WAN interface setting.

    After saving the changes, the CSS tunnels do not come back up until the customer takes down and then puts back up the tunnel. Changing the WAN configuration will bring down the CSS tunnel and parse the CSS setup again. However, in some corner cases, the nvs_config->num_gre_links is 0 and the CSS tunnel fails to come up.

    Workaround: Deactivate and then reenable the CSS setup will bring the CSS tunnel up.

  • Issue 68057: DHCPv6 release packet is not sent from the VMware SD-WAN Edge on the changing of a WAN interface address mode from DHCP stateful to static IPv6 address and the lease remains active till reaching its valid time.

    The DHCPv6 client possesses a lease which it does not release when the configuration change is done. The lease remains valid till its lifetime expires in the DHCPv6 server and is deleted.

    Workaround: Without the fix, there is no way of remediating this issue as the lease would remain active till valid lifetime.

  • Issue 68851: If a VMware SD-WAN Edge and VMware SD-WAN Gateway each have the same TCP syslog server configured, the TCP connection is not established from the Edge to the syslog server.

    If the Edge and Gateway each have the same TCP server and if the syslog packets from the Edge are routed via the Gateway, the syslog server sends a TCP reset to the Edge.

    Workaround: Send the syslog packets direct from the Edge instead of routing via a Gateway or configure a different syslog server for the Edge and Gateway.

  • Issue 69284: For a site using a High Availability topology where the Edges deploy VNF's in an HA configuration and are using Release 4.x, if these HA Edges are downgraded to a 3.4.x Release where HA VNF's are not supported, and then upgraded to 4.5.0, when the HA VNF's are reenabled, the Standby Edge VNF will not come up.

    The VNF state on the Standby Edge is communicated as down via SNMP. If the HA VNF pair is downgraded from a version supporting VNF-HA (release 4.0+) to a release which does not support it with VNF enabled on the Orchestrator. This issue will be seen when the Edge is upgraded back to a version supporting VNF-HA and it is enabled on the Orchestrator again.

    Workaround: VNF should first be deactivated in the case of an HA if the Edge is being downgraded to a version which does not support it.

  • Issue 69562: Traffic failure is observed on a VMware SD-WAN Gateway when both Partner Gateway-BGP and Non SD-WAN Destination-BGP have the same local Autonomous System Number (ASN).

    When both PG-BGP and NSD-BGP have the same local ASN and an NSD-BGP learned route is redistributed into a PG-BGP, the ASN will get prepended twice on the route before advertisement. This may cause some other BGP route to get preferred over this one due to a shorter AS path, possibly causing traffic matching the wrong route.

    Workaround: Without this fix, the workaround to this issues is to have different ASN's for PG-BGP and NSD-BGP.

  • Issue 71719: PPTP Connection is not Established along Edge to Cloud path.

    Connection to the PPTP server behind the VMware SD-WAN Edge does not get established.

    Workaround: There is no workaround for this issue, not even an Edge restart or reboot.

  • Issue 72358: If the IP address of a VMware SD-WAN Orchestrator DNS name changes, the VMware SD-WAN Gateway's management plane process fails to resolve it properly and the Gateway will be unable to connect to the Orchestrator.

    The Gateway's management process periodically checks the DNS resolution of the Orchestrator's DNS name, to see if it has changed recently so that the Gateway can connect to the right host. The DNS resolution code has an issue in it so that all of these resolution checks fail, and the Gateway will keep using the old address and thus no longer be able to connect to the Orchestrator.

    Workaround: Until this issue is resolved, an Operator User should not change the IP address of the Orchestrator. If the Orchestrator's IP address must be changed, all Gateways connecting to that Orchestrator will have to be reactivated.

  • Issue 82415: A VMware SD-WAN Gateway deployed a KVM image with Intel® Ethernet Server Adapter X710; SR-IOV; and Bond0 does not respond if activated to Release 4.5.1 or 5.0.0.0.

    For a Gateway so deployed paths do not come up and the debug.py commands do not work.

    Workaround: Without this fix, Operators should avoid using these builds for this specific Gateway deployment until new builds with this issue fixed are rolled out.

  • Issue 83083: A VMware SD-WAN Gateway upgraded to Release 4.3.1 or later may experience a slow memory leak which can lead to the Gateway's service restarting to clear the memory.

    Gateway restarts can be disruptive to customer traffic for the 30-45 seconds it takes the for the Gateway service to restart. Each time an Operator user runs the debug.py --flow_dump all all all command on the Gateway, the Gateway will leak some of its memory. Running this debug command a sufficient number of times will cause the Gateway's memory usage to reach a critical level and trigger a Gateway service restart to clear the memory.

    Workaround: Avoid running the debug.py --flow_dump all all all command on the Gateway. If using this debug command is unavoidable, monitor the memory usage and schedule maintenance windows to preemptively restart the service to clear the memory prior to an unscheduled restart.

  • Issue 83212: When looking at the VMware SASE Orchestrator for Monitor &gt; Edge &gt; Transport, there is a discrepancy between the Link and Application statistics table.

    The Application and Link statistics should be match but the Application statistics show a higher value than the Link statistics. This issue is most commonly occurs where there is a VMware SD-WAN Edge Hub Cluster topology where the Spoke Edge uses a single WAN link. If this single WAN link experiences some loss, the packets are retransmitted and are accounted twice in Application statistics which results in the observed discrepancy.

    Workaround: There is no workaround for this issue, but Applications statistics will be correct versus Link statistics when this issue is encountered.

  • Issue 87982: A VMware SD-WAN Edge using a Metanoia-type SFP module with a private PPPoE WAN link may be unable to establish BGP peering and connect to other sites.

    VLAN tagged packets using a private PPPoE link are corrupted by the Edge and never reach their destination as a result. This issue does not affect public PPPoE links.

    Workaround: The workarounds are either to not VLAN tag the traffic for PPPoE link or make the link public.

  • Issue 88796: When deploying either a VMware SASE Orchestrator or a VMware SD-WAN Gateway and using an OVA on vSphere, the OVF properties set as part of the deployment (password, network information, etc.) are not applied to the image and the system cannot be accessed after deployment.

    This only affects new system deployed from OVA using OVF/vApp properties (versus using ISO files). This issue is caused by upstream changes to cloud-init in recent updates.

    Workaround: Without the fix, the workaround is for the Operator to deploy the system using a cloud-init user-data ISO file.

  • Issue 89217: A VMware SD-WAN Edge in the 6x0 model line (610, 610N, 610-LTE, 620, 620N, 640, 640N, 680, 680N) may suddenly power off for no reason.

    The 6x0 Edge would have all lights off, both the front status LED and the rear Ethernet port lights and can only be recovered by manually power cycling the Edge.

    The cause of the issue is traced to a PIC microcontroller exclusive to the Edge 6x0 line which uses a PIC firmware version of v20M or earlier (v20L, v20K, v20J). This issue can only occur when the 6x0 Edge uses a PIC version of v20M or earlier, but even with this version the odds of experiencing the power off issue are rare (approximately 1/1,000). The issue cannot occur on a 6x0 Edge with a PIC firmware version of v20N or later.

    Note:

    A 6x0 Edge's Firmware including PIC version can be determined on an Orchestrator using 5.x by going to the Monitor > Edge > Overview page for that Edge and clicking the dropdown information box next to the Edge name which includes the Edge Information, Device Version, and the Device Firmware. However this only works on an Edge using Release 4.5.1.

    The issue is resolved by upgrading the 6x0 Edge to Platform Firmware 1.3.1 (R131-20221216-GA), which includes PIC version v20N. To do this the 6x0 Edge must be connected to a VMware SASE Orchestrator using Release 5.x (5.0.0 or later), and the 6x0 Edge must first be upgraded to Edge hoftix build R5012-20230327-GA-107522. Once the 6x0 Edge is upgraded to R5012-20230327-GA-107522, the user would then update the 6x0 Edge Platform Firmware to version R131-20221216-GA in the same way that an Edge's software version is modified.

    For more information and a step-by-step guide to upgrading a 6x0 Edge to Platform Firmware 1.3.1, see the KB Article: VMware SD-WAN 6X0 model Edges may power off with no LEDs and require a power cycle to come back to a working state (88970). This KB article was updated on April 4th, 2023 to reflect the new Edge and Platform Software needed to resolve the issue.

    For information on uploading a Platform Firmware bundle to an Orchestrator, consult the Platform Firmware and Factory Images with New Orchestrator UI section of the VMware SD-WAN Operator Guide.

    For information on updating a 6x0 Edge’s Platform Firmware, consult the View or Modify Edge Information section of the VMware SD-WAN Administration Guide.

    Workaround: To recover the Edge from the problem state:​

    1. Disconnect the Edge from the power source.

    2. Wait 20 seconds.

    3. Reconnect the Edge to the power source.

    If you do not wish to upgrade the 6x0 Edge's platform firmware, the user can ensure the power to the Edge is consistent and does not flap rapidly or consistently. A good way to ensure a reliable power source is to connect the 6x0 Edge to an Uninterruptible Power Supply (UPS).

    If the user prefers to keep the Edge on a lower software release (for example, Release 4.3.1, or 4.5.1), the customer can temporarily upgrade the Edge to R5012-20230327-GA-107522, perform the Platform Firmware upgrade to version 1.3.1 (R131-20221216-GA) so that the PIC version is v20N, and then downgrade the Edge’s software back to their preferred version. Downgrading the 6x0 Edge's software to an earlier version does not also downgrade the Edge's Platform Firmware and the Edge would continue to use Platform Firmware version 1.3.1. In this use case the customer Edges would need to be on an Orchestrator using Release 5.x.

    If the 6x0 Edge is on an Orchestrator that does not use version 5.x and has experienced this issue and requires an update of its PIC firmware, the customer may reach out to VMware SD-WAN Support and they will manually update the Edge’s PIC version.

  • Issue 90884: For a customer enterprise using a Hub Cluster/Spoke topology, when a Hub Edge in the Cluster is reassigned to one or more Spoke Edges, the users at those Spoke Edge locations may experience traffic failure.

    Hub Edges in a Cluster can be reassigned as part of a Cluster rebalance when an enterprise upgrades their Edge software, so this issue may be observed post-upgrade. When this issue is encountered, the VMware SD-WAN Gateway does not send the new Spoke Edge routes to the Hub Edge because the Gateway is expecting all Hub Edges to have all Spoke Edge routes, and thus these routes are not in the Hub Edge's routing table. As a result traffic between the Spoke Edges and the Hub Edge in the Cluster is impacted because the forwarding path is down.

    Workaround: If the issue is encountered in an enterprise not using Gateways with a fix for this issue, it can be temporarily resolved by performing a route reinit on the Hub Edge. However, the issue may recur on a new Hub Edge reassignment.

  • Issue 92481: If a WAN interface on a VMware SD-WAN Edge is configured as not enabled on the VMware SASE Orchestrator, the interface will still be reported as 'UP' by SNMP.

    The key debug process for interfaces output does not include the physical port details for Edge WAN interfaces (for example, GE3 or GE4 on an Edge 6x0 or 3x00 model). As a result when SNMP polls those interfaces it always returns a result of UP regardless of how these interfaces are configured.

    Workaround: There is no workaround for this issue.

  • Issue 98136: For customer enterprises using a Hub/Spoke topology where Dynamic Branch to Branch VPN is configured, client users behind a SD-WAN Spoke Edge may observe that some traffic has unexpected latency resulting from the traffic using a sub-optimal path.

    Spoke Edge traffic that experiences this issue uses a route that was initially a non-uplink route for a Hub Edge not included in the Profile the Spoke Edge was using. A Dynamic Branch to Branch VPN tunnel can be formed from the Spoke Edge to the Hub Edge because of traffic being sent towards some other unrelated prefix and in this instance the non-uplink route is installed in the Spoke Edge.

    As a result of this non-uplink route, all traffic towards this prefix starts going through the Hub Edge and the non-uplink route becomes uplink (community change to uplink community) but the non-uplink route installed previously is not revoked and the traffic takes the Hub Edge path as long as the Dynamic Branch To Branch VPN tunnel remains up.

    Workaround: Wait for the Dynamic Branch to Branch VPN tunnel to tear down, after which the uplink route will not be installed in the Spoke Edge when a new Dynamic Branch to Branch VPN tunnel is formed towards the Hub Edge.

  • Issue 98979: A VMware SD-WAN Edge may experience a reboot due to an out of memory condition.

    Depending on how much memory is allocated by the Edge during runtime when combined with a subsequent core generation may trigger a kernel out of memory condition which in turn triggers a kernel panic and resulting reboot. An Edge reboot takes about 2-3 minutes to complete, and customer traffic would be disrupted until the Edge completes the reboot.

    Workaround: There is no workaround for this issue.

  • Issue 110561: A Dynamic Branch to Branch tunnel may not come up and client users for traffic intended for that tunnel may observe latency as the flows are instead routed through a SD-WAN Gateway.

    When there are a large number (~4K tunnels between ~4K Edges) of Dynamic Branch to Branch tunnels required due to high traffic between branch Edges, a few tunnels may not come up.

    Workaround: There is no workaround. However, despite the Branch to Branch tunnels not coming up, flows between impacted branches will continue via the Gateway.

  • Issue 113291: When a user runs the Remote Diagnostics "Troubleshoot BGP - Show BGP Neighbor details", "Troubleshoot BGP - Show BGP Neighbor Received Routes", or "Troubleshoot BGP - Show BGP Neighbor Learned Routes" the VMware SASE Orchestrator may return an error with no results.

    The error reads No such neighbor in this view/vrf. This occurs if there is a stale BGP entry and is caused by a logical issue where the diagnostic fetches the vrf name where an Edge has a stale entry, the command returns two entries with the first being the stale one.

    Workaround: The stale BGP entry needs to be deleted to ensure a successful Remote Diagnostic.

  • Issue 115089: For a customer enterprise site where the SD-WAN Edge is using version 4.5.1 or 4.5.2, client users may observe poor traffic performance including one-way VoIP calls.

    With this issue, the Edge drops traffic/packets with an encapsulation error resulting from a ARP lookup failure. The encapsulation or ARP issue is observed for an interface which was working before.

    Workaround: Disable underlay accounting for the impacted Edge interface, or upgrade the Edge to a 5.x build.

  • Issue 118704: A user may observe abnormally high latency values for paths measured between SD-WAN Edges and SD-WAN Gateways even though actual Edge-to-Gateway packet latency is much lower.

    A race condition has been identified with clock synchronization resulting in latency values measured incorrectly. This issue is cosmetic and there is no performance impact to customer traffic but it does negatively impact a customer's ability to properly monitor Edge links and paths.

    Workaround: Clock synchronization can be reset by restarting the Edge Service. This can be done using the Orchestrator UI by navigating to the Diagnostics > Remote Actions page, and then checking the affected Edge and selecting the Restart Service option.

  • Issue 119544: For a customer enterprise that deploys a Cloud Security Service (CSS) with Layer 7 Health Check turned on, if the ICMP echo response is turned off on a VMware SD-WAN Edge's loopback interface, the L7 Health Check will fail with a resulting teardown of the CSS tunnel.

    When the Edge tries to send an L7 health check request (HTTP syn packet) it will reach the loopback interface since ICMP echo response is turned off, which results in the dropping of HTTP packets. Since L7 health check does not get an ACK for the syn packet it has sent, the L7 health check fails and leads to a teardown of the CSS tunnel with any traffic using the CSS disrupted.

    Workaround: Turn on Echo response on the Edge's loopback interface.

  • Issue 125509: A customer enterprise using lower end VMware SD-WAN Edge models may experience flaps for BFD, BGP, or OSPF, depending on the routing protocol being used.

    On entry level Edge platforms (510, 520, 540, 610, and 620) at a high flow scale and coupled with dynamic routing and/or High Availability configuration, OSPF/BGP routing flaps may be observed when aggressive Hello and Dead interval timers are configured. In addition, if the customer also uses Edge Network Intelligence with Analytics turned on, the potential to encounter this issue increases.

    Workaround: If experiencing this issue, the workaround is to revert to default interval timers for OSPF (10, 40) or BGP (60, 180) or disable BFD entirely.

  • Issue 126336: When deploying a Partner Gateway, BGP neighborship may not come up between a provider edge (PE) and the Partner Gateway.

    When this issue occurs, the BGP neighborship does not establish between the PE and the Gateway. The PE remains stuck in a connect state and does not send the ACK for a TCP handshake.

  • Issue 127024: A customer enterprise using lower end VMware SD-WAN Edge models may experience flaps for BFD, BGP, or OSPF, depending on the routing protocol being used.

    On entry level Edge platforms (510, 520, 540, 610, and 620) at a high flow scale (33K flows with 50k routes) and coupled with dynamic routing, OSPF/BGP routing flaps may be observed when aggressive Hello and Dead interval timers are configured.

    Workaround: If experiencing this issue, the workaround is to revert to default interval timers for OSPF (10, 40) or BGP (60, 180) or disable BFD entirely.

  • Issue 131122: In some cases, UDP traffic which matches a configured Business Policy rule which includes a Policy-Based NAT (PBNAT) may not be steered as expected.

    The SD-WAN Gateway may not see see the QOS synchronization for certain UDP flows (for example, DNS traffic). This results in UDP packets potentially not getting the expected routing and business policy rules applied. When the particular flow consists of a single packet (like a DNS flow) the impact can be significant as the entire flow is incorrectly steered.

    Workaround: This is a timing-based issue that does not occur consistently and has no workaround.

  • Issue 134088: ICMP probes may fail on a Partner Gateway while a BGP session is up to the same neighbor.

    In pre-5.x software versions, incoming ICMP probe response packets from the hand-off side are processed post-route lookup. The result of this route lookup was affecting these packets to be dropped in some instances, resulting in ICMP probes failing even though there is connectivity.

    Workaround: The Partner Gateway should be upgraded to a 5.x software version where this issue is resolved.

  • Issue 142366: For a customer enterprise site connected to Partner Gateways where one or more static routes are configured, client users working behind an SD-WAN Edge may observe intermittent traffic loss if a static route via the Primary Partner Gateway is unreachable.

    When the same static route is reachable via two or more Partner Gateways, if the route via the Partner Gateway in the Primary role is unreachable, traffic from an Edge can experience intermittent traffic loss. This issue is the result of the Edge API failing to properly check for reachability on a route lookup which causes the Edge to continue to use the Primary Partner Gateway even though reachability is false.

    Workaround: The issue can be temporarily remediated by the Partner shutting down the Primary Partner Gateway until the static route becomes reachable again. Shutting down the Primary Partner Gateway prevents the Edge from including it in route reachability lookups and ensures traffic matching that static route uses a secondary Partner Gateway. However this can be disruptive for all customers using this Partner Gateway as their Primary Gateway and should be done in a maintenance window by the Partner if possible.

Orchestrator Known Issues

  • Issue 21342:

    When assigning Partner Gateways per-segment, the proper list of Gateway Assignments may not show under the Operator option "View" Gateways on the VMware SD-WAN Edge monitoring list.

  • Issue 24269:

    Monitor > Transport > Loss not graphing observed WAN link loss while QoE graphs do reflect this loss.

  • Issue 25932:

    The VMware SD-WAN Orchestrator allows VMware SD-WAN Gateways to be removed from the Gateway Pool even when they are in use.

  • Issue 32335:

    The ‘End User Service Agreement’ (EUSA) page throws an error when a user is trying to accept the agreement.

    Workaround: Ensure no leading or trailing spaces are found in Enterprise Name.

  • Issue 32435:

    A VMware SD-WAN Edge override for a policy-based NAT configuration is permitted for tuples which are already configured at the profile level and vice versa.

  • Issue 32913:

    After Enabling High Availability, Multicast details for the VMware SD-WAN Edge are not displayed on the Monitoring Page. A failover resolves the issue.

  • Issue 33026:

    The ‘End User Service Agreement’ (EUSA) page does not reload properly after deleting the agreement.

  • Issue 35658:

    When a VMware SD-WAN Edge is moved from one profile to another which has a different CSS setting (e.g. IPsec in profile1 to GRE in profile2), the Edge level CSS settings will continue to use the previous CSS settings (e.g. IPsec versus GRE).

    Workaround: Deactivate and then reenable GRE at the Edge level to resolve the issue.

  • Issue 35667:

    When a VMware SD-WAN Edge is moved from one profile to another profile which has the same CSS setting but a different GRE CSS name (the same endpoints), some GRE tunnels will not show in monitoring.

    Workaround: Deactivate and then reenable GRE at the Edge level to resolve the issue.

  • Issue 36665:

    If the VMware SD-WAN Orchestrator cannot reach the internet, user interface pages that require accessing the Google Maps API may fail to load entirely.

  • Issue 38056:

    The Edge-Licensing export.csv file may not show the region data.

  • Issue 38843:

    When pushing an application map, there is no Operator event, and the Edge event is of limited utility.

  • Issue 39633:

    The Super Gateway hyper link does not work after a user assigns the Alternate Gateway as the Super Gateway.

  • Issue 39790:

    The VMware SD-WAN Orchestrator allows a user to configure a VMware SD-WAN Edge’s routed interface to have greater than the supported 32 subinterfaces, creating the risk that a user can configure 33 or more subinterfaces on an interface which would cause a Dataplane Service Failure for the Edge.

  • Issue 40341:

    Though the Skype application is properly categorized on the backend as Real Time traffic, when editing the Skype Business Policy on the VMware SD-WAN Orchestrator, the Service Class may erroneously display “Transactional”.

  • Issue 41691:

    User cannot change the 'Number of addresses' field although the DHCP pool is not exhausted on the Configure > Edge > Device page.

  • Issue 43276:

    User cannot change the Segment type when a VMware SD-WAN Edge or Profile has a partner gateway configured.

  • Issue 47269:

    The VMware SD-WAN 510-LTE interface may appear for Edge models that do not support an LTE interface.

  • Issue 47713:

     If a Business Policy Rule is configured while Cloud VPN is not enabled, the NAT configuration must be reconfigured upon enabling Cloud VPN.

  • Issue 47820:

    If a VLAN is configured with DHCP not enabled at the Profile level, while also having an Edge Override for this VLAN on that Edge with DHCP enabled, and there is an entry for the DNS server field set to none (no IP configured), the user will be unable to make any changed on the Configure > Edge > Device page and will get an error message of ‘invalid IP address []’ that does not explain or point to the actual problem.

  • Issue 48085:

    The VMware SD-WAN Orchestrator allows a user to delete a VLAN which is associated with an interface.

  • Issue 48737:

    On a VMware SD-WAN Orchestrator which is using the Release 4.0.0 new user interface, if a user is on a Monitor page and changes the Start & End time interval and then navigates between tabs, the Orchestrator does not update Start & End interval time to the new values.

  • Issue 49225:

    VMware SD-WAN Orchestrator does not enforce a limit of 32 total VLANs.

  • Issue 49790:

    When a VMware SD-WAN Edge is activated to Release 4.0.0, the activation is posted twice in Events.

    Workaround: Ignore the duplicate event.

  • Issue 50531:

    When two Operators of differing privileges use the same browser window when accessing the New UI on a 4.0.0 Release version of the VMware SD-WAN Orchestrator, and the Operator with lesser privileges tries to login after the Operator with higher privileges, that lesser privileged Operator will observe multiple errors stating that the "user does not have privilege".

    Note:

    There is no escalation in privileges for the Operator with lower privileges, only the display of error messages.

    Workaround: The next operator may refresh that page prior to logging in to prevent seeing the errors, or each Operator may use different browser windows to avoid this display issue.

  • Issue 51722: On the Release 4.0.0 VMware SD-WAN Orchestrator, the time range selector is no greater than two weeks for any statistic in the Monitor > Edge tabs.

    The time range selector does not show options greater than "Past 2 Weeks" in Monitor > Edge tabs even if the retention period for a set of statistics is much longer than 2 weeks. For example, flow and link statistics are retained for 365 days by default (which is configurable), while path statistics are retained only for 2 weeks by default (also configurable). This issue is making all monitor tabs conform to the lowest retained type of statistic versus allowing a user to select a time period that is consistent with the retention period for that statistic.

    Workaround: A user may use the "Custom" option in the time range selector to see data for more than 2 weeks.

  • Issue 60039: RMA Reactivation does not work when the VMware SD-WAN Edge model is changed.

    When performing an RMA Reactivation for a site where the Edge model is also being changed, the VMware SD-WAN Orchestrator does not save the model change making the reactivation link ineffective. This only affects RMA Reactivations where the Edge model is changed, an RMA Reactivation where the Edge model remains the same will work as expected.

    Workaround: If using a different Edge model for a site, the user would need to create a new Edge and manually apply all Edge-specific settings.

  • Issue 60522: On the VMware SASE Orchestrator UI, the user observes a large number of error messages when they try to remove a segment.

    The issue can be observed when adding a segment to a profile and the associating the segment with multiple VMware SD-WAN Edges. When the user attempts to remove the added segment from the profile, they will see a large number of error messages.

    Workaround: There is no workaround for this issue.

check-circle-line exclamation-circle-line close-line
Scroll to top icon