The most common failure scenarios fall into two categories.
They are configuration and control-plane issues. Management plane issues, while possible, are not common.
Configuration Issues and Fixes
Common configuration issues and their effects are described in
Common Configuration Issues and Effects.
Issues | Effects |
---|---|
Protocol and forwarding IP addresses are reversed for dynamic routing | Dynamic protocol adjacency won’t come up |
Transport zone is not aligned to the DVS boundary | Distributed routing does not work on a subset of ESXi hosts (those missing from the transport zone) |
Dynamic routing protocol configuration mismatch (timers, MTU, BGP ASN, passwords, interface to OSPF area mapping) | Dynamic protocol adjacency does not come up |
DLR HA interface is assigned an IP address and redistribution of connected routes is enabled | DLR Control VM might attract traffic for the HA interface subnet and blackhole the traffic |
To resolve these issues, review the configuration and correct it as needed.
When necessary, use the debug ip ospf or debug ip bgp CLI commands and observe logs on the DLR Control VM or on the ESG console (not via SSH session) to detect protocol configuration issues.
Control-Plane Issues and Fixes
Control plane issues seen are often caused by the following issues:
- Host Control Plane Agent (netcpa) being unable to connect to NSX Manager through the message bus channel provided by vsfwd
- Controller cluster having issues with handling the master role for DLR/VXLAN instances
Controller cluster issues related to handling of master roles can often be resolved by restarting one of the NSX Controllers (restart controller on the Controller’s CLI).
For more information about troubleshooting control-plane issues, see Troubleshooting NSX Controller.