The most common failure scenarios fall into two categories.

They are configuration and control-plane issues. Management plane issues, while possible, are not common.

Configuration Issues and Fixes

Common configuration issues and their effects are described in 1.

Table 1. Common Configuration Issues and Effects

Issues

Effects

Protocol and forwarding IP addresses are reversed for dynamic routing

Dynamic protocol adjacency won’t come up

Transport zone is not aligned to the DVS boundary

Distributed routing does not work on a subset of ESXi hosts (those missing from the transport zone)

Dynamic routing protocol configuration mismatch (timers, MTU, BGP ASN, passwords, interface to OSPF area mapping)

Dynamic protocol adjacency does not come up

DLR HA interface is assigned an IP address and redistribution of connected routes is enabled

DLR Control VM might attract traffic for the HA interface subnet and blackhole the traffic

To resolve these issues, review the configuration and correct it as needed.

When necessary, use the debug ip ospf or debug ip bgp CLI commands and observe logs on the DLR Control VM or on the ESG console (not via SSH session) to detect protocol configuration issues.

Control-Plane Issues and Fixes

Control plane issues seen are often caused by the following issues:

  • Host Control Plane Agent (netcpa) being unable to connect to NSX Manager through the message bus channel provided by vsfwd

  • Controller cluster having issues with handling the master role for DLR/VXLAN instances

Controller cluster issues related to handling of master roles can often be resolved by restarting one of the NSX Controllers (restart controller on the Controller’s CLI).

For more information about troubleshooting control-pane issues, see http://kb.vmware.com/kb/2125767.