When an IPSec VPN tunnel becomes unstable, gather the NSX Data Center for vSphere product logs to start with basic troubleshooting. You can set up packet capture sessions on the data path, and run some NSX Edge CLI commands to determine the causes of tunnel instability.

Use the following procedure to troubleshoot the causes of IPSec VPN tunnel instability.

Prerequisites

Before setting up packet capture sessions on the data path, ensure that the following requirements are met:
  • Packets can be sent and received on the UDP ports 500 and 4500.
  • Firewall rules permit data traffic to pass through ports 500 and 4500.
  • Firewall rules permit Encapsulating Security Payload (ESP) packets.
  • Local subnet routing over the IPSec interface is correctly configured.
  • Check MTU configuration for fragmentation issues by sending a small ping payload and then a larger ping payload to the IP at the end of the tunnel.

Procedure

  1. Gather the support logs from both the sites.
    Important: Due to limited disk space on the NSX Edge appliance, you must redirect logs to a Syslog server. For more information, see the "Configure Remote Syslog Servers" section in the NSX Data Center for vSphere Administration Guide.
  2. Ensure that the IPSec VPN service on the NSX Edge is configured correctly to work with the third-party hardware VPN firewall solutions, such as, SonicWall, Watchguard, and so on. If necessary, contact the VPN vendor for any specific configuration information that you need.
  3. Set up a packet capture of IKE packets or ESP packets between the NSX Edge and third-party firewall.
  4. On the NSX Edge, record the real-time status when the issue is occurring. Run the following commands and records the results.
    Command Purpose
    show service ipsec Check the status of IPSec VPN service.
    show service ipsec sp Check the status of Security Policy.
    show service ipsec sa Check the status of Security Association (SA).
  5. While the issue is still occurring, capture the IPSec-related logs and output on the third-party VPN solution.
  6. Review the IPSec-related logs and output for determining issues. Verify that the IPSec VPN service is running, security polices are created, and security associations between the devices are configured.
    Common issues that you can spot from the logs are as follows:
    • Invalid ID: INVALID_ID_INFORMATION or PAYLOAD_MALFORMED
    • No trusted CA: INVALID_KEY_INFORMATION or a more specific error. For example, no RSA public key known for 'C=CN, ST=BJ, O=VMWare, OU=CINS, CN=left‘, or PAYLOAD_MALFORMED.
    • Proposed proxy-id is not found: INVALID_ID_INFORMATION or PAYLOAD_MALFORMED.
    • DPD no response from peer. For example, DPD: No response from peer - declaring peer dead.
  7. Check the tunnel failure message either in the vSphere Web Client, or the NSX Edge CLI , or by running the NSX Data Center for vSphere REST APIs.
    For example, to view the failure message in the vSphere Web Client, double-click the NSX Edge, navigate to the IPSec VPN page, and do these steps:
    1. Click Show IPSec Statistics.
    2. Select the IPSec channel that is down.
    3. For the selected channel, select the tunnel that is down (disabled), and view the details of the tunnel failure.
      • In NSX 6.4.6 and later, click Disabled in the Tunnel State column.
      • In NSX 6.4.5 and earlier, click View Details in the Tunnel State column.

    The following table lists the possible causes for the IPSec tunnel connectivity issues, and the failure message that is associated with each of them.

    Causes Failure message
    IKEv1 peer is not reachable. Version-IKEv1 Retransmitting IKE Message as no response from Peer.
    Mismatch in IKEv1 Phase 1 proposal. Version-IKEv1 No Proposal Chosen. Check configured Encryption/Authentication/DH/IKE-Version.
    Mismatch in any one of the following:
    • IKEv1 PSK
    • IKEv1 ID
    • IKEv1 certificate
    Version-IKEv1 Authentication Failed. Check the configured secret or local/peer ID configuration.
    Mismatch in IKEv1 Phase 2 proposal. IPSec-SA Proposals or Traffic Selectors did not match.
    IKEv2 peer is not reachable. Version-IKEv2 Retransmitting IKE Message as no response from Peer.
    Mismatch in IKEv2 IKE SA proposal. Version-IKEv2 No Proposal Chosen. Check configured Encrypt/Authentication/DH/IKEversion.
    Mismatch in IKEv2 IPSec SA proposal. IPSec-SA Proposals or Traffic Selectors did not match.
    Mismatch in IKEv2 IPSec SA traffic selectors. Traffic selectors did not match. Check left/right subnet configuration.
    Mismatch in any one of the following:
    • IKEv2 PSK
    • IKEv2 ID
    • IKEv2 certificate
    Version-IKEv2 Authentication Failed. Check the configured secret or local/peer ID configuration.
  8. While the issue is still occurring, capture the runtime state, traffic state, and the packet capture sessions on the entire data path.
    To determine where the traffic has issues, ping from a private subnet on one side of the IPSec tunnel to another private subnet on the other side of the IPSec tunnel.
    1. Set up packet capture at points 1, 2, 3, and 4, as shown in the following figure.
    2. Ping from VM 1 to Host 2.
    3. Ping from Host 2 to VM 1.
    4. Check at which point the packet transfer failed or got dropped.
    Figure 1. Packet Capture at Various Points on the Data Path

    Diagram shows points 1, 2, 3, and 4 where you can capture packets.