On NSX for vSphere, control plane (netcpa) works as a local agent daemon, communicating with NSX Manager and with the controller cluster. Communication Channel Health feature is a proactive health check which periodically reports the central control plane to local control plane status to NSX Manager and is displayed at the NSX Manager UI. This report also serves as a heartbeat to detect the operational status of theNSX Manager to ESXi host netcpa channel. It provides error details during communication faults, generates an event when a channel goes into a wrong status, and also generates heartbeat messages from NSX Manager to hosts.

Problem

Connectivity issues between netcpa and controller.

If there is any missing connection, then netcpa may not be working properly.

Procedure

  1. Validate the event log message when the channel goes into a wrong state using the following command:

    GET https://<vsm_host_ip>/api/2.0/vdn/inventory/host/{hostId}/connection/status.

    Following is the example of the return value:

    <?xml version="1.0" encoding="UTF-8"?>
    <hostConnStatus>
    <hostName>10.161.246.20</hostName>
    <hostId>host-21</hostId>
    <nsxMgrToFirewallAgentConn>UP</nsxMgrToFirewallAgentConn>
    <nsxMgrToControlPlaneAgentConn>UP</nsxMgrToControlPlaneAgentConn>
    <hostToControllerConn>DOWN</hostToControllerConn>
    <fullSyncCount>-1</fullSyncCount>
    <hostToControllerConnectionErrors>
    <hostToControllerConnectionError>
    <controllerIp>10.160.203.236</controllerIp>
    <errorCode>1255604</errorCode>
    <errorMessage>Connection Refused</errorMessage>
    </hostToControllerConnectionError>
    <hostToControllerConnectionError>
    <controllerIp>10.160.203.237</controllerIp>
    <errorCode>1255603</errorCode>
    <errorMessage>SSL Handshake Failure</errorMessage>
    </hostToControllerConnectionError>
    </hostToControllerConnectionErrors>
    </hostConnStatus>

    The following error codes are supported:

    1255602: Incomplete Controller Certificate
    1255603: SSL Handshake Failure
    1255604: Connection Refused
    1255605: Keep-alive Timeout
    1255606: SSL Exception
    1255607: Bad Message
    1255620: Unknown Error

  2. Validate the connections to the controllers from netcpa using the following command:

    esxcli network ip connection list | grep 1234.

  3. Validate the connections to the controllers from netcpa to show CLOSEDor CLOSE_WAIT status by running the following command:

    esxcli network ip connection list |grep "1234.*netcpa*" | egrep "CLOSED|CLOSE_WAIT".

  4. If netcpa has been down for a significantly long time, the connections may not be present at all. To validate this, run the following command. The output is one connection for each controller.

    esxcli network ip connection list |grep "1234.*netcpa*" |grep ESTABLISHED.

  5. To fix the issue, restart netcpa as follows:
    1. Log in as root to the ESXi host through SSH or through the console.
    2. Run the /etc/init.d/netcpad restart command to restart the netcpa agent on the ESXi host.