In a large-scale environment with many transport nodes and VMs on ESXi hosts, NSX agents, which run on ESXi hosts, might time out when communicating with NSX Manager.

Problem

Some operations, such as when a VM vnic tries to attach to a logical switch, fail. The /var/run/log/nsx-opsagent.log has messages such as:

level="ERROR" errorCode="MPA41542"] [MP_AddVnicAttachment] RPC call [0e316296-13-14] to NSX management plane timout
2017-05-15T05:32:13Z nsxa: [nsx@6876 comp="nsx-esx" subcomp="NSXA[VifHandlerThread:-2282640]" tid="1000017079" level="ERROR" errorCode="MPA42003"] [DoMpVifAttachRpc] MP_AddVnicAttachment() failed: RPC call to NSX management plane timout

Cause

In a large-scale environment, some operations might take longer than usual and fail because the default timeout values are exceeded.

Solution

  1. Increase the NSX agent timeout value.
    1. On the ESXi host, stop the NSX opsAgent with the following command:
      /etc/init.d/nsx-opsagent stop
    2. Edit the file /etc/vmware/nsx-opsagent/nsxa.json and change the vifOperationTimeout value from 25 to, for example, 55.
      "mp" : {
          /* timeout for VIF operation */
          "vifOperationTimeout" : 25,
      Note:

      This timeout value must be less than the hostd timeout value that you set in step 2.

    3. Start the NSX opsAgent with the following command:

      /etc/init.d/nsx-opsagent start

  2. Increase the hostd timeout value.
    1. On the ESXi host, stop the hostd agent with the following command:
      /etc/init.d/hostd stop
    2. Edit the file /etc/vmware/hostd/config.xml. Under <opaqueNetwork>, uncomment the entry for <taskTimeout> and change the value from 30 to, for example, 60.
      <opaqueNetwork>
          <!-- maximum message size allowed in opaque network manager IPC, in bytes. -->
          <!-- <maxMsgSize> 65536 </maxMsgSize> -->
          <!-- maximum wait time for opaque network response -->
          <!-- <taskTimeout> 30 </taskTimeout> -->
    3. Start the hostd agent with the following command:

      /etc/init.d/hostd start