There are several different commands you can run to determine if the Edge is in a good state.

Edge Diagnosis

  • Check if vmtoolsd is running with this command:
    nsxedge> show process list
    Perimeter-Gateway-01-0> show process list
    %CPU %MEM    VSZ   RSZ STAT  STARTED     TIME COMMAND
     0.0  0.1   4244   720 Ss     May 16 00:00:15 init [3]
    ...
     0.0  0.1   4240   640 S      May 16 00:00:00 logger -p daemon debug -t vserrdd
     0.2  0.9  57192  4668 S      May 16 00:23:07 /usr/local/bin/vmtoolsd --plugin-pa
     0.0  0.4   4304  2260 SLs    May 16 00:01:54 /usr/sbin/watchdog
     ...
    
  • Check if Edge is in a good state by running this command:
    nsxedge> show eventmgr
    -----------------------
    messagebus     : enabled
    debug          : 0
    profiling      : 0
    cfg_rx         : 1
    cfg_rx_msgbus  : 0
    ...
    
    Use the show eventmgr command to verify that the query command is received and processed.
    nsxedge> show eventmgr
    -----------------------
    messagebus     : enabled
    debug          : 0
    profiling      : 0
    cfg_rx         : 1
    cfg_rx_msgbus  : 0
    cfg_rx_err     : 0
    cfg_exec_err   : 0
    cfg_resp       : 0
    cfg_resp_err   : 0
    cfg_resp_ln_err: 0
    fastquery_rx   : 0
    fastquery_err  : 0
    clearcmd_rx    : 0
    clearcmd_err   : 0
    ha_rx          : 0
    ha_rx_err      : 0
    ha_exec_err    : 0
    status_rx      : 16
    status_rx_err  : 0
    status_svr     : 10
    status_evt     : 0
    status_evt_push: 0
    status_ha      : 0
    status_ver     : 1
    status_sys     : 5
    status_cmd     : 0
    status_svr_err : 0
    status_evt_err : 0
    status_sys_err : 0
    status_ha_err  : 0
    status_ver_err : 0
    status_cmd_err : 0
    evt_report     : 1
    evt_report_err : 0
    hc_report      : 10962
    hc_report_err  : 0
    cli_rx         : 2
    cli_resp       : 1
    cli_resp_err   : 0
    counter_reset  : 0
    ---------- Health Status -------------
    system status  : good
    ha state       : active
    cfg version    : 7
    generation     : 0
    server status  : 1
    syslog-ng      : 1
    haproxy        : 0
    ipsec          : 0
    sslvpn         : 0
    l2vpn          : 0
    dns            : 0
    dhcp           : 0
    heartbeat      : 0
    monitor        : 0
    gslb           : 0
    ---------- System Events -------------
    

Edge Recovery

If the vmtoolsd is not running or the NSX Edge is in a bad state, reboot the edge.

To recover from a crash, a reboot should be sufficient. A redeploy should not be required.

Note: Note down all logging information from the old edge when a redeploy is done.

To debug a kernel crash, you need to obtain:

  • Either the vmss (VM suspend) or vmsn (VM snapshot) file for the edge VM while it is still in the crashed state. If there is a vmem file, this is also needed. This can be use to extract a kernel core dump file, which VMware Support can analyze.
  • The Edge support log, generated right after the crashed edge has been rebooted (but not redeployed). You can also check the edge logs. See https://kb.vmware.com/kb/2079380.
  • A screen shot of the Edge console is also helpful, although this does not usually contain the complete crash report.