There are several different commands you can run to determine if the Edge is in a good state.
Edge Diagnosis
- Check if vmtoolsd is running with this command:
nsxedge> show process list Perimeter-Gateway-01-0> show process list %CPU %MEM VSZ RSZ STAT STARTED TIME COMMAND 0.0 0.1 4244 720 Ss May 16 00:00:15 init [3] ... 0.0 0.1 4240 640 S May 16 00:00:00 logger -p daemon debug -t vserrdd 0.2 0.9 57192 4668 S May 16 00:23:07 /usr/local/bin/vmtoolsd --plugin-pa 0.0 0.4 4304 2260 SLs May 16 00:01:54 /usr/sbin/watchdog ...
- Check if Edge is in a good state by running this command:
nsxedge> show eventmgr ----------------------- messagebus : enabled debug : 0 profiling : 0 cfg_rx : 1 cfg_rx_msgbus : 0 ...
Use the show eventmgr command to verify that the query command is received and processed.nsxedge> show eventmgr ----------------------- messagebus : enabled debug : 0 profiling : 0 cfg_rx : 1 cfg_rx_msgbus : 0 cfg_rx_err : 0 cfg_exec_err : 0 cfg_resp : 0 cfg_resp_err : 0 cfg_resp_ln_err: 0 fastquery_rx : 0 fastquery_err : 0 clearcmd_rx : 0 clearcmd_err : 0 ha_rx : 0 ha_rx_err : 0 ha_exec_err : 0 status_rx : 16 status_rx_err : 0 status_svr : 10 status_evt : 0 status_evt_push: 0 status_ha : 0 status_ver : 1 status_sys : 5 status_cmd : 0 status_svr_err : 0 status_evt_err : 0 status_sys_err : 0 status_ha_err : 0 status_ver_err : 0 status_cmd_err : 0 evt_report : 1 evt_report_err : 0 hc_report : 10962 hc_report_err : 0 cli_rx : 2 cli_resp : 1 cli_resp_err : 0 counter_reset : 0 ---------- Health Status ------------- system status : good ha state : active cfg version : 7 generation : 0 server status : 1 syslog-ng : 1 haproxy : 0 ipsec : 0 sslvpn : 0 l2vpn : 0 dns : 0 dhcp : 0 heartbeat : 0 monitor : 0 gslb : 0 ---------- System Events -------------
Edge Recovery
If the vmtoolsd is not running or the NSX Edge is in a bad state, reboot the edge.
To recover from a crash, a reboot should be sufficient. A redeploy should not be required.
Note: Note down all logging information from the old edge when a redeploy is done.
To debug a kernel crash, you need to obtain:
- Either the vmss (VM suspend) or vmsn (VM snapshot) file for the edge VM while it is still in the crashed state. If there is a vmem file, this is also needed. This can be use to extract a kernel core dump file, which VMware Support can analyze.
- The Edge support log, generated right after the crashed edge has been rebooted (but not redeployed). You can also check the edge logs. See https://kb.vmware.com/kb/2079380.
- A screen shot of the Edge console is also helpful, although this does not usually contain the complete crash report.