If the NSX Intelligence appliance becomes unresponsive, check the status of the NSX Intelligence services.
Problem
The NSX Intelligence appliance has become unresponsive or you received an error message that indicates the appliance is not functioning as expected.
Cause
It is possible that one or more of the underlying NSX Intelligence services has stopped or is not in a healthy state.
Solution
- Log in to the NSX Intelligence appliance CLI host using an account with an Enterprise Administrator role.
- Check the status of the NSX Intelligence services using the get services command.
If all the NSX Intelligence services are functioning properly, you see an output similar to the following example.
my_nsx-intel> get services Service name: druid Service state: running Coordinator health: good Broker health: good Historical health: good Overlord health: good MiddleManager health: good Service name: http Service state: running Session timeout: 1800 Connection timeout: 30 Redirect host: (not configured) Client API rate limit: 100 requests/sec Client API concurrency limit: 40 Global API concurrency limit: 199 Service name: kafka Service state: running Service health: good Service name: liagent Service state: stopped Service name: mgmt-plane-bus Service state: stopped Service name: node-mgmt Service state: running Service name: nsx-config Service state: running Service name: nsx-message-bus Service state: stopped Service name: nsx-upgrade-agent Service state: running Service name: ntp Service state: running Start on boot: True Service name: pace-server Service state: running Service name: postgres Service state: running Service health: good Service name: processing Service state: running Service name: snmp Service state: stopped Start on boot: False Service name: spark Service state: running Service health: good Service name: spark-job-scheduler Service state: running Service name: ssh Service state: running Start on boot: True Service name: syslog Service state: running Service name: ui-service Service state: running Service name: zookeeper Service state: running Service health: good my_nsx-intel>A service state can either be running or stopped. A service health can be good or degraded.
- You can also view the syslog file and search for the output of the pace-monitor.sh health-check script that logs the health of the NSX Intelligence services to the syslog file.
If all the services are functioning as expected, you see an output similar to the following sample output after running the get log-file syslog | find pace-monitor command.
my_nsx-intel> get log-file syslog | find pace-monitor <13>1 2019-08-30T03:19:20.409899+00:00 my_nsx-intel pace-monitor.sh - - - "_self": { <13>1 2019-08-30T03:19:20.410253+00:00 my_nsx-intel pace-monitor.sh - - - "href": "/node/pace/appliance-health", <13>1 2019-08-30T03:19:20.410623+00:00 my_nsx-intel pace-monitor.sh - - - "rel": "self" <13>1 2019-08-30T03:19:20.410908+00:00 my_nsx-intel pace-monitor.sh - - - }, <13>1 2019-08-30T03:19:20.411162+00:00 my_nsx-intel pace-monitor.sh - - - "appliance-health": { <13>1 2019-08-30T03:19:20.411416+00:00 my_nsx-intel pace-monitor.sh - - - "status": "Following NSX Intelligence first boot services are either PENDING or FAILED - Token-Registration", <13>1 2019-08-30T03:19:20.411668+00:00 my_nsx-intel pace-monitor.sh - - - "sub-system-status": { <13>1 2019-08-30T03:19:20.411923+00:00 my_nsx-intel pace-monitor.sh - - - "app-services": { <13>1 2019-08-30T03:19:20.412280+00:00 my_nsx-intel pace-monitor.sh - - - "services": [], <13>1 2019-08-30T03:19:20.412528+00:00 my_nsx-intel pace-monitor.sh - - - "status": "" <13>1 2019-08-30T03:19:20.412807+00:00 my_nsx-intel pace-monitor.sh - - - }, <13>1 2019-08-30T03:19:20.413075+00:00 my_nsx-intel pace-monitor.sh - - - "base-infra-services": { <13>1 2019-08-30T03:19:20.413303+00:00 my_nsx-intel pace-monitor.sh - - - "services": [ <13>1 2019-08-30T03:19:20.413613+00:00 my_nsx-intel pace-monitor.sh - - - { <13>1 2019-08-30T03:19:20.413848+00:00 my_nsx-intel pace-monitor.sh - - - "druid-health": { <13>1 2019-08-30T03:19:20.414146+00:00 my_nsx-intel pace-monitor.sh - - - "broker": "good", <13>1 2019-08-30T03:19:20.414473+00:00 my_nsx-intel pace-monitor.sh - - - "coordinator": "good", <13>1 2019-08-30T03:19:20.414717+00:00 my_nsx-intel pace-monitor.sh - - - "historical": "good", <13>1 2019-08-30T03:19:20.414979+00:00 my_nsx-intel pace-monitor.sh - - - "middlemanager": "good", <13>1 2019-08-30T03:19:20.415295+00:00 my_nsx-intel pace-monitor.sh - - - "overlord": "good" <13>1 2019-08-30T03:19:20.415533+00:00 my_nsx-intel pace-monitor.sh - - - }, <13>1 2019-08-30T03:19:20.415762+00:00 my_nsx-intel pace-monitor.sh - - - "service-name": "druid" <13>1 2019-08-30T03:19:20.415982+00:00 my_nsx-intel pace-monitor.sh - - - }, <13>1 2019-08-30T03:19:20.416269+00:00 my_nsx-intel pace-monitor.sh - - - { <13>1 2019-08-30T03:19:20.416539+00:00 my_nsx-intel pace-monitor.sh - - - "health": "good", <13>1 2019-08-30T03:19:20.416772+00:00 my_nsx-intel pace-monitor.sh - - - "service-name": "kafka" <13>1 2019-08-30T03:19:20.416991+00:00 my_nsx-intel pace-monitor.sh - - - }, <13>1 2019-08-30T03:19:20.417204+00:00 my_nsx-intel pace-monitor.sh - - - { <13>1 2019-08-30T03:19:20.417510+00:00 my_nsx-intel pace-monitor.sh - - - "health": "good", <13>1 2019-08-30T03:19:20.417745+00:00 my_nsx-intel pace-monitor.sh - - - "service-name": "postgres" <13>1 2019-08-30T03:19:20.418133+00:00 my_nsx-intel pace-monitor.sh - - - }, <13>1 2019-08-30T03:19:20.418389+00:00 my_nsx-intel pace-monitor.sh - - - { <13>1 2019-08-30T03:19:20.418626+00:00 my_nsx-intel pace-monitor.sh - - - "health": "good", <13>1 2019-08-30T03:19:20.418855+00:00 my_nsx-intel pace-monitor.sh - - - "service-name": "spark" <13>1 2019-08-30T03:19:20.419157+00:00 my_nsx-intel pace-monitor.sh - - - }, <13>1 2019-08-30T03:19:20.419435+00:00 my_nsx-intel pace-monitor.sh - - - { <13>1 2019-08-30T03:19:20.419684+00:00 my_nsx-intel pace-monitor.sh - - - "health": "good", <13>1 2019-08-30T03:19:20.419928+00:00 my_nsx-intel pace-monitor.sh - - - "service-name": "zookeeper" <13>1 2019-08-30T03:19:20.420165+00:00 my_nsx-intel pace-monitor.sh - - - } <13>1 2019-08-30T03:19:20.420496+00:00 my_nsx-intel pace-monitor.sh - - - ], <13>1 2019-08-30T03:19:20.420786+00:00 my_nsx-intel pace-monitor.sh - - - "status": "" <13>1 2019-08-30T03:19:20.421022+00:00 my_nsx-intel pace-monitor.sh - - - }, <13>1 2019-08-30T03:19:20.421255+00:00 my_nsx-intel pace-monitor.sh - - - "first-boot-services": { <13>1 2019-08-30T03:19:20.421539+00:00 my_nsx-intel pace-monitor.sh - - - "services": [ <13>1 2019-08-30T03:19:20.421777+00:00 my_nsx-intel pace-monitor.sh - - - { <13>1 2019-08-30T03:19:20.422010+00:00 my_nsx-intel pace-monitor.sh - - - "health": "degraded", <13>1 2019-08-30T03:19:20.422277+00:00 my_nsx-intel pace-monitor.sh - - - "service-name": "token-registration" <13>1 2019-08-30T03:19:20.422512+00:00 my_nsx-intel pace-monitor.sh - - - } <13>1 2019-08-30T03:19:20.422770+00:00 my_nsx-intel pace-monitor.sh - - - ], <13>1 2019-08-30T03:19:20.423012+00:00 my_nsx-intel pace-monitor.sh - - - "status": "Following NSX Intelligence first boot, services are either PENDING or FAILED - Token-Registration" <13>1 2019-08-30T03:19:20.423354+00:00 my_nsx-intel pace-monitor.sh - - - } <13>1 2019-08-30T03:19:20.423601+00:00 my_nsx-intel pace-monitor.sh - - - } <13>1 2019-08-30T03:19:20.423882+00:00 my_nsx-intel pace-monitor.sh - - - } <13>1 2019-08-30T03:19:20.424339+00:00 my_nsx-intel pace-monitor.sh - - - } <13>1 2019-08-30T03:19:20.972629+00:00 my_nsx-intel pace-monitor.sh - - - NSX Intelligence health OK. <30>1 2019-08-30T03:19:20.973076+00:00 my_nsx-intel pace-monitor 20804 - - <13>Aug 30 03:19:19 pace-monitor.sh: NSX Intelligence health OK. <182>1 2019-08-30T03:23:23.857Z my_nsx-intel NSX 21752 - [nsx@6876 comp="nsx-cli" subcomp="node-mgmt" username="admin" level="INFO"] CMD: get log-file syslog | find pace-monitorIf there is a problem with one of the services, you might see the following line when you run get log-file syslog | find pace-monitor.NSX Intelligence health DEGRADED. Return code not HTTP OK.
- If you encounter one of the following outputs, restart the service using the
restart service service-namecommand.- After running the get services command, one of the services shows Service state: stopped or Service health: degraded.
- After running the get log-file syslog | find pace-monitor command, the output shows something similar to the PACE health DEGRADED. Return code not HTTP OK. message.
For example, if the postgres service's state shows it is stopped, or if its state is running, but it has a degraded service health, run the following command.restart service postgres
Important: You must use therestart service service-namecommand to restart NSX Intelligence services. If you decide to use thestop service service-nameandstart service service-namecommands instead, you have to also manually restart each of the services that depend on service-name. The following list shows the dependency order in which the NSX Intelligence services have to be restarted.zookeeper > druid > kafka > spark > spark-job-scheduler > nsx-config > processing > pace-server
For example, if the nsx-config service is stopped and then started using thestop|start service service-namecommand, you must also use therestart service service-namecommand to restart the processing and pace-server services.When a service restarts, other services that depend on it might briefly go into a degraded state. If no errors occur, those degraded services return to a stable state.