Check the Status of the NSX Intelligence Appliance

If the NSX Intelligence appliance becomes unresponsive, check the status of the NSX Intelligence services.

Problem

The NSX Intelligence appliance has become unresponsive or you received an error message that indicates the appliance is not functioning as expected.

Cause

It is possible that one or more of the underlying NSX Intelligence services has stopped or is not in a healthy state.

Solution

Check the status of the NSX Intelligence services using the get services command.

If all the NSX Intelligence services are functioning properly, you see an output similar to the following example.

my_nsx-intel> get services
Service name:                  druid
Service state:                 running
Coordinator health:            good
Broker health:                 good
Historical health:             good
Overlord health:               good
MiddleManager health:          good

Service name:                  http
Service state:                 running
Session timeout:               1800
Connection timeout:            30
Redirect host:                 (not configured)
Client API rate limit:         100 requests/sec
Client API concurrency limit:  40
Global API concurrency limit:  199

Service name:                  kafka
Service state:                 running
Service health:                good

Service name:                  liagent
Service state:                 stopped

Service name:                  mgmt-plane-bus
Service state:                 stopped

Service name:                  node-mgmt
Service state:                 running

Service name:                  nsx-config
Service state:                 running

Service name:                  nsx-message-bus
Service state:                 stopped

Service name:                  nsx-upgrade-agent
Service state:                 running

Service name:                  ntp
Service state:                 running
Start on boot:                 True

Service name:                  pace-server
Service state:                 running

Service name:                  postgres
Service state:                 running
Service health:                good

Service name:                  processing
Service state:                 running

Service name:                  snmp
Service state:                 stopped
Start on boot:                 False

Service name:                  spark
Service state:                 running
Service health:                good

Service name:                  spark-job-scheduler
Service state:                 running

Service name:                  ssh
Service state:                 running
Start on boot:                 True

Service name:                  syslog
Service state:                 running

Service name:                  ui-service
Service state:                 running

Service name:                  zookeeper
Service state:                 running
Service health:                good

my_nsx-intel>

A service state can either be running or stopped. A service health can be good or degraded.

You can also view the syslog file and search for the output of the pace-monitor.sh health-check script that logs the health of the NSX Intelligence services to the syslog file.

If all the services are functioning as expected, you see an output similar to the following sample output after running the get log-file syslog | find pace-monitor command.

my_nsx-intel> get log-file syslog | find pace-monitor
<13>1 2019-08-30T03:19:20.409899+00:00 my_nsx-intel pace-monitor.sh - - -    "_self": {
<13>1 2019-08-30T03:19:20.410253+00:00 my_nsx-intel pace-monitor.sh - - -      "href": "/node/pace/appliance-health",
<13>1 2019-08-30T03:19:20.410623+00:00 my_nsx-intel pace-monitor.sh - - -      "rel": "self"
<13>1 2019-08-30T03:19:20.410908+00:00 my_nsx-intel pace-monitor.sh - - -    },
<13>1 2019-08-30T03:19:20.411162+00:00 my_nsx-intel pace-monitor.sh - - -    "appliance-health": {
<13>1 2019-08-30T03:19:20.411416+00:00 my_nsx-intel pace-monitor.sh - - -      "status": "Following NSX Intelligence first boot services are either PENDING or FAILED - Token-Registration",
<13>1 2019-08-30T03:19:20.411668+00:00 my_nsx-intel pace-monitor.sh - - -      "sub-system-status": {
<13>1 2019-08-30T03:19:20.411923+00:00 my_nsx-intel pace-monitor.sh - - -        "app-services": {
<13>1 2019-08-30T03:19:20.412280+00:00 my_nsx-intel pace-monitor.sh - - -          "services": [],
<13>1 2019-08-30T03:19:20.412528+00:00 my_nsx-intel pace-monitor.sh - - -          "status": ""
<13>1 2019-08-30T03:19:20.412807+00:00 my_nsx-intel pace-monitor.sh - - -        },
<13>1 2019-08-30T03:19:20.413075+00:00 my_nsx-intel pace-monitor.sh - - -        "base-infra-services": {
<13>1 2019-08-30T03:19:20.413303+00:00 my_nsx-intel pace-monitor.sh - - -          "services": [
<13>1 2019-08-30T03:19:20.413613+00:00 my_nsx-intel pace-monitor.sh - - -            {
<13>1 2019-08-30T03:19:20.413848+00:00 my_nsx-intel pace-monitor.sh - - -              "druid-health": {
<13>1 2019-08-30T03:19:20.414146+00:00 my_nsx-intel pace-monitor.sh - - -                "broker": "good",
<13>1 2019-08-30T03:19:20.414473+00:00 my_nsx-intel pace-monitor.sh - - -                "coordinator": "good",
<13>1 2019-08-30T03:19:20.414717+00:00 my_nsx-intel pace-monitor.sh - - -                "historical": "good",
<13>1 2019-08-30T03:19:20.414979+00:00 my_nsx-intel pace-monitor.sh - - -                "middlemanager": "good",
<13>1 2019-08-30T03:19:20.415295+00:00 my_nsx-intel pace-monitor.sh - - -                "overlord": "good"
<13>1 2019-08-30T03:19:20.415533+00:00 my_nsx-intel pace-monitor.sh - - -              },
<13>1 2019-08-30T03:19:20.415762+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "druid"
<13>1 2019-08-30T03:19:20.415982+00:00 my_nsx-intel pace-monitor.sh - - -            },
<13>1 2019-08-30T03:19:20.416269+00:00 my_nsx-intel pace-monitor.sh - - -            {
<13>1 2019-08-30T03:19:20.416539+00:00 my_nsx-intel pace-monitor.sh - - -              "health": "good",
<13>1 2019-08-30T03:19:20.416772+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "kafka"
<13>1 2019-08-30T03:19:20.416991+00:00 my_nsx-intel pace-monitor.sh - - -            },
<13>1 2019-08-30T03:19:20.417204+00:00 my_nsx-intel pace-monitor.sh - - -            {
<13>1 2019-08-30T03:19:20.417510+00:00 my_nsx-intel pace-monitor.sh - - -              "health": "good",
<13>1 2019-08-30T03:19:20.417745+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "postgres"
<13>1 2019-08-30T03:19:20.418133+00:00 my_nsx-intel pace-monitor.sh - - -            },
<13>1 2019-08-30T03:19:20.418389+00:00 my_nsx-intel pace-monitor.sh - - -            {
<13>1 2019-08-30T03:19:20.418626+00:00 my_nsx-intel pace-monitor.sh - - -              "health": "good",
<13>1 2019-08-30T03:19:20.418855+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "spark"
<13>1 2019-08-30T03:19:20.419157+00:00 my_nsx-intel pace-monitor.sh - - -            },
<13>1 2019-08-30T03:19:20.419435+00:00 my_nsx-intel pace-monitor.sh - - -            {
<13>1 2019-08-30T03:19:20.419684+00:00 my_nsx-intel pace-monitor.sh - - -              "health": "good",
<13>1 2019-08-30T03:19:20.419928+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "zookeeper"
<13>1 2019-08-30T03:19:20.420165+00:00 my_nsx-intel pace-monitor.sh - - -            }
<13>1 2019-08-30T03:19:20.420496+00:00 my_nsx-intel pace-monitor.sh - - -          ],
<13>1 2019-08-30T03:19:20.420786+00:00 my_nsx-intel pace-monitor.sh - - -          "status": ""
<13>1 2019-08-30T03:19:20.421022+00:00 my_nsx-intel pace-monitor.sh - - -        },
<13>1 2019-08-30T03:19:20.421255+00:00 my_nsx-intel pace-monitor.sh - - -        "first-boot-services": {
<13>1 2019-08-30T03:19:20.421539+00:00 my_nsx-intel pace-monitor.sh - - -          "services": [
<13>1 2019-08-30T03:19:20.421777+00:00 my_nsx-intel pace-monitor.sh - - -            {
<13>1 2019-08-30T03:19:20.422010+00:00 my_nsx-intel pace-monitor.sh - - -              "health": "degraded",
<13>1 2019-08-30T03:19:20.422277+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "token-registration"
<13>1 2019-08-30T03:19:20.422512+00:00 my_nsx-intel pace-monitor.sh - - -            }
<13>1 2019-08-30T03:19:20.422770+00:00 my_nsx-intel pace-monitor.sh - - -          ],
<13>1 2019-08-30T03:19:20.423012+00:00 my_nsx-intel pace-monitor.sh - - -          "status": "Following NSX Intelligence first boot, services are either PENDING or FAILED - Token-Registration"
<13>1 2019-08-30T03:19:20.423354+00:00 my_nsx-intel pace-monitor.sh - - -        }
<13>1 2019-08-30T03:19:20.423601+00:00 my_nsx-intel pace-monitor.sh - - -      }
<13>1 2019-08-30T03:19:20.423882+00:00 my_nsx-intel pace-monitor.sh - - -    }
<13>1 2019-08-30T03:19:20.424339+00:00 my_nsx-intel pace-monitor.sh - - -  }
<13>1 2019-08-30T03:19:20.972629+00:00 my_nsx-intel pace-monitor.sh - - -  NSX Intelligence health OK.
<30>1 2019-08-30T03:19:20.973076+00:00 my_nsx-intel pace-monitor 20804 - -  <13>Aug 30 03:19:19 pace-monitor.sh: NSX Intelligence health OK.
<182>1 2019-08-30T03:23:23.857Z my_nsx-intel NSX 21752 - [nsx@6876 comp="nsx-cli" subcomp="node-mgmt" username="admin" level="INFO"] CMD: get log-file syslog | find pace-monitor

If there is a problem with one of the services, you might see the following line when you run get log-file syslog | find pace-monitor.

NSX Intelligence health DEGRADED. Return code not HTTP OK.

If you encounter one of the following outputs, restart the service using the restart service service-name command.
- After running the get services command, one of the services shows Service state: stopped or Service health: degraded.
- After running the get log-file syslog | find pace-monitor command, the output shows something similar to the PACE health DEGRADED. Return code not HTTP OK. message.
For example, if the postgres service's state shows it is stopped, or if its state is running, but it has a degraded service health, run the following command.
```
restart service postgres
```
Important: You must use the restart service service-name command to restart NSX Intelligence services. If you decide to use the stop service service-name and start service service-name commands instead, you have to also manually restart each of the services that depend on service-name. The following list shows the dependency order in which the NSX Intelligence services have to be restarted.
```
zookeeper > druid > kafka > spark > spark-job-scheduler > nsx-config > processing > pace-server 
```
For example, if the nsx-config service is stopped and then started using the stop|start service service-name command, you must also use the restart service service-name command to restart the processing and pace-server services.
When a service restarts, other services that depend on it might briefly go into a degraded state. If no errors occur, those degraded services return to a stable state.