检查 NSX Intelligence 设备的状态

如果 NSX Intelligence 设备停止响应，请检查 NSX Intelligence 服务的状态。

问题

NSX Intelligence 设备已停止响应，或者您收到一条错误消息，指示设备无法正常工作。

原因

一个或多个底层 NSX Intelligence 服务可能已停止或未处于正常运行状态。

解决方案

使用具有企业管理员角色的帐户登录到 NSX Intelligence 设备 CLI 主机。

使用 get services 命令检查 NSX Intelligence 服务的状态。

如果所有 NSX Intelligence 服务正常工作，则会看到类似于以下示例的输出。

my_nsx-intel> get services
Service name:                  druid
Service state:                 running
Coordinator health:            good
Broker health:                 good
Historical health:             good
Overlord health:               good
MiddleManager health:          good

Service name:                  http
Service state:                 running
Session timeout:               1800
Connection timeout:            30
Redirect host:                 (not configured)
Client API rate limit:         100 requests/sec
Client API concurrency limit:  40
Global API concurrency limit:  199

Service name:                  kafka
Service state:                 running
Service health:                good

Service name:                  liagent
Service state:                 stopped

Service name:                  mgmt-plane-bus
Service state:                 stopped

Service name:                  node-mgmt
Service state:                 running

Service name:                  nsx-config
Service state:                 running

Service name:                  nsx-message-bus
Service state:                 stopped

Service name:                  nsx-upgrade-agent
Service state:                 running

Service name:                  ntp
Service state:                 running
Start on boot:                 True

Service name:                  pace-server
Service state:                 running

Service name:                  postgres
Service state:                 running
Service health:                good

Service name:                  processing
Service state:                 running

Service name:                  snmp
Service state:                 stopped
Start on boot:                 False

Service name:                  spark
Service state:                 running
Service health:                good

Service name:                  spark-job-scheduler
Service state:                 running

Service name:                  ssh
Service state:                 running
Start on boot:                 True

Service name:                  syslog
Service state:                 running

Service name:                  ui-service
Service state:                 running

Service name:                  zookeeper
Service state:                 running
Service health:                good

my_nsx-intel>

服务状态可能是 running 或 stopped。服务运行状况可能是 good 或 degraded。

您还可以查看 syslog 文件，并搜索将 NSX Intelligence 服务运行状况记录到 syslog 文件的 pace-monitor.sh 运行状况检查脚本的输出。

如果所有服务正常工作，在运行 get log-file syslog | find pace-monitor 命令后，看到的输出类似于以下示例输出。

my_nsx-intel> get log-file syslog | find pace-monitor
<13>1 2019-08-30T03:19:20.409899+00:00 my_nsx-intel pace-monitor.sh - - -    "_self": {
<13>1 2019-08-30T03:19:20.410253+00:00 my_nsx-intel pace-monitor.sh - - -      "href": "/node/pace/appliance-health",
<13>1 2019-08-30T03:19:20.410623+00:00 my_nsx-intel pace-monitor.sh - - -      "rel": "self"
<13>1 2019-08-30T03:19:20.410908+00:00 my_nsx-intel pace-monitor.sh - - -    },
<13>1 2019-08-30T03:19:20.411162+00:00 my_nsx-intel pace-monitor.sh - - -    "appliance-health": {
<13>1 2019-08-30T03:19:20.411416+00:00 my_nsx-intel pace-monitor.sh - - -      "status": "Following NSX Intelligence first boot services are either PENDING or FAILED - Token-Registration",
<13>1 2019-08-30T03:19:20.411668+00:00 my_nsx-intel pace-monitor.sh - - -      "sub-system-status": {
<13>1 2019-08-30T03:19:20.411923+00:00 my_nsx-intel pace-monitor.sh - - -        "app-services": {
<13>1 2019-08-30T03:19:20.412280+00:00 my_nsx-intel pace-monitor.sh - - -          "services": [],
<13>1 2019-08-30T03:19:20.412528+00:00 my_nsx-intel pace-monitor.sh - - -          "status": ""
<13>1 2019-08-30T03:19:20.412807+00:00 my_nsx-intel pace-monitor.sh - - -        },
<13>1 2019-08-30T03:19:20.413075+00:00 my_nsx-intel pace-monitor.sh - - -        "base-infra-services": {
<13>1 2019-08-30T03:19:20.413303+00:00 my_nsx-intel pace-monitor.sh - - -          "services": [
<13>1 2019-08-30T03:19:20.413613+00:00 my_nsx-intel pace-monitor.sh - - -            {
<13>1 2019-08-30T03:19:20.413848+00:00 my_nsx-intel pace-monitor.sh - - -              "druid-health": {
<13>1 2019-08-30T03:19:20.414146+00:00 my_nsx-intel pace-monitor.sh - - -                "broker": "good",
<13>1 2019-08-30T03:19:20.414473+00:00 my_nsx-intel pace-monitor.sh - - -                "coordinator": "good",
<13>1 2019-08-30T03:19:20.414717+00:00 my_nsx-intel pace-monitor.sh - - -                "historical": "good",
<13>1 2019-08-30T03:19:20.414979+00:00 my_nsx-intel pace-monitor.sh - - -                "middlemanager": "good",
<13>1 2019-08-30T03:19:20.415295+00:00 my_nsx-intel pace-monitor.sh - - -                "overlord": "good"
<13>1 2019-08-30T03:19:20.415533+00:00 my_nsx-intel pace-monitor.sh - - -              },
<13>1 2019-08-30T03:19:20.415762+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "druid"
<13>1 2019-08-30T03:19:20.415982+00:00 my_nsx-intel pace-monitor.sh - - -            },
<13>1 2019-08-30T03:19:20.416269+00:00 my_nsx-intel pace-monitor.sh - - -            {
<13>1 2019-08-30T03:19:20.416539+00:00 my_nsx-intel pace-monitor.sh - - -              "health": "good",
<13>1 2019-08-30T03:19:20.416772+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "kafka"
<13>1 2019-08-30T03:19:20.416991+00:00 my_nsx-intel pace-monitor.sh - - -            },
<13>1 2019-08-30T03:19:20.417204+00:00 my_nsx-intel pace-monitor.sh - - -            {
<13>1 2019-08-30T03:19:20.417510+00:00 my_nsx-intel pace-monitor.sh - - -              "health": "good",
<13>1 2019-08-30T03:19:20.417745+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "postgres"
<13>1 2019-08-30T03:19:20.418133+00:00 my_nsx-intel pace-monitor.sh - - -            },
<13>1 2019-08-30T03:19:20.418389+00:00 my_nsx-intel pace-monitor.sh - - -            {
<13>1 2019-08-30T03:19:20.418626+00:00 my_nsx-intel pace-monitor.sh - - -              "health": "good",
<13>1 2019-08-30T03:19:20.418855+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "spark"
<13>1 2019-08-30T03:19:20.419157+00:00 my_nsx-intel pace-monitor.sh - - -            },
<13>1 2019-08-30T03:19:20.419435+00:00 my_nsx-intel pace-monitor.sh - - -            {
<13>1 2019-08-30T03:19:20.419684+00:00 my_nsx-intel pace-monitor.sh - - -              "health": "good",
<13>1 2019-08-30T03:19:20.419928+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "zookeeper"
<13>1 2019-08-30T03:19:20.420165+00:00 my_nsx-intel pace-monitor.sh - - -            }
<13>1 2019-08-30T03:19:20.420496+00:00 my_nsx-intel pace-monitor.sh - - -          ],
<13>1 2019-08-30T03:19:20.420786+00:00 my_nsx-intel pace-monitor.sh - - -          "status": ""
<13>1 2019-08-30T03:19:20.421022+00:00 my_nsx-intel pace-monitor.sh - - -        },
<13>1 2019-08-30T03:19:20.421255+00:00 my_nsx-intel pace-monitor.sh - - -        "first-boot-services": {
<13>1 2019-08-30T03:19:20.421539+00:00 my_nsx-intel pace-monitor.sh - - -          "services": [
<13>1 2019-08-30T03:19:20.421777+00:00 my_nsx-intel pace-monitor.sh - - -            {
<13>1 2019-08-30T03:19:20.422010+00:00 my_nsx-intel pace-monitor.sh - - -              "health": "degraded",
<13>1 2019-08-30T03:19:20.422277+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "token-registration"
<13>1 2019-08-30T03:19:20.422512+00:00 my_nsx-intel pace-monitor.sh - - -            }
<13>1 2019-08-30T03:19:20.422770+00:00 my_nsx-intel pace-monitor.sh - - -          ],
<13>1 2019-08-30T03:19:20.423012+00:00 my_nsx-intel pace-monitor.sh - - -          "status": "Following NSX Intelligence first boot, services are either PENDING or FAILED - Token-Registration"
<13>1 2019-08-30T03:19:20.423354+00:00 my_nsx-intel pace-monitor.sh - - -        }
<13>1 2019-08-30T03:19:20.423601+00:00 my_nsx-intel pace-monitor.sh - - -      }
<13>1 2019-08-30T03:19:20.423882+00:00 my_nsx-intel pace-monitor.sh - - -    }
<13>1 2019-08-30T03:19:20.424339+00:00 my_nsx-intel pace-monitor.sh - - -  }
<13>1 2019-08-30T03:19:20.972629+00:00 my_nsx-intel pace-monitor.sh - - -  NSX Intelligence health OK.
<30>1 2019-08-30T03:19:20.973076+00:00 my_nsx-intel pace-monitor 20804 - -  <13>Aug 30 03:19:19 pace-monitor.sh: NSX Intelligence health OK.
<182>1 2019-08-30T03:23:23.857Z my_nsx-intel NSX 21752 - [nsx@6876 comp="nsx-cli" subcomp="node-mgmt" username="admin" level="INFO"] CMD: get log-file syslog | find pace-monitor

如果其中的一个服务出现问题，则可能会在运行 get log-file syslog | find pace-monitor 时看到以下行。

NSX Intelligence health DEGRADED. Return code not HTTP OK.

如果遇到以下输出之一，请使用 restart service service-name 命令重新启动该服务。
- 在运行 get services 命令后，其中的一个服务显示 Service state: stopped 或 Service health: degraded。
- 在运行 get log-file syslog | find pace-monitor 命令后，输出将显示类似于 PACE health DEGRADED.Return code not HTTP OK. 的消息。
例如，如果 postgres 服务的状态显示为 stopped，或者其状态为 running 但服务运行状况为 degraded，请运行以下命令。
```
restart service postgres
```
重要说明：您必须使用 restart service service-name 命令重新启动 NSX Intelligence 服务。如果您决定同时使用 stop service service-name 和 start service service-name 命令，则还必须手动重新启动依赖于 service-name 的每项服务。以下列表显示了重新启动 NSX Intelligence 服务时必须遵循的依赖关系顺序。
```
zookeeper > druid > kafka > spark > spark-job-scheduler > nsx-config > processing > pace-server 
```
例如，如果在 nsx-config 服务停止后又使用 stop|start service service-name 命令将其启动，则还必须使用 restart service service-name 命令重新启动 processing 和 pace-server 服务。
当服务重新启动时，依赖该服务的其他服务可能会暂时进入降级状态。如果未发生任何错误，则这些降级服务将恢复到稳定状态。