如果 NSX Intelligence 设备停止响应,请检查 NSX Intelligence 服务的状态。
问题
NSX Intelligence 设备已停止响应,或者您收到一条错误消息,指示设备无法正常工作。
原因
一个或多个底层 NSX Intelligence 服务可能已停止或未处于正常运行状态。
解决方案
- 使用具有企业管理员角色的帐户登录到 NSX Intelligence 设备 CLI 主机。
- 使用 get services 命令检查 NSX Intelligence 服务的状态。
如果所有 NSX Intelligence 服务正常工作,则会看到类似于以下示例的输出。
my_nsx-intel> get services Service name: druid Service state: running Coordinator health: good Broker health: good Historical health: good Overlord health: good MiddleManager health: good Service name: http Service state: running Session timeout: 1800 Connection timeout: 30 Redirect host: (not configured) Client API rate limit: 100 requests/sec Client API concurrency limit: 40 Global API concurrency limit: 199 Service name: kafka Service state: running Service health: good Service name: liagent Service state: stopped Service name: mgmt-plane-bus Service state: stopped Service name: node-mgmt Service state: running Service name: nsx-config Service state: running Service name: nsx-message-bus Service state: stopped Service name: nsx-upgrade-agent Service state: running Service name: ntp Service state: running Start on boot: True Service name: pace-server Service state: running Service name: postgres Service state: running Service health: good Service name: processing Service state: running Service name: snmp Service state: stopped Start on boot: False Service name: spark Service state: running Service health: good Service name: spark-job-scheduler Service state: running Service name: ssh Service state: running Start on boot: True Service name: syslog Service state: running Service name: ui-service Service state: running Service name: zookeeper Service state: running Service health: good my_nsx-intel>
服务状态可能是 running 或 stopped。服务运行状况可能是 good 或 degraded。
- 您还可以查看 syslog 文件,并搜索将 NSX Intelligence 服务运行状况记录到 syslog 文件的 pace-monitor.sh 运行状况检查脚本的输出。
如果所有服务正常工作,在运行 get log-file syslog | find pace-monitor 命令后,看到的输出类似于以下示例输出。
my_nsx-intel> get log-file syslog | find pace-monitor <13>1 2019-08-30T03:19:20.409899+00:00 my_nsx-intel pace-monitor.sh - - - "_self": { <13>1 2019-08-30T03:19:20.410253+00:00 my_nsx-intel pace-monitor.sh - - - "href": "/node/pace/appliance-health", <13>1 2019-08-30T03:19:20.410623+00:00 my_nsx-intel pace-monitor.sh - - - "rel": "self" <13>1 2019-08-30T03:19:20.410908+00:00 my_nsx-intel pace-monitor.sh - - - }, <13>1 2019-08-30T03:19:20.411162+00:00 my_nsx-intel pace-monitor.sh - - - "appliance-health": { <13>1 2019-08-30T03:19:20.411416+00:00 my_nsx-intel pace-monitor.sh - - - "status": "Following NSX Intelligence first boot services are either PENDING or FAILED - Token-Registration", <13>1 2019-08-30T03:19:20.411668+00:00 my_nsx-intel pace-monitor.sh - - - "sub-system-status": { <13>1 2019-08-30T03:19:20.411923+00:00 my_nsx-intel pace-monitor.sh - - - "app-services": { <13>1 2019-08-30T03:19:20.412280+00:00 my_nsx-intel pace-monitor.sh - - - "services": [], <13>1 2019-08-30T03:19:20.412528+00:00 my_nsx-intel pace-monitor.sh - - - "status": "" <13>1 2019-08-30T03:19:20.412807+00:00 my_nsx-intel pace-monitor.sh - - - }, <13>1 2019-08-30T03:19:20.413075+00:00 my_nsx-intel pace-monitor.sh - - - "base-infra-services": { <13>1 2019-08-30T03:19:20.413303+00:00 my_nsx-intel pace-monitor.sh - - - "services": [ <13>1 2019-08-30T03:19:20.413613+00:00 my_nsx-intel pace-monitor.sh - - - { <13>1 2019-08-30T03:19:20.413848+00:00 my_nsx-intel pace-monitor.sh - - - "druid-health": { <13>1 2019-08-30T03:19:20.414146+00:00 my_nsx-intel pace-monitor.sh - - - "broker": "good", <13>1 2019-08-30T03:19:20.414473+00:00 my_nsx-intel pace-monitor.sh - - - "coordinator": "good", <13>1 2019-08-30T03:19:20.414717+00:00 my_nsx-intel pace-monitor.sh - - - "historical": "good", <13>1 2019-08-30T03:19:20.414979+00:00 my_nsx-intel pace-monitor.sh - - - "middlemanager": "good", <13>1 2019-08-30T03:19:20.415295+00:00 my_nsx-intel pace-monitor.sh - - - "overlord": "good" <13>1 2019-08-30T03:19:20.415533+00:00 my_nsx-intel pace-monitor.sh - - - }, <13>1 2019-08-30T03:19:20.415762+00:00 my_nsx-intel pace-monitor.sh - - - "service-name": "druid" <13>1 2019-08-30T03:19:20.415982+00:00 my_nsx-intel pace-monitor.sh - - - }, <13>1 2019-08-30T03:19:20.416269+00:00 my_nsx-intel pace-monitor.sh - - - { <13>1 2019-08-30T03:19:20.416539+00:00 my_nsx-intel pace-monitor.sh - - - "health": "good", <13>1 2019-08-30T03:19:20.416772+00:00 my_nsx-intel pace-monitor.sh - - - "service-name": "kafka" <13>1 2019-08-30T03:19:20.416991+00:00 my_nsx-intel pace-monitor.sh - - - }, <13>1 2019-08-30T03:19:20.417204+00:00 my_nsx-intel pace-monitor.sh - - - { <13>1 2019-08-30T03:19:20.417510+00:00 my_nsx-intel pace-monitor.sh - - - "health": "good", <13>1 2019-08-30T03:19:20.417745+00:00 my_nsx-intel pace-monitor.sh - - - "service-name": "postgres" <13>1 2019-08-30T03:19:20.418133+00:00 my_nsx-intel pace-monitor.sh - - - }, <13>1 2019-08-30T03:19:20.418389+00:00 my_nsx-intel pace-monitor.sh - - - { <13>1 2019-08-30T03:19:20.418626+00:00 my_nsx-intel pace-monitor.sh - - - "health": "good", <13>1 2019-08-30T03:19:20.418855+00:00 my_nsx-intel pace-monitor.sh - - - "service-name": "spark" <13>1 2019-08-30T03:19:20.419157+00:00 my_nsx-intel pace-monitor.sh - - - }, <13>1 2019-08-30T03:19:20.419435+00:00 my_nsx-intel pace-monitor.sh - - - { <13>1 2019-08-30T03:19:20.419684+00:00 my_nsx-intel pace-monitor.sh - - - "health": "good", <13>1 2019-08-30T03:19:20.419928+00:00 my_nsx-intel pace-monitor.sh - - - "service-name": "zookeeper" <13>1 2019-08-30T03:19:20.420165+00:00 my_nsx-intel pace-monitor.sh - - - } <13>1 2019-08-30T03:19:20.420496+00:00 my_nsx-intel pace-monitor.sh - - - ], <13>1 2019-08-30T03:19:20.420786+00:00 my_nsx-intel pace-monitor.sh - - - "status": "" <13>1 2019-08-30T03:19:20.421022+00:00 my_nsx-intel pace-monitor.sh - - - }, <13>1 2019-08-30T03:19:20.421255+00:00 my_nsx-intel pace-monitor.sh - - - "first-boot-services": { <13>1 2019-08-30T03:19:20.421539+00:00 my_nsx-intel pace-monitor.sh - - - "services": [ <13>1 2019-08-30T03:19:20.421777+00:00 my_nsx-intel pace-monitor.sh - - - { <13>1 2019-08-30T03:19:20.422010+00:00 my_nsx-intel pace-monitor.sh - - - "health": "degraded", <13>1 2019-08-30T03:19:20.422277+00:00 my_nsx-intel pace-monitor.sh - - - "service-name": "token-registration" <13>1 2019-08-30T03:19:20.422512+00:00 my_nsx-intel pace-monitor.sh - - - } <13>1 2019-08-30T03:19:20.422770+00:00 my_nsx-intel pace-monitor.sh - - - ], <13>1 2019-08-30T03:19:20.423012+00:00 my_nsx-intel pace-monitor.sh - - - "status": "Following NSX Intelligence first boot, services are either PENDING or FAILED - Token-Registration" <13>1 2019-08-30T03:19:20.423354+00:00 my_nsx-intel pace-monitor.sh - - - } <13>1 2019-08-30T03:19:20.423601+00:00 my_nsx-intel pace-monitor.sh - - - } <13>1 2019-08-30T03:19:20.423882+00:00 my_nsx-intel pace-monitor.sh - - - } <13>1 2019-08-30T03:19:20.424339+00:00 my_nsx-intel pace-monitor.sh - - - } <13>1 2019-08-30T03:19:20.972629+00:00 my_nsx-intel pace-monitor.sh - - - NSX Intelligence health OK. <30>1 2019-08-30T03:19:20.973076+00:00 my_nsx-intel pace-monitor 20804 - - <13>Aug 30 03:19:19 pace-monitor.sh: NSX Intelligence health OK. <182>1 2019-08-30T03:23:23.857Z my_nsx-intel NSX 21752 - [nsx@6876 comp="nsx-cli" subcomp="node-mgmt" username="admin" level="INFO"] CMD: get log-file syslog | find pace-monitor
如果其中的一个服务出现问题,则可能会在运行 get log-file syslog | find pace-monitor 时看到以下行。NSX Intelligence health DEGRADED. Return code not HTTP OK.
- 如果遇到以下输出之一,请使用
restart service service-name
命令重新启动该服务。- 在运行 get services 命令后,其中的一个服务显示 Service state: stopped 或 Service health: degraded。
- 在运行 get log-file syslog | find pace-monitor 命令后,输出将显示类似于 PACE health DEGRADED.Return code not HTTP OK. 的消息。
例如,如果 postgres 服务的状态显示为 stopped,或者其状态为 running 但服务运行状况为 degraded,请运行以下命令。restart service postgres
重要说明: 您必须使用restart service service-name
命令重新启动 NSX Intelligence 服务。如果您决定同时使用stop service service-name
和start service service-name
命令,则还必须手动重新启动依赖于 service-name 的每项服务。以下列表显示了重新启动 NSX Intelligence 服务时必须遵循的依赖关系顺序。zookeeper > druid > kafka > spark > spark-job-scheduler > nsx-config > processing > pace-server
例如,如果在 nsx-config 服务停止后又使用stop|start service service-name
命令将其启动,则还必须使用restart service service-name
命令重新启动 processing 和 pace-server 服务。当服务重新启动时,依赖该服务的其他服务可能会暂时进入降级状态。如果未发生任何错误,则这些降级服务将恢复到稳定状态。