如果 NSX Intelligence 设备停止响应,请检查 NSX Intelligence 服务的状态。

问题

NSX Intelligence 设备已停止响应,或者您收到一条错误消息,指示设备无法正常工作。

原因

一个或多个底层 NSX Intelligence 服务可能已停止或未处于正常运行状态。

解决方案

  1. 使用具有企业管理员角色的帐户登录到 NSX Intelligence 设备 CLI 主机。
  2. 使用 get services 命令检查 NSX Intelligence 服务的状态。
    如果所有 NSX Intelligence 服务正常工作,则会看到类似于以下示例的输出。
    my_nsx-intel> get services
    Service name:                  druid
    Service state:                 running
    Coordinator health:            good
    Broker health:                 good
    Historical health:             good
    Overlord health:               good
    MiddleManager health:          good
    
    Service name:                  http
    Service state:                 running
    Session timeout:               1800
    Connection timeout:            30
    Redirect host:                 (not configured)
    Client API rate limit:         100 requests/sec
    Client API concurrency limit:  40
    Global API concurrency limit:  199
    
    Service name:                  kafka
    Service state:                 running
    Service health:                good
    
    Service name:                  liagent
    Service state:                 stopped
    
    Service name:                  mgmt-plane-bus
    Service state:                 stopped
    
    Service name:                  node-mgmt
    Service state:                 running
    
    Service name:                  nsx-config
    Service state:                 running
    
    Service name:                  nsx-message-bus
    Service state:                 stopped
    
    Service name:                  nsx-upgrade-agent
    Service state:                 running
    
    Service name:                  ntp
    Service state:                 running
    Start on boot:                 True
    
    Service name:                  pace-server
    Service state:                 running
    
    Service name:                  postgres
    Service state:                 running
    Service health:                good
    
    Service name:                  processing
    Service state:                 running
    
    Service name:                  snmp
    Service state:                 stopped
    Start on boot:                 False
    
    Service name:                  spark
    Service state:                 running
    Service health:                good
    
    Service name:                  spark-job-scheduler
    Service state:                 running
    
    Service name:                  ssh
    Service state:                 running
    Start on boot:                 True
    
    Service name:                  syslog
    Service state:                 running
    
    Service name:                  ui-service
    Service state:                 running
    
    Service name:                  zookeeper
    Service state:                 running
    Service health:                good
    
    my_nsx-intel>

    服务状态可能是 runningstopped。服务运行状况可能是 gooddegraded

  3. 您还可以查看 syslog 文件,并搜索将 NSX Intelligence 服务运行状况记录到 syslog 文件的 pace-monitor.sh 运行状况检查脚本的输出。
    如果所有服务正常工作,在运行 get log-file syslog | find pace-monitor 命令后,看到的输出类似于以下示例输出。
    my_nsx-intel> get log-file syslog | find pace-monitor
    <13>1 2019-08-30T03:19:20.409899+00:00 my_nsx-intel pace-monitor.sh - - -    "_self": {
    <13>1 2019-08-30T03:19:20.410253+00:00 my_nsx-intel pace-monitor.sh - - -      "href": "/node/pace/appliance-health",
    <13>1 2019-08-30T03:19:20.410623+00:00 my_nsx-intel pace-monitor.sh - - -      "rel": "self"
    <13>1 2019-08-30T03:19:20.410908+00:00 my_nsx-intel pace-monitor.sh - - -    },
    <13>1 2019-08-30T03:19:20.411162+00:00 my_nsx-intel pace-monitor.sh - - -    "appliance-health": {
    <13>1 2019-08-30T03:19:20.411416+00:00 my_nsx-intel pace-monitor.sh - - -      "status": "Following NSX Intelligence first boot services are either PENDING or FAILED - Token-Registration",
    <13>1 2019-08-30T03:19:20.411668+00:00 my_nsx-intel pace-monitor.sh - - -      "sub-system-status": {
    <13>1 2019-08-30T03:19:20.411923+00:00 my_nsx-intel pace-monitor.sh - - -        "app-services": {
    <13>1 2019-08-30T03:19:20.412280+00:00 my_nsx-intel pace-monitor.sh - - -          "services": [],
    <13>1 2019-08-30T03:19:20.412528+00:00 my_nsx-intel pace-monitor.sh - - -          "status": ""
    <13>1 2019-08-30T03:19:20.412807+00:00 my_nsx-intel pace-monitor.sh - - -        },
    <13>1 2019-08-30T03:19:20.413075+00:00 my_nsx-intel pace-monitor.sh - - -        "base-infra-services": {
    <13>1 2019-08-30T03:19:20.413303+00:00 my_nsx-intel pace-monitor.sh - - -          "services": [
    <13>1 2019-08-30T03:19:20.413613+00:00 my_nsx-intel pace-monitor.sh - - -            {
    <13>1 2019-08-30T03:19:20.413848+00:00 my_nsx-intel pace-monitor.sh - - -              "druid-health": {
    <13>1 2019-08-30T03:19:20.414146+00:00 my_nsx-intel pace-monitor.sh - - -                "broker": "good",
    <13>1 2019-08-30T03:19:20.414473+00:00 my_nsx-intel pace-monitor.sh - - -                "coordinator": "good",
    <13>1 2019-08-30T03:19:20.414717+00:00 my_nsx-intel pace-monitor.sh - - -                "historical": "good",
    <13>1 2019-08-30T03:19:20.414979+00:00 my_nsx-intel pace-monitor.sh - - -                "middlemanager": "good",
    <13>1 2019-08-30T03:19:20.415295+00:00 my_nsx-intel pace-monitor.sh - - -                "overlord": "good"
    <13>1 2019-08-30T03:19:20.415533+00:00 my_nsx-intel pace-monitor.sh - - -              },
    <13>1 2019-08-30T03:19:20.415762+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "druid"
    <13>1 2019-08-30T03:19:20.415982+00:00 my_nsx-intel pace-monitor.sh - - -            },
    <13>1 2019-08-30T03:19:20.416269+00:00 my_nsx-intel pace-monitor.sh - - -            {
    <13>1 2019-08-30T03:19:20.416539+00:00 my_nsx-intel pace-monitor.sh - - -              "health": "good",
    <13>1 2019-08-30T03:19:20.416772+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "kafka"
    <13>1 2019-08-30T03:19:20.416991+00:00 my_nsx-intel pace-monitor.sh - - -            },
    <13>1 2019-08-30T03:19:20.417204+00:00 my_nsx-intel pace-monitor.sh - - -            {
    <13>1 2019-08-30T03:19:20.417510+00:00 my_nsx-intel pace-monitor.sh - - -              "health": "good",
    <13>1 2019-08-30T03:19:20.417745+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "postgres"
    <13>1 2019-08-30T03:19:20.418133+00:00 my_nsx-intel pace-monitor.sh - - -            },
    <13>1 2019-08-30T03:19:20.418389+00:00 my_nsx-intel pace-monitor.sh - - -            {
    <13>1 2019-08-30T03:19:20.418626+00:00 my_nsx-intel pace-monitor.sh - - -              "health": "good",
    <13>1 2019-08-30T03:19:20.418855+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "spark"
    <13>1 2019-08-30T03:19:20.419157+00:00 my_nsx-intel pace-monitor.sh - - -            },
    <13>1 2019-08-30T03:19:20.419435+00:00 my_nsx-intel pace-monitor.sh - - -            {
    <13>1 2019-08-30T03:19:20.419684+00:00 my_nsx-intel pace-monitor.sh - - -              "health": "good",
    <13>1 2019-08-30T03:19:20.419928+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "zookeeper"
    <13>1 2019-08-30T03:19:20.420165+00:00 my_nsx-intel pace-monitor.sh - - -            }
    <13>1 2019-08-30T03:19:20.420496+00:00 my_nsx-intel pace-monitor.sh - - -          ],
    <13>1 2019-08-30T03:19:20.420786+00:00 my_nsx-intel pace-monitor.sh - - -          "status": ""
    <13>1 2019-08-30T03:19:20.421022+00:00 my_nsx-intel pace-monitor.sh - - -        },
    <13>1 2019-08-30T03:19:20.421255+00:00 my_nsx-intel pace-monitor.sh - - -        "first-boot-services": {
    <13>1 2019-08-30T03:19:20.421539+00:00 my_nsx-intel pace-monitor.sh - - -          "services": [
    <13>1 2019-08-30T03:19:20.421777+00:00 my_nsx-intel pace-monitor.sh - - -            {
    <13>1 2019-08-30T03:19:20.422010+00:00 my_nsx-intel pace-monitor.sh - - -              "health": "degraded",
    <13>1 2019-08-30T03:19:20.422277+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "token-registration"
    <13>1 2019-08-30T03:19:20.422512+00:00 my_nsx-intel pace-monitor.sh - - -            }
    <13>1 2019-08-30T03:19:20.422770+00:00 my_nsx-intel pace-monitor.sh - - -          ],
    <13>1 2019-08-30T03:19:20.423012+00:00 my_nsx-intel pace-monitor.sh - - -          "status": "Following NSX Intelligence first boot, services are either PENDING or FAILED - Token-Registration"
    <13>1 2019-08-30T03:19:20.423354+00:00 my_nsx-intel pace-monitor.sh - - -        }
    <13>1 2019-08-30T03:19:20.423601+00:00 my_nsx-intel pace-monitor.sh - - -      }
    <13>1 2019-08-30T03:19:20.423882+00:00 my_nsx-intel pace-monitor.sh - - -    }
    <13>1 2019-08-30T03:19:20.424339+00:00 my_nsx-intel pace-monitor.sh - - -  }
    <13>1 2019-08-30T03:19:20.972629+00:00 my_nsx-intel pace-monitor.sh - - -  NSX Intelligence health OK.
    <30>1 2019-08-30T03:19:20.973076+00:00 my_nsx-intel pace-monitor 20804 - -  <13>Aug 30 03:19:19 pace-monitor.sh: NSX Intelligence health OK.
    <182>1 2019-08-30T03:23:23.857Z my_nsx-intel NSX 21752 - [nsx@6876 comp="nsx-cli" subcomp="node-mgmt" username="admin" level="INFO"] CMD: get log-file syslog | find pace-monitor
    
    如果其中的一个服务出现问题,则可能会在运行 get log-file syslog | find pace-monitor 时看到以下行。
    NSX Intelligence health DEGRADED. Return code not HTTP OK.
  4. 如果遇到以下输出之一,请使用 restart service service-name 命令重新启动该服务。
    • 在运行 get services 命令后,其中的一个服务显示 Service state: stoppedService health: degraded
    • 在运行 get log-file syslog | find pace-monitor 命令后,输出将显示类似于 PACE health DEGRADED.Return code not HTTP OK. 的消息。
    例如,如果 postgres 服务的状态显示为 stopped,或者其状态为 running 但服务运行状况为 degraded,请运行以下命令。
    restart service postgres
    重要说明: 您必须使用 restart service service-name 命令重新启动 NSX Intelligence 服务。如果您决定同时使用 stop service service-namestart service service-name 命令,则还必须手动重新启动依赖于 service-name 的每项服务。以下列表显示了重新启动 NSX Intelligence 服务时必须遵循的依赖关系顺序。
    zookeeper > druid > kafka > spark > spark-job-scheduler > nsx-config > processing > pace-server 
    例如,如果在 nsx-config 服务停止后又使用 stop|start service service-name 命令将其启动,则还必须使用 restart service service-name 命令重新启动 processingpace-server 服务。

    当服务重新启动时,依赖该服务的其他服务可能会暂时进入降级状态。如果未发生任何错误,则这些降级服务将恢复到稳定状态。