如果 NSX Intelligence 應用裝置沒有回應,請檢查 NSX Intelligence 服務的狀態。

問題

NSX Intelligence 應用裝置沒有回應,或者您收到錯誤訊息,指出應用裝置未如預期運作。

原因

一或多個基礎 NSX Intelligence 服務可能已停止或未處於健全狀態。

解決方案

  1. 使用具有企業管理員角色的帳戶登入 NSX Intelligence 應用裝置 CLI 主機。
  2. 使用 get services 命令檢查 NSX Intelligence 服務的狀態。
    如果所有 NSX Intelligence 服務皆正常運作,您會看到類似下列範例的輸出。
    my_nsx-intel> get services
    Service name:                  druid
    Service state:                 running
    Coordinator health:            good
    Broker health:                 good
    Historical health:             good
    Overlord health:               good
    MiddleManager health:          good
    
    Service name:                  http
    Service state:                 running
    Session timeout:               1800
    Connection timeout:            30
    Redirect host:                 (not configured)
    Client API rate limit:         100 requests/sec
    Client API concurrency limit:  40
    Global API concurrency limit:  199
    
    Service name:                  kafka
    Service state:                 running
    Service health:                good
    
    Service name:                  liagent
    Service state:                 stopped
    
    Service name:                  mgmt-plane-bus
    Service state:                 stopped
    
    Service name:                  node-mgmt
    Service state:                 running
    
    Service name:                  nsx-config
    Service state:                 running
    
    Service name:                  nsx-message-bus
    Service state:                 stopped
    
    Service name:                  nsx-upgrade-agent
    Service state:                 running
    
    Service name:                  ntp
    Service state:                 running
    Start on boot:                 True
    
    Service name:                  pace-server
    Service state:                 running
    
    Service name:                  postgres
    Service state:                 running
    Service health:                good
    
    Service name:                  processing
    Service state:                 running
    
    Service name:                  snmp
    Service state:                 stopped
    Start on boot:                 False
    
    Service name:                  spark
    Service state:                 running
    Service health:                good
    
    Service name:                  spark-job-scheduler
    Service state:                 running
    
    Service name:                  ssh
    Service state:                 running
    Start on boot:                 True
    
    Service name:                  syslog
    Service state:                 running
    
    Service name:                  ui-service
    Service state:                 running
    
    Service name:                  zookeeper
    Service state:                 running
    Service health:                good
    
    my_nsx-intel>

    服務狀態可能是執行已停止。服務健全狀況可能是良好已降級

  3. 您也可以檢視 syslog 檔案,並搜尋 pace-monitor.sh 健全狀況檢查指令碼的輸出,該指令碼會將 NSX Intelligence 服務的健全狀況記錄到 syslog 檔案。
    如果所有服務皆如預期運作,則在執行 get log-file syslog | find pace-monitor 命令後,您會看到類似下列範例輸出的輸出。
    my_nsx-intel> get log-file syslog | find pace-monitor
    <13>1 2019-08-30T03:19:20.409899+00:00 my_nsx-intel pace-monitor.sh - - -    "_self": {
    <13>1 2019-08-30T03:19:20.410253+00:00 my_nsx-intel pace-monitor.sh - - -      "href": "/node/pace/appliance-health",
    <13>1 2019-08-30T03:19:20.410623+00:00 my_nsx-intel pace-monitor.sh - - -      "rel": "self"
    <13>1 2019-08-30T03:19:20.410908+00:00 my_nsx-intel pace-monitor.sh - - -    },
    <13>1 2019-08-30T03:19:20.411162+00:00 my_nsx-intel pace-monitor.sh - - -    "appliance-health": {
    <13>1 2019-08-30T03:19:20.411416+00:00 my_nsx-intel pace-monitor.sh - - -      "status": "Following NSX Intelligence first boot services are either PENDING or FAILED - Token-Registration",
    <13>1 2019-08-30T03:19:20.411668+00:00 my_nsx-intel pace-monitor.sh - - -      "sub-system-status": {
    <13>1 2019-08-30T03:19:20.411923+00:00 my_nsx-intel pace-monitor.sh - - -        "app-services": {
    <13>1 2019-08-30T03:19:20.412280+00:00 my_nsx-intel pace-monitor.sh - - -          "services": [],
    <13>1 2019-08-30T03:19:20.412528+00:00 my_nsx-intel pace-monitor.sh - - -          "status": ""
    <13>1 2019-08-30T03:19:20.412807+00:00 my_nsx-intel pace-monitor.sh - - -        },
    <13>1 2019-08-30T03:19:20.413075+00:00 my_nsx-intel pace-monitor.sh - - -        "base-infra-services": {
    <13>1 2019-08-30T03:19:20.413303+00:00 my_nsx-intel pace-monitor.sh - - -          "services": [
    <13>1 2019-08-30T03:19:20.413613+00:00 my_nsx-intel pace-monitor.sh - - -            {
    <13>1 2019-08-30T03:19:20.413848+00:00 my_nsx-intel pace-monitor.sh - - -              "druid-health": {
    <13>1 2019-08-30T03:19:20.414146+00:00 my_nsx-intel pace-monitor.sh - - -                "broker": "good",
    <13>1 2019-08-30T03:19:20.414473+00:00 my_nsx-intel pace-monitor.sh - - -                "coordinator": "good",
    <13>1 2019-08-30T03:19:20.414717+00:00 my_nsx-intel pace-monitor.sh - - -                "historical": "good",
    <13>1 2019-08-30T03:19:20.414979+00:00 my_nsx-intel pace-monitor.sh - - -                "middlemanager": "good",
    <13>1 2019-08-30T03:19:20.415295+00:00 my_nsx-intel pace-monitor.sh - - -                "overlord": "good"
    <13>1 2019-08-30T03:19:20.415533+00:00 my_nsx-intel pace-monitor.sh - - -              },
    <13>1 2019-08-30T03:19:20.415762+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "druid"
    <13>1 2019-08-30T03:19:20.415982+00:00 my_nsx-intel pace-monitor.sh - - -            },
    <13>1 2019-08-30T03:19:20.416269+00:00 my_nsx-intel pace-monitor.sh - - -            {
    <13>1 2019-08-30T03:19:20.416539+00:00 my_nsx-intel pace-monitor.sh - - -              "health": "good",
    <13>1 2019-08-30T03:19:20.416772+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "kafka"
    <13>1 2019-08-30T03:19:20.416991+00:00 my_nsx-intel pace-monitor.sh - - -            },
    <13>1 2019-08-30T03:19:20.417204+00:00 my_nsx-intel pace-monitor.sh - - -            {
    <13>1 2019-08-30T03:19:20.417510+00:00 my_nsx-intel pace-monitor.sh - - -              "health": "good",
    <13>1 2019-08-30T03:19:20.417745+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "postgres"
    <13>1 2019-08-30T03:19:20.418133+00:00 my_nsx-intel pace-monitor.sh - - -            },
    <13>1 2019-08-30T03:19:20.418389+00:00 my_nsx-intel pace-monitor.sh - - -            {
    <13>1 2019-08-30T03:19:20.418626+00:00 my_nsx-intel pace-monitor.sh - - -              "health": "good",
    <13>1 2019-08-30T03:19:20.418855+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "spark"
    <13>1 2019-08-30T03:19:20.419157+00:00 my_nsx-intel pace-monitor.sh - - -            },
    <13>1 2019-08-30T03:19:20.419435+00:00 my_nsx-intel pace-monitor.sh - - -            {
    <13>1 2019-08-30T03:19:20.419684+00:00 my_nsx-intel pace-monitor.sh - - -              "health": "good",
    <13>1 2019-08-30T03:19:20.419928+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "zookeeper"
    <13>1 2019-08-30T03:19:20.420165+00:00 my_nsx-intel pace-monitor.sh - - -            }
    <13>1 2019-08-30T03:19:20.420496+00:00 my_nsx-intel pace-monitor.sh - - -          ],
    <13>1 2019-08-30T03:19:20.420786+00:00 my_nsx-intel pace-monitor.sh - - -          "status": ""
    <13>1 2019-08-30T03:19:20.421022+00:00 my_nsx-intel pace-monitor.sh - - -        },
    <13>1 2019-08-30T03:19:20.421255+00:00 my_nsx-intel pace-monitor.sh - - -        "first-boot-services": {
    <13>1 2019-08-30T03:19:20.421539+00:00 my_nsx-intel pace-monitor.sh - - -          "services": [
    <13>1 2019-08-30T03:19:20.421777+00:00 my_nsx-intel pace-monitor.sh - - -            {
    <13>1 2019-08-30T03:19:20.422010+00:00 my_nsx-intel pace-monitor.sh - - -              "health": "degraded",
    <13>1 2019-08-30T03:19:20.422277+00:00 my_nsx-intel pace-monitor.sh - - -              "service-name": "token-registration"
    <13>1 2019-08-30T03:19:20.422512+00:00 my_nsx-intel pace-monitor.sh - - -            }
    <13>1 2019-08-30T03:19:20.422770+00:00 my_nsx-intel pace-monitor.sh - - -          ],
    <13>1 2019-08-30T03:19:20.423012+00:00 my_nsx-intel pace-monitor.sh - - -          "status": "Following NSX Intelligence first boot, services are either PENDING or FAILED - Token-Registration"
    <13>1 2019-08-30T03:19:20.423354+00:00 my_nsx-intel pace-monitor.sh - - -        }
    <13>1 2019-08-30T03:19:20.423601+00:00 my_nsx-intel pace-monitor.sh - - -      }
    <13>1 2019-08-30T03:19:20.423882+00:00 my_nsx-intel pace-monitor.sh - - -    }
    <13>1 2019-08-30T03:19:20.424339+00:00 my_nsx-intel pace-monitor.sh - - -  }
    <13>1 2019-08-30T03:19:20.972629+00:00 my_nsx-intel pace-monitor.sh - - -  NSX Intelligence health OK.
    <30>1 2019-08-30T03:19:20.973076+00:00 my_nsx-intel pace-monitor 20804 - -  <13>Aug 30 03:19:19 pace-monitor.sh: NSX Intelligence health OK.
    <182>1 2019-08-30T03:23:23.857Z my_nsx-intel NSX 21752 - [nsx@6876 comp="nsx-cli" subcomp="node-mgmt" username="admin" level="INFO"] CMD: get log-file syslog | find pace-monitor
    
    如果其中一個服務發生問題,當您執行 get log-file syslog | grep pace-monitor 時,可能會看到下列行。
    NSX Intelligence health DEGRADED. Return code not HTTP OK.
  4. 如果遇到下列其中一個輸出,請使用 restart service service-name 命令重新啟動服務。
    • 執行 get services 命令後,其中一個服務會顯示服務狀態: 已停止服務健全狀況: 已降級
    • 執行 get log-file syslog | grep pace-monitor 命令後,輸出會顯示如下的訊息:PACE 健全狀況為「已降級」。傳回碼並非 HTTP OK。 訊息。
    例如,如果 postgres 服務的狀態顯示為 已停止,或者其狀態為 執行中,但其狀態為 已降級服務健全狀況,請執行下列命令。
    restart service postgres
    重要: 您必須使用 restart service service-name 命令重新啟動 NSX Intelligence 服務。如果您決定改用 stop service service-namestart service service-name 命令,您也必須手動重新啟動每個依存於 service-name 的服務。下列清單顯示必須重新啟動 NSX Intelligence 服務的相依性順序。
    zookeeper > druid > kafka > spark > spark-job-scheduler > nsx-config > processing > pace-server 
    例如,如果在 nsx-config 服務停止後使用 stop|start service service-name 命令加以啟動,您也必須使用 restart service service-name 命令重新啟動 processingpace-server 服務。

    此外,如果您使用 restart service service-name 命令在 spark-job-scheduler 服務之前重新啟動相依性順序清單中顯示的任何服務,您也必須使用 restart service spark-job-scheduler 命令手動重新啟動 spark-job-scheduler 服務。若未執行此動作,將會導致 spark-job-scheduler 服務進入錯誤狀態。