線上診斷系統 (ODS) 功能可在執行階段自動偵錯 NCP。ODS 是透過 NCP 中建置的 Runbook 來實作。

Runbook 包含偵錯程序。Runbook 會產生偵錯報告。請注意,您無法修改預先定義的 Runbook。

您可以執行 CLI 命令來執行以下 Runbook 作業:
  • 叫用 Runbook,以起始執行階段偵錯
  • 檢查 Runbook 狀態
  • 取得偵錯報告

從 NCP 4.1.2 開始,會提供 Runbook NCPPendingPod。此 Runbook 可用來對停滯在擱置狀態的網繭進行偵錯。它會檢查 nsx-node-agent/ncp/hyperbus,以利找到此類問題的根本原因。

執行階段的偵錯步驟

步驟 1:在 NCP 網繭中啟動 sha-agent 容器

依預設,NCP 網繭中只有一個 NCP 容器。當需要執行執行階段偵錯時,您需要先在 NCP 網繭中啟動 sha-agent 容器。yaml 檔案中具有啟動 sha-agent 容器所需的密碼和組態對應。sha-agent 容器已新增至 NCP 網繭。套用 yaml 後,請確定這兩個容器 (NCP 和 sha-agent) 都在執行中。

ncp-ods.yaml:
# Yaml template for NCP Deployment
# Proper kubernetes API and NSX API parameters, and NCP Docker image
# must be specified.
# This yaml file is part of NCP 4.1.1.0 release.
  
# This is a FAKE configmap, which only ensures that sha-agent could start up.
apiVersion: v1
data:
  deployment_info.yml: "outer_collector:\n  tsdb:\n    # Access metrics instance via
    servicename.namespace\n    endpoint: metrics-manager.nsx-system.svc.cluster.local:5129\nmanager_namespace:
    nsx-system\nnorthstar_service_type: METRICS\norg_id: \ninstance_id:\nmanager_fqdn:
    metrics-manager.nsx-system.svc.cluster.local:5129\nmanager_type: NTM\n"
kind: ConfigMap
metadata:
  name: sha-agent-config
  namespace: nsx-system
---
# This is a FAKE secret, which only ensures that sha-agent could start up.
apiVersion: v1
data:
  ca.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURlekNDQW1NQ0ZCQlFROC83alhzcUhLMENISnlRTmoya2tDS3pNQTBHQ1NxR1NJYjNEUUVCQ3dVQU1Ib3gKQ3pBSkJnTlZCQVlUQWxWVE1Rc3dDUVlEVlFRSURBSkRRVEVTTUJBR0ExVUVCd3dKVUdGc2J5QkJiSFJ2TVE4dwpEUVlEVlFRS0RBWldUWGRoY21VeERqQU1CZ05WQkFzTUJWWkVUbVYwTVNrd0p3WURWUVFERENCRFRpMXpZekl0Ck1UQXRNVGcxTFRFd01TMHhNVEF0TVRZNE16VXhPREUwTnpBZUZ3MHlNekExTURnd016VTFORGhhRncwek16QTEKTURVd016VTFORGhhTUhveEN6QUpCZ05WQkFZVEFsVlRNUXN3Q1FZRFZRUUlEQUpEUVRFU01CQUdBMVVFQnd3SgpVR0ZzYnlCQmJIUnZNUTh3RFFZRFZRUUtEQVpXVFhkaGNtVXhEakFNQmdOVkJBc01CVlpFVG1WME1Ta3dKd1lEClZRUUREQ0JEVGkxell6SXRNVEF0TVRnMUxURXdNUzB4TVRBdE1UWTRNelV4T0RFME56Q0NBU0l3RFFZSktvWkkKaHZjTkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFPMzVSTE1hTUpWcnBGREk3WGtIN3QybUN2VzA2SWJWZjF4RQozL2ozTU1TMHZYeURrdFI4Ui9qOTBhVThxUlhmVnNTcnlPcmk4dm5ZT0ZFdVpEbE5Bd2x1VnpQR0RQcFdlbk1CCm80VlFaNjh6eWVpQXY1UEMzc2xhWkUyVm9qaFlyeHUzd3JiWDVKdEpkaWpDbnNIbkYzRDN3VnJFbTVaMUJVSHoKMzZ3bitHWVM1Vzh1WlZzQy9IK2gvSnYrYXY5SVFWdzZHVGYvSjRaK2xTaUJRSU1rL0s1WUdZc25PUFBaYUVNcgpWVkk5RUg0TFpJOFRuWjVnT0owUVdBdThMSGVpd0c4QklnRWtyd0V0NXBhYnN0VjNkTWdRc2p6bjdUZm9aR2h4CmVic05SQXF1bmc4VmpHQzZHeGpsTzdsbXhQdDJmK0ZPMEE4anF5c2xMbDRENWtDY29CVUNBd0VBQVRBTkJna3EKaGtpRzl3MEJBUXNGQUFPQ0FRRUFWOTZBVUtYVGFNK0hrQmVPU3Nmc3htUHZDcG1ac0U5elYxUVlMemhmRW9NcgpDVkY0KzRPc3hsQ2VxZDdIR0pXdHYvdnRXMHIxZk9NNHhpNitHS09yQU4zRjVteUVLS3Rhd01jaStyS2EzbitHCmJpK0Rrdmk2YUNLVG0zaUFoTlEzNkdCdzJiSkxZU1Z4L0FRTDltQ2p4ZURwckV4WlBET1AyblVOTkJjVW02WVEKdXphUGNud1VGTzVXNlpRQ3hFQVdkeU1FbXZYK1pWcHNLTk42MXlhWnBvZHdVRlExMGFjR0QvN3lrc3Y4WTQ0YwpsQmlKSWdLVmdySDlYUUhIQVIxUzZPVGNnYzgyU25RS1dSUCtpTCtCQjQ4eWRLUkhEeDFSdzcvcTU5VXFSa0ltCmVlejdEVHhwaFBoN1NyWk5ZTVZlVTJxaFFRMWpETnVhd2FnUXJNa2lTQT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
  tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURlekNDQW1NQ0ZCQlFROC83alhzcUhLMENISnlRTmoya2tDS3pNQTBHQ1NxR1NJYjNEUUVCQ3dVQU1Ib3gKQ3pBSkJnTlZCQVlUQWxWVE1Rc3dDUVlEVlFRSURBSkRRVEVTTUJBR0ExVUVCd3dKVUdGc2J5QkJiSFJ2TVE4dwpEUVlEVlFRS0RBWldUWGRoY21VeERqQU1CZ05WQkFzTUJWWkVUbVYwTVNrd0p3WURWUVFERENCRFRpMXpZekl0Ck1UQXRNVGcxTFRFd01TMHhNVEF0TVRZNE16VXhPREUwTnpBZUZ3MHlNekExTURnd016VTFORGhhRncwek16QTEKTURVd016VTFORGhhTUhveEN6QUpCZ05WQkFZVEFsVlRNUXN3Q1FZRFZRUUlEQUpEUVRFU01CQUdBMVVFQnd3SgpVR0ZzYnlCQmJIUnZNUTh3RFFZRFZRUUtEQVpXVFhkaGNtVXhEakFNQmdOVkJBc01CVlpFVG1WME1Ta3dKd1lEClZRUUREQ0JEVGkxell6SXRNVEF0TVRnMUxURXdNUzB4TVRBdE1UWTRNelV4T0RFME56Q0NBU0l3RFFZSktvWkkKaHZjTkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFPMzVSTE1hTUpWcnBGREk3WGtIN3QybUN2VzA2SWJWZjF4RQozL2ozTU1TMHZYeURrdFI4Ui9qOTBhVThxUlhmVnNTcnlPcmk4dm5ZT0ZFdVpEbE5Bd2x1VnpQR0RQcFdlbk1CCm80VlFaNjh6eWVpQXY1UEMzc2xhWkUyVm9qaFlyeHUzd3JiWDVKdEpkaWpDbnNIbkYzRDN3VnJFbTVaMUJVSHoKMzZ3bitHWVM1Vzh1WlZzQy9IK2gvSnYrYXY5SVFWdzZHVGYvSjRaK2xTaUJRSU1rL0s1WUdZc25PUFBaYUVNcgpWVkk5RUg0TFpJOFRuWjVnT0owUVdBdThMSGVpd0c4QklnRWtyd0V0NXBhYnN0VjNkTWdRc2p6bjdUZm9aR2h4CmVic05SQXF1bmc4VmpHQzZHeGpsTzdsbXhQdDJmK0ZPMEE4anF5c2xMbDRENWtDY29CVUNBd0VBQVRBTkJna3EKaGtpRzl3MEJBUXNGQUFPQ0FRRUFWOTZBVUtYVGFNK0hrQmVPU3Nmc3htUHZDcG1ac0U5elYxUVlMemhmRW9NcgpDVkY0KzRPc3hsQ2VxZDdIR0pXdHYvdnRXMHIxZk9NNHhpNitHS09yQU4zRjVteUVLS3Rhd01jaStyS2EzbitHCmJpK0Rrdmk2YUNLVG0zaUFoTlEzNkdCdzJiSkxZU1Z4L0FRTDltQ2p4ZURwckV4WlBET1AyblVOTkJjVW02WVEKdXphUGNud1VGTzVXNlpRQ3hFQVdkeU1FbXZYK1pWcHNLTk42MXlhWnBvZHdVRlExMGFjR0QvN3lrc3Y4WTQ0YwpsQmlKSWdLVmdySDlYUUhIQVIxUzZPVGNnYzgyU25RS1dSUCtpTCtCQjQ4eWRLUkhEeDFSdzcvcTU5VXFSa0ltCmVlejdEVHhwaFBoN1NyWk5ZTVZlVTJxaFFRMWpETnVhd2FnUXJNa2lTQT09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
  tls.key: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcGdJQkFBS0NBUUVBN2ZsRXN4b3dsV3VrVU1qdGVRZnUzYVlLOWJUb2h0Vi9YRVRmK1Bjd3hMUzlmSU9TCjFIeEgrUDNScFR5cEZkOVd4S3ZJNnVMeStkZzRVUzVrT1UwRENXNVhNOFlNK2xaNmN3R2poVkJucnpQSjZJQy8KazhMZXlWcGtUWldpT0Zpdkc3ZkN0dGZrbTBsMktNS2V3ZWNYY1BmQldzU2JsblVGUWZQZnJDZjRaaExsYnk1bApXd0w4ZjZIOG0vNXEvMGhCWERvWk4vOG5objZWS0lGQWd5VDhybGdaaXljNDg5bG9ReXRWVWowUWZndGtqeE9kCm5tQTRuUkJZQzd3c2Q2TEFid0VpQVNTdkFTM21scHV5MVhkMHlCQ3lQT2Z0Titoa2FIRjV1dzFFQ3E2ZUR4V00KWUxvYkdPVTd1V2JFKzNaLzRVN1FEeU9yS3lVdVhnUG1RSnlnRlFJREFRQUJBb0lCQVFDWXRSZHZzd093THJYdgpuVEErTldnRDFkUThuYzJGRUtXOHlQbk1vcHNwN3kyVkpEMXBteUw0VmJCZFQxTFZsVTd4djZhYmkrME5oTUdHCjNyVXp6QWFCMjh1YmpxQ3ZXQ1VWZmR5MzVNUFVPdkI3QVh0dVQyTjFaRXJ2T25FeHBUOGhFMGVnMjJONGZxaVQKT1doMDExMUVnY2dTL2cwMWZIeFdPUyswSXFZVW9SalJIcVFCZ0dkR3NqYmJNTmF1Skl2bitQb1BKM1ZjeVJ4cgorWHhPOHlidHlzUlV3dkxYb2dpTmc5UTRsUENEZmtBdkFWNHdqRGFWeFBHUWs3YmFOOVJzeEVtVnZpZFU4aFpmCnFMbGlXUUwrdFdvWk5zcTM3VlhBRk51R203c2dISFptb3dUcEw5b1BjZU9rNkhsa2g0OEVCQWJmZm9zd0l0c0sKMUM1WWFtakJBb0dCQVBibGl5QzZuc3lMQ01EcUR1OEdYTHo2WER5UktHdnhlbmZjRjEzUzZpR2QxenkwUTVRawprdmdoMDlpVktVM21GQ1BLZElRMENDUWIza0lVU2hxL3c1MGw3KzBtcm5MUDI5ZjBOWHRFdDhNYTdCM1pPbmpBCnlzZnNCOU9KUkkzeDFVV0hwUXhnV3N2Q1RES0sxN2xhTnRYbDNiUGtNdGpRUWNwM21MYlJSeUpSQW9HQkFQYS8KZ0U3aCtSdUp4U2ZSVXNNbEVEOVV6ZGliNVphdmlSQ3Zaak56bkMrT1duYWpGMHBpeUZWRGlCYndVQTlkRkR5SApuaHlwRWlZVnRhWGFybm9FbEVkZmlYMGI2ZXJHWFYySWNVMDhWQU9wVGkveUZYQWprdDI5SytKVDVzN3lQVytGCklzcnZPRWJnQ3VFZmIvR1p4NmdETSsxMmlzSW9wTWdYb2dTeFRBeUZBb0dCQU5YUzJubFA1bk9TL2RQRllZV1UKNXdBcmUzSmc3TGIvZldjTXo1Zk1NRVZJNDcySkNQWGw3dnJDb1N2emtzQUtRT3IyVFk2cFdWdWNYeEt2YTdaYQoyZGpob0Rhc3gyeGJwRFFWSmJSS1FUUFJ2eWZpbUFjNFFPWi8vZzh2MUpWeUdaaUw3MThXbTh2WHpCSUJ1TzZuCnVOSHFyK1U1L3VkVEJZZUpxRks4VUhUaEFvR0JBTURpdkl0dGpJMGhhcFNReG5DMEpYcE1jY20xUElsSjJRekoKQUV5U1FITFFoaGtkcnRSQVdqaUUzUHFKaXh3bmQrMUZXcTB1NFhnU0duaDNkVkwvQjJhdjRVdUNxWjRVeU9HWQpDbklGQ2V2K3lwY2lWKzNjY1MrVGRKMnRWczFKZ2dzT2VUOUlONmIzOXFrN0tRZ2xYWFVTWStKcWUxZ0I2NlpiCkN4VTkvNlA5QW9HQkFNUDI0NWRKQ2dsc0dwQ3pjZVR3bFd3MUdmTUdhN0V3QWd4ZXNFQVd3NmE3d2lENVhDcDMKVHo3eE9YdzM4TyswQUNucEZTMXBpNTM4eWZCKzF5aFhtZU8wOUhTdUpsOEVYVlpRR0VNWUR2Q0M5OHVGb0xzWgprcHEwallqQlpzV0JScE9XaXVPdkQrU3pQa0xtNU10cytUemdQWURqWjVEamw2SExHQ3dDNjFKSQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo=
kind: Secret
metadata:
  name: sha-agent-tls-cert
  namespace: nsx-system
type: kubernetes.io/tls
---
 
apiVersion: apps/v1
kind: Deployment
metadata:
  # VMware NSX Container Plugin
  name: nsx-ncp
  namespace: nsx-system
  labels:
    tier: nsx-networking
    component: nsx-ncp
    version: v1
spec:
  # Active-Standby is supported from NCP 2.4.0 release,
  # so replica can be more than 1 if NCP HA is activated.
  # replica *must be* 1 if NCP HA is deactivated.
  selector:
    matchLabels:
      tier: nsx-networking
      component: nsx-ncp
      version: v1
 
  replicas: 1
 
  template:
    metadata:
      labels:
        tier: nsx-networking
        component: nsx-ncp
        version: v1
 
      # annotations:
      #   prometheus.io/scrape: "true"
      #   prometheus.io/port: "8001"
 
    spec:
      # NCP shares the host management network.
      hostNetwork: true
 
      tolerations:
        - key: node-role.kubernetes.io/master
          effect: NoSchedule
        - key: node-role.kubernetes.io/control-plane
          effect: NoSchedule
 
      # If configured with ServiceAccount, update the ServiceAccount
      # name below.
      serviceAccountName: ncp-svc-account
      # podAntiAffinity could ensure that NCP replicas are not be co-located
      # on a single node
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: component
                  operator: In
                  values:
                  - nsx-ncp
                - key: tier
                  operator: In
                  values:
                  - nsx-networking
              topologyKey: "kubernetes.io/hostname"
 
      containers:
        - name: nsx-ncp
          # Docker image for NCP
 
          image: nsx-ncp
 
          imagePullPolicy: IfNotPresent
 
          securityContext:
            capabilities:
              add:
                - AUDIT_WRITE
 
          env:
            - name: NCP_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: NCP_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          livenessProbe:
            exec:
              command:
                - /bin/sh
                - -c
                - check_pod_liveness nsx-ncp 30
            initialDelaySeconds: 5
            timeoutSeconds: 30
            periodSeconds: 10
            failureThreshold: 5
          volumeMounts:
          - name: projected-volume
            mountPath: /etc/nsx-ujo
            readOnly: true
  
        - name: sha-agent
          image: nsx-ncp
          imagePullPolicy: IfNotPresent
          volumeMounts:
          - mountPath: /etc/nsx-ujo
            name: projected-volume
            readOnly: true
          - mountPath: /var/log/nsx-ujo
            name: host-var-log-ujo
          - mountPath: /cert/
            name: sha-agent-cert
          - mountPath: /etc/sha
            name: config-volume
          command: ["start_sha"]
 
      volumes:
 
        - name: host-var-log-ujo
          hostPath:
            path: /var/log/nsx-ujo
            type: DirectoryOrCreate
        - name: sha-agent-cert
          projected:
            defaultMode: 420
            sources:
            - secret:
                items:
                - key: tls.key
                  path: tls.key
                - key: tls.crt
                  path: tls.crt
                - key: ca.crt
                  path: ca.crt
                name: sha-agent-tls-cert
            - secret:
                items:
                - key: tls.crt
                  path: nsx-cert/tls.crt
                - key: tls.key
                  path: nsx-cert/tls.key
                name: nsx-secret
        - configMap:
            defaultMode: 420
            name: sha-agent-config
          name: config-volume
 
        - name: projected-volume
          projected:
            sources:
              # ConfigMap nsx-ncp-config is expected to supply ncp.ini
              - configMap:
                  name: nsx-ncp-config
                  items:
                    - key: ncp.ini
                      path: ncp.ini
              # To use cert based auth, uncomment and update the secretName,
              # then update ncp.ini with the mounted cert and key file paths
 
              #- secret:
              #    name: nsx-secret
              #    items:
              #      - key: tls.crt
              #        path: nsx-cert/tls.crt
              #      - key: tls.key
              #        path: nsx-cert/tls.key
 
 
              #- secret:
              #    name: lb-secret
              #    items:
              #      - key: tls.crt
              #        path: lb-cert/tls.crt
              #      - key: tls.key
              #        path: lb-cert/tls.key
 
              # To use JWT based auth, uncomment and update the secretName.
              #- secret:
              #    name: wcp-cluster-credentials
              #    items:
              #      - key: username
              #        path: vc/username
              #      - key: password
              #        path: vc/password

步驟 2:啟動 Runbook 叫用,並取得結果

在 NCP 中,我們使用 sha-agent CLI,藉由在 sha-agent 容器中執行 CLI 命令,來執行相關的 Runbook 作業。在 sha-agent 容器中, sha-appctl 負責執行 Runbook 命令。當您遇到網繭停滯在擱置狀態的問題時,需要先取得網繭的名稱和命名空間。然後,使用以下命令來啟動 Runbook 叫用:
kubectl exec NCP_POD -c sha-agent -- /opt/vmware/sha/bin/sha-appctl -c start_invocation --runbook NCPPendingPod --pod_ns POD_NS --pod_name POD_NAME
該命令會輸出一個叫用識別碼,以追蹤叫用狀態。您可以使用該識別碼,透過以下命令來取得叫用結果:
kubectl exec NCP_POD -c sha-agent -- /opt/vmware/sha/bin/sha-appctl -c get_invocation_result --invocation INVOCATION_ID

您可以在命令輸出中取得 json 格式的 Runbook 結果,以查看偵錯結果。