在具有資料庫 HA 組態的 VMware Cloud Director 應用裝置部署中,postgres 使用者無法透過 SSH 連線至其對等資料庫節點。
問題
當資料庫節點之間出現 SSH 問題時,VMware Cloud Director 會顯示 localClusterHealth
為 SSH_PROBLEM。必須盡快修正此嚴重問題。
可以使用 VMware Cloud Director 應用裝置管理使用者介面來檢視 localClusterHealth
,也可以執行 /nodes VMware Cloud Director 應用裝置 API。請參閱 VMware Cloud Director 應用裝置 API 說明文件。
在出現 SSH 問題的某個節點的對等節點上執行 /nodes API 時,/nodes
API 會傳回以下資訊:localClusterHealth
為 SSH_PROBLEM,localClusterFailover
為 INDETERMINATE。容錯移轉模式為 INDETERMINATE,因為執行 /nodes
API 的節點無法透過 SSH 連線到其對等節點之一。對於出現 SSH 問題的節點,其回應本文的 "failover"
輸出部分中的 "details"
顯示:ssh failed.command: ssh unreachable_standby_host_IP /usr/bin/grep failover=manual /opt/vmware/vpostgres/10/etc/repmgr.conf。
GET https://primary_host_IP:5480/api/1.0.0/nodes
,則
/nodes
API 可能會傳回下列資訊。
{ "localClusterFailover": "INDETERMINATE", "localClusterHealth": "SSH_PROBLEM", "localClusterState": [ { "connectionString": "host=primary_host_IP user=repmgr dbname=repmgr connect_timeout=2", "failover": { "details": "failover = manual", "mode": "MANUAL", "repmgrd": { "details": "On node primary_node_ID (primary_host_name): repmgrd = not applicable", "status": "NOT APPLICABLE" } }, "id": primary_node_ID, "location": "default", "name": "primary_host_name", "nodeHealth": "HEALTHY", "nodeRole": "PRIMARY", "role": "primary", "status": "* running", "upstream": "" }, { "connectionString": "host=running_standby_host_IP user=repmgr dbname=repmgr connect_timeout=2", "failover": { "details": "failover = manual", "mode": "MANUAL", "repmgrd": { "details": "On node running_standby_node_ID (running_standby_host_name): repmgrd = not applicable", "status": "NOT APPLICABLE" } }, "id": running_standby_node_ID, "location": "default", "name": "running_standby_host_name", "nodeHealth": "HEALTHY", "nodeRole": "STANDBY", "role": "standby", "status": "running", "upstream": "primary_host_name" }, { "connectionString": "host=unreachable_standby_host_IP user=repmgr dbname=repmgr connect_timeout=2", "failover": { "details": "ssh failed. command: ssh unreachable_standby_host_IP /usr/bin/grep failover=manual /opt/vmware/vpostgres/10/etc/repmgr.conf", "mode": "UNKNOWN", "repmgrd": { "details": "On node unreachable_standby_node_ID (unreachable_standby_host_name): repmgrd = not running", "status": "NOT RUNNING" } }, "id": unreachable_standby_node_ID, "location": "default", "name": "unreachable_standby_host_name", "nodeHealth": "HEALTHY", "nodeRole": "STANDBY", "role": "standby", "status": "running", "upstream": "primary_host_name" } ], "warnings": [] }
如果執行 GET https://unreachable_standby_host_IP:5480/api/1.0.0/nodes
,由於節點不受信任,則 localClusterFailover
和 localClusterState
資訊可能不正確。/nodes API 傳回警告訊息,指出 unreachable_standby_host_name 無法連線至其對等節點。
/nodes
API 可能會傳回下列資訊。
{ "localClusterFailover": "MANUAL", "localClusterHealth": "SSH_PROBLEM", "localClusterState": [ { "connectionString": "host=primary_host_IP user=repmgr dbname=repmgr connect_timeout=2", "failover": { "details": "ssh failed. command: ssh primary_host_IP /usr/bin/grep failover=manual /opt/vmware/vpostgres/10/etc/repmgr.conf", "mode": "UNKNOWN", "repmgrd": { "details": "On node primary_node_ID (primary_host_name): repmgrd = n/a", "status": "UNKNOWN" } }, "id": primary_node_ID, "location": "default", "name": "primary_host_name", "nodeHealth": "UNHEALTHY", "nodeRole": "PRIMARY", "role": "primary", "status": "? running", "upstream": "" }, { "connectionString": "host=running_standby_host_IP user=repmgr dbname=repmgr connect_timeout=2", "failover": { "details": "ssh failed. command: ssh running_standby_host_IP /usr/bin/grep failover=manual /opt/vmware/vpostgres/10/etc/repmgr.conf", "mode": "UNKNOWN", "repmgrd": { "details": "On node running_standby_node_ID (running_standby_host_name): repmgrd = n/a", "status": "UNKNOWN" } }, "id": running_standby_node_ID, "location": "default", "name": "running_standby_host_name", "nodeHealth": "UNHEALTHY", "nodeRole": "STANDBY", "role": "standby", "status": "? running", "upstream": "primary_host_name" }, { "connectionString": "host=unreachable_standby_host_IP user=repmgr dbname=repmgr connect_timeout=2", "failover": { "details": "failover = manual", "mode": "MANUAL", "repmgrd": { "details": "On node unreachable_standby_node_ID (unreachable_standby_host_name): repmgrd = not applicable", "status": "NOT APPLICABLE" } }, "id": unreachable_standby_node_ID, "location": "default", "name": "unreachable_standby_host_name", "nodeHealth": "HEALTHY", "nodeRole": "STANDBY", "role": "standby", "status": "running", "upstream": "? primary_host_name" } ], "warnings": [ "unable to connect to node \"primary_host_name\" (ID: primary_node_ID)", "unable to connect to node \"running_standby_host_name\" (ID: running_standby_node_ID)", "unable to connect to node \"unreachable_standby_host_name\" (ID: unreachable_standby_node_ID)'s upstream node \"primary_host_name\" (ID: primary_node_ID)", "unable to determine if node \"unreachable_standby_host_name\" (ID: unreachable_standby_node_ID) is attached to its upstream node \"primary_host_name\" (ID: primary_node_ID)" ] }
原因
VMware Cloud Director 將 postgres 使用者的 SSH 憑證儲存在 NFS 共用傳輸伺服器儲存區中。所有資料庫節點都必須具有共用傳輸伺服器儲存區的存取權。如果資料庫節點變得不受信任,即 postgres 使用者的 SSH 憑證不再有效或無法再存取,則該節點無法使用 SSH 用戶端在其對等節點上執行命令。VMware Cloud Director 應用裝置必須具有此功能,才能在 HA 模式下正確執行。