VMware Cloud Director 維護節點之間的同步串流複寫。如果待命節點變得無法連線,您必須確定原因並解決問題。
問題
VMware Cloud Director 應用裝置管理使用者介面將叢集健全狀況顯示為 DEGRADED,其中一個待命節點的狀態為 ? 無法連線。
/nodes
API 傳回的資訊指出 localClusterHealth
為 DEGRADED,節點 status
為 ? 無法連線,且 nodeHealth
為 UNHEALTHY。
例如,
/nodes
API 可能會針對節點傳回下列資訊。
{ "localClusterFailover": "MANUAL", "localClusterHealth": "DEGRADED", "localClusterState": [ { "connectionString": "host=primary_host_IP user=repmgr dbname=repmgr connect_timeout=2", "failover": { "details": "failover = manual", "mode": "MANUAL", "repmgrd": { "details": "On node primary_node_ID (primary_host_name): repmgrd = not applicable", "status": "NOT APPLICABLE" } }, "id": primary_node_ID, "location": "default", "name": "primary_host_name", "nodeHealth": "HEALTHY", "nodeRole": "PRIMARY", "role": "primary", "status": "* running", "upstream": "" }, { "connectionString": "host=unreachable_standby_host_IP user=repmgr dbname=repmgr connect_timeout=2", "failover": { "details": "failover state unknown - unable to ssh to failed or unreachable node", "mode": "UNKNOWN", "repmgrd": { "details": "On node unreachable_standby_node_ID (unreachable_standby_host_name): repmgrd = n/a", "status": "UNKNOWN" } }, "id": unreachable_standby_node_ID, "location": "default", "name": "unreachable_standby_host_name", "nodeHealth": "UNHEALTHY", "nodeRole": "STANDBY", "role": "standby", "status": "? unreachable", "upstream": "primary_host_name" }, { "connectionString": "host=running_standby_host_IP user=repmgr dbname=repmgr connect_timeout=2", "failover": { "details": "failover = manual", "mode": "MANUAL", "repmgrd": { "details": "On node running_standby_node_ID (running_standby_host_IP): repmgrd = not applicable", "status": "NOT APPLICABLE" } }, "id": running_standby_node_ID, "location": "default", "name": "running_standby_host_name", "nodeHealth": "HEALTHY", "nodeRole": "STANDBY", "role": "standby", "status": "running", "upstream": "primary_host_name" } ], "warnings": [ "unable to connect to node \"unreachable_standby_host_name\" (ID: unreachable_standby_node_ID)", "node \"unreachable_standby_host_name\" (ID: unreachable_standby_node_ID) is registered as an active standby but is unreachable" ] }
原因
為確保資料完整性,PostgreSQL 資料庫使用預寫式記錄 (WAL)。主要節點持續將 WAL 串流至作用中的待命節點,以進行複寫和復原。待命節點會在收到 WAL 後對其進行處理。如果待命節點無法連線,它會停止接收 WAL,且無法成為升階為新主要節點的候選節點。
解決方案
- 確認無法連線的待命節點的虛擬機器是否正在執行。
- 確認與待命節點的網路連線是否運作正常。
- 確認沒有可能會阻止待命節點與其他節點通訊的 SSH 問題。
- 確認待命節點上的 vpostgres 服務是否正在執行。