To troubleshoot problems in your database high availability cluster, you must monitor the status of the nodes and the events in the cluster.

Procedure

  1. Log in or SSH as root to the OS of any of the running cells in the cluster.
  2. Change the user to postgres.
    sudo -i -u postgres
  3. Check the status of the cluster.
    The Upstream column shows the current primary node.
    /opt/vmware/vpostgres/current/bin/repmgr cluster show

    The console output displays the cluster information. In the following example, the primary node in the cluster, node 3, is unreachable.

         ID |  Name    | Role    | Status         | Upstream   | Location | Connection string
    --------+----------+---------+----------------+------------+----------+------------------------
     Node 1 | Node name | standby |     running   | Node 3 name| default | host=host IP address user=repmgr dbname=repmgr
     Node 2 | Node name | standby |     running   | Node 3 name| default | host=host IP address user=repmgr dbname=repmgr
     Node 3 | Node name | primary | ? unreachable |            | default | host=host IP address user=repmgr dbname=repmgr

    In the following system output example, node 3 is the primary node in a healthy running cluster.

     
         ID |  Name     | Role    | Status   | Upstream  | Location| Connection string
    --------+-----------+---------+----------+-----------+---------+------------------------
     Node 1 | Node name | standby |  running | Node3 name| default | host=host IP address user=repmgr dbname=repmgr
     Node 2 | Node name | standby |  running | Node3 name| default | host=host IP address user=repmgr dbname=repmgr
     Node 3 | Node name | primary | *running |           | default | host=host IP address user=repmgr dbname=repmgr
  4. Check the cluster events log.
    /opt/vmware/vpostgres/current/bin/repmgr -f /opt/vmware/vpostgres/current/etc/repmgr.conf cluster event
    The system output shows creation, cloning, and registration events in the cluster.

What to do next

If the status of the primary node is unreachable or failed, you must promote a standby node.

If the status of a standby node is unreachable or failed, repair the node and start the PostgreSQL service if it is not running.