If the appliance database fails, and no database nodes are up and running or all replica nodes are out of sync when the master fails, use the following procedure to attempt to recover the database.

About this task

This procedure applies to situations in which no database nodes are operational across a cluster that is running in asynchronous mode. In this scenario, you typically see errors similar to the following on the Virtual Appliance Management Interface page when trying to load or refresh the page:

Error initializing the database service: Could not open JDBC Connection for transaction; nested exception is org.postgresql.util.PSQLException: The connection attempt failed.

Procedure

  1. Try to recover the database using the Virtual Appliance Management Interface from one of the database nodes.
    1. If possible, open the Virtual Appliance Management Interface database page of the node with the most recent state. Typically, this node is the one that was the master node before the database failed.
    2. If the Virtual Appliance Management Interface for the master node fails to open, try to open the Interface for other replica nodes.
    3. If you can find a database node with a working Virtual Appliance Management Interface, try to recover it by performing a manual failover.
  2. If the procedure in step 1 fails, start a shell session and try to determine the node with the most recent state. Start a shell session to all the available cluster nodes and try to start their databases by running the following shell command: service vpostgres start
  3. Use the following procedure for each node that has a running local database to determine the node with the most recent state.
    1. Run the following command to determine the node with the most recent state. If the command returns f, then it is the node with most recent state and you can proceed to step 4.
      su - postgres
      psql vcac
      vcac=# select pg_is_in_recovery();
       pg_is_in_recovery
      • If this command returns an f, then this node has the most recent state.

      • If the node returns a t, run the following command on the node:

      SELECT pg_last_xlog_receive_location() as receive_loc, pg_last_xlog_replay_location() as replay_loc, extract(epoch from pg_last_xact_replay_timestamp()) as replay_timestamp;

      This command should return a result similar to the following.

      vcac=# SELECT pg_last_xlog_receive_location() as receive_loc, pg_last_xlog_replay_location() as replay_loc, extract(epoch from pg_last_xact_replay_timestamp()) as replay_timestamp;
       receive_loc | replay_loc | replay_timestamp
      -------------+------------+------------------
       0/20000000 | 0/203228A0 | 1491577215.68858
      (1 row)
      
  4. Compare the results for each node to determine which one has the most recent state.

    Select the node with greatest value under the receive_loc column. If equal, select the greatest from the replay_loc column and then, if again equal, select the node with greatest value of replay_timestamp.

  5. Run the following command on the node with the most recent state: vcac-vami psql-promote-master -force
  6. Open the /etc/haproxy/conf.d/10-psql.cfg file in a text editor and update the following line.
    server masterserver sc-rdops-vm06-dhcp-170-156.eng.vmware.com:5432 check on-marked-up shutdown-backup-sessions
    

    To read as follows with the current node FQDN:

    server masterserver current-node-fqdn:5432 check on-marked-up shutdown-backup-sessions
  7. Save the file.
  8. Run the service haproxy restart command.
  9. Open the Virtual Appliance Management Interface database page for the most recent node.

    This node should appear as the master node with the other nodes as invalid replicas. In addition, the Reset button for the replicas is enabled.

  10. Click Reset and Refresh for each replica in succession until the cluster state is repaired.