Troubleshooting VMware Identity Manager postgres cluster outage deployed through vRealize Suite Lifecycle Manager.

Problem

VMware Identity Manager cluster health status displays as CRITICAL in vRealize Suite Lifecycle Manager Health Notification due to network loss in the VMware Identity Manager appliance.

Cause

Network loss on the postgres cluster primary node. For /usr/local/bin/pcp_watchdog_info -p 9898 -h localhost -U pgpool command, it would prompt for a password. If /usr/local/etc/pgpool.pwd file is present on the VMware Identity Manager node, that would contain the password. If the password is not available, use the default password password.

Command parameters help:

-h : The host against which the command is run is localhost.

-p : The port on which pgpool accepts connections is 9898.

-U : The pgpool health check and replication delay check user is pgpool.

There must be an expected response.

3 YES <Host1>:9999 Linux <Host1> <Host1>

<Host1>:9999 Linux <Host1> <Host1> 9999 9000 4 MASTER

<Host2>:9999 Linux <Host2> <Host2> 9999 9000 7 STANDBY

<Host3>:9999 Linux <Host3> <Host3> 9999 9000 7 STANDBY

In the response, there needs to be a MASTER node and 2 STANDBY nodes present. If any of the node's status is SHUTDOWN, DEAD or the command execution is struck, follow the steps to resolve this issue.

Solution

  1. Gracefully bring down the services on VMware Identity Manager nodes. Refer to KB 78815 for the required steps.
  2. Power OFF the VMware Identity Manager appliances in vCenter.
  3. Power ON the VMware Identity Manager nodes through vRealize Suite Lifecycle Manager.