High Availability (HA) failover in PostgreSQL supported by Tanzu-powered backend infrastructure plays a pivotal role in the domain of database management. This ensures uninterrupted accessibility and data integrity, offering protection against hardware failures, network issues, and unforeseen circumstances. The resilience of PostgreSQL is realized through a harmony of resilient replication mechanisms, and tools seamlessly orchestrating automatic transitions between primary and standby nodes within the Tanzu environment.
The following components have been utilized for the deployment of PostgreSQL High Availability (HA) failover system:
We leveraged Tanzu Kubernetes Grid (informally known as TKGm) 2.3.0 to create a well-configured and highly available infrastructure for our PostgreSQL deployment. The Tanzu Infrastructure played an important role in optimizing the deployment and management of PostgreSQL, adding further value to the failover capabilities.
Tanzu’s centralized management capabilities simplify the administration of PostgreSQL deployments. The Tanzu portfolio provides a unified control plane for deploying, scaling, and managing applications including PostgreSQL instances. This centralized approach streamlines operations reducing the complexity associated with managing distributed databases.
PostgreSQL’s HA strategy lies in a robust replication mechanism. This mechanism maintains a standby node that mirrors the data of the primary node. The replication process can be synchronous or asynchronous, offering flexibility based on performance and reliability requirements. This synchronized standby node facilitates a smooth transition in case of a primary node failure, ensuring data consistency and availability.
As an open-source tool, the Replication Manager (informally known as repmgr) simplifies the administration of PostgreSQL replication and failover. By constantly monitoring the health of the primary node, repmgr triggers the promotion of a standby node to the new primary node upon detecting a failure. This automated process minimizes downtime and preserves the data consistency.
In containerized environments orchestrated by Kubernetes, Helm charts emerge as a convenient solution for deploying and managing PostgreSQL instances.
Tanzu’s integration optimizes resource utilization by ensuring efficient use of underlying infrastructure. The Tanzu infrastructure, in conjunction with PostgreSQL’s HA failover mechanism, creates a symbiotic relationship where resources are allocated dynamically, contributing to cost-effectiveness and performance optimization.
In this example, we’ll walk through the installation and configuration steps for achieving High Availability failover in PostgreSQL using Helm charts, Kubernetes, and Replication Manager within the Tanzu infrastructure. We’ve set up two namespaces, primary
and standby
, with the goal that if the primary node goes down, the standby node seamlessly takes over.
The following component versions and interoperability matrix are supported with this deployment:
Add an Helm repository.
# helm repo add bitnami https://charts.bitnami.com/bitnami
"bitnami" has been added to your repositories
Create the following Kubernetes namespaces.
# kubectl create ns primary
# kubectl create ns standby
Deploy the primary node. You can download the sample primary-values.yaml
file from here.
# helm install primary-repmgr oci://registry-1.docker.io/bitnamicharts/postgresql-ha -n primary -f values-primary.yaml
NAME: primary-repmgr
LAST DEPLOYED: Thu Jan 4 09:45:02 2024
NAMESPACE: primary
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
CHART NAME: postgresql-ha
CHART VERSION: 12.3.7
APP VERSION: 16.1.0
Deploy the standby node. You can download the sample standby-values.yaml
file from here.
# helm install standby-repmgr oci://registry-1.docker.io/bitnamicharts/postgresql-ha -n standby -f values-standby.yaml
NAME: standby-repmgr
LAST DEPLOYED: Thu Jan 4 09:49:00 2024
NAMESPACE: standby
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
CHART NAME: postgresql-ha
CHART VERSION: 12.3.7
APP VERSION: 16.1.0
Scale down the nodes.
# kubectl scale sts/primary-repmgr-postgresql-ha-postgresql --replicas=0 -n primary
statefulset.apps/primary-repmgr-postgresql-ha-postgresql scaled
# kubectl scale sts/standby-repmgr-postgresql-ha-postgresql --replicas=0 -n standby
statefulset.apps/standby-repmgr-postgresql-ha-postgresql scaled
Configure the Replication Manager.
# kubectl edit sts/primary-repmgr-postgresql-ha-postgresql -n primary
…
- name: REPMGR_PARTNER_NODES
value: <primary-svc-name>.svc.cluster.local,<standby-svc-name>svc.cluster.local,
- name: REPMGR_PRIMARY_HOST
value: <primary-svc-name>.svc.cluster.local
…
NoteUpdate the parameters
REPMGR_PARTNER_NODES
,REPMGR_PRIMARY_HOST
to reflect the cluster’s topology.In the scenario described above, the testing involves interactions between distinct namespaces. As a result, we utilized
svc.cluster.local
to address the nodes. If you’re setting up PostgreSQL nodes in separate clusters, it is mandatory to update the addressing information with either the Fully Qualified Domain Name (FQDN) or the IP address corresponding to each node in the respective clusters.
# kubectl edit sts/standby-repmgr-postgresql-ha-postgresql -n standby
…
- name: REPMGR_PARTNER_NODES
value: <primary-svc-name>.svc.cluster.local,<standby-svc-name>svc.cluster.local,
- name: REPMGR_PRIMARY_HOST
value: <primary-svc-name>.svc.cluster.local
…
Update the primary PGPool.
# kubectl edit deployments primary-repmgr-postgresql-ha-pgpool -n primary
…
- name: PGPOOL_BACKEND_NODES
value: 0:primary-repmgr-postgresql-ha-postgresql-0.<primary-svc-name>:5432,1:standby-repmgr-postgresql-ha-postgresql-0.<standby-svc-name>:5432,
…
NoteUpdate the
PGPOOL_BACKEND_NODES
parameter to recognize the new primary and standby nodes.
Update the standby PGPool.
# kubectl edit deployments standby-repmgr-postgresql-ha-pgpool -n standby
..
- name: PGPOOL_BACKEND_NODES
value: 0:primary-repmgr-postgresql-ha-postgresql-0.<primary-svc-name>:5432,1:standby-repmgr-postgresql-ha-postgresql-0.<standby-svc-name>:5432,
…
Configure the pg_hba.conf file by updating the ConfigMap for Standby.
# kubectl edit cm standby-repmgr-postgresql-ha-postgresql-hooks-scripts -n standby
..
data:
pg_hba.conf: |
host replication postgres 0.0.0.0/0 trust
host replication repl_user 0.0.0.0/0 trust
host replication repl_user 0.0.0.0/0 md5
host all all 0.0.0.0/0 md5
host all all ::/0 md5
local all all md5
host all all 127.0.0.1/32 md5
host all all ::1/128 md5
…
NoteUpdate the
pg_hba.conf
file to allow necessary host connections for replication, ensuring secure communication between the nodes.
Initiate Failover.
To initiate the failover, the primary node is gracefully scaled down to zero replicas, allowing the standby node in a different namespace to take over seamlessly so that we can verify it by checking the kubectl
logs
# kubectl scale sts/primary-repmgr-postgresql-ha-postgresql --replicas=0 -n primary
statefulset.apps/primary-repmgr-postgresql-ha-postgresql scaled
The standby node in the ‘standby’ namespace will take over seamlessly.
Verify Logs.
# kubectl logs standby-repmgr-postgresql-ha-postgresql-0 -n standby
…
DEBUG: begin_transaction()
DEBUG: commit_transaction()
NOTICE: STANDBY PROMOTE successful
DETAIL: server "standby-repmgr-postgresql-ha-postgresql-0" (ID: 1006) was successfully promoted to primary
DEBUG: _create_event(): event is "standby_promote" for node 1006
DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery()
DEBUG: _create_event():
INSERT INTO repmgr.events ( node_id, event, successful, details ) VALUES ($1, $2, $3, $4) RETURNING event_timestamp
DEBUG: _create_event(): Event timestamp is "2024-01-04 14:03:37.316107+00"
DEBUG: _create_event(): command is '/opt/bitnami/repmgr/events/router.sh %n %e %s "%t" "%d"'
INFO: executing notification command for event "standby_promote"
DETAIL: command is:
/opt/bitnami/repmgr/events/router.sh 1006 standby_promote 1 "2024-01-04 14:03:37.316107+00" "server \"standby-repmgr-postgresql-ha-postgresql-0\" (ID: 1006) was successfully promoted to primary"
DETAIL: parsed event notification command was:
/opt/bitnami/repmgr/events/router.sh 1006 standby_promote 1 "2024-01-04 14:03:37.316107+00" "server \"standby-repmgr-postgresql-ha-postgresql-0\" (ID: 1006) was successfully promoted to primary"
…
Check the logs to ensure that the standby node has been successfully promoted to the primary role.
By following these steps, you’ve set up a PostgreSQL High Availability (HA) failover system using Helm charts and Kubernetes with Tanzu infrastructure. This ensures continuous database accessibility and data integrity, even in the face of unexpected failures. The automated failover capabilities provided by Replication Manager (repmgr) further enhance the reliability of your PostgreSQL deployment in containerized environments.
The PostgreSQL HA failover mechanism enables the following benefits:
By integrating Tanzu infrastructure into our PostgreSQL High Availability (HA) failover solution within the dynamic realm of containerized environments, PostgreSQL’s HA failover mechanisms, orchestrated through Kubernetes and Helm, extend beyond simple failover capabilities.
They provide the following comprehensive suite of benefits that cater to the demands of modern applications:
In essence, the benefits ensure that PostgreSQL not only withstands node failures but also emerges as a resilient, high-performance, and adaptable database solution, ready to meet the challenges.