Troubleshooting

This topic provides information that can help troubleshoot problems you may encounter using Postgres for Kubernetes.

Monitor deployment progress

Use watch kubectl get all to monitor the progress of the Postgres operator deployment. The deployment is complete when the postgres operator pod is in the Running state. For example:

watch kubectl get all

NAME                                           READY       STATUS       RESTARTS      AGE
pod/postgres-operator-567dbc67b9-nrq5t         1/1         Running      0             57s
NAME                                           TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes                             ClusterIP   10.96.0.1    <none>        443/TCP   2d4h
NAME                                           READY       UP-TO-DATE   AVAILABLE     AGE
deployment.apps/postgres-operator              1/1         1            1             57s
NAME                                           DESIRED     CURRENT      READY         AGE
replicaset.apps/postgres-operator-567dbc67b9   1           1            1             57s

View Postgres Operator logs

Check the logs of the operator to ensure that it is running properly.

kubectl logs -l app=postgres-operator

2019-08-05T17:24:16.182Z	INFO	controller-runtime.controller	Starting EventSource{"controller": "postgres", "source": "kind source: /, Kind="}
2019-08-05T17:24:16.182Z	INFO	setup	starting manager
2019-08-05T17:24:16.285Z	INFO	controller-runtime.controller	Starting Controller	{"controller": "postgres"}
2019-08-05T17:24:16.386Z	INFO	controller-runtime.controller
Starting workers	{"controller": "postgres", "worker count": 1}

List all Postgres instances

When you create Postgres instances, each instance is created in its own namespace. To see all Postgres instances in the cluster, add the -all-namespaces option to the kubectl get command.

kubectl get postgres --all-namespaces

NAMESPACE   NAME               STATUS     AGE
default     postgres-sample    Running    19d
default     postgres-sample2   Running    15d
test        my-postgres        Pending    15d
test        my-postgres3       Pending    15d

Review Message field for errors

In scenarios when a Postgres instance is not running due to misconfiguration, or insufficient resources, or any error scenarios, you can investigate the error by running kubectl describe postgres <postgres-instance-name> -n <namespace>. The Status section of the field Message will contain the error message encountered. For example:

Status:
  Backups:
    Last Created:
    Last Successful:
  Binding:
    Name:         postgres-sample-app-user-db-secret
  Current State:  Created
  Db Version:     14.5
  Message:        PostgresBackupLocation.sql.tanzu.vmware.com "nonexistent-location" not found

If there are multiple errors, they will appear in a list separated by semi-colons.

Find the versions of the deployed Postgres Operator and instances

To find the currently deployed version of the Postgres operator, use the helm command:

helm ls

NAME             	NAMESPACE	REVISION	UPDATED                             	STATUS  	CHART                        APP VERSION
postgres-operator	default  	1       	2022-08-11 13:26:00.769535 -0500 CDT	deployed	postgres-operator-v2.0.0	 v2.0.0

The version is in the chart name and the APP VERSION column.

To find the version of a Postgres instance, use the kubectl command to describe the instance's pod.

kubectl get pods

kubectl get pods
NAME                                 READY   STATUS    RESTARTS   AGE
postgres-sample-0                    1/1     Running   0          9s
postgres-operator-85f777b9db-wbj9b   1/1     Running   0          4m15s

Name:           postgres-sample-0
Namespace:      default
Priority:       0
Node:           minikube/192.168.64.32
Start Time:     Mon, 11 Oct 2021 14:10:38 -0500
Labels:         app=postgres
                controller-revision-hash=postgres-sample-5fc8fb8b4b
                headless-service=postgres-sample
                postgres-instance=postgres-sample
                role=read
                statefulset.kubernetes.io/pod-name=postgres-sample-0
                type=data
Annotations:    <none>
Status:         Running
IP:             172.17.0.8
Controlled By:  StatefulSet/my-postgres
Containers:
  pg-container:
    Container ID:  docker://6c651d690a6fdb6d1c0d3644ad8225037d31da1c33fd3f88f1625bdfd45cea3a
    Image:         postgres-instance:v2.0.0
    Image ID:      docker://sha256:00359ca344dd96eb05f2bd430430c97a6d46a40996c395fca44c209cb954a6e7
    Port:          5432/TCP
    Host Port:     0/TCP

The VMware Postgres Operator version can be found in the image name of the pg-container entry.

Cannot reduce instance data size after deployment

When deploying an instance using a specific storage size in the instance yaml deployment file, you cannot reduce the instance data storage size at a later stage. For example, after creating an instance and setting the storage size to 100M:

kubectl create -f postgres.yaml

Verify the storage size using a command similar to:

kubectl get postgres.sql.tanzu.vmware.com/postgres-sample -o jsonpath='{.spec.storageSize}'

100M

If you later patch the instance to decrease the storage size from 100M to 2M:

kubectl patch postgres.sql.tanzu.vmware.com/postgres-sample --type merge -p '{"spec":{"storageSize": "2M"}}'

the operation returns an error similar to:

Error from server (spec.storageSize: Invalid Value: "2M" spec.storageSize cannot be reduced for an existing instance
spec.storageSize: Invalid Value: "2M" spec.storageSize needs to be at least 250M): admission webhook "vpostgres.kb.io" denied the request: spec.storageSize: Invalid Value: "2M" spec.storageSize cannot be reduced for an existing instance
spec.storageSize: Invalid Value: "2M" spec.storageSize needs to be at least 250M

To reduce the instance data size, create a new instance and migrate the source data over. Ensure that the source data fits in the reduced data size allocation of the newly created instance.

Address low disk space errors

When the monitor pod or data pods are out of disk space, you could receive an error similar to:

2022-07-25 20:02:19.028 UTC [248] LOG:  could not close temporary statistics file "pg_stat_tmp/global.tmp": No space left on device

Resolve the issue either by increasing the storage size, or by restoring to a new instance with a larger size volume.

Increase storage size

You can modify the Postgres data volume and expand it. For information how to verify that your PVs are expandable, and how to increase them, see Expanding Storage Volume Size.

Restore to an instance with a larger volume

If your instance is backed up, you can restore it to a new instance that has a larger data volume. For restore details see Restore to a Different Instance.

Errors during backup of two different instances on the same bucket

This scenario occurs when you have two separate Kubernetes clusters with matching instance and namespace names. This scenario requires the following conditions:

Each cluster has a matching namespace name; for example cluster 1 has a namespace called my-namespace, and cluster 2 has a namespace called my-namespace.
Each cluster has a Postgres instance with the same name, for example my-instance.
Both clusters share the same S3 bucket for backups.

During backup, the first Postgres instance creates a backup stanza using the format my-instance-my-namespace. That stanza is encrypted with a randomly-generated backup cipher. During backup configuration for the second instance, the instance detects that a backup stanza with the same name already exists in the bucket. However, the second instance cannot decrypt the backup information because it uses a different cipher. The error is similar to: :

ERROR: [043]: WAL segment to get required 2021-09-02 15:55:35.615 P00 INFO: archive-get command end: aborted with exception [043] command terminated with exit code 43 or FormatError: key/value found outside of section at line 1: ▒▒▒H▒t=֠O@▒Y▒.

Workaround: Use different instance names, or different namespace names, or different buckets for backups.

Troubleshooting PostgreSQL server configMap

PostgreSQL configuration settings not applied

If you see a different configMap value than the one you set, check for errors in the instance using
kubectl describe postgres <instance-name> -n <namespace-name>.

An example error could be:
```
Warning  ConfigFileError  24s  postgrescontroller  19:28:11 642 WARN  Postgres logs from "/pgsql/data/startup.log":
19:28:11 642 INFO  2022-12-07 19:28:10.875 GMT [641] LOG:  invalid value for parameter "log_timezone": "PST"
19:28:11 642 INFO  2022-12-07 19:28:10.875 GMT [641] FATAL:  configuration file "/etc/customconfig/postgresql.conf" contains errors
```
Look for ConfigFileError under Events. In this specific example, the field log_timezone was set to the value PST, which is an invalid value. Edit the ConfigMap, and re-apply, as described in Updating PostgreSQL parameters. After fixing the error, the pod will restart.

Confirm that you have provided valid field names. For details on PostgreSQL configuration parameter names refer to PostgreSQL Server parameters. A sample error could be:

Warning  ConfigFileError  3s  postgrescontroller  19:36:28 349 WARN  Postgres logs from "/pgsql/data/startup.log":
19:36:28 349 INFO  2022-12-07 19:36:28.231 GMT [331] LOG:  unrecognized configuration parameter "log_timezones" in file "/etc/customconfig/    postgresql.conf" line 1
19:36:28 349 INFO  2022-12-07 19:36:28.231 GMT [331] FATAL:  configuration file "/etc/customconfig/postgresql.conf" contains errors

In this example, the field log_timezone has been mistakenly entered as log_timezones with an s, which is an invalid value.

Review if the field you attempted to change is part of the exception list. Certain fields have default values that cannot be overwritten, and your custom values will not be applied. For the parameter exceptions details refer to Exceptions.

Postgres instance remains pending after changing Postgresql settings

If your instance appears to be stuck in Pending state, or a pod has gone into CrashLoopBackoff, first check the events for errors by running:
```
kubectl describe postgres <instance-name> -n <namespace-name>
```
Look for ConfigFileError under Events.

If the events have expired, check the logs in the pg-container by running:

kubectl logs -l postgres-instance=<instance-name>,type=data -n <namespace> -c <pg-container>

Look for a line that begins with FATAL. The output could be similar to:

19:42:31.751 GMT [1181] FATAL:  configuration file \"/etc/customconfig/postgresql.conf\" contains errors\n"}

Review the steps in PostgreSQL configuration settings not applied to verify that you have used valid field names and values in your configMap.
Once you update configMap with valid field names and values, the affected pod will restart and the updated values will be applied allowing the instance achieve a Running state.

Rebuilding indexes

There are many scenarios when indexes may need to be rebuilt. Refer to the information in PostgreSQL REINDEX.

Use the reindexdb utility to reindex all databases, a single database, a single schema, a single table, or a single index depending on your use case. A standard index rebuild will allow table reads and prevent table writes, until the action is complete.

Before rebuilding affected indexes, review the reindex documentation that matches the Postgres database major version of the affected Postgres instance. For PostgreSQL 15 for example, refer to reindexdb.

Procedure

Determine the primary pod name by running:

kubectl get pod -l postgres-instance=<INSTANCE-NAME>,role=read-write -n <NAMESPACE-NAME>`

where:

INSTANCE-NAME is the name of the Postgres instance
NAMESPACE-NAME is the name of the namespace

Sample output:

NAME            READY   STATUS    RESTARTS   AGE
my-postgres-0   5/5     Running   0          25s

Connect to a container shell on the pod by using:

kubectl -n <NAMESPACE-NAME> exec -it <POD-NAME> -- bash

where:

POD-NAME is the name of the pod returned from the previous output
NAMESPACE-NAME is the name of the namespace

To index all databases run:

reindexdb --all

The ouput should be similar to:

reindexdb: reindexing database "my-postgres"
reindexdb: reindexing database "postgres"
reindexdb: reindexing database "template1"