This guide describes how to troubleshoot common problems with RabbitMQ Cluster Kubernetes Operator.
This guide may be helpful for DIY RabbitMQ on Kubernetes deployments but such environments are not its primary focus.
Certain errors have dedicated sections:
After creating a RabbitMQ instance, it is not available within a few minutes and RabbitMQ pods are not running.
Common reasons for such failure are:
imagePullSecrets
configuration. This prevents the image from being pulled from a Docker registry.storageClassName
configuration.Potential solution to resolve this issue:
kubectl describe pod POD-NAME
to see if there are any warnings (eg. 0/1 nodes are available: 1 Insufficient memory.
)imagePullSecrets
and storageClassName
configurations. See imagePullSecrets, Persistence, and Update a RabbitMQ Instance.If deploying to a resource-constrained cluster (eg. local environments like kind
or minikube
), you may need to adjust CPU and/or memory limits of the cluster. Check the resource-limits example to see how to do this.
An error such as
pods POD-NAME is forbidden: unable to validate against any pod security policy: []
as an event of the underlying ReplicaSet
of the Kubernetes Operator deployment, or as an event of the underlying StatefulSet
of the RabbitmqCluster
.
This occurs if pod security policy admission control is enabled for the Kubernetes cluster, but you have not created the necessary PodSecurityPolicy
and corresponding role-based access control (RBAC) resources.
Potential solution is to create the PodSecurityPolicy and RBAC resources by following the procedure in Pod Security Policies.
symptom: "After deleting a RabbitmqCluster instance, some Pods are stuck in the terminating state. RabbitMQ is still running in the affected Pods."
cause: "The likely cause is a leftover quorum queue in RabbitMQ."
Potential solution to resolve this issue:
kubectl delete pod --force --grace-period=0 POD-NAME
This example uses a Pod name:
kubectl delete pod --force rabbit-rollout-restart-server-1 # warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. # pod 'rabbit-rollout-restart-server-1' force deleted
To view the status of an instance by running, use
kubectl -n NAMESPACE get all
Where NAMESPACE
is the Kubernetes namespace of the instance.
For example:
kubectl -n rmq-instance-1 get all # NAME READY STATUS RESTARTS AGE # pod/example-server-0 1/1 Running 0 2m27s
# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE # service/example-nodes ClusterIP None None 4369/TCP 2m27s # service/example ClusterIP 10.111.202.183 None 5672/TCP,15672/TCP,15692/TCP 2m28s
# NAME READY AGE # statefulset.apps/example-server 1/1 2m28s
After deploying RabbitMQ Cluster Operator, it fails during startup and its pod is restarted.
Common reasons for such failure are:
Potential solution to resolve this issue:
kubectl -n rabbitmq-system logs -l app.kubernetes.io/name=rabbitmq-cluster-operator
)Failed to get API Group-Resources
Get https://ADDRESS:443/api: connect: connection refused
kube-apiserver
component