After the system experiences a very high rate of events in a short period of time, some of the pods in the TKG Cluster on Supervisor or upstream Kubernetes cluster are stuck in the Terminating status.
Problem
After the system recovered from a very high rate of events occurring, the NSX Application Platform is in a Degraded status. In addition, some of the pods in the TKG Cluster on Supervisor or upstream Kubernetes cluster are stuck in the Terminating status for a few minutes or longer.
UI displays that theCause
Due to some Kubernetes infrastructure issues, some of the pods cannot be deleted correctly because of one of the following reasons.
- A finalizer associated with the stuck pod is not able to complete.
- The stuck pod is not responding to the termination signals.
Solution
Ask your infrastructure administrator to use the following information to manually delete the pods that are stuck in the
Terminating status.
- Log in to the control node for your TKG Cluster on Supervisor or upstream Kubernetes cluster.
- Use the following command to find all of the pods that are in the Terminating status.
get pod -A | grep Terminating
- Force delete the pods with the Terminating status, using the following command.
kubectl delete pod <pod-name> -n <namespace> --force --grace-period=0
- Repeat the following command and verify that the stuck pods have been deleted successfully. If necessary, repeat step 3 again for the pods that continue to be in the Terminating status.
get pod -A | grep Terminating