Multi-Attach error for RWO (Block) volume when Node VM is shutdown before Pods are evicted and Volumes are detached from Node VM.
Note: This issue is present in all the Kubernetes releases.
Impact: After the Node is shutdown, the Pod running on that Node does not come up on the new Node. The events on the Pod will have a warning message for FailedAttachVolume. Error Message: Multi-Attach error for volume "pvc-uuid" Volume is already exclusively attached to one node and can't be attached to another.
Upstream Issue: Kubernetes is being enhanced to fix this issue. For more information, see the Kubernetes Enhancement Proposals (KEP) PR - kubernetes/enhancements#1116.
Workaround
The pods stuck in this state can be recovered by following steps.
- Find the Node VM in the vCenter Inventory. Make sure the correct VM associated with the Node is used for further instructions.
- Detach all the Persistent Volumes Disks attached to this Node VM.
Note: Do not detach the Primary disks used by the Guest OS.
- Right-click a virtual machine in the inventory and select Edit Settings.
- From the Virtual Hardware find all the Hard Disks for the Persistent Volumes and remove them.
Note: Do not select Delete files from datastore.
- Click OK to reconfigure VM to detach all the Persistent Volumes disks from shutdown/powered off Node VM.
- Execute kubectl get volumeattachments and find all volumeattachments objects associated with the shutdown Node VM.
- Edit volumeattachment object with
kubectl edit volumeattachments <volumeattachments-object-name>
and remove finalizers. - Check if the volumeattachment object is deleted by Kubernetes. If this object remains on the system, you can safely delete this with
kubectl delete volumeattachments <volumeattachments-object-name>
. - Wait for some time for the Pod to come up on a new Node.