Since NSX-OVS is not supported in the latest kernel version, you can switch the NSX-OVS kernel module to the upstream OVS kernel module before upgrading the kernel to the latest version. If NCP does not work with the latest kernel after a kernel upgrade, you can do a rollback (switch back to NSX-OVS and downgrade the kernel).
The first procedure below describes how to switch the NSX-OVS kernel module to the upstream OVS kernel module when you upgrade the kernel. The second procedure describes how to switch back to the NSX-OVS kernel module when you downgrade the kernel.
Both procedures involve the Kubernetes concepts taints and tolerations. For more information about these concepts, see https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration.
Switch to the upstream OVS kernel module
- Modify the tolerations of both daemonset.apps/nsx-ncp-bootstrap and daemonset.apps/nsx-node-agent. Change the following:
- effect: NoExecute operator: Exists
to:- effect: NoExecute key: evict-user-pods
- Modify the nsx-node-agent configmap. Change use_nsx_ovs_kernel_module to
False
. - Taint worker-node1 "evict-user-pods:NoExecute" to evict all user pods in this node to other nodes:
kubectl taint nodes worker-node1 evict-user-pods:NoExecute
- Taint worker-node1 "evict-ncp-pods:NoExecute" to evict nsx-node-agent and nsx-ncp-bootstrap pods in this node to other nodes:
kubectl taint nodes worker-node1 evict-ncp-pods:NoExecute
- Uninstall the ovs-kernel module and restore the upstream OVS kernel module on worker-node1.
- Delete kmod files vport-geneve.ko, vport-gre.ko, vport-lisp.ko, vport-stt.ko, vport-vxlan.ko, openvswitch.ko in directory /lib/modules/$(uname -r)/weak-updates/openvswitch.
- If there are vport-geneve.ko, vport-gre.ko, vport-lisp.ko, vport-stt.ko, vport-vxlan.ko, openvswitch.ko files in directory /lib/modules/$(uname -r)/nsx/usr-ovs-kmod-backup, move them to directory /lib/modules/$(uname -r)/weak-updates/openvswitch.
- Delete directory /lib/modules/$(uname -r)/nsx.
- Upgrade the kernel of worker-node1 to the latest version and reboot it.
Note: Set SELinux to Permissive mode on worker-node1 if containerd and kubelet cannot be running.
- Restart kubelet.
- Remove taint "evict-ncp-pods:NoExecute" from worker-node1. Verify that bootstrap and node-agent can start.
- Remove taint "evict-user-pods:NoExecute" from worker-node1. Verify that all pods in this node are running.
- Repeat steps 3-9 for other nodes.
- Recover the tolerations of both nsx-ncp-bootstrap and nsx-node-agent DaemonSets in step 1.
Switch back to the NSX-OVS kernel module
- Modify the tolerations of both daemonset.apps/nsx-ncp-bootstrap and daemonset.apps/nsx-node-agent. Change the following:
- effect: NoExecute operator: Exists
to:- effect: NoExecute key: evict-user-pods
- Modify the nsx-node-agent configmap. Change use_nsx_ovs_kernel_module to
True
. - Taint worker-node1 "evict-user-pods:NoExecute" to evict all user pods in this node to other nodes:
kubectl taint nodes worker-node1 evict-user-pods:NoExecute
- Taint worker-node1 "evict-ncp-pods:NoExecute" to evict nsx-node-agent and nsx-ncp-bootstrap pods in this node to other nodes:
kubectl taint nodes worker-node1 evict-ncp-pods:NoExecute
- Downgrade the kernel of worker-node1 to a supported version and reboot it.
Note: Set SELinux to Permissive mode on worker-node1 if containerd and kubelet cannot be running.
- Restart kubelet.
- Remove taint "evict-ncp-pods:NoExecute" from worker-node1. Verify that bootstrap and node-agent can start.
- Remove taint "evict-user-pods:NoExecute" from worker-node1. Verify that all pods in this node are running.
- Repeat steps 3-8 for other nodes.
- Recover the tolerations of both nsx-ncp-bootstrap and nsx-node-agent DaemonSets in step 1.