Since NSX-OVS is not supported in the latest kernel version, you can switch the NSX-OVS kernel module to the upstream OVS kernel module before upgrading the kernel to the latest version. If NCP does not work with the latest kernel after a kernel upgrade, you can do a rollback (switch back to NSX-OVS and downgrade the kernel).

The first procedure below describes how to switch the NSX-OVS kernel module to the upstream OVS kernel module when you upgrade the kernel. The second procedure describes how to switch back to the NSX-OVS kernel module when you downgrade the kernel.

Both procedures involve the Kubernetes concepts taints and tolerations. For more information about these concepts, see https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration.

Switch to the upstream OVS kernel module

  1. Modify the tolerations of both daemonset.apps/nsx-ncp-bootstrap and daemonset.apps/nsx-node-agent. Change the following:
          - effect: NoExecute
            operator: Exists
    to:
          - effect: NoExecute
            key: evict-user-pods
  2. Modify the nsx-node-agent configmap. Change use_nsx_ovs_kernel_module to False.
  3. Taint worker-node1 "evict-user-pods:NoExecute" to evict all user pods in this node to other nodes:
    kubectl taint nodes worker-node1 evict-user-pods:NoExecute
  4. Taint worker-node1 "evict-ncp-pods:NoExecute" to evict nsx-node-agent and nsx-ncp-bootstrap pods in this node to other nodes:
    kubectl taint nodes worker-node1 evict-ncp-pods:NoExecute
  5. Uninstall the ovs-kernel module and restore the upstream OVS kernel module on worker-node1.
    1. Delete kmod files vport-geneve.ko, vport-gre.ko, vport-lisp.ko, vport-stt.ko, vport-vxlan.ko, openvswitch.ko in directory /lib/modules/$(uname -r)/weak-updates/openvswitch.
    2. If there are vport-geneve.ko, vport-gre.ko, vport-lisp.ko, vport-stt.ko, vport-vxlan.ko, openvswitch.ko files in directory /lib/modules/$(uname -r)/nsx/usr-ovs-kmod-backup, move them to directory /lib/modules/$(uname -r)/weak-updates/openvswitch.
    3. Delete directory /lib/modules/$(uname -r)/nsx.
  6. Upgrade the kernel of worker-node1 to the latest version and reboot it.

    Note: Set SELinux to Permissive mode on worker-node1 if containerd and kubelet cannot be running.

  7. Restart kubelet.
  8. Remove taint "evict-ncp-pods:NoExecute" from worker-node1. Verify that bootstrap and node-agent can start.
  9. Remove taint "evict-user-pods:NoExecute" from worker-node1. Verify that all pods in this node are running.
  10. Repeat steps 3-9 for other nodes.
  11. Recover the tolerations of both nsx-ncp-bootstrap and nsx-node-agent DaemonSets in step 1.

Switch back to the NSX-OVS kernel module

  1. Modify the tolerations of both daemonset.apps/nsx-ncp-bootstrap and daemonset.apps/nsx-node-agent. Change the following:
          - effect: NoExecute
            operator: Exists
    to:
          - effect: NoExecute
            key: evict-user-pods
  2. Modify the nsx-node-agent configmap. Change use_nsx_ovs_kernel_module to True.
  3. Taint worker-node1 "evict-user-pods:NoExecute" to evict all user pods in this node to other nodes:
    kubectl taint nodes worker-node1 evict-user-pods:NoExecute
  4. Taint worker-node1 "evict-ncp-pods:NoExecute" to evict nsx-node-agent and nsx-ncp-bootstrap pods in this node to other nodes:
    kubectl taint nodes worker-node1 evict-ncp-pods:NoExecute
  5. Downgrade the kernel of worker-node1 to a supported version and reboot it.

    Note: Set SELinux to Permissive mode on worker-node1 if containerd and kubelet cannot be running.

  6. Restart kubelet.
  7. Remove taint "evict-ncp-pods:NoExecute" from worker-node1. Verify that bootstrap and node-agent can start.
  8. Remove taint "evict-user-pods:NoExecute" from worker-node1. Verify that all pods in this node are running.
  9. Repeat steps 3-8 for other nodes.
  10. Recover the tolerations of both nsx-ncp-bootstrap and nsx-node-agent DaemonSets in step 1.