vSphere with Tanzu supports rolling updates for Supervisor Clusters and Tanzu Kubernetes clusters, and for the infrastructure supporting these clusters.

How vSphere with Tanzu Clusters Are Updated

vSphere with Tanzu uses a rolling update model for Supervisor Clusters and Tanzu Kubernetes clusters. The rolling update model ensures that there is minimal downtime for cluster workloads during the update process. Rolling updates include upgrading the Kubernetes software versions and the infrastructure and services supporting the Kubernetes clusters, such as virtual machine configurations and resources, vSphere services and namespaces, and custom resources.

For the update to succeed, your configuration must meet several compatibility requirements, so the system enforces recheck conditions to ensure that clusters are ready for updates, and supports rollback if cluster upgrade is not successful.
Note: A vSphere with Tanzu update involves more than just an upgrade of the Kubernetes software version. We use the term "update" to describe this process instead of the term "upgrade," which is a limited form of update that increments the software version.

Dependency Between Supervisor Cluster Updates and Tanzu Kubernetes Cluster Updates

In most cases, you can update a Supervisor Cluster independently from updating Tanzu Kubernetes clusters. If you want to update only select Supervisor Clusters, or only select Tanzu Kubernetes clusters, you can.

However, currently there is a dependency between the two types of cluster updates. For the update path, see Supported Update Path.

About Supervisor Cluster Updates

When you initiate an update for a Supervisor Cluster, the system creates a new control plane node and joins it to the existing control plane. The vSphere inventory shows four control plane nodes during this phase of the update as the system adds a new updated node and then removes the older out-of-date node. Objects are migrated from one of the old control plane nodes to the new one, and the old control plane nodes is removed. This process repeats one-by-one until all control plane nodes are updated. Once the control plane is updated, the worker nodes are updated in a similar rolling update fashion. The worker nodes are the ESXi hosts, and each spherelet process on each ESXi host is updated one-by-one.

You can choose between two types of Supervisor Cluster updates to perform: a Supervisor namespaces Update, or update everything, including VMware versions and Kubernetes versions.

You use the vSphere Namespaces update workflow to update the Kubernetes version that the Supervisor Cluster is running, such as from Kubernetes 1.16.7 to Kubernetes 1.17.4, and the infrastructure supporting Kubernetes clusters. This type of update is more frequent and is used to maintain pace with the Kubernetes release cadence. This is the vSphere Namespaces update sequence.
  1. Upgrade vCenter Server.
  2. Perform a Supervisor Namespaces update (including Kubernetes upgrade).

To perform a vSphere Namespaces update, see Update the Supervisor Cluster by Performing a vSphere Namespaces Update.

You use the update everything workflow to update all vSphere with Tanzu components. This type of update is required when you are updating major releases, for example such as from NSX-T 3.0 to 3.X and from vSphere 7.0 to 7.X. This update workflow is infrequent depending on when there are new VMware product releases. This is the update everything sequence:
  1. Upgrade NSX-T Data Center.
  2. Upgrade vCenter Server.
  3. Upgrade ESXi hosts.
  4. Perform a Supervisor Namespaces update (including Kubernetes upgrade).

About Tanzu Kubernetes Cluster Updates

When you update a Supervisor Cluster, the infrastructure components supporting the Tanzu Kubernetes clusters deployed to that Supervisor Cluster, such as the Tanzu Kubernetes Grid Service, are likewise updated. Each infrastructure update can include updates for services supporting the Tanzu Kubernetes Grid Service (CNI, CSI, CPI), and updated configuration settings for the control plane and worker nodes that can be applied to existing Tanzu Kubernetes clusters. To ensure that your configuration meets compatibility requirements, vSphere with Tanzu performs pre-checks during rolling update and enforces compliance.

To perform a rolling update of a Tanzu Kubernetes cluster, typically you update the cluster manifest. See Update Tanzu Kubernetes Clusters. Note, however, that when a vSphere Namespaces Update is performed, the system immediately propagates updated configurations to all Tanzu Kubernetes clusters. These updates can automatically trigger a rolling update of the Tanzu Kubernetes control plane and worker nodes.

The rolling update process for replacing the cluster nodes is similar to the rolling update of pods in a Kubernetes Deployment. There are two distinct controllers responsible for performing a rolling update of Tanzu Kubernetes clusters: the Add-ons Controller and the TanzuKubernetesCluster controller. Within those two controllers there are three key stages to a rolling update: updating add-ons, updating the control plane, and updating the worker nodes. These stages occur in order, with pre-checks that prevent a step from beginning until the preceding step has sufficiently progressed. These steps might be skipped if they are determined to be unnecessary. For example, an update might only affect worker nodes and therefore not require any add-on or control plane updates.

During the update process, the system adds a new cluster node, and waits for the node to come online with the target Kubernetes version. The system then marks the old node for deletion, moves to the next node, and repeats the process. The old node is not deleted until all pods are removed. For example, if a pod is defined with PodDisruptionBudgets that prevent a node from being fully drained, the node is cordoned off but is not removed until those pods can be evicted. The system upgrades all control plane nodes first, then worker nodes. During an update, the Tanzu Kubernetes cluster status changes to "updating". After the rolling update process completes, the Tanzu Kubernetes cluster status changes to "running".

Pods running on a Tanzu Kubernetes cluster that are not governed by a replication controller will be deleted during a Kubernetes version upgrade as part of the worker node drain during the Tanzu Kubernetes cluster update. This is true if the cluster update is triggered manually or automatically by a vSphere Namespaces update. Pods not governed by a replication controller include pods that are not created as part of a Deployment or ReplicaSet spec. Refer to the topic Pod Lifecycle: Pod lifetime in the Kubernetes documentation for more information.