This topic provides conceptual information about upgrading VMware Tanzu Kubernetes Grid Integrated Edition (TKGI) and TKGI-provisioned Kubernetes clusters.

For step-by-step instructions on upgrading TKGI and TKGI-provisioned Kubernetes clusters, see:



Overview

An Tanzu Kubernetes Grid Integrated Edition upgrade modifies the TKGI version, for example, upgrading TKGI from v1.16.x to v1.17.0 or from v1.17.0 to v1.17.1.

There are two ways you can upgrade TKGI:

  • Full Upgrade: By default, TKGI is set to perform a full upgrade, which upgrades both the TKGI control plane and all TKGI-provisioned Kubernetes clusters.

  • Control Plane Only Upgrade: You can choose to upgrade TKGI in two phases by upgrading the TKGI control plane first and then upgrading your TKGI-provisioned Kubernetes clusters later.


Deciding Between Full and Two-Phase Upgrade

When deciding whether to perform the default full upgrade or to upgrade the TKGI control plane and TKGI-provisioned Kubernetes clusters separately, consider your organization’s needs.

You might prefer to upgrade TKGI in two phases because of the advantages it provides:

  • If your organization runs TKGI-provisioned Kubernetes clusters in both development and production environments and you want to upgrade only one environment first, you can achieve your goal by upgrading the TKGI control plane and TKGI-provisioned Kubernetes separately.

  • Faster Tanzu Kubernetes Grid Integrated Edition tile upgrades. If you have a large number of clusters in your TKGI deployment, performing a full upgrade can significantly increase the amount of time required to upgrade the Tanzu Kubernetes Grid Integrated Edition tile.

  • More granular control over cluster upgrades. In addition to enabling you to upgrade subsets of clusters, the TKGI CLI supports upgrading each cluster individually.

  • Not a monolithic upgrade. This helps isolate the root cause of an error when troubleshooting upgrades. For example, when a cluster-related upgrade error occurs during a full upgrade, the entire Tanzu Kubernetes Grid Integrated Edition tile upgrade might fail.

    Warning: If you deactivate the default full upgrade and upgrade only the TKGI control plane, you must upgrade all your TKGI-provisioned Kubernetes clusters before the next Tanzu Kubernetes Grid Integrated Edition tile upgrade. Deactivating the default full upgrade and upgrading only the TKGI control plane cause the TKGI version tagged in your Kubernetes clusters to fall behind the Tanzu Kubernetes Grid Integrated Edition tile version. If your TKGI-provisioned Kubernetes clusters fall more than one version behind the tile, TKGI cannot upgrade the clusters.


Deciding Between Tile or CLI Upgrade

You can use either the Tanzu Kubernetes Grid Integrated Edition tile or the TKGI CLI to perform TKGI upgrades:

  • To perform a full upgrade of the TKGI control plane and TKGI-provisioned Kubernetes clusters, use the Tanzu Kubernetes Grid Integrated Edition tile .
  • To upgrade the TKGI control plane only, use the Tanzu Kubernetes Grid Integrated Edition tile.
  • To upgrade TKGI-provisioned Kubernetes clusters, use either the TKGI CLI or the Tanzu Kubernetes Grid Integrated Edition tile.

Upgrade Method Supported Upgrade Types
Full TKGI upgrade TKGI control plane only Kubernetes clusters only
TKGI Tile
TKGI CLI

Typically, if you choose to upgrade TKGI-provisioned Kubernetes clusters only, you will upgrade them through the TKGI CLI.



What Happens During Full TKGI and TKGI Control Plane Only Upgrades

After you add a new Tanzu Kubernetes Grid Integrated Edition tile version to your staging area on the Ops Manager Installation Dashboard, Ops Manager automatically migrates your configuration settings into the new tile version.

You can perform a full TKGI upgrade or a TKGI control plane only upgrade:

Full TKGI Upgrades

During a full TKGI upgrade, the Tanzu Kubernetes Grid Integrated Edition tile does the following:

  1. Recreates the Control Plane VMs:

  2. Upgrades Clusters:

    • Upgrades all of the TKGI-provisioned Kubernetes clusters.
    • Requires the Upgrade all clusters errand check box is activated in the Errands pane on the Tanzu Kubernetes Grid Integrated Edition tile.
    • For more information, see What Happens During Cluster Upgrades below.

      Warning: If you have TKGI-provisioned Windows worker clusters, do not activate the Upgrade all clusters errand before upgrading to the TKGI v1.17 tile. You cannot use the Upgrade all clusters errand because you must manually migrate each individual Windows worker cluster to the CSI Driver for vSphere. For more information, see Configure vSphere CSI for Windows in Deploying and Managing Cloud Native Storage (CNS) on vSphere.

TKGI Control Plane Only Upgrades

During a TKGI control plane only upgrade, the Tanzu Kubernetes Grid Integrated Edition tile does the following:

  1. Recreates the Control Plane VMs:

  2. Does Not Upgrade Clusters:

    • Does not automatically upgrade TKGI-provisioned Kubernetes clusters after upgrading the TKGI control plane.
    • Requires the Upgrade all clusters errand check box is deactivated in the Errands pane on the Tanzu Kubernetes Grid Integrated Edition tile.
    • The TKGI-provisioned Kubernetes clusters remain on the previous TKGI version until you manually upgrade them. For more information, see What Happens During Cluster Upgrades below, and Upgrading Clusters.
    • Some cluster management tasks are not supported for clusters that are running the previous TKGI version. For more information, see Tasks Supported Following a TKGI Control Plane Only Upgrade below.



What Happens During Control Plane Upgrades

Note the following when upgrading the TKGI control plane:

  • Upgrading the TKGI control plane includes upgrading the TKGI API server, UAA server, and the TKGI database.
  • If the TKGI installation is not scaled for high availability (beta), the control plane upgrade causes temporary outages as described in Control Plane Outages below.
  • The control plane upgrade will halt if a control plane canary instance encounters an error. For more information, see Canary Instances below.


Control Plane Outages

When the TKGI control plane is not scaled for high availability (beta), upgrading the control plane temporarily interrupts the following:

  • Logging in to the TKGI CLI and using all tkgi commands.
  • Using the TKGI API to retrieve information about clusters.
  • Using the TKGI API to create and delete clusters.
  • Using the TKGI API to resize clusters.

These outages do not affect the Kubernetes clusters themselves. During a TKGI control plane upgrade, you can still interact with clusters and their workloads using the Kubernetes Command Line Interface, kubectl.

For more information about the TKGI control plane and high availability (beta), see TKGI Control Plane Overview in Tanzu Kubernetes Grid Integrated Edition Architecture.


Canary Instances

The Tanzu Kubernetes Grid Integrated Edition tile is a BOSH deployment.

BOSH-deployed products can set a number of canary instances to upgrade first, before the rest of the deployment VMs. BOSH continues the upgrade only if the canary instance upgrade succeeds. If the canary instance encounters an error, the upgrade stops running and other VMs are not affected.

The Tanzu Kubernetes Grid Integrated Edition tile uses one canary instance when deploying or upgrading Tanzu Kubernetes Grid Integrated Edition.


Tasks Supported Following a TKGI Control Plane Only Upgrade

TKGI allows an admin to upgrade the TKGI control plane without upgrading the TKGI-provisioned Kubernetes clusters. These clusters continue running the previous TKGI version.

Although the TKGI CLI generally supports these clusters, there are CLI commands that are not supported.

The following tables summarize which TKGI CLI commands are supported on clusters running the previous TKGI version:

Note: VMware recommends you do not run TKGI CLI cluster management commands on clusters running the previous TKGI version.


TKGI CLI Utility Commands

The following summarizes the TKGI CLI utility commands that are supported for clusters running the previous TKGI version.

Task Status Notes

Supported Tasks
  • tkgi cancel-task
  • tkgi clusters
  • tkgi compute-profile
  • tkgi compute-profiles
  • tkgi create-compute-profile
  • tkgi create-kubernetes-profile
  • tkgi create-network-profile
  • tkgi delete-compute-profile
  • tkgi delete-kubernetes-profile
  • tkgi delete-network-profile
  • tkgi kubernetes-profile
  • tkgi kubernetes-profiles
  • tkgi login
  • tkgi logout
  • tkgi network-profile
  • tkgi network-profiles
  • tkgi plans
  • tkgi task
  • tkgi tasks


TKGI CLI Cluster Management Commands

The following summarizes the TKGI CLI cluster management commands that are supported for clusters running the previous TKGI version.

Note: VMware recommends you do not run TKGI CLI cluster management commands on clusters running the previous TKGI version.

Task Status Notes

Supported Tasks
  • tkgi certificates
  • tkgi cluster
  • tkgi create-cluster
  • tkgi delete-cluster
  • tkgi get-credentials
  • tkgi get-kubeconfig
  • tkgi upgrade-cluster
  • tkgi upgrade-clusters

Partially-Supported Tasks
  • tkgi update-cluster

    For more information, see the Notes column.

Supported tkgi update-cluster Flags:
  • ‑‑num-nodes INT32 *
  • ‑‑node-pool-instances *


Unsupported tkgi update-cluster Flags:
  • ‑‑compute-profile
  • ‑‑kubelet-drain-timeout
  • ‑‑kubelet-drain-grace-period
  • ‑‑kubelet-drain-force
  • ‑‑kubelet-drain-ignore-daemonsets
  • ‑‑kubelet-drain-delete-local-data
  • ‑‑kubelet-drain-force-node
  • ‑‑kubernetes-profile
  • ‑‑network-profile
  • ‑‑tags []ClusterTag *
  • ‑‑config-file

* Clusters running the previous TKGI version and configured with ‑‑tags do not support any tkgi update-cluster operations.

Unsupported Tasks
  • tkgi rotate-certificates



What Happens During Cluster Upgrades

Upgrading a TKGI-provisioned Kubernetes cluster upgrades the cluster to the TKGI version of the TKGI control plane and tags the cluster with the upgrade version.

Upgrading the cluster also upgrades the cluster’s Kubernetes version to the version included with the Tanzu Kubernetes Grid Integrated Edition tile.

During an upgrade of TKGI-provisioned clusters, TKGI recreates your clusters. This includes the following stages for each cluster you upgrade:

  1. Control Plane nodes are recreated.
  2. Worker nodes are recreated.

Depending on your cluster configuration, these recreations might cause Cluster Control Plane Nodes Outage or Worker Nodes Outage as described below.

Note: When the Upgrade all clusters errand is enabled in the Tanzu Kubernetes Grid Integrated Edition tile, updating the tile with a new Linux or Windows stemcell rolls every Linux or Windows VM in each Kubernetes cluster. This automatic rolling ensures that all your VMs are patched. To avoid workload downtime, use the resource configuration recommended in Control Plane Nodes Outage and Worker Nodes Outage below and in Maintaining Workload Uptime.

You can upgrade TKGI-provisioned Kubernetes clusters either through the Tanzu Kubernetes Grid Integrated Edition tile or the TKGI CLI. See the table below.

This method Upgrades
The Upgrade all clusters errand in
the Tanzu Kubernetes Grid Integrated Edition tile > Errands
All clusters. Clusters are upgraded serially.
tkgi upgrade-cluster One cluster.
tkgi upgrade-clusters Multiple clusters. Clusters are upgraded serially or in parallel.


Cluster Control Plane Nodes Outage

When TKGI upgrades a single-control plane node cluster, you cannot interact with your cluster, use kubectl, or push new workloads.

To avoid this loss of functionality, VMware recommends using multi-control plane node clusters.

Worker Nodes Outage

When TKGI upgrades a worker node, the node stops running containers. If your workloads run on a single node, they will experience downtime.

To avoid downtime for stateless workloads, VMware recommends using at least one worker node per availability zone (AZ). For stateful workloads, VMware recommends using a minimum of two worker nodes per AZ.



About Switching from the Flannel CNI to the Antrea CNI

Tanzu Kubernetes Grid Integrated Edition supports Antrea, Flannel, and NSX-T as the Container Network Interfaces (CNIs) for TKGI-provisioned clusters.

VMware recommends the Antrea CNI over Flannel. The Antrea CNI provides Kubernetes Network Policy support for non-NSX-T environments. Antrea CNI-configured clusters are supported on AWS, Azure, and vSphere without NSX-T environments.

For more information about Antrea, see Antrea in the Antrea documentation.

Note: Support for the Flannel Container Networking Interface (CNI) is deprecated.

VMware recommends that you configure Antrea as the default TKGI-provisioned cluster CNI, and that you switch your Flannel CNI-configured clusters to the Antrea CNI.


Switch from the Flannel CNI to Antrea

You can configure TKGI to network newly created TKGI-provisioned clusters with the Antrea CNI.

Configure the TKGI default CNI during TKGI installation and upgrade only.

During TKGI installation:

  • Configure the TKGI default CNI as either Antrea or vSphere with NSX-T.

During TKGI upgrades:

  • You can optionally change the TKGI default CNI from the deprecated Flannel CNI to Antrea.
  • Do not change the CNI configuration from Antrea to Flannel.
  • Do not change the CNI configuration from or to NSX-T after your initial TKGI installation.

If you initially configured TKGI to use Flannel as the default CNI and switch to Antrea as the default CNI during a TKGI upgrade:

  • Existing Flannel-configured clusters remain networked using Flannel. Your existing Flannel clusters will not be migrated to Antrea.
  • New clusters created after the upgrade are created using Antrea as their CNI.

Warning:
Do not change the TKGI default CNI configuration between upgrades.


For information about selecting and configuring a CNI for TKGI, see the Networking section of the installation documentation for your environment:

check-circle-line exclamation-circle-line close-line
Scroll to top icon