Upgrading Clusters

This topic describes how to upgrade Kubernetes clusters provisioned by VMware Tanzu Kubernetes Grid Integrated Edition (Tanzu Kubernetes Grid Integrated Edition) through the Tanzu Kubernetes Grid Integrated Edition Command Line Interface (TKGI CLI).

For information about how to upgrade TKGI-provisioned clusters through the Tanzu Kubernetes Grid Integrated Edition tile, see Verify Errand Configuration in one of the following topics:

For conceptual information about Tanzu Kubernetes Grid Integrated Edition upgrades, see About Tanzu Kubernetes Grid Integrated Edition Upgrades.

Overview

Upgrading a TKGI-provisioned Kubernetes cluster updates the Tanzu Kubernetes Grid Integrated Edition version and the Kubernetes version of the cluster.

Note: When upgrading TKGI to mitigate the Apache Log4j vulnerability you must also upgrade all TKGI clusters.

TKGI-provisioned Kubernetes clusters upgrade when:

You upgrade Tanzu Kubernetes Grid Integrated Edition with the Upgrade all clusters errand enabled in the Tanzu Kubernetes Grid Integrated Edition tile > Errands.
You run tkgi upgrade-cluster or tkgi upgrade-clusters as described in Upgrade Clusters below.

For example, running tkgi upgrade-cluster upgrades the cluster you specify to your current version of Tanzu Kubernetes Grid Integrated Edition and to the version of Kubernetes that is included with your current version of Tanzu Kubernetes Grid Integrated Edition.

WARNING: Do not change the number of control plane/etcd nodes for any plan that was used to create currently-running clusters. Tanzu Kubernetes Grid Integrated Edition does not support changing the number of control plane/etcd nodes for plans with existing clusters.

Prerequisites

Before upgrading TKGI-provisioned Kubernetes clusters:

If you have not already done so, install the TKGI CLI for the current TKGI version. For information, see Installing the TKGI CLI.
Verify the cluster you are upgrading supports upgrading. For information, see Verify Your Clusters Support Upgrading in the Upgrade Preparation Checklist for Tanzu Kubernetes Grid Integrated Edition.
Verify that your Kubernetes environment is healthy. For information, see Verifying Deployment Health.
Log in to Tanzu Kubernetes Grid Integrated Edition using tkgi login. For more information, see Logging in to Tanzu Kubernetes Grid Integrated Edition.

(Optional) Customize Cluster Container Runtimes Before Upgrading

Containerd is the default container runtime for newly created clusters.

Note: All Docker container runtime clusters must be switched to use the containerd-runtime prior to upgrading to TKGI v1.15.

By default, the TKGI v1.14 upgrade-cluster errand will switch a cluster’s container runtime from Docker to containerd.

Warning: Cluster workloads will experience downtime while the cluster switches from using the Docker runtime to containerd.

VMware recommends that before upgrading to TKGI v1.14 that you either switch your clusters to the containerd container runtime or “lock” your clusters to the Docker container runtime:

Switch a Cluster to a Different Container Runtime
Lock a Cluster to the Docker Container Runtime

For more information on the containerd-runtime, see Containerd Container Runtime.

Switch a Cluster to a Different Container Runtime

You can switch an existing cluster from using a Docker container runtime to a containerd container runtime.

Warning: Cluster workloads will experience downtime while the cluster switches from using the Docker runtime to containerd.

Note: Switching a cluster to a different container runtime is available only in TKGI v1.13.3 and later TKGI v1.13 patches.

To switch an existing cluster to a different container runtime:

To identify which of your existing clusters use a Docker container runtime:
```
kubectl get nodes -o wide
```
Create either a JSON or YAML cluster configuration file containing the following content:
- JSON formatted configuration file:
```
{
    "runtime": "RUNTIME-NAME"
}
```
- YAML formatted configuration file:
```
---
runtime: RUNTIME-NAME
```
Where RUNTIME-NAME specifies either docker or containerd as the container runtime to switch to.
To update your cluster with your configuration settings, run the following command:
```
tkgi update-cluster CLUSTER-NAME --config-file CONFIG-FILE-NAME
```
Where:
- CLUSTER-NAME is the name of your cluster.
- CONFIG-FILE-NAME is the cluster configuration file you created above.
Verify your cluster now uses the containerd container runtime.

Note: You must manage and monitor Docker and containerd runtime clusters differently. For more information, see Breaking Changes in the TKGI v1.12 Release Notes.

Lock a Cluster to the Docker Container Runtime

If you want an existing cluster to continue using the Docker container runtime after it has been upgraded to TKGI v1.14, you must lock the cluster’s container runtime before upgrading.

Warning: The default value for lock_container_runtime is false. A “locked” cluster will switch to using the containerd runtime during the next TKGI upgrade if, between locking and upgrading, you run tkgi update-cluster without including the lock_container_runtime: true parameter in your configuration.

To lock an existing cluster to its current container runtime:

To identify which of your existing clusters use a Docker container runtime:
```
kubectl get nodes -o wide
```
Create either a JSON or YAML cluster configuration file containing the following content:
- JSON formatted configuration file:
```
{
    "lock_container_runtime": true
}
```
- YAML formatted configuration file:
```
---
lock_container_runtime: true
```
To update your cluster with your configuration settings, run the following command:
```
tkgi update-cluster CLUSTER-NAME --config-file CONFIG-FILE-NAME
```
Where:
- CLUSTER-NAME is the name of your cluster.
- CONFIG-FILE-NAME is the configuration file to use to lock the container runtime.

Note: Locking a cluster container runtime is available only in TKGI v1.13.3 and later TKGI v1.13 patches.

Upgrade Clusters

You can use the TKGI CLI to upgrade an existing cluster to the current version of TKGI.

To upgrade the TKGI version on individual or multiple clusters:

Upgrade a Single Kubernetes Cluster
Upgrade Multiple Kubernetes Clusters

To monitor or stop a cluster upgrade, follow the procedures in Manage Your Kubernetes Cluster Upgrade Job below.

Upgrade a Single Cluster

The Tanzu Kubernetes Grid Integrated Edition CLI provides upgrade-cluster for upgrading an individual Tanzu Kubernetes Grid Integrated Edition-provisioned Kubernetes cluster.

To upgrade an individual Kubernetes cluster:

Run the following command:
```
tkgi upgrade-cluster CLUSTER-NAME
```
Where CLUSTER-NAME is the name of the Kubernetes cluster you want to upgrade.

For more information about the tkgi upgrade-cluster command, see tkgi upgrade-cluster in the TKGI CLI documentation.

Note: The nodes in an upgrading cluster are processed serially.

To upgrade multiple clusters, see Upgrade Multiple Kubernetes Clusters below.

Upgrade Multiple Clusters

The Tanzu Kubernetes Grid Integrated Edition CLI provides upgrade-clusters for upgrading multiple Tanzu Kubernetes Grid Integrated Edition-provisioned Kubernetes clusters. You can upgrade clusters serially, serially with some clusters designated as canary clusters, or entirely in parallel.

To upgrade multiple Kubernetes clusters:

Run the following command:
```
tkgi upgrade-clusters CLUSTER-NAMES
```
Where CLUSTER-NAMES is a list of names of the Kubernetes clusters that you want to upgrade.

For more information about the tkgi upgrade-clusters command, see tkgi upgrade-clusters in the TKGI CLI documentation.

Note: The nodes in an upgrading cluster are always processed serially.

To upgrade a single cluster, see Upgrade a Single Kubernetes Cluster above.

Upgrade Clusters in Parallel

To upgrade multiple Kubernetes clusters:

Run the following command:
```
tkgi upgrade-clusters --clusters CLUSTER-NAMES --max-in-flight CLUSTER-COUNT --wait
```
Where:
- CLUSTER-NAMES is a comma-delimited list of the names of the Kubernetes clusters you want to upgrade.
- CLUSTER-COUNT is the maximum number of clusters to upgrade in parallel within an AZ. The CLUSTER-COUNT must be less than your TKGI tile > TKGI API Service > Worker VM Max in Flight value. For example, if your TKGI tile Worker VM Max in Flight value remains the default of 4, run upgrade-clusters with a –max-in-flight argument value less than 4.

Considerations when running tkgi upgrade-clusters:

If an upgrade for a cluster in the --clusters list fails, the tkgi upgrade-clusters job continues to a subsequent cluster in the list.
Clusters are upgraded serially if --max-in-flight is not set.
If the count of names in the --clusters list is more than the --max-in-flight value, the first set of clusters are upgraded in parallel and subsequent clusters are queued. As the initial cluster upgrades complete, the remaining clusters are pulled from the queue and upgraded in parallel.
To run the cluster upgrade job as a background task, remove the --wait argument.

Note: The nodes in an upgrading cluster are always processed serially.

For example:

$ tkgi upgrade-clusters --clusters k8-cluster-000,k8-cluster-001,k8-cluster-002 --max-in-flight 2  --wait  
  
You are about to upgrade k8-cluster-000, k8-cluster-001 and k8-cluster-002.  
Warning: This operation may be long running and may block further operations on the cluster(s) until complete  
  
Continue? (y/n):y  
Your taskID for the upgrade task is: d772aba0-2670-4fba-b26c-044b19d6ab60  
Started upgrading cluster: k8-cluster-000  
Started upgrading cluster: k8-cluster-001  
Finished upgrading cluster: k8-cluster-000  
Started upgrading cluster: k8-cluster-002  
Finished upgrading cluster: k8-cluster-001  
Finished upgrading cluster: k8-cluster-002  
Upgrade task d772aba0-2670-4fba-b26c-044b19d6ab60 is done.

Upgrade Clusters With Canaries

To upgrade multiple clusters and automatically stop upgrading clusters if a cluster upgrade fails, specify your cluster list as canary clusters. You can specify one or more clusters as canary clusters.

To upgrade multiple clusters with one or more canary clusters:

Run the following command:
```
tkgi upgrade-clusters --canaries CANARY-CLUSTER-NAMES --clusters CLUSTER-NAMES --wait
```
Where:
- CANARY-CLUSTER-NAMES is a comma-delimited list of the names of the Kubernetes clusters you want to upgrade as canary clusters.
- CLUSTER-NAMES is a comma-delimited list of Kubernetes clusters to upgrade if all canary clusters successfully upgrade.
Note: The –clusters argument is required.

Considerations when running tkgi upgrade-clusters with a --canaries list:

The clusters specified in the --canaries list are upgraded prior to upgrading the clusters in your --clusters list.
If a canary cluster upgrade fails, the entire tkgi upgrade-clusters job stops.
If a --clusters list cluster upgrade fails, the tkgi upgrade-clusters job continues to a subsequent cluster in the list.
To configure tkgi upgrade-clusters to stop for any cluster upgrade failure, specify only one cluster in your –clusters list and the remaining clusters in your –canaries list.
Canary clusters are always upgraded serially. To upgrade clusters in the --clusters list in parallel, see Upgrade Clusters in Parallel above.
To run the cluster upgrade job as a background task, remove the --wait argument.

Note: The nodes in an upgrading cluster are always processed serially.

For example:

$ tkgi upgrade-clusters –canaries k8-cluster-dev,k8-cluster-000,k8-cluster-001 –clusters k8-cluster-002 –wait
You are about to upgrade k8-cluster-dev k8-cluster-000, k8-cluster-001 and k8-cluster-002. Warning: This operation may be long running and may block further operations on the cluster(s) until complete

Continue? (y/n):y Your taskID for the upgrade task is: ce31a1bb-380a-453f-afa0-835ffa1ce6ac Started upgrading cluster: k8-cluster-000 Upgrading cluster succeeded: k8-cluster-000 Started upgrading cluster: k8-cluster-001 Upgrading cluster succeeded: k8-cluster-001 Started upgrading cluster: k8-cluster-dev Upgrading cluster failed: k8-cluster-dev Upgrade task ce31a1bb-380a-453f-afa0-835ffa1ce6ac is done.

Manage Your Cluster Upgrade Job

You can use the TKGI CLI to monitor and manage your Tanzu Kubernetes Grid Integrated Edition-provisioned Kubernetes cluster upgrade jobs.

Monitor Your Clusters

To review the status of the actions being performed on your clusters, run the following command:

tkgi clusters

For example:

 $ tkgi clusters 
 Upgrade is available to TKGI Version: 1.9.0-build.1 
 TKGI Version     Name               k8s Version  Plan Name  UUID                                  Status       Action 1.9.0-build.1   k8-cluster-000     1.18.8       small      9527ebaa-e2fa-422f-a52b-de3c3f0e39a4  succeeded    UPGRADE 1.9.0-build.1   k8-cluster-001     1.18.8       small      9527ebaa-e2fa-422f-a52b-de3c3f0e39a4  failed       UPGRADE 1.9.0-build.1   k8-cluster-002     1.18.8       small      9527ebaa-e2fa-422f-a52b-de3c3f0e39a4  in progress  UPGRADE 1.9.0-build.1   k8-cluster-003     1.18.8       small      9527ebaa-e2fa-422f-a52b-de3c3f0e39a4  queued       UPGRADE

Monitor Your Cluster Upgrade Job

To review the status of your upgrade-clusters job, run the following command:

tkgi task TASKID

Where TASKID is the ID of the task that was returned when you ran tkgi upgrade-clusters.

For example:

 $ tkgi task ce31a1bb-380a-453f-afa0-835ffa1ce6ac 
 Your upgrade task is: done 
 Name           Status     Start time                     End time                            isCanary k8-cluster-000 succeeded  Mon, 14 Oct 2019 12:00:00 PDT  Mon, 14 Oct 2019 12:19:54 PDT       true k8-cluster-001 failed     Mon, 14 Oct 2019 12:20:00 PDT  —                                 true

Stop Your Cluster Upgrade Job

To cancel a running upgrade-clusters job, run the following TKGI CLI command:

tkgi cancel-task TASKID

Where TASKID is the ID of the task that was returned when you ran tkgi upgrade-clusters.

Warning: tkgi cancel-task does not cancel cluster upgrades currently in progress. This command only cancels a job’s pending cluster upgrades.

After Upgrading Clusters

(Optional) Restore Cluster Sizing

If you scaled your cluster up for the upgrade and you prefer to restore your cluster to its original sizing, you can now scale the cluster back down to its previous configuration. VMware recommends that you not scale down your clusters and continue to run them with recommended configurations, reducing the chance of a future outage.