Upgrading Clusters

This topic describes how to use the VMware Tanzu Kubernetes Grid Integrated Edition Command Line Interface (TKGI CLI) to upgrade TKGI-provisioned Kubernetes clusters.

For information about how to upgrade TKGI-provisioned clusters through the Tanzu Kubernetes Grid Integrated Edition tile, see Verify Errand Configuration in one of the following topics:

For conceptual information about Tanzu Kubernetes Grid Integrated Edition upgrades, see About Tanzu Kubernetes Grid Integrated Edition Upgrades.

Overview

Upgrading a TKGI-provisioned Kubernetes cluster updates the Tanzu Kubernetes Grid Integrated Edition version and the Kubernetes version of the cluster.

TKGI-provisioned Kubernetes clusters upgrade when:

You upgrade Tanzu Kubernetes Grid Integrated Edition with the Upgrade all clusters errand enabled in the Tanzu Kubernetes Grid Integrated Edition tile > Errands.
You run tkgi upgrade-cluster or tkgi upgrade-clusters as described in Upgrade Clusters below.

For example, running tkgi upgrade-cluster upgrades the cluster you specify to your current version of Tanzu Kubernetes Grid Integrated Edition and to the version of Kubernetes that is included with your current version of Tanzu Kubernetes Grid Integrated Edition.

WARNING: Do not change the number of control plane/etcd nodes for any plan that was used to create currently-running clusters. Tanzu Kubernetes Grid Integrated Edition does not support changing the number of control plane/etcd nodes for plans with existing clusters.

Prerequisites

Before upgrading TKGI-provisioned Kubernetes clusters:

If you have not already done so, install the TKGI CLI for the current TKGI version. For information, see Installing the TKGI CLI.
Verify the cluster you are upgrading supports upgrading. For information, see Verify Your Clusters Support Upgrading in the Upgrade Preparation Checklist for Tanzu Kubernetes Grid Integrated Edition.
Verify that your Kubernetes environment is healthy. For information, see Verifying Deployment Health.
If you are upgrading a cluster that uses a public cloud CSI driver, see Limitations on Using a Public Cloud CSI Driver in Release Notes for additional requirements.
Log in to Tanzu Kubernetes Grid Integrated Edition using tkgi login. For more information, see Logging in to Tanzu Kubernetes Grid Integrated Edition.

Upgrade Clusters

You can upgrade a cluster’s TKGI version to the TKGI version currently running on the TKGI control plane.

To upgrade a cluster’s TKGI version:

Use the TKGI CLI to upgrade the TKGI version on individual or multiple clusters:
- Upgrade a Single Kubernetes Cluster
- Upgrade Multiple Kubernetes Clusters
To monitor or stop a cluster upgrade, follow the procedures in Manage Your Kubernetes Cluster Upgrade Job below.
Complete the steps in After Upgrading Clusters below.

Upgrade a Single Cluster

The Tanzu Kubernetes Grid Integrated Edition CLI provides upgrade-cluster for upgrading an individual Tanzu Kubernetes Grid Integrated Edition-provisioned Kubernetes cluster.

To upgrade an individual Kubernetes cluster:

Run the following command:
```
tkgi upgrade-cluster CLUSTER-NAME  --pre-check  --nodes-parallel PARALLEL-COUNT
```
Where:
- CLUSTER-NAME is the name of the Kubernetes cluster you want to upgrade.
- (Optional) Include --pre-check to initially perform an automated cluster pre-check. upgrade-cluster will validate the cluster and display a status report after performing the pre-check. You must verify that you want the upgrade to proceed after the pre-check completes. For more information, see Upgrade Cluster Validation below.
- (Optional) Include --nodes-parallel to specify PARALLEL-COUNT, the number of worker nodes to upgrade in parallel. For more information, see Upgrade Cluster Worker Nodes in Parallel below.

For more information about the tkgi upgrade-cluster command, see tkgi upgrade-cluster in the TKGI CLI documentation.

To upgrade multiple clusters, see Upgrade Multiple Kubernetes Clusters below.

Upgrade Cluster Validation

You can request for upgrade-cluster to validate the cluster before upgrading.

When you request pre-upgrade cluster validation, upgrade-cluster validates the cluster first and displays a report when validation completes:

If the cluster fails the validation, upgrade-cluster ends after displaying the validation report.
If the cluster passes the validation, the pre-check prompts you for permission to start the cluster upgrade. You must respond if you want the upgrade-cluster to proceed.

A cluster fails validation if any of the following conditions are found:

A cluster certificate has expired.
A task on the cluster control plane or worker nodes is not running.
Virtual, persistent, or system storage is at 100% of capacity.

Cluster validation also checks and reports if any cluster certificates will expire within the next 180 days, but this condition does not cause the pre-check to fail.

To upgrade a cluster with pre-upgrade validation:

Run the following command:

tkgi upgrade-cluster CLUSTER-NAME  --pre-check

Where CLUSTER-NAME is the name of the Kubernetes cluster you want to upgrade.

For example:

tkgi upgrade-cluster  api-cluster-v3.5  --pre-check

Validating cluster VM storage.
Error: Storage on the following VMs is full. Use the BOSH CLI to investigate.
master/c3de3f6a-e422-5a14-faf6-3ddd22ae52ac

Error: upgrade pre-check failure: VM storage status

Upgrade Cluster Worker Nodes in Parallel

You can request for upgrade-cluster to upgrade multiple cluster worker nodes in parallel.

Three or more worker nodes are required for the parallel upgrade feature to activate:

A cluster must have at least three worker nodes if the cluster is in a single AZ without any node pools.
An AZ must have three or more worker nodes if it is in a cluster distributed across multiple AZs without any node pools.
All node pools in an AZ must have three or more worker nodes if they are in a cluster configured with compute profile node pools.

To upgrade worker nodes in parallel:

Run the following command:
```
tkgi upgrade-cluster CLUSTER-NAME  --nodes-parallel PARALLEL-COUNT
```
Where PARALLEL-COUNT is the number of worker nodes to upgrade in parallel. Accepts 1 or 2. When configured as 1, the default, parallel upgrading is deactivated.

For example:
```
tkgi upgrade-cluster example-cluster  --nodes-parallel 2
```

Upgrade Multiple Clusters

The Tanzu Kubernetes Grid Integrated Edition CLI provides upgrade-clusters for upgrading multiple Tanzu Kubernetes Grid Integrated Edition-provisioned Kubernetes clusters. You can upgrade clusters serially, serially with some clusters designated as canary clusters, or entirely in parallel.

To upgrade multiple Kubernetes clusters:

Run the following command:
```
tkgi upgrade-clusters CLUSTER-NAMES
```
Where CLUSTER-NAMES is a list of names of the Kubernetes clusters that you want to upgrade.

Note: When using tkgi upgrade-clusters, the worker nodes within an upgrading cluster are upgraded serially.

For more information about the tkgi upgrade-clusters command, see tkgi upgrade-clusters in the TKGI CLI documentation.

To upgrade a single cluster, see Upgrade a Single Kubernetes Cluster above.

Upgrade Clusters in Parallel

To upgrade multiple Kubernetes clusters:

Run the following command:
```
tkgi upgrade-clusters --clusters CLUSTER-NAMES --max-in-flight CLUSTER-COUNT --wait
```
Where:
- CLUSTER-NAMES is a comma-delimited list of the names of the Kubernetes clusters you want to upgrade.
- CLUSTER-COUNT is the maximum number of clusters to upgrade in parallel within an AZ. The CLUSTER-COUNT must be less than your TKGI tile > TKGI API Service > Worker VM Max in Flight value. For example, if your TKGI tile Worker VM Max in Flight value remains the default of 4, run upgrade-clusters with a –max-in-flight argument value less than 4.

Considerations when running tkgi upgrade-clusters:

If an upgrade for a cluster in the --clusters list fails, the tkgi upgrade-clusters job continues to a subsequent cluster in the list.
Clusters are upgraded serially if --max-in-flight is not set.
If the count of names in the --clusters list is more than the --max-in-flight value, the first set of clusters are upgraded in parallel and subsequent clusters are queued. As the initial cluster upgrades complete, the remaining clusters are pulled from the queue and upgraded in parallel.
To run the cluster upgrade job as a background task, remove the --wait argument.

Note: tkgi upgrade-clusters supports upgrading clusters in parallel. When using tkgi upgrade-clusters, the worker nodes within an upgrading cluster are upgraded serially.

For example:

$ tkgi upgrade-clusters --clusters k8-cluster-000,k8-cluster-001,k8-cluster-002 --max-in-flight 2  --wait  
  
You are about to upgrade k8-cluster-000, k8-cluster-001 and k8-cluster-002.  
Warning: This operation might be long running and might block further operations on the cluster(s) until complete  
  
Continue? (y/n):y  
Your taskID for the upgrade task is: d772aba0-2670-4fba-b26c-044b19d6ab60  
Started upgrading cluster: k8-cluster-000  
Started upgrading cluster: k8-cluster-001  
Finished upgrading cluster: k8-cluster-000  
Started upgrading cluster: k8-cluster-002  
Finished upgrading cluster: k8-cluster-001  
Finished upgrading cluster: k8-cluster-002  
Upgrade task d772aba0-2670-4fba-b26c-044b19d6ab60 is done.

Upgrade Clusters With Canaries

To upgrade multiple clusters and automatically stop upgrading clusters if a cluster upgrade fails, specify your cluster list as canary clusters. You can specify one or more clusters as canary clusters.

To upgrade multiple clusters with one or more canary clusters:

Run the following command:
```
tkgi upgrade-clusters --canaries CANARY-CLUSTER-NAMES --clusters CLUSTER-NAMES --wait
```
Where:
- CANARY-CLUSTER-NAMES is a comma-delimited list of the names of the Kubernetes clusters you want to upgrade as canary clusters.
- CLUSTER-NAMES is a comma-delimited list of Kubernetes clusters to upgrade if all canary clusters successfully upgrade.
Note: The –clusters argument is required.

Considerations when running tkgi upgrade-clusters with a --canaries list:

The clusters specified in the --canaries list are upgraded prior to upgrading the clusters in your --clusters list.
If a canary cluster upgrade fails, the entire tkgi upgrade-clusters job stops.
If a --clusters list cluster upgrade fails, the tkgi upgrade-clusters job continues to a subsequent cluster in the list.
To configure tkgi upgrade-clusters to stop for any cluster upgrade failure, specify only one cluster in your –clusters list and the remaining clusters in your –canaries list.
Canary clusters are always upgraded serially. To upgrade clusters in the --clusters list in parallel, see Upgrade Clusters in Parallel above.
To run the cluster upgrade job as a background task, remove the --wait argument.

For example:

$ tkgi upgrade-clusters --canaries k8-cluster-dev,k8-cluster-000,k8-cluster-001  --clusters k8-cluster-002  --wait  
  
You are about to upgrade k8-cluster-dev k8-cluster-000, k8-cluster-001 and k8-cluster-002.  
Warning: This operation might be long running and might block further operations on the cluster(s) until complete  
  
Continue? (y/n):y  
Your taskID for the upgrade task is: ce31a1bb-380a-453f-afa0-835ffa1ce6ac  
Started upgrading cluster: k8-cluster-000  
Upgrading cluster succeeded: k8-cluster-000  
Started upgrading cluster: k8-cluster-001  
Upgrading cluster succeeded: k8-cluster-001  
Started upgrading cluster: k8-cluster-dev  
Upgrading cluster failed: k8-cluster-dev  
Upgrade task ce31a1bb-380a-453f-afa0-835ffa1ce6ac is done.

Manage Your Cluster Upgrade Job

You can use the TKGI CLI to monitor and manage your Tanzu Kubernetes Grid Integrated Edition-provisioned Kubernetes cluster upgrade jobs:

Monitor Your Clusters
Monitor Your Cluster Upgrade Job
Stop Your Cluster Upgrade Job

Monitor Your Clusters

To review the status of the actions being performed on your clusters, run the following command:

tkgi clusters

For example:

$ tkgi clusters  

Upgrade is available to TKGI Version: 1.9.0-build.1  

TKGI Version     Name               k8s Version  Plan Name  UUID                                  Status       Action  
1.9.0-build.1   k8-cluster-000     1.18.8       small      9527ebaa-e2fa-422f-a52b-de3c3f0e39a4  succeeded    UPGRADE  
1.9.0-build.1   k8-cluster-001     1.18.8       small      9527ebaa-e2fa-422f-a52b-de3c3f0e39a4  failed       UPGRADE  
1.9.0-build.1   k8-cluster-002     1.18.8       small      9527ebaa-e2fa-422f-a52b-de3c3f0e39a4  in progress  UPGRADE  
1.9.0-build.1   k8-cluster-003     1.18.8       small      9527ebaa-e2fa-422f-a52b-de3c3f0e39a4  queued       UPGRADE

Monitor Your Cluster Upgrade Job

To review the status of your upgrade-clusters job, run the following command:

tkgi task TASKID

Where TASKID is the ID of the task that was returned when you ran tkgi upgrade-clusters.

For example:

$ tkgi task ce31a1bb-380a-453f-afa0-835ffa1ce6ac

Your upgrade task is: done

Name           Status     Start time                     End time                            isCanary  
k8-cluster-000 succeeded  Mon, 14 Oct 2019 12:00:00 PDT  Mon, 14 Oct 2019 12:19:54 PDT       true  
k8-cluster-001 failed     Mon, 14 Oct 2019 12:20:00 PDT  ---                                 true

Stop Your Cluster Upgrade Job

To cancel a running upgrade-clusters job, run the following TKGI CLI command:

tkgi cancel-task TASKID

Where TASKID is the ID of the task that was returned when you ran tkgi upgrade-clusters.

Warning: tkgi cancel-task does not cancel cluster upgrades currently in progress. This command only cancels a job’s pending cluster upgrades.

After Upgrading Clusters

Complete the following optional procedures after you have upgraded your cluster:

Upgrade Velero
Restore Cluster Sizing

Upgrade Velero

TKGI v1.20 uses Velero v1.12.1. You must upgrade Velero to v1.12.1 on all of your existing clusters.

To upgrade Velero:

Download Velero v1.12.1 from Broadcom Support.
Complete the steps in Upgrading to Velero 1.12 in the Velero documentation.

Warning: Ensure you have updated the Velero custom resource definitions on your clusters as described in Instructions in Upgrading to Velero 1.12 in the Velero documentation.

(Optional) Restore Cluster Sizing

If you scaled your cluster up for the upgrade and you prefer to restore your cluster to its original sizing, you can now scale the cluster back down to its previous configuration. VMware recommends that you not scale down your clusters and continue to run them with recommended configurations, reducing the chance of a future outage.