To back up and restore Tanzu Kubernetes Grid management clusters, you can use Velero, an open source community standard tool for backing up and restoring Kubernetes cluster resources and persistent volumes. Velero supports a variety of storage providers to store its backups.

If a Tanzu Kubernetes Grid management cluster crashes and fails to recover, the infrastructure administrator can use a Velero backup to restore its contents to a new management cluster, including management cluster extensions and internal API objects for the workload clusters.

The following sections explain how to back up and restore a Tanzu Kubernetes Grid management cluster using the Velero CLI.

To back up and restore Tanzu Kubernetes Grid workload clusters, you can use similar velero backup and velero restore commands to the ones described below.

Prerequisites

These sections describe what you need to back up and restore Tanzu Kubernetes Grid management clusters.

Install the Velero CLI

  1. Go to https://www.vmware.com/go/get-tkg and log in with your My VMware credentials.
  2. Under Product Downloads, click Go to Downloads.
  3. Scroll to the Velero entries and download the Velero CLI .gz file for your workstation OS. Its filename starts with velero-linux-, velero-mac-, or velero-windows64-.
  4. Use either the gunzip command or the extraction tool of your choice to unpack the binary:
    gzip -d <RELEASE-TARBALL-NAME>.gz
    
  5. Rename the CLI binary for your platform to velero, make sure that it is executable, and add it to your PATH.

    • Mac OS and Linux platforms:

      1. Move the binary into the /usr/local/bin folder and rename it to velero.
      2. Make the file executable:
      chmod +x /usr/local/bin/velero
      
    • Windows platforms:

      1. Create a new Program Files\velero folder and copy the binary into it.
      2. Rename the binary to velero.exe.
      3. Right-click the velero folder, select Properties > Security, and make sure that your user account has the Full Control permission.
      4. Use Windows Search to search for env.
      5. Select Edit the system environment variables and click the Environment Variables button.
      6. Select the Path row under System variables, and click Edit.
      7. Click New to add a new row and enter the path to the velero binary.

Storage Provider Setup

Velero supports a variety of storage providers. You need a storage provider to store cluster object backups and snapshots of any CSI persistent volumes that the cluster objects use.

VMware recommends dedicating a unique storage bucket to each cluster, so the instructions below include setting up storage for each management cluster backup or restore operation.

How you set up a storage provider depends on your infrastructure:

  • vSphere (On Premises)

    • Requires external object storage, which can be:
      • An S3 bucket on Amazon Web Services (AWS), enabled by the Velero Plugin for AWS
      • An S3 bucket on cloud storage such as MinIO, also enabled by the AWS plugin. See the example MinIO installation in the Velero documentation.
      • Other storage providers, as listed on the Providers page in the Velero documentation.
    • To store CSI volume snapshots, requires the Velero Plugin for vSphere.
      • Stores cluster objects and volume snapshots to the same S3 bucket.
      • To enable the Velero plugin, must add the following VirtualMachine permissions to the role you created for the Tanzu Kubernetes Grid account, if you did not already include them when you created the account:
        • Configuration > Toggle disk change tracking
        • Provisioning > Allow read-only disk access
        • Provisioning > Allow virtual machine download
        • Snapshot management > Create snapshot
        • Snapshot management > Remove snapshot
  • AWS and Azure

    • Requires plugins installed with the velero install command, with parameters including:
      • --plugins: the object store and volume snapshotter plugin versions
      • --backup-location-config: object backup location
      • --snapshot-location-config: volume snapshot location

Example: The following Velero CLI commands use a public Velero image repository. Please replace the repository with the official Tanzu Kubernetes Grid image repository by following the Customize Velero Install instructions in the Velero documentation.

velero install \
 --image=$TKG_REG/velero:$VELERO_VERSION \
 --plugin=$TKG_REG/velero-plugin-for-aws:$PLUGIN_VERSION \
<....>

vSphere Backup and Restore

These sections describe how to back up and restore Tanzu Kubernetes Grid management clusters on vSphere.

Back Up a Management Cluster on vSphere

  1. Install Velero server on the management cluster. This example uses MinIO as the object storage:

    velero install --provider aws --plugins "velero/velero-plugin-for-aws:v1.1.0" --bucket velero --secret-file ./credentials-velero --backup-location-config "region=minio,s3ForcePathStyle=true,s3Url=minio_server_url" --snapshot-location-config region="default"
    

    See the MinIO server setup instructions in the Velero documentation.

  2. If there are CSI-based volumes to back up, follow the Velero Plugin for vSphere setup instructions to install the Velero plugin for vSphere.

  3. Set Cluster.Spec.Paused to true for all workload clusters:


    kubectl patch cluster workload_cluster_name --type='merge' -p '{"spec":{"paused": true}}'

  4. Back up the management cluster:

    velero backup create your_backup_name --exclude-namespaces=tkg-system
    

    Excluding tkg-system objects avoids creating duplicate management cluster API objects when restoring to a new management cluster.

  5. Set Cluster.Spec.Paused back to false for the workload clusters.

Restore to a New Management Cluster on vSphere

  1. Install Velero server on the new management cluster. This example uses MinIO as the object storage:

    velero install --provider aws --plugins "velero/velero-plugin-for-aws:v1.1.0" --bucket velero --secret-file ./credentials-velero --backup-location-config "region=minio,s3ForcePathStyle=true,s3Url=minio_server_url"
    

    See the MinIO server setup instructions in the Velero documentation.

  2. If there are CSI-based volumes to restore, follow the Velero Plugin for vSphere setup instructions to install the Velero plugin for vSphere.

  3. Restore the management cluster:

    velero restore create your_restore_name --from-backup your_backup_name
    
  4. Set Cluster.Spec.Paused field to false for all workload clusters:


    kubectl patch cluster cluster_name --type='merge' -p '{"spec":{"paused": false}}'

AWS

These sections describe how to back up and restore Tanzu Kubernetes Grid management clusters on AWS.

Back Up a Management Cluster on AWS

  1. Follow the Velero Plugin for AWS setup instructions to install Velero server on the management cluster.

  2. Set Cluster.Spec.Paused to true for all workload clusters:


    kubectl patch cluster workload_cluster_name --type='merge' -p '{"spec":{"paused": true}}'

  3. Back up the management cluster:

    velero backup create your_backup_name --exclude-namespaces=tkg-system
    

    Excluding tkg-system objects avoids creating duplicate management cluster API objects when restoring to a new management cluster.

  4. Set Cluster.Spec.Paused back to false for the workload clusters.

Restore to a New Management Cluster on AWS

  1. Follow the Velero Plugin for AWS setup instructions to install Velero server on the new management cluster.

  2. Restore the management cluster:

    velero backup get
    velero restore create your_restore_name --from-backup your_backup_name
    
  3. Set Cluster.Spec.Paused to false for all workload clusters:


    kubectl patch cluster cluster_name --type='merge' -p '{"spec":{"paused": false}}'

Azure

These sections describe how to back up and restore Tanzu Kubernetes Grid management clusters on Azure.

Back Up a Management Cluster on Azure

  1. Follow the Velero Plugin for Azure setup instructions to install Velero server on the management cluster.

  2. Set Cluster.Spec.Paused to true for all workload clusters:

     kubectl patch cluster workload_cluster_name --type='merge' -p '{"spec":{"paused": true}}' 

  3. Back up the management cluster:

    velero backup create your_backup_name --exclude-namespaces=tkg-system
    

    Excluding tkg-system objects avoids creating duplicate management cluster API objects when restoring to a new management cluster.

  4. If velero backup returns a transport is closing error, try again after increasing the memory limit, as described in Update resource requests and limits after install in the Velero documentation.

  5. Set Cluster.Spec.Paused back to false for the workload clusters.

Restore to a New Management Cluster on Azure

  1. Follow the Velero Plugin for Azure setup instructions to install Velero server on the new management cluster.

  2. Restore the management cluster:

    velero backup get
    velero restore create your_restore_name --from-backup your_backup_name
    
  3. Set Cluster.Spec.Paused to false for all workload clusters:

     kubectl patch cluster cluster_name --type='merge' -p '{"spec":{"paused": false}}' 

check-circle-line exclamation-circle-line close-line
Scroll to top icon