Data Protection

Using VMware Tanzu Mission Control, you can protect the valuable data resources in your Kubernetes clusters using the backup and restore functionality provided by Velero, an open source community standard.

The data protection features of Tanzu Mission Control allow you to create the following types of backups for managed clusters (both attached and provisioned):

all resources in a cluster
selected or excluded namespaces in a cluster
specific or excluded resources in a cluster identified by a given label

You can selectively restore the backups you have created, by specifying the following:

the entire backup
selected or excluded namespaces from the backup
specific or excluded resources from the backup identified by a given label

Additionally, you can schedule regular backups and manage the storage of backups and volume snapshots you create by specifying a retention period for each backup and deleting backups that are no longer needed.

When you perform a backup for a cluster, Tanzu Mission Control uses Velero to create a backup of the specified Kubernetes resources with snapshots of persistent volume data, and then stores the backup in the location that you specify.

Note: The namespaces kube-system, velero, tkg-system, and vmware-system-tmc are not included in backups.

For more information about Velero, visit https://velero.io/docs.

For information on how to use the data protection features in Tanzu Mission Control, see Protecting Data in Using VMware Tanzu Mission Control.

Note: The data protection features of Tanzu Mission Control are not available in Tanzu Mission Control Essentials.

About Backup Storage

For the storage of your backups, you can specify a target location that allows Tanzu Mission Control to manage the storage of backups, provisioning resources as necessary according to your specifications. However, if you prefer to manage your own storage for backups, you can also specify a target location that points to a storage location that you create and maintain in your cloud provider account, such as an AWS S3 or S3-compatible storage location or an Azure Blob storage location. With self-provisioned storage, you can leverage existing storage investments for backups, reducing network and cloud storage costs, and apply existing storage policies, quotas, and encryption. For a list of supported S3-compatible providers, see S3-Compatible object store providers in the Velero documentation.

Before you define a backup for a cluster, you must create a target location and credential that you will use to perform the backup.

The data protection credential specifies the access credentials for the account where your backup is stored. This account can be either your AWS account where Tanzu Mission Control manages backup storage, or an account where you manage backups (the account that contains your AWS S3 or S3-compatible storage or the subscription that contains your Azure Blob storage).
The data protection target location identifies the place where you want the backup stored, and references the associated data protection credential. You can share the target location across multiple cluster groups and clusters.

About Volume Backup

Tanzu Mission Control leverages Velero for backing up and restoring Kubernetes volumes. Velero supports backing up and restoring volumes using the File System Backup (FSB) method and the Container Storage Interface (CSI) snapshot method.

Note: Tanzu Mission Control Self-Managed version 1.1 and earlier does not support the Container Storage Interface (CSI) snapshot method on Tanzu Kubernetes Grid clusters.

File System Backup

Kubernetes volumes attached to pods can be backed up from the file system of the volumes. This approach is called File System Backup (FSB) or Pod Volume Backup. Tanzu Mission Control uses Velero to achieve this.

You can enable FSB when enabling data protection on the cluster. You can also enable or disable FSB from the data protection tab on the cluster detail page. On enabling FSB, Kopia gets installed on the cluster and Velero backs up pod volumes using Kopia. Data on the volumes backed up using FSB, is copied to the backup storage location using Kopia.

FSB offers two approaches of discovering pod volumes to be backup:

Opt-in approach: Every pod containing a volume to be backed up using FSB must be annotated with the volume’s name using the backup.velero.io/backup-volumes annotation.
Opt-out approach: All pod volumes are backed up using FSB, with the ability to opt-out any volumes that should not be backed up using the backup.velero.io/backup-volumes-excludes annotation on the pod.

For more information about Velero, see Velero Backup.

If FSB is enabled on your cluster, FSB with Opt-out is the default setting for all backup operations. If FSB is enabled, during backup operation all volumes are evaluated based on the specified Opt-out or Opt-in approach to exclude or include volumes to be backed up using FSB.

Benefits of using FSB include:

It is capable of backing up and restoring almost any type of Kubernetes volume. Therefore, if you need a volume type that doesn’t have the concept of a native snapshot or CSI volume snapshot, FSB might be the best choice.
It is not tied to a specific storage platform, so you could save the backup data to a different storage platform from the one backing Kubernetes volumes, for example, a durable storage media.

However, you should also be aware that:

FSB backs up data from the live file system, so the backup data is less consistent than the CSI volume snapshot approach.
It accesses the file system from the mounted hostpath directory, so the pods need to run as root user and even under privileged mode in some environments.

Note:

If both FSB and CSI volume snapshot are enabled, Velero will first backup volumes using FSB based on the specified approach (opt-in or opt-out). Volumes not included in FSB backup will be backed up by CSI volume snapshot if they meet prerequisites given in Requirements for CSI Volume Backup in Using VMware Tanzu Mission Control. It is recommended to enable both FSB and CSI volume snapshots in case the cluster has volume types which do not support CSI volume snapshots. Such volumes can be backed up using FSB.

Container Storage Interface (CSI) Volume Snapshot

Velero supports backup and restore of CSI driver backed volumes using the Requirements for CSI Volume Backup in Using VMware Tanzu Mission Control. To create a CSI snapshot, it requires a volume snapshot class that tells the Kubernetes engine which driver file to use when creating snapshots.

CSI snapshot is available only for persistent volumes created using CSI drivers supporting volume snapshot. Data on the CSI snapshots can be optionally copied to the backup storage location. You should check with your cloud provider about the snapshot durability.

When scheduling a backup, if you choose to move CSI snapshot data to the backup storage location, first Velero creates the CSI snapshot and then moves the CSI snapshot data to a backup storage location and finally removes the CSI snapshot to release the cluster storage occupied by the snapshot.

CSI Snapshot data movement is useful in the following scenarios:

For on-premises datacenters, usually the storage doesn’t support durable snapshots. Also, it is not cost efficient to keep a number of CSI volume snapshots for longer duration on the cluster storage. Moving CSI a snapshot to a backup storage location lowers the cost for long time preservation of data.
For public cloud deployments, this feature helps to fulfil the multiple cloud strategy. CSI snapshots are not supported across providers, so moving snapshot data to a backup storage location allows you to restore the data to another cloud provider. If you have a plan to restore your backups to a different cluster across cloud providers, consider moving the snapshot data backup storage location.

For more information, see CSI Snapshot and CSI Snapshot Data Movement.

Note:

CSI snapshot is available for Tanzu Kubernetes Grid (TKG) clusters running Kubernetes version 1.26.5 and later.
Tanzu Mission Control Self-Managed version 1.1 and earlier does not support the CSI snapshot method on Tanzu Kubernetes Grid clusters.

For more information about CSI prerequisites, see Requirements for CSI Volume Backup in Using VMware Tanzu Mission Control.

FSB and CSI Usage

TMC allows you to enable and disable FSB and CSI snapshot independently.

If only FSB is enabled, Velero evaluates volumes to be backed up using FSB based on the specified approach (Opt-in or Opt-out). During restore, FSB backed up volumes will always be restored.
If both FSB and CSI volume snapshot are enabled and backup configured with CSI snapshot, Velero first evaluates volumes to be backed up using FSB based on the specified approach (Opt-in or Opt-out). CSI-driver based volumes not included in FSB backup are backed up by CSI volume snapshot if they meet the prerequisites stated in Requirements for CSI Volume Backup.
If both FSB and CSI volume snapshot are enabled but backup is not configured with CSI snapshot, Velero will evaluate volumes to be backed up using FSB based on the specified approach (Opt-in or Opt-out). Volumes not included in FSB will not be backed up.
If only CSI Snapshot is enabled and backup configured with CSI snapshot, Velero will backup only CSI-driver based volumes if they meet prerequisites stated in Requirements for CSI Volume Backup. None of the non-CSI driver based volumes will be backed up.
If neither FSB nor CSI snapshot is enabled, no volumes will be backed up.

It is recommended to enable both FSB and CSI volume snapshot, otherwise if the cluster has non-CSI driver based volume types those volumes will not be backed up. In addition, you may use the Opt-in or Opt-out approach to exclude the CSI-based volumes from FSB so they can be backed up using CSI snapshot.

About Backup Restoration Between Different Clusters

When you create a backup using Tanzu Mission Control, that backup can be available for restoration to other clusters in your organization. This feature allows you to create a backup in one cluster and restore it to a different cluster, even clusters running on different platforms.

When migrating workloads between clusters running different versions of Kubernetes, consider the availability of resources in each version and the compatibility of API groups for each custom resource. If the source and target clusters are running different versions of Kubernetes, keep the following in mind:

A Kubernetes version downgrade (restoring to a cluster running a lower version of Kubernetes) can cause incompatibility of core API groups and other issues associated with feature availability. Use this approach judiciously.
If a Kubernetes version upgrade (restoring to a cluster running a higher version of Kubernetes) causes incompatibility of core API groups, you must update the impacted custom resources in the source cluster prior to creating the backup.
For example, IngressClass in networking.k8s.io/v1beta1 API is no longer supported as of Kubernetes version 1.22.

For more information, see https://velero.io/docs/main/migration-case/ in the Velero documentation.

You cannot restore a backup that contains Kopia volumes on cluster without Kopia. Additionally, you can restore a backup that contains volume snapshots to another cluster only if both clusters share the same cloud provider account.

When migrating workloads between clusters running on different cloud providers, consider the following items:

By default, persistent volume claims (PVCs) might fail to bind to volumes because the appropriate storage class from the source cluster doesn't exist in the target cluster. To make sure your volumes bind, use a storage class map as described in https://velero.io/docs/v1.8/restore-reference/#changing-pvpvc-storage-classes in the Velero documentation.
For example, the following configmap maps the default and managed-premium storage classes from an AKS cluster to the gp2 storage class in an EKS cluster.
```
apiVersion: v1
kind: ConfigMap
metadata:
  name: change-storage-class-config
  namespace: velero
  labels:
    velero.io/plugin-config: ""
    velero.io/change-storage-class: RestoreItemAction
data:
  # Map the "default" and "managed-premium" storage classes backed by AzureDisk on 
  # the source cluster to "gp2", a storage class backed by AWS EBS on the current 
  # (destination) cluster.
  default: gp2
  managed-premium: gp2
```
Custom resources from the source cluster might not exist in the target cluster.
For example, tiers.crd.antrea.io and tiers.security.antrea.tanzu.vmware.com from a Tanzu Kubernetes cluster are not found in an AKS cluster.
You can exclude resources during restore to help avoid this issue.
Resource differences between the source and target cluster might impact functionality.
Some packages install webhooks that can cause issues when the source and target are not the same cluster. For example, mutatingwebhookconfiguration.admissionregistration.k8s.io and validatingwebhookconfiguration.admissionregistration.k8s.io from an AKS will impact the functionality of an EKS cluster.
You can exclude resources during restore to help avoid this issue.