This topic describes the back up and restore process for VMware Tanzu Kubernetes Grid Integrated Edition (TKGI), and provides high-level considerations for implementing your back up and restore strategy for TKGI.
TKGI back up and restore comprises various layers and tools. The table summarizes these layers and tools from the top-down, that is, workload to infrastructure. Refer to the individual sections for details on performing back up and restore for each layer, and additional details on test scenarios.
Layer | Tools | Comments |
---|---|---|
Backup and Restore Kubernetes Workloads | Velero | Load balancer and ingress services depend on NSX-T backup. |
Backup and Restore Kubernetes Clusters | BOSH Backup and Restore (BBR) | Use BBR to back up and restore Kubernetes clusters provisioned by TKGI, including the control plane nodes, etcd database, and worker nodes. |
Backup and Restore TKGI Components | Ops Manager, BBR | Use VMware Tanzu Operations Manager (Ops Manager) to back up and restore the BOSH Director and TKGI tile configurations. Use BBR to backup and restore the TKGI Management Plane virtual machines, including BOSH Director, TKGI Control Plane, and TKGI DB. |
Backup and Restore TKGI Infrastructure | NSX-T Manager, vCenter Server | Use the NSX Manager UI or CLI to backup and restore the NSX Manager DB. Use vCenter Server to backup and restore vCenter objects. |
When planning your backup and restore strategy and implementation for TKGI, keep in mind the following considerations.
This documentation provides guidelines for implementing a robust back up and restore practice for TKGI. You must develop a plan of your own on how you intend to implement these tools and guidelines for your TKGI foundation. In addition, you must test the back up and restore of each layer of your TKGI foundation to ensure that the components and workloads are properly backed up and restored.
You can only restore what is backed up. Back up key resources, such as stateful applications, cluster configurations, NSX-T objects on a frequent and regular schedule, such as once every 24 hours.
Make sure you promptly back up critical components, including:
For optimal performance and assurance, only back up one Kubernetes namespace at a time using Velero. Likewise, restore only one Kubernetes namespace at a time using Velero.
The general approach is to restore what breaks. For example, if NSX-T crashes, you only need to restore NSX-T. If Ops Manager breaks, restore Ops Manager.
The exception is a Kubernetes cluster. If the cluster breaks, you need to restore the cluster using BBR and the applications using Velero. On the NSX-T side, you need to create a new namespace for the restored cluster. Once the cluster is restored, use kubectl
to delete the old namespace. This will force the creation of the NSX-T objects in the namespace. Refer to the scenarios for the TKGI cluster back up and restore for more details.
Because there are several layers and tools involved in the back up and restore of TKGI, it can be confusing what is being backed up and restored within each layer.
Velero is used for backing up and restoring Kubernetes workloads. Use Velero restore if something breaks in a workload application, and you need to restore it.
BBR is used to back up and restore Kubernetes clusters provisioned by TKGI. This includes the cluster nodes and etcd database. BBR might also restore stateless applications, depending on when the backup of the cluster was taken, but do not rely on it for such purposes. Use Velero for workload back up and restore.
For BBR to work restoring a cluster, the NSX-T objects need to be in the NSX-T database. BBR will only work if the objects are in NSX-T. BBR recreates the VM, not the logical switch. You need the logical switch to be present for each Kubernetes cluster.
If NSX-T crashes, you only need to restore NSX-T. Applications will continue to run, but you can’t add anything.