This topic provides an overview of the backup and restore process for TKGI, and provides high-level considerations for implementing your backup and restore strategy for TKGI.
TKGI backup and restore comprises various layers and tools. The table summarizes these layers and tools from the top-down, that is, workload to infrastructure. Refer to the individual sections for details on performing backup and restore for that layer, as well as additional details on test scenarios.
Layer | Tool(s) | Comments |
---|---|---|
Backup and Restore Kubernetes Workloads | Velero, Restic | Use Velero for stateless applications and Velero with Restic for stateful applications. Load balancer and ingress services depend on NSX-T backup. |
Backup and Restore Kubernetes Clusters | BOSH Backup and Restore (BBR) | Use BBR to back up and restore Kubernetes clusters provisioned by TKGI, including the control plane nodes, etcd database, and worker nodes. |
Backup and Restore TKGI Components | Ops Manager, BBR | Use Ops Manager to back up and restore the BOSH Director and TKGI tile configurations. Use BBR to backup and restore the TKGI Management Plane virtual machines, including BOSH Director, TKGI Control Plane, and TKGI DB. |
Backup and Restore TKGI Infrastructure | NSX-T Manager, vCenter Server | Use the NSX Manager UI or CLI to backup and restore the NSX Manager DB. Use vCenter Server to backup and restore vCenter objects. |
When planning your backup and restore strategy and implementation for TKGI, keep in mind the following considerations.
This documentation provides guidelines for implementing a robust backup and restore practice for TKGI. You must develop a plan of your own on how you intend to implement these tools and guidelines for your TKGI foundation. In addition, you must test the backup and restore of each layer of your TKGI foundation to ensure that the components and workloads are properly backed up and restored.
You can only restore what is backed up. Key resources, such as stateful applications, cluster configurations, NSX-T objects, should be backed up on a frequent and regular schedule, such as once every 24 hours.
Make sure you promptly back up critical components, including:
For optimal performance and assurance, only back up one Kubernetes namespace at a time using Velero. Likewise, you should only restore one Kubernetes namespace at a time using Velero.
The general approach is to restore what breaks. For example, if NSX-T crashes, you only need to restore NSX-T. If Ops Manager breaks, restore Ops Manager.
The exception is a Kubernetes cluster. If the cluster breaks, you need to restore the cluster using BBR and the applications using Velero. On the NSX-T side, you need to create a new namespace for the restored cluster. Once the cluster is restored, use kubectl
to delete the old namespace. This will force the creation of the NSX-T objects in the namespace. Refer to the scenarios for the TKGI cluster backup and restore for more details.
Because there are several layers and tools involved in the backup and restore of TKGI, it can be confusing what is being backed up and restored within each layer.
Velero is used for backing up and restoring Kubernetes workloads. Restic is used with Velero for backup and restore of stateful workloads. Use Velero restore if something breaks in a workload application, and you need to restore it.
BBR is used to backup and restore Kubernetes clusters provisioned by TKGI. This includes the cluster nodes and etcd database. BBR may also restore stateless applications, depending on when the backup of the cluster was taken, but you shouldn’t rely on it for such purposes. Use Velero for workload backup and restore.
For BBR to work restoring a cluster, the NSX-T objects need to be in the NSX-T database. BBR will only work if the objects are in NSX-T. BBR recreates the VM, not the logical switch. You need the logical switch to be present for each Kubernetes cluster.
If NSX-T crashes, you only need to restore NSX-T. Applications will continue to run, but you can’t add anything.