This topic describes how to use BOSH Backup and Restore (BBR) to restore the BOSH Director, VMware Tanzu Kubernetes Grid Integrated Edition control plane, and Kubernetes clusters.
In the event of a disaster, you may lose your environment’s VMs, disks, and your IaaS network and load balancer resources as well. You can re-create your environment, configured with your saved Tanzu Kubernetes Grid Integrated Edition Ops Manager Installation settings, using your BBR backup artifacts.
Before restoring using BBR:
Use BBR to restore the following:
The following are the requirements for a backup artifact to be restorable to another environment:
Additional considerations:
Note: This section is for guidance only. You should always validate your backups by using the backup artifacts in a restore.
Before you use BBR to either back up TKGI or restore TKGI from backup, follow these steps to retrieve deployment information and credentials:
Before running BBR, verify that the installed version of BBR is compatible with your deployment’s current Tanzu Kubernetes Grid Integrated Edition release.
For your current Tanzu Kubernetes Grid Integrated Edition release’s minimum version information, see the Tanzu Kubernetes Grid Integrated Edition Release Notes.
To verify the currently installed BBR version, run the following command:
bbr version
If you do not have BBR installed, or your installed version does not meet the minimum version requirement, see Installing BOSH Backup and Restore.
There are two ways to retrieve BOSH Director credentials:
To retrieve your Bbr Ssh Credentials using the Ops Manager Installation Dashboard, perform the following steps:
Click the Credentials tab.
Locate Bbr Ssh Credentials.
private_key_pem
field value.To retrieve your Bbr Ssh Credentials using the Ops Manager API, perform the following steps:
Retrieve the Bbr Ssh Credentials by running the following command:
curl "https://OPS-MAN-FQDN/api/v0/deployed/director/credentials/bbr_ssh_credentials" \
-X GET \
-H "Authorization: Bearer UAA-ACCESS-TOKEN"
Where:
OPS-MAN-FQDN
is the fully-qualified domain name (FQDN) for your Ops Manager deployment.UAA-ACCESS-TOKEN
is your UAA access token.Copy the value of the private_key_pem
field.
To reformat the copied private_key_pem
value and save it to a file in the current directory, run the following command:
printf -- "YOUR-PRIVATE-KEY" > PRIVATE-KEY-FILE
Where:
YOUR-PRIVATE-KEY
is the text of your private key.PRIVATE-KEY-FILE
is the path to the private key file you are creating.For example:
$ printf – “—–begin rsa private key—– fake key contents —-end rsa private key—–” > bbr_key.pem
There are two ways to retrieve BOSH Director credentials:
To retrieve your BOSH Director credentials using the Ops Manager Installation Dashboard, perform the following steps:
Click the Credentials tab.
Locate Director Credentials.
password
field.To retrieve your BOSH Director credentials using the Ops Manager API, perform the following steps:
Retrieve the Director Credentials by running the following command:
curl "https://OPS-MAN-FQDN/api/v0/deployed/director/credentials/bbr_ssh_credentials" \
-X GET \
-H "Authorization: Bearer UAA-ACCESS-TOKEN"
Where: OPS-MAN-FQDN
is the fully-qualified domain name (FQDN) for your Ops Manager deployment. UAA-ACCESS-TOKEN
is your UAA access token.
Copy and record the value of the password
field.
To obtain BOSH credentials for your BBR operations, perform the following steps:
uaa_client_secret
.uaa_client_name
.Note: You must use BOSH credentials that limit the scope of BBR activity to your cluster deployments.
You access the BOSH Director using an IP address.
To obtain your BOSH Director’s IP address:
To log in to BOSH Director, using the IP address that you recorded above, run the following command line:
bosh -e BOSH-DIRECTOR-IP \
--ca-cert PATH-TO-BOSH-SERVER-CERTIFICATE log-in
Where:
BOSH-DIRECTOR-IP
is the BOSH Director IP address recorded above.PATH-TO-BOSH-SERVER-CERTIFICATE
is the path to the root Certificate Authority (CA) certificate as outlined in Download the Root CA Certificate.To specify Email, specify director
.
To specify Password, enter the Director Credentials that you obtained in Retrieve the BOSH Director Credentials.
For example:
$ bosh -e 10.0.0.3 \
–ca-cert /var/tempest/workspaces/default/root_ca_certificate log-in
Email (): director
Password (): *******************
Successfully authenticated with UAA
Succeeded
To download the root CA certificate for your Tanzu Kubernetes Grid Integrated Edition deployment, perform the following steps:
To locate and record a cluster deployment name, follow the steps below for each cluster:
On the command line, run the following command to log in:
tkgi login -a TKGI-API -u USERNAME -kWhere:
TKGI-API
is the domain name for the TKGI API that you entered in Ops Manager > Tanzu Kubernetes Grid Integrated Edition > TKGI API > API Hostname (FQDN). For example, api.tkgi.example.com
.USERNAME
is your user name. tkgi login
command. Note: If your operator has configured Tanzu Kubernetes Grid Integrated Edition to use a SAML identity provider, you must include an additional SSO flag to use the above command. For information about the SSO flags, see the section for the above command in TKGI CLI. For information about configuring SAML, see Connecting Tanzu Kubernetes Grid Integrated Edition to a SAML Identity Provider
Identify the cluster ID:
tkgi cluster CLUSTER-NAME
Where CLUSTER-NAME
is the name of your cluster.
From the output of this command, record the UUID value.
Open the Ops Manager Installation Dashboard.
Click the BOSH Director tile.
Select the Credentials tab.
Navigate to Bosh Commandline Credentials and click Link to Credential.
Copy the credential value.
SSH into your jumpbox. For more information about the jumpbox, see Installing BOSH Backup and Restore.
To retrieve your cluster deployment name, run the following command:
BOSH-CLI-CREDENTIALS deployments | grep UUID
Where:
BOSH-CLI-CREDENTIALS
is the full value that you copied from the BOSH Director tile in Retrieve the BOSH Command Line Credentials.UUID
is the cluster UUID that you recorded in the previous step.
To restore BOSH director, Tanzu Kubernetes Grid Integrated Edition control plane or cluster you must transfer your BBR backup artifacts from your safe storage location to your jumpbox.
To copy an artifact onto a jumpbox, run the following SCP command:
scp -r LOCAL-PATH-TO-BACKUP-ARTIFACT JUMPBOX-USER@JUMPBOX-ADDRESS:
Where:
LOCAL-PATH-TO-BACKUP-ARTIFACT
is the path to your BBR backup artifact.JUMPBOX-USER
is the ssh username of the jumpbox.JUMPBOX-ADDRESS
is the IP address, or hostname, of the jumpbox.(Optional) Decrypt your backup artifact if the artifact is encrypted.
Restoration of Kubernetes clusters provisioned by TKGI is a two step process: redeploy the clusters, then restore them.
Perform the following steps to redeploy the TKGI-provisioned Kubernetes clusters and restore their state from backup.
Before restoring your TKGI-provisioned clusters, you must redeploy them to BOSH. To redeploy TKGI-provisioned clusters:
To redeploy all clusters:
To redeploy a TKGI-provisioned cluster through the TKGI CLI:
Identify the names of your TKGI-provisioned clusters:
tkgi clusters
For each cluster you want to redeploy, run the following command:
tkgi upgrade-cluster CLUSTER-NAME
Where CLUSTER-NAME
is the name of your Kubernetes cluster. For more information, see Upgrade Clusters.
After redeploying your TKGI-provisioned clusters, restore their stateless workloads and cluster state from backup by running the BOSH restore
command from your jumpbox. Stateless workloads are tracked in the cluster etcd database, which BBR backs up.
Warning: BBR does not back up persistent volumes, load balancers, or other IaaS resources.
Warning: When you restore a cluster, etcd is stopped in the API server. During this process, only currently-deployed clusters function, and you cannot create new workloads.
To restore a cluster:
Move the cluster backup artifact to a folder from which you will run the BBR restore process.
SSH into your jumpbox. For more information about the jumpbox, see Configure Your Jumpbox in Installing BOSH Backup and Restore.
Run the following command:
BOSH_CLIENT_SECRET=BOSH-CLIENT-SECRET \
nohup bbr deployment --target BOSH-TARGET \
--username BOSH-CLIENT --deployment DEPLOYMENT-NAME \
--ca-cert PATH-TO-BOSH-SERVER-CERT \
restore \
--artifact-path PATH-TO-DEPLOYMENT-BACKUP
Where:
BOSH-CLIENT-SECRET
is the BOSH_CLIENT_SECRET
property. This value is in the BOSH Director tile under Credentials > Bosh Commandline Credentials.BOSH-TARGET
is the BOSH_ENVIRONMENT
property. This value is in the BOSH Director tile under Credentials > Bosh Commandline Credentials. You must be able to reach the target address from the workstation where you run bbr
commands.BOSH-CLIENT
is the BOSH_CLIENT
property. This value is in the BOSH Director tile under Credentials > Bosh Commandline Credentials.DEPLOYMENT-NAME
is the cluster BOSH deployment name that you recorded in Retrieve Your Cluster Deployment Names above.PATH-TO-BOSH-CA-CERT
is the path to the root CA certificate that you downloaded in the Download the Root CA Certificate section above.PATH-TO-DEPLOYMENT-BACKUP
is the path to your deployment backup. Make sure you have transfer your artifact into your jumpbox as described in Transfer Artifacts to Jumpbox above.For example:
$ BOSH_CLIENT_SECRET=p455w0rd \
nohup bbr deployment \
–target bosh.example.com \
–username admin \
–deployment service-instance_3839394 \
–ca-cert bosh.ca.cert \
restore \
–artifact-path deployment-backup
Note: The BBR restore command can take a long time to complete. The BBR restore command above uses nohup
and the restore process is run within your SSH session. If you instead run the BBR command in a screen
or tmux
session the task will run separately from your SSH session and will continue to run, even if your SSH connection to the jumpbox fails.
bbr restore
, see Cancel a Restore below.If your Tanzu Kubernetes Grid Integrated Edition cluster restore fails, do one or more of the following:
--debug
flag to activate debug logs. For more information, see BBR Logging.Be sure to complete the steps in Clean Up After a Failed Restore below.
After restoring a Kubernetes cluster, you must register all of the cluster’s worker nodes with their control plane nodes. To register cluster worker nodes, complete the following:
To delete a cluster’s restored nodes:
To determine your cluster’s namespace, run the following command:
kubectl get all --all-namespaces
To retrieve the list of worker nodes in the cluster, run the following command:
kubectl get nodes -o wide
Document the worker node names listed in the NAME
column. The worker nodes should all be listed with a status of NotReady
.
To delete a node, run the following:
kubectl delete node NODE-NAME
Where NODE-NAME
is a node NAME
returned by the kubectl get nodes
command.
Repeat the preceding kubectl delete node
step for each of your cluster’s nodes.
To restart kubelet
on your worker node VMs:
To restart kubelet
on all of your cluster’s worker node VMs, run the following command:
bosh ssh -d DEPLOYMENT-NAME worker -c 'sudo /var/vcap/bosh/bin/monit restart kubelet'
Where DEPLOYMENT-NAME
is the cluster BOSH deployment name that you recorded in Retrieve Your Cluster Deployment Names above.
To confirm all worker nodes in your cluster have been restored to a Ready
state, run the following command:
kubectl get nodes -o wide
To resolve a failing BBR restore command:
Directory /var/vcap/store/bbr-backup already exists on instance
, run the relevant commands from the Clean up After Failed Restore section of this topic.If you must cancel a restore, perform the following steps:
yes
to confirm.If a BBR restore process fails, BBR may not have run the post-restore scripts, potentially leaving the instance in a locked state. Additionally, the BBR restore folder may remain on the target instance and subsequent restore attempts may also fail.
To resolve issues following a failed BOSH Director restore, run the following BBR command:
nohup bbr director \
--host BOSH-DIRECTOR-IP \
--username bbr \
--private-key-path PRIVATE-KEY-FILE \
restore-cleanup
Where:
BOSH-DIRECTOR-IP
is the address of the BOSH Director. If the BOSH Director is public, BOSH-DIRECTOR-IP is a URL, such as https://my-bosh.xxx.cf-app.com
. Otherwise, this is the internal IP BOSH-DIRECTOR-IP
which you can retrieve as show in Retrieve the BOSH Director Address above.PRIVATE-KEY-FILE
is the path to the private key file that you can create from Bbr Ssh Credentials
as shown in Download the BBR SSH Credentials above.$ nohup bbr director \
–target 10.0.0.5 \
–username bbr \
–private-key-path private.pem \
restore-cleanup
To resolve issues following a failed control plane restore, run the following BBR command:
BOSH_CLIENT_SECRET=BOSH-CLIENT-SECRET \
bbr deployment \
--target BOSH-TARGET \
--username BOSH-CLIENT \
--deployment DEPLOYMENT-NAME \
--ca-cert PATH-TO-BOSH-CA-CERT \
restore-cleanup
Where:
BOSH-CLIENT-SECRET
is the value for BOSH_CLIENT_SECRET
retrieved in Download the BOSH Commandline Credentials above.BOSH-TARGET
is the value for BOSH_ENVIRONMENT
retrieved in Download the BOSH Commandline Credentials above. You must be able to reach the target address from the workstation where you run bbr
commands.BOSH-CLIENT
is the value for BOSH_CLIENT
retrieved in Download the BOSH Commandline Credentials above.DEPLOYMENT-NAME
is the name retrieved in Retrieve Your Cluster Deployment Name above.PATH-TO-BOSH-CA-CERT
is the path to the root CA certificate that you downloaded in Download the Root CA Certificate above.$ BOSH_CLIENT_SECRET=p455w0rd \
bbr deployment \
–target bosh.example.com \
–username admin \
–deployment pivotal-container-service-453f2f \
–ca-cert bosh.ca.crt \
restore-cleanup
To resolve issues following a failed cluster restore, run the following BBR command:
BOSH_CLIENT_SECRET=BOSH-CLIENT-SECRET \
bbr deployment \
--target BOSH-TARGET \
--username BOSH-CLIENT \
--deployment DEPLOYMENT-NAME \
--ca-cert PATH-TO-BOSH-CA-CERT \
restore-cleanup
Where:
BOSH-CLIENT-SECRET
is the value for BOSH_CLIENT_SECRET
retrieved in Download the BOSH Commandline Credentials.BOSH-TARGET
is the value for BOSH_ENVIRONMENT
retrieved in Download the BOSH Commandline Credentials. You must be able to reach the target address from the workstation where you run bbr
commands.BOSH-CLIENT
is the value for BOSH_CLIENT
retrieved in Download the BOSH Commandline Credentials.DEPLOYMENT-NAME
is the name retrieved in Retrieve Your Cluster Deployment Names above.PATH-TO-BOSH-CA-CERT
is the path to the root CA certificate that you downloaded in Download the Root CA Certificate.$ BOSH_CLIENT_SECRET=p455w0rd \
bbr deployment \
–target bosh.example.com \
–username admin \
–deployment pivotal-container-service-453f2f \
–ca-cert bosh.ca.crt \
restore-cleanup