This topic describes how to use BOSH Backup and Restore (BBR) to restore the BOSH Director, VMware Tanzu Kubernetes Grid Integrated Edition control plane.
In the event of a disaster, you may lose your environment’s VMs, disks, and your IaaS network and load balancer resources as well. You can re-create your environment, configured with your saved Tanzu Kubernetes Grid Integrated Edition Ops Manager Installation settings, using your BBR backup artifacts.
Before restoring using BBR:
Use BBR to restore the following:
The following are the requirements for a backup artifact to be restorable to another environment:
Additional considerations:
Note: This section is for guidance only. You should always validate your backups by using the backup artifacts in a restore.
Before you use BBR to either back up TKGI or restore TKGI from backup, follow these steps to retrieve deployment information and credentials:
Before running BBR, verify that the installed version of BBR is compatible with your deployment’s current Tanzu Kubernetes Grid Integrated Edition release.
For your current Tanzu Kubernetes Grid Integrated Edition release’s minimum version information, see the Tanzu Kubernetes Grid Integrated Edition Release Notes.
To verify the currently installed BBR version, run the following command:
bbr version
If you do not have BBR installed, or your installed version does not meet the minimum version requirement, see Installing BOSH Backup and Restore.
There are two ways to retrieve BOSH Director credentials:
To retrieve your Bbr Ssh Credentials using the Ops Manager Installation Dashboard, perform the following steps:
Click the Credentials tab.
Locate Bbr Ssh Credentials.
private_key_pem
field value.To retrieve your Bbr Ssh Credentials using the Ops Manager API, perform the following steps:
Retrieve the Bbr Ssh Credentials by running the following command:
curl "https://OPS-MAN-FQDN/api/v0/deployed/director/credentials/bbr_ssh_credentials" \
-X GET \
-H "Authorization: Bearer UAA-ACCESS-TOKEN"
Where:
OPS-MAN-FQDN
is the fully-qualified domain name (FQDN) for your Ops Manager deployment.UAA-ACCESS-TOKEN
is your UAA access token.Copy the value of the private_key_pem
field.
To reformat the copied private_key_pem
value and save it to a file in the current directory, run the following command:
printf -- "YOUR-PRIVATE-KEY" > PRIVATE-KEY-FILE
Where:
YOUR-PRIVATE-KEY
is the text of your private key.PRIVATE-KEY-FILE
is the path to the private key file you are creating.For example:
$ printf – “—–begin rsa private key—– fake key contents —-end rsa private key—–” > bbr_key.pem
There are two ways to retrieve BOSH Director credentials:
To retrieve your BOSH Director credentials using the Ops Manager Installation Dashboard, perform the following steps:
Click the Credentials tab.
Locate Director Credentials.
password
field.To retrieve your BOSH Director credentials using the Ops Manager API, perform the following steps:
Retrieve the Director Credentials by running the following command:
curl "https://OPS-MAN-FQDN/api/v0/deployed/director/credentials/bbr_ssh_credentials" \
-X GET \
-H "Authorization: Bearer UAA-ACCESS-TOKEN"
Where: OPS-MAN-FQDN
is the fully-qualified domain name (FQDN) for your Ops Manager deployment. UAA-ACCESS-TOKEN
is your UAA access token.
Copy and record the value of the password
field.
To obtain BOSH credentials for your BBR operations, perform the following steps:
uaa_client_secret
.uaa_client_name
.Note: You must use BOSH credentials that limit the scope of BBR activity to your cluster deployments.
You access the BOSH Director using an IP address.
To obtain your BOSH Director’s IP address:
To log in to BOSH Director, using the IP address that you recorded above, run the following command line:
bosh -e BOSH-DIRECTOR-IP \
--ca-cert PATH-TO-BOSH-SERVER-CERTIFICATE log-in
Where:
BOSH-DIRECTOR-IP
is the BOSH Director IP address recorded above.PATH-TO-BOSH-SERVER-CERTIFICATE
is the path to the root Certificate Authority (CA) certificate as outlined in Download the Root CA Certificate.To specify Email, specify director
.
To specify Password, enter the Director Credentials that you obtained in Retrieve the BOSH Director Credentials.
For example:
$ bosh -e 10.0.0.3 \
–ca-cert /var/tempest/workspaces/default/root_ca_certificate log-in
Email (): director
Password (): *******************
Successfully authenticated with UAA
Succeeded
To download the root CA certificate for your Tanzu Kubernetes Grid Integrated Edition deployment, perform the following steps:
To locate and record a cluster deployment name, follow the steps below for each cluster:
On the command line, run the following command to log in:
tkgi login -a TKGI-API -u USERNAME -kWhere:
TKGI-API
is the domain name for the TKGI API that you entered in Ops Manager > Tanzu Kubernetes Grid Integrated Edition > TKGI API > API Hostname (FQDN). For example, api.tkgi.example.com
.USERNAME
is your user name. tkgi login
command. Note: If your operator has configured Tanzu Kubernetes Grid Integrated Edition to use a SAML identity provider, you must include an additional SSO flag to use the above command. For information about the SSO flags, see the section for the above command in TKGI CLI. For information about configuring SAML, see Connecting Tanzu Kubernetes Grid Integrated Edition to a SAML Identity Provider
Identify the cluster ID:
tkgi cluster CLUSTER-NAME
Where CLUSTER-NAME
is the name of your cluster.
From the output of this command, record the UUID value.
Open the Ops Manager Installation Dashboard.
Click the BOSH Director tile.
Select the Credentials tab.
Navigate to Bosh Commandline Credentials and click Link to Credential.
Copy the credential value.
SSH into your jumpbox. For more information about the jumpbox, see Installing BOSH Backup and Restore.
To retrieve your cluster deployment name, run the following command:
BOSH-CLI-CREDENTIALS deployments | grep UUID
Where:
BOSH-CLI-CREDENTIALS
is the full value that you copied from the BOSH Director tile in Retrieve the BOSH Command Line Credentials.UUID
is the cluster UUID that you recorded in the previous step.
To restore BOSH director, Tanzu Kubernetes Grid Integrated Edition control plane or cluster you must transfer your BBR backup artifacts from your safe storage location to your jumpbox.
To copy an artifact onto a jumpbox, run the following SCP command:
scp -r LOCAL-PATH-TO-BACKUP-ARTIFACT JUMPBOX-USER@JUMPBOX-ADDRESS:
Where:
LOCAL-PATH-TO-BACKUP-ARTIFACT
is the path to your BBR backup artifact.JUMPBOX-USER
is the ssh username of the jumpbox.JUMPBOX-ADDRESS
is the IP address, or hostname, of the jumpbox.(Optional) Decrypt your backup artifact if the artifact is encrypted.
In the event of losing your BOSH Director or Ops Manager environment, you must first recreate the BOSH Director VM before restoring the BOSH Director.
You can restore your BOSH Director configuration by using Tanzu Kubernetes Grid Integrated Edition Ops Manager to restore the installation settings artifacts saved when following the Export Installation Settings backup procedure steps.
To redeploy and restore your Ops Manager and BOSH Director follow the procedures below.
In the event of a disaster, you may lose your IaaS resources. You must recreate your IaaS resources before restoring using your BBR artifacts.
To recreate your IaaS resources, such as networks and load balancers, prepare your environment for Tanzu Kubernetes Grid Integrated Edition by following the installation instructions specific to your IaaS in Installing Tanzu Kubernetes Grid Integrated Edition.
After recreating IaaS resources, you must add those resources to Ops Manager by performing the procedures in the (Optional) Configure Ops Manager for New Resources section.
WARNING: After importing installation settings, do not click Apply Changes in Ops Manager before instructed to in the steps Deploy the BOSH Director or Redeploy the Tanzu Kubernetes Grid Integrated Edition Control Plane.
You can import installation settings in two ways:
Use the Ops Manager UI:
YOUR-OPS-MAN-FQDN
in a browser.Note: Some browsers do not provide the import process progress status, and may appear to hang. The import process takes at least 10 minutes, and requires additional time for each restored Ops Manager tile.
Use the Ops Manager API:
To use the Ops Manager API to import installation settings, run the following command:
curl "https://OPS-MAN-FQDN/api/v1/installation_asset_collection" \
-X POST \
-H "Authorization: Bearer UAA-ACCESS-TOKEN" \
-F 'installation[file][email protected]' \
-F 'passphrase=DECRYPTION-PASSPHRASE'
Where:
OPS-MAN-FQDN
is the fully-qualified domain name (FQDN) for your Ops Manager deployment.UAA-ACCESS-TOKEN
is the UAA access token. For more information about how to retrieve this token, see Using the Ops Manager API.DECRYPTION-PASSPHRASE
is the decryption passphrase in use when you exported the installation settings from Ops Manager.If you recreated IaaS resources such as networks and load balancers by following the steps in the Deploy Ops Manager section above, perform the following steps to update Ops Manager with your new resources:
Activate Ops Manager advanced mode. For more information, see How to Enable Advanced Mode in the Ops Manager in the Knowledge Base.
Note: Ops Manager advanced mode allows you to make changes that are normally deactivated. You may see warning messages when you save changes.
Navigate to the Ops Manager Installation Dashboard and click the BOSH Director tile.
If you are using Google Cloud Platform (GCP), click Google Config and update:
Click Create Networks and update the network names to reflect the network names for the new environment.
If your BOSH Director had an external hostname, you must change it in Director Config > Director Hostname to ensure it does not conflict with the hostname of the backed up Director.
Ensure that there are no outstanding warning messages in the BOSH Director tile, then deactivate Ops Manager advanced mode. For more information, see How to Enable Advanced Mode in the Ops Manager in the Knowledge Base.
Note: A change in VM size or underlying hardware should not affect the ability for BBR restore data, as long as adequate storage space to restore the data exists.
SSH into your Ops Manager VM. For more information, see the Log in to the Ops Manager VM with SSH section of the Advanced Troubleshooting with the BOSH CLI topic.
To delete the /var/tempest/workspaces/default/deployments/bosh-state.json
file, run the following on the Ops Manager VM:
sudo rm /var/tempest/workspaces/default/deployments/bosh-state.json
In a browser, navigate to your Ops Manager’s fully-qualified domain name.
You can deploy the BOSH Director by itself in two ways:
Use the Ops Manager UI:
Use the Ops Manager API:
Restore the BOSH Director by running BBR commands on your jumpbox.
To restore the BOSH Director:
Ensure the Tanzu Kubernetes Grid Integrated Edition BOSH Director backup artifact is in the folder from which you run BBR.
Run the BBR restore command to restore the TKGI BOSH Director:
nohup bbr director --host BOSH-DIRECTOR-IP \
--username bbr --private-key-path PRIVATE-KEY-FILE \
restore \
--artifact-path PATH-TO-DIRECTOR-BACKUP
Where:
BOSH-DIRECTOR-IP
is the address of the BOSH Director. If the BOSH Director is public, BOSH-DIRECTOR-IP is a URL, such as https://my-bosh.xxx.cf-app.com
. Otherwise, this is the internal IP BOSH-DIRECTOR-IP
which you can retrieve as shown in Retrieve the BOSH Director Address.PRIVATE-KEY-FILE
is the path to the private key file that you can create from Bbr Ssh Credentials
as shown in Download the BBR SSH Credentials.PATH-TO-DEPLOYMENT-BACKUP
is the path to the TKGI BOSH Director backup that you want to restore.For example:
$ nohup bbr director –host 10.0.0.5 \
–username bbr –private-key-path private.pem \
restore \
–artifact-path /home/10.0.0.5-abcd1234abcd1234
Note: The BBR restore command can take a long time to complete. The example command in this section uses nohup
and the restore process is run within your SSH session. If you instead run the BBR command in a screen
or tmux
session the task will run separately from your SSH session and will continue to run, even if your SSH connection to the jumpbox fails.
If your BOSH Director restore fails, do one or more of the following:
--debug
flag to activate debug logs. For more information, see BBR Logging.Be sure to complete the steps in Clean Up After a Failed Restore below.
After BOSH Director has been restored, you must reconcile BOSH Director’s internal state with the state of the IaaS.
To determine the existing deployments in your environment, run the following command:
BOSH-CLI-CREDENTIALS bosh deployments
Where:
BOSH-CLI-CREDENTIALS
is the full Bosh Commandline Credentials
value that you copied from the BOSH Director tile in Download the BOSH Commandline Credentials.To reconcile the BOSH Director’s internal state with the state of a single deployment, run the following command:
BOSH-CLI-CREDENTIALS bosh -d DEPLOYMENT-NAME -n cck \
--resolution delete_disk_reference \
--resolution delete_vm_reference
Where:
BOSH-CLI-CREDENTIALS
is the full Bosh Commandline Credentials
value that you copied from the BOSH Director tile in Download the BOSH Commandline Credentials.DEPLOYMENT-NAME
is a deployment name retrieved in the previous step.You must redeploy the Tanzu Kubernetes Grid Integrated Edition tile before restoring the Tanzu Kubernetes Grid Integrated Edition control plane. By redeploying the Tanzu Kubernetes Grid Integrated Edition tile you create the VMs that constitute the control plane deployment.
To redeploy the Tanzu Kubernetes Grid Integrated Edition tile, do the following:
Do either the following procedures to determine the stemcell that TKGI uses:
Review the Stemcell Library:
Review a Stemcell List Using BOSH CLI:
To retrieve the stemcell release using the BOSH CLI, run the following command:
BOSH-CLI-CREDENTIALS bosh deployments
Where:
BOSH-CLI-CREDENTIALS
is the full Bosh Commandline Credentials
value that you copied from the BOSH Director tile in Download the BOSH Commandline Credentials.For example:
$ bosh deployments
Using environment ‘10.0.0.5’ as user ‘director’ (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)Name Release(s) Stemcell(s) Team(s)
pivotal-container-service-453f2faa3bd2e16f52b7 backup-and-restore-sdk/1.9.0 bosh-google-kvm-ubuntu-xenial-go_agent/170.15 -
…
Note: At most, the TKGI tile can have two stemcells, where one stemcell is Linux and the other stemcell is Windows.
For more information about stemcells in Ops Manager, see Importing and Managing Stemcells.
To upload the stemcell used by your Tanzu Kubernetes Grid Integrated Edition tile:
Run the following command to upload the stemcell used by TKGI:
BOSH-CLI-CREDENTIALS bosh -d DEPLOYMENT-NAME \
--ca-cert PATH-TO-BOSH-SERVER-CERTIFICATE \
upload-stemcell \
--fix PATH-TO-STEMCELL
Where:
BOSH-CLI-CREDENTIALS
is the full Bosh Commandline Credentials
value that you copied from the BOSH Director tile in Download the BOSH Commandline Credentials.PATH-TO-BOSH-SERVER-CERTIFICATE
is the path to the root CA certificate that you downloaded in Download the Root CA Certificate.PATH-TO-STEMCELL
is the path to your tile’s stemcell.To ensure the stemcells for all of your other installed tiles have been uploaded, repeat the last step, running the bosh upload-stemcell --fix PATH-TO-STEMCELL
command, for each required stemcell that is different from the already uploaded TKGI stemcell.
To redeploy your Tanzu Kubernetes Grid Integrated Edition tile’s control plane:
From the Ops Manager Installation Dashboard, navigate to VMware Tanzu Kubernetes Grid Integrated Edition > Resource Config.
Ensure the Upgrade all clusters errand is Off.
Ensure that all other errands needed by your system are set to run.
Return to the Ops Manager Installation Dashboard.
Click Review Pending Changes.
Review your changes. For more information, see Reviewing Pending Product Changes.
Click Apply Changes to redeploy the control plane.
Restore the Tanzu Kubernetes Grid Integrated Edition control plane by running BBR commands on your jumpbox.
To restore the Tanzu Kubernetes Grid Integrated Edition control plane:
Ensure the Tanzu Kubernetes Grid Integrated Edition deployment backup artifact is in the folder from which you run BBR.
Run the BBR restore command to restore the TKGI control plane:
BOSH_CLIENT_SECRET=BOSH-CLIENT-SECRET \
nohup bbr deployment --target BOSH-TARGET \
--username BOSH-CLIENT --deployment DEPLOYMENT-NAME \
--ca-cert PATH-TO-BOSH-SERVER-CERT \
restore \
--artifact-path PATH-TO-DEPLOYMENT-BACKUP
Where:
BOSH-CLIENT-SECRET
is the value for BOSH_CLIENT_SECRET
retrieved in Download the BOSH Commandline Credentials.BOSH-TARGET
is the value for BOSH_ENVIRONMENT
retrieved in Download the BOSH Commandline Credentials. You must be able to reach the target address from the workstation where you run bbr
commands.BOSH-CLIENT
is the value for BOSH_CLIENT
retrieved in Download the BOSH Commandline Credentials.DEPLOYMENT-NAME
is the deployment name retrieved in Locate the Tanzu Kubernetes Grid Integrated Edition Deployment Name.PATH-TO-BOSH-CA-CERT
is the path to the root CA certificate that you downloaded in Download the Root CA Certificate.PATH-TO-DEPLOYMENT-BACKUP
is the path to the TKGI control plane backup that you want to restore.For example:
$ BOSH_CLIENT_SECRET=p455w0rd \
nohup bbr deployment –target bosh.example.com \
–username admin –deployment pivotal-container-0 \
–ca-cert bosh.ca.crt \
restore \
–artifact-path /home/pivotal-container-service_abcd1234abcd1234abcd-abcd1234abcd1234
Note: The BBR restore command can take a long time to complete. The command above uses nohup
and the restore process is run within your SSH session. If you instead run the BBR command in a screen
or tmux
session the task will run separately from your SSH session and will continue to run, even if your SSH connection to the jumpbox fails.
If your Tanzu Kubernetes Grid Integrated Edition control plane restore fails, do one or more of the following:
--debug
flag to activate debug logs. For more information, see BBR Logging.Be sure to complete the steps in Clean Up After a Failed Restore below.
To resolve a failing BBR restore command:
Directory /var/vcap/store/bbr-backup already exists on instance
, run the relevant commands from the Clean up After Failed Restore section of this topic.If you must cancel a restore, perform the following steps:
yes
to confirm.If a BBR restore process fails, BBR may not have run the post-restore scripts, potentially leaving the instance in a locked state. Additionally, the BBR restore folder may remain on the target instance and subsequent restore attempts may also fail.
To resolve issues following a failed BOSH Director restore, run the following BBR command:
nohup bbr director \
--host BOSH-DIRECTOR-IP \
--username bbr \
--private-key-path PRIVATE-KEY-FILE \
restore-cleanup
Where:
BOSH-DIRECTOR-IP
is the address of the BOSH Director. If the BOSH Director is public, BOSH-DIRECTOR-IP is a URL, such as https://my-bosh.xxx.cf-app.com
. Otherwise, this is the internal IP BOSH-DIRECTOR-IP
which you can retrieve as show in Retrieve the BOSH Director Address above.PRIVATE-KEY-FILE
is the path to the private key file that you can create from Bbr Ssh Credentials
as shown in Download the BBR SSH Credentials above.$ nohup bbr director \
–target 10.0.0.5 \
–username bbr \
–private-key-path private.pem \
restore-cleanup
To resolve issues following a failed control plane restore, run the following BBR command:
BOSH_CLIENT_SECRET=BOSH-CLIENT-SECRET \
bbr deployment \
--target BOSH-TARGET \
--username BOSH-CLIENT \
--deployment DEPLOYMENT-NAME \
--ca-cert PATH-TO-BOSH-CA-CERT \
restore-cleanup
Where:
BOSH-CLIENT-SECRET
is the value for BOSH_CLIENT_SECRET
retrieved in Download the BOSH Commandline Credentials above.BOSH-TARGET
is the value for BOSH_ENVIRONMENT
retrieved in Download the BOSH Commandline Credentials above. You must be able to reach the target address from the workstation where you run bbr
commands.BOSH-CLIENT
is the value for BOSH_CLIENT
retrieved in Download the BOSH Commandline Credentials above.DEPLOYMENT-NAME
is the name retrieved in Retrieve Your Cluster Deployment Name above.PATH-TO-BOSH-CA-CERT
is the path to the root CA certificate that you downloaded in Download the Root CA Certificate above.$ BOSH_CLIENT_SECRET=p455w0rd \
bbr deployment \
–target bosh.example.com \
–username admin \
–deployment pivotal-container-service-453f2f \
–ca-cert bosh.ca.crt \
restore-cleanup
To resolve issues following a failed cluster restore, run the following BBR command:
BOSH_CLIENT_SECRET=BOSH-CLIENT-SECRET \
bbr deployment \
--target BOSH-TARGET \
--username BOSH-CLIENT \
--deployment DEPLOYMENT-NAME \
--ca-cert PATH-TO-BOSH-CA-CERT \
restore-cleanup
Where:
BOSH-CLIENT-SECRET
is the value for BOSH_CLIENT_SECRET
retrieved in Download the BOSH Commandline Credentials.BOSH-TARGET
is the value for BOSH_ENVIRONMENT
retrieved in Download the BOSH Commandline Credentials. You must be able to reach the target address from the workstation where you run bbr
commands.BOSH-CLIENT
is the value for BOSH_CLIENT
retrieved in Download the BOSH Commandline Credentials.DEPLOYMENT-NAME
is the name retrieved in Retrieve Your Cluster Deployment Names above.PATH-TO-BOSH-CA-CERT
is the path to the root CA certificate that you downloaded in Download the Root CA Certificate.$ BOSH_CLIENT_SECRET=p455w0rd \
bbr deployment \
–target bosh.example.com \
–username admin \
–deployment pivotal-container-service-453f2f \
–ca-cert bosh.ca.crt \
restore-cleanup