This topic describes how to use BOSH Backup and Restore (BBR) to restore the BOSH Director, VMware Tanzu Kubernetes Grid Integrated Edition (TKGI) control plane, and Kubernetes clusters.

Overview

In the event of a disaster, you might lose your environment’s VMs, disks, and your IaaS network and load balancer resources as well. You can re-create your environment, configured with your saved Tanzu Kubernetes Grid Integrated Edition Ops Manager Installation settings, using your BBR backup artifacts.

Before restoring using BBR:


Use BBR to restore the following:

Compatibility of Restore

The following are the requirements for a backup artifact to be restorable to another environment:

  • Topology: BBR requires the BOSH topology of a deployment to be the same in the restore environment as it was in the backup environment.
  • Naming of instance groups and jobs: For any deployment that implements the back up and restore scripts, the instance groups and jobs must have the same names.
  • Number of instance groups and jobs: For instance groups and jobs that have back up and restore scripts, the same number of instances must exist.

Additional considerations:

  • Limited validation: BBR puts the backed up data into the corresponding instance groups and jobs in the restored environment, but cannot validate the restore beyond that.
  • Same Cluster: Currently, BBR supports the in-place restore of a cluster backup artifact onto the same cluster. Migration from one cluster to another using a BBR backup artifact has not yet been validated.

Note: This section is for guidance only. Always validate your backups by using the backup artifacts in a restore.

Prepare to Restore a Backup

Before you use BBR to either back up TKGI or restore TKGI from backup, follow these steps to retrieve deployment information and credentials:

Verify Your BBR Version

Before running BBR, verify that the installed version of BBR is compatible with the version of Ops Manager your TKGI tile is on:

  1. To determine the Ops Manager BBR version requirements, see the Ops Manager Release Notes for the version of Ops Manager you are using.

  2. To verify the currently installed BBR version, run the following command:

    bbr version  
    
  3. If the installed BBR version does not meet the Ops Manager BBR version requirement, or BBR is not installed, you must upgrade BBR. For more information, see Installing BOSH Backup and Restore.

Retrieve the BBR SSH Credentials

There are two ways to retrieve BOSH Director credentials:

Ops Manager Installation Dashboard

To retrieve your BBR SSH Credentials using the Ops Manager Installation Dashboard:

  1. Navigate to the Ops Manager Installation Dashboard.
  2. Click the BOSH Director tile.
  3. Click the Credentials tab.

  4. Locate Bbr Ssh Credentials.

  5. Click Link to Credentials next to it.
  6. Copy the private_key_pem field value.

Ops Manager API

To retrieve your BBR SSH Credentials using the Ops Manager API:

  1. Obtain your UAA access token. For more information, see Access the Ops Manager API.
  2. Retrieve the Bbr Ssh Credentials by running the following command:

    curl "https://OPS-MAN-FQDN/api/v0/deployed/director/credentials/bbr_ssh_credentials" \
    -X GET \
    -H "Authorization: Bearer UAA-ACCESS-TOKEN"
    

    Where:

    • OPS-MAN-FQDN is the fully-qualified domain name (FQDN) for your Ops Manager deployment.
    • UAA-ACCESS-TOKEN is your UAA access token.
  3. Copy the value of the private_key_pem field.

Save the BBR SSH Credentials to File

To save the BBR SSH credentials to a private key file:

  1. To reformat the copied private_key_pem value and save it to a file in the current directory:

    printf -- "YOUR-PRIVATE-KEY" > PRIVATE-KEY-FILE
    

    Where:

    • YOUR-PRIVATE-KEY is the text of your private key.
    • PRIVATE-KEY-FILE is the path to the private key file you are creating.

    For example:

    $ printf --  "-----begin rsa private key----- fake key contents ----end rsa private key-----" > bbr_key.pem
    

Retrieve the BOSH Director Credentials

There are two ways to retrieve BOSH Director credentials:

Ops Manager Installation Dashboard

To retrieve your BOSH Director credentials using the Ops Manager Installation Dashboard, perform the following steps:

  1. Navigate to the Ops Manager Installation Dashboard.
  2. Click the BOSH Director tile.
  3. Click the Credentials tab.

  4. Locate Director Credentials.

  5. Click Link to Credentials next to it.
  6. Copy and record the value of the password field.

Ops Manager API

To retrieve your BOSH Director credentials using the Ops Manager API, perform the following steps:

  1. Obtain your UAA access token. For more information, see Access the Ops Manager API.
  2. Retrieve the Director Credentials by running the following command:

    curl "https://OPS-MAN-FQDN/api/v0/deployed/director/credentials/bbr_ssh_credentials" \
    -X GET \
    -H "Authorization: Bearer UAA-ACCESS-TOKEN"
    

    Where:

    • OPS-MAN-FQDN is the fully-qualified domain name (FQDN) for your Ops Manager deployment.
    • UAA-ACCESS-TOKEN is your UAA access token.
  3. Copy and record the value of the password field.

Retrieve the UAA Client Credentials

To obtain BOSH credentials for your BBR operations, perform the following steps:

  1. From the Ops Manager Installation Dashboard, click the Tanzu Kubernetes Grid Integrated Edition tile.
  2. Select the Credentials tab.
  3. Navigate to Credentials > UAA Client Credentials.
  4. Record the value for uaa_client_secret.
  5. Record the value for uaa_client_name.

Note: You must use BOSH credentials that limit the scope of BBR activity to your cluster deployments.

Retrieve the BOSH Director Address

You access the BOSH Director using an IP address.

To obtain your BOSH Director’s IP address:

  1. Open the Ops Manager Installation Dashboard.
  2. Select BOSH Director > Status.
  3. Select the listed Director IP Address.

Log In To BOSH Director

  1. If you are not using the Ops Manager VM as your jump box, install the latest BOSH CLI on your jump box.
  2. To log in to BOSH Director, using the IP address that you recorded above, run the following command line:

    bosh -e BOSH-DIRECTOR-IP \
    --ca-cert PATH-TO-BOSH-SERVER-CERTIFICATE log-in
    

    Where:

    • BOSH-DIRECTOR-IP is the BOSH Director IP address recorded above.
    • PATH-TO-BOSH-SERVER-CERTIFICATE is the path to the root Certificate Authority (CA) certificate as outlined in Download the Root CA Certificate.
  3. To specify Email, specify director.

  4. To specify Password, enter the Director Credentials that you obtained in Retrieve the BOSH Director Credentials.

    For example:
    $ bosh -e 10.0.0.3 \
    --ca-cert /var/tempest/workspaces/default/root_ca_certificate log-in
    Email (): director
    Password (): *******************
    Successfully authenticated with UAA
    Succeeded
    

Download the Root CA Certificate

To download the root CA certificate for your Tanzu Kubernetes Grid Integrated Edition deployment, perform the following steps:

  1. Open the Ops Manager Installation Dashboard.
  2. In the top right corner, click your user name.
  3. Navigate to Settings > Advanced.
  4. Click Download Root CA Cert.

Retrieve the BOSH Command Line Credentials

  1. Open the Ops Manager Installation Dashboard.
  2. Click the BOSH Director tile.
  3. In the BOSH Director tile, click the Credentials tab.
  4. Navigate to Bosh Commandline Credentials.
  5. Click Link to Credential.
  6. Copy the credential value.

Retrieve Your Cluster Deployment Names

To locate and record a cluster deployment name, follow the steps below for each cluster:

  1. On the command line, run the following command to log in:

    tkgi login -a TKGI-API -u USERNAME -k
    

    Where:

    • TKGI-API is the domain name for the TKGI API that you entered in Ops Manager > Tanzu Kubernetes Grid Integrated Edition > TKGI API > API Hostname (FQDN). For example, api.tkgi.example.com.
    • USERNAME is your user name.

      See Logging in to Tanzu Kubernetes Grid Integrated Edition for more information about the tkgi login command.

      Note: If your operator has configured Tanzu Kubernetes Grid Integrated Edition to use a SAML identity provider, you must include an additional SSO flag to use the above command. For information about the SSO flags, see the section for the above command in TKGI CLI. For information about configuring SAML, see Connecting Tanzu Kubernetes Grid Integrated Edition to a SAML Identity Provider

  2. Identify the cluster ID:

    tkgi cluster CLUSTER-NAME
    

    Where CLUSTER-NAME is the name of your cluster.

  3. From the output of this command, record the UUID value.

  4. Open the Ops Manager Installation Dashboard.

  5. Click the BOSH Director tile.

  6. Select the Credentials tab.

  7. Navigate to Bosh Commandline Credentials and click Link to Credential.

  8. Copy the credential value.

  9. SSH into your jump box. For more information about the jump box, see Installing BOSH Backup and Restore.

  10. To retrieve your cluster deployment name, run the following command:

    BOSH-CLI-CREDENTIALS deployments | grep UUID
    

    Where:

Transfer Artifacts to Your Jump Box

To restore BOSH director, Tanzu Kubernetes Grid Integrated Edition control plane or cluster you must transfer your BBR backup artifacts from your safe storage location to your jump box.

  1. To copy an artifact onto a jump box, run the following SCP command:

    scp -r LOCAL-PATH-TO-BACKUP-ARTIFACT JUMP-BOX-USER@JUMP-BOX-ADDRESS:
    

    Where:

    • LOCAL-PATH-TO-BACKUP-ARTIFACT is the path to your BBR backup artifact.
    • JUMP-BOX-USER is the SSH user name of the jump box.
    • JUMP-BOX-ADDRESS is the IP address, or hostname, of the jump box.
  2. (Optional) Decrypt your backup artifact if the artifact is encrypted.

Restore the BOSH Director

In the event of losing your BOSH Director or Ops Manager environment, you must first recreate the BOSH Director VM before restoring the BOSH Director.

You can restore your BOSH Director configuration by using Tanzu Kubernetes Grid Integrated Edition Ops Manager to restore the installation settings artifacts saved when following the Export Installation Settings back up procedure steps.

To redeploy and restore your Ops Manager and BOSH Director follow the procedures below.

Deploy Ops Manager

In the event of a disaster, you might lose your IaaS resources. You must recreate your IaaS resources before restoring using your BBR artifacts.

  1. To recreate your IaaS resources, such as networks and load balancers, prepare your environment for Tanzu Kubernetes Grid Integrated Edition by following the installation instructions specific to your IaaS in Installing Tanzu Kubernetes Grid Integrated Edition.

  2. After recreating IaaS resources, you must add those resources to Ops Manager by performing the procedures in the (Optional) Configure Ops Manager for New Resources section.

Import Installation Settings

WARNING: After importing installation settings, do not click Apply Changes in Ops Manager before instructed to in the steps Deploy the BOSH Director or Redeploy the Tanzu Kubernetes Grid Integrated Edition Control Plane.

You can import installation settings in two ways:

  • Use the Ops Manager UI:

    1. Access your new Ops Manager by navigating to YOUR-OPS-MAN-FQDN in a browser.
    2. On the Welcome to Ops Manager page, click Import Existing Installation.
    3. In the import panel, perform the following tasks:
      • Enter the Decryption Passphrase in use when you exported the installation settings from Ops Manager.
      • Click Choose File and browse to the installation zip file that you exported in Back Up Installation Settings.
    4. Click Import.

      Note: Some browsers do not provide the import process progress status, and might appear to hang. The import process takes at least 10 minutes, and requires additional time for each restored Ops Manager tile.

    5. Successfully imported installation is displayed upon successful completion of importing all installation settings.
  • Use the Ops Manager API:

    1. To use the Ops Manager API to import installation settings, run the following command:

      curl "https://OPS-MAN-FQDN/api/v0/installation_asset_collection" \
      -X POST \
      -H "Authorization: Bearer UAA-ACCESS-TOKEN" \
      -F 'installation[file][email protected]' \
      -F 'passphrase=DECRYPTION-PASSPHRASE'
      

      Where:

      • OPS-MAN-FQDN is the fully-qualified domain name (FQDN) for your Ops Manager deployment.
      • UAA-ACCESS-TOKEN is the UAA access token. For more information about how to retrieve this token, see Using the Ops Manager API.
      • DECRYPTION-PASSPHRASE is the decryption passphrase in use when you exported the installation settings from Ops Manager.

(Optional) Configure Ops Manager for New Resources

If you recreated IaaS resources such as networks and load balancers by following the steps in the Deploy Ops Manager section above, perform the following steps to update Ops Manager with your new resources:

  1. Activate Ops Manager advanced mode. For more information, see How to Enable Advanced Mode in the Ops Manager in the Knowledge Base.

    Note: Ops Manager advanced mode allows you to make changes that are normally deactivated. You might see warning messages when you save changes.

  2. Navigate to the Ops Manager Installation Dashboard and click the BOSH Director tile.

  3. If you are using Google Cloud Platform (GCP), click Google Config and update:

    1. Project ID to reflect the GCP project ID.
    2. Default Deployment Tag to reflect the environment name.
    3. AuthJSON to reflect the service account.
  4. Click Create Networks and update the network names to reflect the network names for the new environment.

  5. If your BOSH Director had an external hostname, you must change it in Director Config > Director Hostname to ensure it does not conflict with the hostname of the backed up Director.

  6. Ensure that there are no outstanding warning messages in the BOSH Director tile, then deactivate Ops Manager advanced mode. For more information, see How to Enable Advanced Mode in the Ops Manager in the Knowledge Base.

Note: A change in VM size or underlying hardware will not affect the ability for BBR restore data, as long as adequate storage space to restore the data exists.

Remove BOSH State File

  1. SSH into your Ops Manager VM. For more information, see the Log in to the Ops Manager VM with SSH section of the Advanced Troubleshooting with the BOSH CLI topic.

  2. To delete the /var/tempest/workspaces/default/deployments/bosh-state.json file, run the following on the Ops Manager VM:

    sudo rm /var/tempest/workspaces/default/deployments/bosh-state.json
    
  3. In a browser, navigate to your Ops Manager’s fully-qualified domain name.

  4. Log in to Ops Manager.

Deploy the BOSH Director

You can deploy the BOSH Director by itself in two ways:

  • Use the Ops Manager UI:

    1. Open the Ops Manager Installation Dashboard.
    2. Click Review Pending Changes.
    3. On the Review Pending Changes page, click the BOSH Director check box.
    4. Click Apply Changes.
  • Use the Ops Manager API:

    1. Use the Ops Manager API to deploy the BOSH Director.

Restore the BOSH Director

Restore the BOSH Director by running BBR commands on your jump box.

To restore the BOSH Director:

  1. Ensure the Tanzu Kubernetes Grid Integrated Edition BOSH Director backup artifact is in the folder from which you run BBR.

  2. Run the BBR restore command to restore the TKGI BOSH Director:

    nohup bbr director  --host BOSH-DIRECTOR-IP \
    --username bbr  --private-key-path PRIVATE-KEY-FILE \
    restore \
    --artifact-path PATH-TO-DIRECTOR-BACKUP
    

    Where:

    • BOSH-DIRECTOR-IP is the address of the BOSH Director. If the BOSH Director is public, BOSH-DIRECTOR-IP is a URL, such as https://my-bosh.xxx.cf-app.com. Otherwise, this is the internal IP BOSH-DIRECTOR-IP which you can retrieve as shown in Retrieve the BOSH Director Address.
    • PRIVATE-KEY-FILE is the path to the private key file that you can create from Bbr Ssh Credentials as shown in Download the BBR SSH Credentials.
    • PATH-TO-DEPLOYMENT-BACKUP is the path to the TKGI BOSH Director backup that you want to restore.

    For example:

    $ nohup bbr director  --host 10.0.0.5 \
      --username bbr  --private-key-path private.pem \
      restore \
      --artifact-path /home/10.0.0.5-abcd1234abcd1234
    

    Note: The BBR restore command can take a long time to complete. The example command in this section uses nohup and the restore process is run within your SSH session. If you instead run the BBR command in a screen or tmux session the task will run separately from your SSH session and will continue to run, even if your SSH connection to the jump box fails.

  3. If your BOSH Director restore fails, do one or more of the following:

    Be sure to complete the steps in Clean Up After a Failed Restore below.

Remove All Stale Deployment Cloud IDs

After BOSH Director has been restored, you must reconcile BOSH Director’s internal state with the state of the IaaS.

  1. To determine the existing deployments in your environment, run the following command:

    BOSH-CLI-CREDENTIALS bosh deployments
    

    Where:

  2. To reconcile the BOSH Director’s internal state with the state of a single deployment, run the following command:

    BOSH-CLI-CREDENTIALS bosh -d DEPLOYMENT-NAME -n cck \
    --resolution delete_disk_reference \
    --resolution delete_vm_reference
    

    Where:

    • BOSH-CLI-CREDENTIALS is the full Bosh Commandline Credentials value that you copied from the BOSH Director tile in Download the BOSH Commandline Credentials.
    • DEPLOYMENT-NAME is a deployment name retrieved in the previous step.
  3. Repeat the last command for each deployment in the IaaS.

Restore the Tanzu Kubernetes Grid Integrated Edition Control Plane

You must redeploy the Tanzu Kubernetes Grid Integrated Edition tile before restoring the Tanzu Kubernetes Grid Integrated Edition control plane. By redeploying the Tanzu Kubernetes Grid Integrated Edition tile you create the VMs that constitute the control plane deployment.

To redeploy the Tanzu Kubernetes Grid Integrated Edition tile, do the following:

Determine the Required Stemcell

Do either the following procedures to determine the stemcell that TKGI uses:

  • Review the Stemcell Library:

    1. Open Ops Manager.
    2. Click Stemcell Library.
    3. Record the TKGI stemcell release number from the Staged column.
  • Review a Stemcell List Using BOSH CLI:

    1. To retrieve the stemcell release using the BOSH CLI, run the following command:

      BOSH-CLI-CREDENTIALS bosh deployments
      

      Where:

      For example:

      $ bosh deployments  
      Using environment '10.0.0.5' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)  
      Name                                                   Release(s)                                 Stemcell(s)                                    Team(s)  
      pivotal-container-service-453f2faa3bd2e16f52b7         backup-and-restore-sdk/1.9.0               bosh-google-kvm-ubuntu-jammy-go_agent/1.75  -  
      ...
      

Note: At most, the TKGI tile can have two stemcells, where one stemcell is Linux and the other stemcell is Windows.

For more information about stemcells in Ops Manager, see Importing and Managing Stemcells.

Upload Stemcells

To upload the stemcell used by your Tanzu Kubernetes Grid Integrated Edition tile:

  1. Download the stemcell from Broadcom Support.
  2. Run the following command to upload the stemcell used by TKGI:

    BOSH-CLI-CREDENTIALS  bosh -d DEPLOYMENT-NAME \
    --ca-cert PATH-TO-BOSH-SERVER-CERTIFICATE \
    upload-stemcell \
    --fix PATH-TO-STEMCELL
    

    Where:

  3. To ensure the stemcells for all of your other installed tiles have been uploaded, repeat the last step, running the bosh upload-stemcell --fix PATH-TO-STEMCELL command, for each required stemcell that is different from the already uploaded TKGI stemcell.

Redeploy the Tanzu Kubernetes Grid Integrated Edition Control Plane

To redeploy your Tanzu Kubernetes Grid Integrated Edition tile’s control plane:

  1. From the Ops Manager Installation Dashboard, navigate to VMware Tanzu Kubernetes Grid Integrated Edition > Resource Config.

  2. Ensure the Upgrade all clusters errand is Off.

  3. Ensure both Instances > TKGI API and Instances > TKGI Database are configured as they had been when the backup you are restoring was created.

  4. Ensure that all errands needed by your system are set to run.

  5. Return to the Ops Manager Installation Dashboard.

  6. Click Review Pending Changes.

  7. Review your changes. For more information, see Reviewing Pending Product Changes.

  8. Click Apply Changes to redeploy the control plane.

Restore the TKGI Control Plane

Restore the Tanzu Kubernetes Grid Integrated Edition control plane by running BBR commands on your jump box.

To restore the Tanzu Kubernetes Grid Integrated Edition control plane:

  1. Ensure the Tanzu Kubernetes Grid Integrated Edition deployment backup artifact is in the folder from which you run BBR.

  2. Run the BBR restore command to restore the TKGI control plane:

    BOSH_CLIENT_SECRET=BOSH-CLIENT-SECRET \
    nohup bbr deployment  --target BOSH-TARGET \
    --username BOSH-CLIENT  --deployment DEPLOYMENT-NAME \
    --ca-cert PATH-TO-BOSH-SERVER-CERT \
    restore \
    --artifact-path PATH-TO-DEPLOYMENT-BACKUP
    

    Where:

    For example:

    $ BOSH_CLIENT_SECRET=p455w0rd \
    nohup bbr deployment  --target bosh.example.com \
    --username admin  --deployment pivotal-container-0 \
    --ca-cert bosh.ca.crt \
    restore \
    --artifact-path /home/pivotal-container-service_abcd1234abcd1234abcd-abcd1234abcd1234
    

    Note: The BBR restore command can take a long time to complete. The command above uses nohup and the restore process is run within your SSH session. If you instead run the BBR command in a screen or tmux session the task will run separately from your SSH session and will continue to run, even if your SSH connection to the jump box fails.

  3. If your Tanzu Kubernetes Grid Integrated Edition control plane restore fails, do one or more of the following:

    Be sure to complete the steps in Clean Up After a Failed Restore below.

Redeploy and Restore Clusters

After restoring the Tanzu Kubernetes Grid Integrated Edition control plane, perform the following steps to redeploy the TKGI-provisioned Kubernetes clusters and restore their state from backup.

Redeploy Clusters

Before restoring your TKGI-provisioned clusters, you must redeploy them to BOSH. To redeploy TKGI-provisioned clusters:

Redeploy All Clusters

To redeploy all clusters:

  1. In Ops Manager, navigate to the Tanzu Kubernetes Grid Integrated Edition tile.
  2. Click Errands.
  3. Ensure the Upgrade all clusters errand is On. This errand redeploys all your TKGI-provisioned clusters.
  4. Return to the Installation Dashboard.
  5. Click Review Pending Changes, review your changes, and then click Apply Changes. For more information, see Reviewing Pending Product Changes.

Redeploy a Single Cluster

To redeploy a TKGI-provisioned cluster through the TKGI CLI:

  1. Identify the names of your TKGI-provisioned clusters:

    tkgi clusters
    
  2. For each cluster you want to redeploy, run the following command:

    tkgi upgrade-cluster CLUSTER-NAME
    

    Where CLUSTER-NAME is the name of your Kubernetes cluster. For more information, see Upgrade Clusters.

Restore Clusters

After redeploying your TKGI-provisioned clusters, restore their stateless workloads and cluster state from backup by running the BOSH restore command from your jump box. Stateless workloads are tracked in the cluster etcd database, which BBR backs up.

Warning: BBR does not back up persistent volumes, load balancers, or other IaaS resources.

Warning: When you restore a cluster, etcd is stopped in the API server. During this process, only currently-deployed clusters function, and you cannot create new workloads.

To restore a cluster:

  1. Move the cluster backup artifact to a folder from which you will run the BBR restore process.

  2. SSH into your jump box. For more information about the jump box, see Configure Your Jump Box in Installing BOSH Backup and Restore.

  3. Run the following command:

    BOSH_CLIENT_SECRET=BOSH-CLIENT-SECRET \
    nohup bbr deployment  --target BOSH-TARGET \
    --username BOSH-CLIENT  --deployment DEPLOYMENT-NAME \
    --ca-cert PATH-TO-BOSH-SERVER-CERT \
    restore \
    --artifact-path PATH-TO-DEPLOYMENT-BACKUP
    

    Where:

    • BOSH-CLIENT-SECRET is the BOSH_CLIENT_SECRET property. This value is in the BOSH Director tile under Credentials > Bosh Commandline Credentials.
    • BOSH-TARGET is the BOSH_ENVIRONMENT property. This value is in the BOSH Director tile under Credentials > Bosh Commandline Credentials. You must be able to reach the target address from the workstation where you run bbr commands.
    • BOSH-CLIENT is the BOSH_CLIENT property. This value is in the BOSH Director tile under Credentials > Bosh Commandline Credentials.
    • DEPLOYMENT-NAME is the cluster BOSH deployment name that you recorded in Retrieve Your Cluster Deployment Names above.
    • PATH-TO-BOSH-CA-CERT is the path to the root CA certificate that you downloaded in the Download the Root CA Certificate section above.
    • PATH-TO-DEPLOYMENT-BACKUP is the path to to your deployment backup. Make sure you have transfer your artifact into your jump box as described in Transfer Artifacts to Jump Box above.

    For example:

    $ BOSH_CLIENT_SECRET=p455w0rd \
    nohup bbr deployment \
    --target bosh.example.com \
    --username admin \
    --deployment service-instance_3839394 \
    --ca-cert bosh.ca.cert \
    restore \
    --artifact-path deployment-backup
    

    Note: The BBR restore command can take a long time to complete. The BBR restore command above uses nohup and the restore process is run within your SSH session. If you instead run the BBR command in a screen or tmux session the task will run separately from your SSH session and will continue to run, even if your SSH connection to the jump box fails.

  4. To cancel a running bbr restore, see Cancel a Restore below.
  5. After you restore a Kubernetes cluster, you must register its workers with their control plane nodes by following the Register Restored Worker VMs steps below.
  6. If your Tanzu Kubernetes Grid Integrated Edition cluster restore fails, do one or more of the following:

    Be sure to complete the steps in Clean Up After a Failed Restore below.

Register Restored Worker VMs

After restoring a Kubernetes cluster, you must register all of the cluster’s worker nodes with their control plane nodes. To register cluster worker nodes, complete the following:

  1. Delete Nodes
  2. Restart kubelet

Delete Nodes

To delete a cluster’s restored nodes:

  1. To determine your cluster’s namespace, run the following command:

    kubectl get all --all-namespaces
    
  2. To retrieve the list of worker nodes in the cluster, run the following command:

    kubectl get nodes -o wide
    

    Document the worker node names listed in the NAME column. Verify the worker nodes are all listed with a status of NotReady.

  3. To delete a node, run the following:

    kubectl delete node NODE-NAME
    

    Where NODE-NAME is a node NAME returned by the kubectl get nodes command.

  4. Repeat the preceding kubectl delete node step for each of your cluster’s nodes.

Restart kubelet

To restart kubelet on your worker node VMs:

  1. To restart kubelet on all of your cluster’s worker node VMs, run the following command:

    bosh ssh -d DEPLOYMENT-NAME worker -c 'sudo /var/vcap/bosh/bin/monit restart kubelet'
    

    Where DEPLOYMENT-NAME is the cluster BOSH deployment name that you recorded in Retrieve Your Cluster Deployment Names above.

  2. To confirm all worker nodes in your cluster have been restored to a Ready state, run the following command:

    kubectl get nodes -o wide
    

Resolve a Failing BBR Restore Command

To resolve a failing BBR restore command:

  1. Ensure that you set all the parameters in the command.
  2. Ensure that the BOSH Director credentials are valid.
  3. Ensure that the specified BOSH deployment or Director exists.
  4. Ensure that the jump box can reach the BOSH Director.
  5. Ensure the source backup artifact is compatible with the target BOSH deployment or Director.
  6. If you see the error message Directory /var/vcap/store/bbr-backup already exists on instance, run the relevant commands from the Clean up After Failed Restore section of this topic.
  7. See the BBR Logging topic.

Cancel a Restore

If you must cancel a restore, perform the following steps:

  1. Terminate the BBR process by pressing Ctrl-C and typing yes to confirm.
  2. Perform the procedures in the Clean up After Failed Restore section to support future restores. Stopping a restore can leave the system in an unusable state and prevent future restores.

Clean Up After a Failed Restore

If a BBR restore process fails, BBR might not have run the post-restore scripts, potentially leaving the instance in a locked state. Additionally, the BBR restore folder might remain on the target instance and subsequent restore attempts might also fail.

  • To resolve issues following a failed BOSH Director restore, run the following BBR command:

    nohup bbr director \
    --host BOSH-DIRECTOR-IP \
    --username bbr \
    --private-key-path PRIVATE-KEY-FILE \
    restore-cleanup
    

    Where:

    • BOSH-DIRECTOR-IP is the address of the BOSH Director. If the BOSH Director is public, BOSH-DIRECTOR-IP is a URL, such as https://my-bosh.xxx.cf-app.com. Otherwise, this is the internal IP BOSH-DIRECTOR-IP which you can retrieve as show in Retrieve the BOSH Director Address above.
    • PRIVATE-KEY-FILE is the path to the private key file that you can create from Bbr Ssh Credentials as shown in Download the BBR SSH Credentials above.
      For example:
    $ nohup bbr director \
    --target 10.0.0.5 \
    --username bbr \
    --private-key-path private.pem \
    restore-cleanup
    
  • To resolve issues following a failed control plane restore, run the following BBR command:

    BOSH_CLIENT_SECRET=BOSH-CLIENT-SECRET \
    bbr deployment \
    --target BOSH-TARGET \
    --username BOSH-CLIENT \
    --deployment DEPLOYMENT-NAME \
    --ca-cert PATH-TO-BOSH-CA-CERT \
    restore-cleanup
    

    Where:

    $ BOSH_CLIENT_SECRET=p455w0rd \
    bbr deployment \
    --target bosh.example.com \
    --username admin \
    --deployment pivotal-container-service-453f2f \
    --ca-cert bosh.ca.crt \
    restore-cleanup
    
  • To resolve issues following a failed cluster restore, run the following BBR command:

    BOSH_CLIENT_SECRET=BOSH-CLIENT-SECRET \
    bbr deployment \
    --target BOSH-TARGET \
    --username BOSH-CLIENT \
    --deployment DEPLOYMENT-NAME \
    --ca-cert PATH-TO-BOSH-CA-CERT \
    restore-cleanup
    

    Where:

    $ BOSH_CLIENT_SECRET=p455w0rd \
    bbr deployment \
    --target bosh.example.com \
    --username admin \
    --deployment pivotal-container-service-453f2f \
    --ca-cert bosh.ca.crt \
    restore-cleanup
    
check-circle-line exclamation-circle-line close-line
Scroll to top icon