Backup and Restore MongoDB Deployments on Kubernetes

Introduction

VMware Tanzu Application Catalog (Tanzu Application Catalog) offers a MongoDB Helm chart that makes it quick and easy to deploy a horizontally-scalable MongoDB cluster on Kubernetes with separate primary, secondary and arbiter nodes. This Helm chart is compliant with current best practices and can also be easily upgraded to ensure that you always have the latest fixes and security updates.

However, setting up a scalable MongoDB service is just the beginning; you also need to regularly backup the data being stored in the service, and to have the ability to restore it elsewhere if needed. Common scenarios for such backup/restore operations include disaster recovery, off-site data analysis or application load testing.

This guide walks you through two different approaches you can follow when backing up and restoring MongoDB deployments on Kubernetes:

Back up the data from the source deployment and restore it in a new deployment using MongoDB's built-in backup/restore tools.
Back up the persistent volumes from the source deployment and attach them to a new deployment using Velero, a Kubernetes backup/restore tool.

Assumptions and prerequisites

This guide makes the following assumptions:

You have two separate Kubernetes clusters - a source cluster and a destination cluster - with kubectl and Helm v3 installed. This guide uses Google Kubernetes Engine (GKE) clusters but you can also use any other Kubernetes provider. Learn how to install kubectl and Helm v3.x.
You have configured Helm to use the Tanzu Application Catalog chart repository following the instructions for Tanzu Application Catalog or the instructions for VMware Tanzu Application Catalog for Tanzu Advanced.

You have previously deployed the Tanzu Application Catalog MongoDB Helm chart with replication on the source cluster and added some data to it. Example command sequences to perform these tasks are shown below, where the PASSWORD placeholder refers to the database administrator password. Replace the REPOSITORY and REGISTRY placeholders with references to your Tanzu Application Catalog chart repository and container registry.

helm install mongodb REPOSITORY/mongodb \
  --namespace default \
  --set replicaSet.enabled=true \
  --set mongodbRootPassword=PASSWORD
kubectl run --namespace default mongodb-client --rm --tty -i --restart='Never' --image REGISTRY/mongodb:4.2.5-debian-10-r35 --command -- mongo admin --host mongodb --authenticationDatabase admin -u root -p PASSWORD
use mydb
db.accounts.insert({name:"john", total: "1058"})
db.accounts.insert({name:"jane", total: "6283"})
db.accounts.insert({name:"james", total: "472"})
exit

Method 1: Backup and restore data using MongoDB's built-in tools

This method involves using MongoDB's mongodump tool to backup the data in the source cluster and MongoDB's mongorestore tool to restore this data on the destination cluster.

Step 1: Backup data with mongodump

The first step is to back up the data in the MongoDB deployment on the source cluster. Follow these steps:

Obtain the MongoDB administrator password:

export MONGODB_ROOT_PASSWORD=$(kubectl get secret --namespace default mongodb -o jsonpath="{.data.mongodb-root-password}" | base64 --decode)

Forward the MongoDB service port and place the process in the background:
```
kubectl port-forward --namespace default svc/mongodb 27017:27017 &
```
Create a directory for the backup files and make it the current working directory:
```
mkdir mybackup
chmod o+w mybackup
cd mybackup
```
Back up the contents of all the databases to the current directory using the mongodump tool. If this tool is not installed on your system, use Tanzu Application Catalog's MongoDB Docker image to perform the backup, as shown below (replace the REGISTRY placeholder with your Tanzu Application Catalog container registry):
```
docker run --rm --name mongodb -v $(pwd):/app --net="host" REGISTRY/mongodb:latest mongodump -u root -p $MONGODB_ROOT_PASSWORD -o /app
```
Here, the --net parameter lets the Docker container use the host's network stack and thereby gain access to the forwarded port. The mongodump command connects to the MongoDB service and creates backup files in the /app directory, which is mapped to the current directory (mybackup/) on the Docker host with the -v parameter. Finally, the --rm parameter deletes the container after the mongodump command completes execution.
Stop the service port forwarding by terminating the corresponding background process.

At the end of this step, the backup directory should contain the data from your running MongoDB deployment.

Step 2: Restore data with mongorestore

The next step is to create an empty MongoDB deployment on the destination cluster and restore the data into it. You can also use the procedure shown below with a MongoDB deployment in a separate namespace in the same cluster.

Create a new MongoDB deployment. Replace the PASSWORD placeholder with the database administrator password and the REPOSITORY placeholder with a reference to your Tanzu Application Catalog chart repository.
```
helm install mongodb-new REPOSITORY/mongodb \
    --set replicaSet.enabled=true \
    --set mongodbRootPassword=PASSWORD
```

Create an environment variable with the password for the new deployment:

export MONGODB_ROOT_PASSWORD=$(kubectl get secret --namespace default mongodb-new -o jsonpath="{.data.mongodb-root-password}" | base64 --decode)

Forward the MongoDB service port for the new deployment and place the process in the background:
```
kubectl port-forward --namespace default svc/mongodb-new 27017:27017 &
```
Restore the contents of the backup into the new release using the mongorestore tool. If this tool is not available on your system, mount the directory containing the backup files as a volume in Tanzu Application Catalog's MongoDB Docker container and use the mongorestore client tool in the container image to import the backup into the new cluster, as shown below (replace the REGISTRY placeholder with your Tanzu Application Catalog container registry):
```
cd mybackup
docker run --rm --name mongodb -v $(pwd):/app --net="host" REGISTRY/mongodb:latest mongorestore -u root -p $MONGODB_ROOT_PASSWORD /app
```

Here, the -v parameter mounts the current directory (containing the backup files) to the container's /app path. Then, the mongorestore client tool is used to connect to the new MongoDB service and restore the data from the original deployment. As before, the --rm parameter destroys the container after the command completes execution.

Stop the service port forwarding by terminating the background process.

Connect to the new deployment and confirm that your data has been successfully restore (replace the REGISTRY placeholder with your Tanzu Application Catalog container registry):

kubectl run --namespace default mongodb-new-client --rm --tty -i --restart='Never' --image REGISTRY/mongodb:4.2.5-debian-10-r35 --command -- mongo mydb --host mongodb-new --authenticationDatabase admin -u root -p $MONGODB_ROOT_PASSWORD --eval "db.accounts.find()"

Here is an example of what you should see:

Query results

Method 2: Back up and restore persistent data volumes

This method involves copying the persistent data volume for the primary MongoDB node and reusing it in a new deployment with Velero, an open source Kubernetes backup/restore tool. This method is only suitable when:

The cloud provider is supported by Velero.
Both clusters are on the same cloud provider, because Velero does not support the migration of persistent volumes across cloud providers.
The restored deployment on the destination cluster will have the same name, namespace and credentials as the original deployment on the source cluster.

NOTE
For persistent volume migration across cloud providers with Velero, you have the option of using Velero's Restic integration. This integration is currently beta quality and is not covered in this guide.

Step 1: Install Velero on the source cluster

Velero is an open source tool that makes it easy to backup and restore Kubernetes resources. It can be used to back up an entire cluster or specific resources such as persistent volumes.

Modify your context to reflect the source cluster (if not already done).
Follow the Velero plugin setup instructions for your cloud provider. For example, if you are using Google Cloud Platform (as this guide does), follow the GCP plugin setup instructions to create a service account and storage bucket and obtain a credentials file.
Then, install Velero on the source cluster by executing the command below, remembering to replace the BUCKET-NAME placeholder with the name of your storage bucket and the SECRET-FILENAME placeholder with the path to your credentials file:
```
velero install --provider gcp --plugins velero/velero-plugin-for-gcp:v1.0.0 --bucket BUCKET-NAME --secret-file SECRET-FILENAME
```
You should see output similar to the screenshot below as Velero is installed:
Confirm that the Velero deployment is successful by checking for a running pod using the command below:
```
kubectl get pods -n velero
```

Step 2: Back up the MongoDB deployment on the source cluster

Next, back up the persistent volumes using Velero.

Create a backup of the volumes in the running MongoDB deployment on the source cluster. This backup will contain both the primary and secondary node volumes.
```
velero backup create mongo-backup --include-resources pvc,pv --selector release=mongodb
```
Execute the command below to view the contents of the backup and confirm that it contains all the required resources:
```
velero backup describe mongo-backup  --details
```

To avoid the backup data being overwritten, switch the bucket to read-only access:

kubectl patch backupstoragelocation default -n velero --type merge --patch '{"spec":{"accessMode":"ReadOnly"}}'

Obtain and note the replicaset key from the deployment:

kubectl get secret mongodb -o jsonpath="{.data.mongodb-replica-set-key}" | base64 --decode

Step 3: Restore the MongoDB deployment on the destination cluster

You can now restore the persistent volumes and integrate them with a new MongoDB deployment on the destination cluster.

Modify your context to reflect the destination cluster.
Install Velero on the destination cluster as described in Step 1. Remember to use the same values for the BUCKET-NAME and SECRET-FILENAME placeholders as you did originally, so that Velero is able to access the previously-saved backups.
```
velero install --provider gcp --plugins velero/velero-plugin-for-gcp:v1.0.0 --bucket BUCKET-NAME --secret-file SECRET-FILENAME
```
Confirm that the Velero deployment is successful by checking for a running pod using the command below:
```
kubectl get pods -n velero
```
Restore the persistent volumes in the same namespace as the source cluster using Velero.
```
velero restore create --from-backup mongo-backup
```
Confirm that the persistent volumes have been restored and note the volume name for the primary node:
```
kubectl get pvc --namespace default
```
Delete the persistent volume corresponding to the secondary node and retain only the volume corresponding to the primary node. If there is more than one secondary volume (depending on how you originally deployed the chart), delete all the secondary volumes.
```
kubectl delete pvc --namespace default SECONDARY-PVC-NAME
```
Create a new MongoDB deployment. Use the same name and namespace as the original deployment and use the chart's persistence.existingClaim parameter to attach the existing volume. Replace the PASSWORD placeholder with the same database administrative password used in the original release, the PRIMARY-PVC-NAME placeholder with the name of the restored primary node volume, the REPLICASET-KEY with the key obtained at the end of the previous step and the REPOSITORY placeholder with a reference to your Tanzu Application Catalog chart repository.
```
helm install mongodb REPOSITORY/mongodb \
    --namespace default \
    --set replicaSet.enabled=true \
    --set mongodbRootPassword=PASSWORD \
    --set persistence.existingClaim=PRIMARY-PVC-NAME \
    --set replicaSet.key=REPLICASET-KEY
```
NOTE: It is important to create the new deployment on the destination cluster using the same namespace, deployment name, credentials and replicaset key as the original deployment on the source cluster.

This will create a new deployment that uses the original primary node volume (and hence the original data). Note that if replication is enabled, as in the example above, installing the chart will automatically create a new volume for each secondary node.

Connect to the new deployment and confirm that your original data is intact (replace the REGISTRY placeholder with your Tanzu Application Catalog container registry):

export MONGODB_ROOT_PASSWORD=$(kubectl get secret --namespace default mongodb -o jsonpath="{.data.mongodb-root-password}" | base64 --decode)
kubectl run --namespace default mongodb-client --rm --tty -i --restart='Never' --image REGISTRY/mongodb:4.2.5-debian-10-r35 --command -- mongo mydb --host mongodb --authenticationDatabase admin -u root -p $MONGODB_ROOT_PASSWORD --eval "db.accounts.find()"

Here is an example of what you should see:

Query results

Useful links

Tanzu Application Catalog MongoDB Helm chart
MongoDB client applications mongodump and mongorestore
Velero documentation