VMware Tanzu Application Catalog’s (Tanzu Application Catalog) Apache Cassandra Helm chart makes it easy to deploy a scalable Apache Cassandra database cluster on Kubernetes. This Helm chart is compliant with current best practices and can also be easily upgraded to ensure that you always have the latest fixes and security updates.
Once the database cluster is deployed and in use, it’s necessary to put a data backup/restore strategy in place. This backup/restore strategy is needed for many operational scenarios, including disaster recovery planning, off-site data analysis or application load testing.
This guide explains how to back up and restore an Apache Cassandra deployment on Kubernetes using Velero, an open-source Kubernetes backup/restore tool.
This guide makes the following assumptions:
kubectl
and Helm v3 installed. This guide uses Google Kubernetes Engine (GKE) clusters but you can also use any other Kubernetes provider. Learn how to install kubectl
and Helm v3.x.You have previously deployed the Tanzu Application Catalog Apache Cassandra Helm chart on the source cluster and added some data to it. Example command sequences to perform these tasks are shown below, where the PASSWORD placeholder refers to the database administrator password and the cluster is deployed with 3 replicas. Replace the REPOSITORY and REGISTRY placeholders with references to your Tanzu Application Catalog chart repository and container registry.
helm install cassandra REPOSITORY/cassandra \
--set replicaCount=3 \
--set cluster.seedCount=2 \
--set dbUser.user=admin \
--set dbUser.password=PASSWORD \
--set cluster.minimumAvailable=2
kubectl run --namespace default cassandra-client --rm --tty -i --restart='Never' --env CASSANDRA_PASSWORD=PASSWORD --image REGISTRY/cassandra:3.11.8-debian-10-r20 -- bash
cqlsh -u admin -p $CASSANDRA_PASSWORD cassandra
CREATE KEYSPACE IF NOT EXISTS test WITH REPLICATION= {'class': 'SimpleStrategy', 'replication_factor': '2' } ;
USE test;
CREATE TABLE items (id UUID PRIMARY KEY, name TEXT);
INSERT INTO items (id, name) VALUES (now(), 'milk');
INSERT INTO items (id, name) VALUES (now(), 'eggs');
exit
The Kubernetes provider is supported by Velero.
NOTEFor persistent volume migration across cloud providers with Velero, you have the option of using Velero’s Restic integration. This integration is not covered in this guide.
Velero is an open source tool that makes it easy to backup and restore Kubernetes resources. It can be used to back up an entire cluster or specific resources such as persistent volumes.
Then, install Velero on the source cluster by executing the command below, remembering to replace the BUCKET-NAME placeholder with the name of your storage bucket and the SECRET-FILENAME placeholder with the path to your credentials file:
velero install --provider gcp --plugins velero/velero-plugin-for-gcp:v1.0.0 --bucket BUCKET-NAME --secret-file SECRET-FILENAME
You should see output similar to the screenshot below as Velero is installed:
Confirm that the Velero deployment is successful by checking for a running pod using the command below:
kubectl get pods -n velero
The next step involves using Velero to copy the persistent data volumes for the Apache Cassandra pods. These copied data volumes can then be reused in a new deployment.
Create a backup of the volumes in the running Apache Cassandra deployment on the source cluster. This backup will contain both the primary and secondary node volumes.
velero backup create cassandra-backup --include-resources=pvc,pv --selector app.kubernetes.io/instance=cassandra
Execute the command below to view the contents of the backup and confirm that it contains all the required resources:
velero backup describe cassandra-backup --details
To avoid the backup data being overwritten, switch the bucket to read-only access:
kubectl patch backupstoragelocation default -n velero --type merge --patch '{"spec":{"accessMode":"ReadOnly"}}'
You can now restore the persistent volumes and integrate them with a new Apache Cassandra deployment on the destination cluster.
Install Velero on the destination cluster as described in Step 1. Remember to use the same values for the BUCKET-NAME and SECRET-FILENAME placeholders as you did originally, so that Velero is able to access the previously-saved backups.
velero install --provider gcp --plugins velero/velero-plugin-for-gcp:v1.0.0 --bucket BUCKET-NAME --secret-file SECRET-FILENAME
Confirm that the Velero deployment is successful by checking for a running pod using the command below:
kubectl get pods -n velero
Restore the persistent volumes in the same namespace as the source cluster using Velero.
velero restore create --from-backup cassandra-backup
Confirm that the persistent volumes have been restored:
kubectl get pvc
Create a new Apache Cassandra deployment. Use the same name, namespace and cluster topology as the original deployment. Replace the PASSWORD placeholder with the same database administrator password used in the original deployment and the REPOSITORY placeholder with a reference to your Tanzu Application Catalog chart repository.
helm install cassandra REPOSITORY/cassandra \
--set replicaCount=3 \
--set cluster.seedCount=2 \
--set dbUser.user=admin \
--set dbUser.password=PASSWORD \
--set cluster.minimumAvailable=2
NOTE: The deployment command shown above is only an example. It is important to create the new deployment on the destination cluster using the same namespace, deployment name, credentials and cluster topology as the original deployment on the source cluster.
This will create a new deployment that uses the original pod volumes (and hence the original data).
Connect to the new deployment and confirm that your original data is intact using a query like the example shown below. Replace the PASSWORD placeholder with the database administrator password and the REGISTRY placeholder with a reference to your Tanzu Application Catalog container registry.
kubectl run --namespace default cassandra-client --rm --tty -i --restart='Never' --env CASSANDRA_PASSWORD=PASSWORD --image REGISTRY/cassandra:3.11.8-debian-10-r20 -- bash
cqlsh -u admin -p $CASSANDRA_PASSWORD cassandra
USE test;
SELECT * FROM items;
Confirm that your original data is intact.