This topic describes how to backup and restore VMware Tanzu GemFire for Kubernetes.

Prerequisite

A Tanzu GemFire cluster that has data stored on persistent replicate or persistent partition regions.

Process Overview

  1. Back Up Cluster: Use gfsh to run a backup on the Tanzu GemFire cluster to a target directory on a mounted volume.

  2. Copy Data Out: Copy the backup data from the mounted volume to a local directory or other location unaffected by Kubernetes cluster failures.

  3. Create New Volumes: Create volumes that can be mounted by the Tanzu GemFire cluster.

  4. Copy Data In and Mount Volumes: Copy the backup data from local directly to the new volumes. Mount the volumes to make it possible to run the ‘restore.sh’ scripts on each volume. This restores the data structure to the point in time of the backup.

  5. Start New Cluster: Start a Tanzu GemFire cluster that uses the restored volumes attached to its locators and servers.

The examples below works with TLS enabled, two locator, and three server Tanzu GemFire cluster running on Google Kubernetes Engine (GKE).

kubectl get gemfireclusters --all-namespaces
NAMESPACE                                                  NAME                  LOCATORS   SERVERS   CLUSTER IMAGE                                                         OPERATOR VERSION
gemfire-system-test-6260d8f3-5c20-4d9b-b878-48bfaf42cea8   system-test-gemfire   2/2        3/3       registry.tanzu.vmware.com/pivotal-gemfire/vmware-gemfire:9.15.0               2.0.0

Back Up Tanzu GemFire Cluster

Use gfsh to run a backup on the Tanzu GemFire cluster to a target directory on a mounted persistent volume as follows.

  1. Shell into the locator pod:

    kubectl exec -it LOCATOR-POD-NAME sh -n ${NAMESPACE_NAME}
    

    Where LOCATOR-POD-NAME is the name of a locator pod, as listed in the kubectl get pods command, and NAMESPACE_NAME is an environment variable whose value is your chosen name for the Tanzu GemFire cluster namespace

  2. Within the shell on the locator, retrieve the locator’s fully qualified domain name.

    hostname -f
    
  3. Launch gfsh.

    gfsh
    
  4. Connect to the cluster with gfsh:

    gfsh> connect --locator=<LOCATOR-FQDN>[10334] --security-properties-file=/security/gfsecurity.properties
    
  5. Confirm the presence of persistent regions with data using gfsh:

    gfsh>list regions
    List of regions
    ----------------------------------------------------------
    system-test-client-7f4475f66-cxqwr-region-2b6b4ddf141dce44
    system-test-client-7f4475f66-cxqwr-region-6c4ac1be94dfc5da
    system-test-client-7f4475f66-f9knw-region-15984b2673b8251a
    system-test-client-7f4475f66-f9knw-region-1c88a17ccea4a17c
    system-test-client-7f4475f66-ms6q7-region-1aaff046364ca9a2
    system-test-client-7f4475f66-ms6q7-region-a71eeb50f734d120
    system-test-client-7f4475f66-nvgcb-region-8dad70a328d45880
    system-test-client-7f4475f66-nvgcb-region-e98ec4bb28f87aa7
    system-test-client-7f4475f66-pxtcb-region-3afd5eef182258de
    system-test-client-7f4475f66-pxtcb-region-c1f5abdafd699866
    
    gfsh>describe region --name=system-test-client-7f4475f66-cxqwr-region-2b6b4ddf141dce44
    Name            : system-test-client-7f4475f66-cxqwr-region-2b6b4ddf141dce44
    Data Policy     : persistent partition
    Hosting Members : system-test-gemfire-server-1
                      system-test-gemfire-server-2
                      system-test-gemfire-server-0
    
    Non-Default Attributes Shared By Hosting Members
    
      Type    |       Name       | Value
    --------- | ---------------- | --------------------
    Region    | size             | 820
              | data-policy      | PERSISTENT_PARTITION
    Partition | redundant-copies | 1
    
  6. Create backup files for each member of the cluster. By default, a persistent volume at /data is mounted by each GemFire member. The command gfsh>backup disk-store --dir=/data/backup will then create backup files at /data/backup.

    gfsh>backup disk-store --dir=/data/backup
    
    The following disk stores were backed up successfully
    
               Member             |                 UUID                 |                     Directory                     | Host
    ----------------------------- | ------------------------------------ | ------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------
    system-test-gemfire-server-0  | c0a27f6c-8403-43c1-976d-56ef85fa464e | /data/.                                           | system-test-gemfire-server-0.system-test-gemfire-server.gemfire-system-test-6260d8f3-5c20-4d9b-b878-48bfaf42cea8...
                                  | a3c4aa0d-eaae-4773-96f6-fbbc75cbef31 | /data/pdxmetadata                                 | system-test-gemfire-server-0.system-test-gemfire-server.gemfire-system-test-6260d8f3-5c20-4d9b-b878-48bfaf42cea8...
    system-test-gemfire-locator-0 | 1fb7bcfa-a467-497a-8338-2cd92ff59d4c | /data/ConfigDiskDir_system-test-gemfire-locator-0 | system-test-gemfire-locator-0.system-test-gemfire-locator.gemfire-system-test-6260d8f3-5c20-4d9b-b878-48bfaf42cea..
    system-test-gemfire-server-1  | 032b5ed6-6eac-43ee-97ac-6f6d5aca553a | /data/pdxmetadata                                 | system-test-gemfire-server-1.system-test-gemfire-server.gemfire-system-test-6260d8f3-5c20-4d9b-b878-48bfaf42cea8...
                                  | 484d933f-a46b-4e39-acdb-fb7c29beffe9 | /data/.                                           | system-test-gemfire-server-1.system-test-gemfire-server.gemfire-system-test-6260d8f3-5c20-4d9b-b878-48bfaf42cea8...
    system-test-gemfire-locator-1 | 6a9c9ee6-5f6d-4170-91bd-84c943d6c7e0 | /data/ConfigDiskDir_system-test-gemfire-locator-1 | system-test-gemfire-locator-1.system-test-gemfire-locator.gemfire-system-test-6260d8f3-5c20-4d9b-b878-48bfaf42cea..
    system-test-gemfire-server-2  | 7714dfb4-6019-40c0-a585-6f053820b9c3 | /data/.                                           | system-test-gemfire-server-2.system-test-gemfire-server.gemfire-system-test-6260d8f3-5c20-4d9b-b878-48bfaf42cea8...
                                  | b572e497-3b2e-445c-9280-775dceaaa9c8 | /data/pdxmetadata                                 | system-test-gemfire-server-2.system-test-gemfire-server.gemfire-system-test-6260d8f3-5c20-4d9b-b878-48bfaf42cea8...
    

Copy Data Out

Copy the backup data from the mounted volume to a local directory or other location unaffected by Kubernetes cluster failures as follows. The instructions below copy the backup data to the local directory ./mybackupdata.

  1. Create a list of locator pod names:

    locatorPodList=$(kubectl get pods -l 'app.kubernetes.io/component=gemfire-locator' --no-headers -o custom-columns=":metadata.name" --all-namespaces)
    
  2. Create a list of server pod names:

    serverPodList=$(kubectl get pods -l 'app.kubernetes.io/component=gemfire-server' --no-headers -o custom-columns=":metadata.name" --all-namespaces)
    
  3. Copy locator backup data to a local directory or other location unaffected by Kubernetes cluster failures. The following command copies server backup data to a local directory named ./mybackupdata:

    for i in ${locatorPodList[@]}; do kubectl cp $i:/data/backup/ ./mybackupdata/$i -n ${NAMESPACE_NAME}; done
    
  4. Copy server backup data to a local directory or other location unaffected by Kubernetes cluster failures. The following command copies server backup data to a local directory named ./mybackupdata:

    for i in ${serverPodList[@]}; do kubectl cp $i:/data/backup/ ./mybackupdata/$i -n ${NAMESPACE_NAME}; done
    
  5. Verify the existence of local copies of the backup data:

    ls mybackupdata/
    total 0
    drwxr-xr-x   4 system-test-gemfire-server-2
    drwxr-xr-x   7 .
    drwxr-xr-x   4 system-test-gemfire-server-1
    drwxr-xr-x   4 system-test-gemfire-server-0
    drwxr-xr-x   4 system-test-gemfire-locator-1
    drwxr-xr-x   4 system-test-gemfire-locator-0
    drwxr-xr-x  44 ..
    

Create New Volumes

Create volumes that can be mounted on Tanzu GemFire cluster members as follows. Because we must create several PVCs, one for each member of the cluster, we will use a placeholder for the PVC name, then use sed to substitute in the appropriate name.

  1. Create a YAML file named locatorpvc.yaml for the locators:

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      labels:
        gemfire.vmware.com/app: system-test-gemfire-locator
      name: data-POD_NAME_PLACEHOLDER
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 2Gi
      storageClassName: standard
      volumeMode: Filesystem
    
  2. Create a YAML file named serverpvc.yaml for the servers:

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      labels:
        gemfire.vmware.com/app: system-test-gemfire-server
      name: data-POD_NAME_PLACEHOLDER
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 4Gi
      storageClassName: standard
      volumeMode: Filesystem
    

    Note: The label gemfire.vmware.com/app has a different value for locators and servers. This label is used to assign a pod to a given PVC with the volume id data and mount /data. In addition, more memory is assigned to servers than locators (4Gi vs 2Gi).

  3. Create the locator PVCs:

    for i in ${locatorPodList[@]}; do sed -E "s/POD_NAME_PLACEHOLDER/${i}/g" locatorpvc.yaml | kubectl -n ${NAMESPACE_NAME} apply -f -; done
    
  4. Create the server PVCs:

    for i in ${serverPodList[@]}; do sed -E "s/POD_NAME_PLACEHOLDER/${i}/g" serverpvc.yaml | kubectl -n ${NAMESPACE_NAME} apply -f -; done
    

Copy Data In and Mount Volumes

To copy the backup data from local directly to the new PVCs, we create minimal pods that mount the PVCs, then use kubectl cp to transfer the files.

We define a pod YAML file with a placeholder for the pod name in addition to the claimName. These placeholders will be substituted appropriately during creation. The claimName will be the PVC name that corresponds to a given GemFire member. For example, the locator pod with name my-gemfire-cluster-locator-0 will mount the PVC with name data-my-gemfire-cluster-locator-0.

  1. Create a YAML file named restorepod.yaml.

    apiVersion: v1
    kind: Pod
    metadata:
      name: restore-POD_NAME_PLACEHOLDER
    spec:
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: data-POD_NAME_PLACEHOLDER
      containers:
      - image: alpine:latest
        command:
          - sleep
          - "360000"
        imagePullPolicy: IfNotPresent
        name: restore
        volumeMounts:
        - mountPath: "/data"
          name: data
      restartPolicy: Always
    
  2. Create a restore pod for each GemFire member using the locator and server pod name lists that we created in Copy Data Out.

    for i in ${locatorPodList[@]}; do sed -E "s/POD_NAME_PLACEHOLDER/${i}/g" restorepod.yaml | kubectl -n ${NAMESPACE_NAME} apply -f -; done
    
    for i in ${serverPodList[@]}; do sed -E "s/POD_NAME_PLACEHOLDER/${i}/g" restorepod.yaml | kubectl -n ${NAMESPACE_NAME} apply -f -; done
    
  3. With a restore pod for each GemFire member running, we copy the backup files from the local directory or other location unaffected by Kubernetes cluster failures where we originally copied the data in Copy Data Out. The following commands copy locator and server backup data from a local directory named ./mybackupdata to each restore pod.

    for i in ${locatorPodList[@]}; do kubectl cp ./mybackupdata/$i restore-$i:/data/backup/ -n ${NAMESPACE_NAME}; done
    
    for i in ${serverPodList[@]}; do kubectl cp ./mybackupdata/$i restore-$i:/data/backup/ -n ${NAMESPACE_NAME}; done
    
  4. Within the running restore pods, confirm the presence of the backup files and restore script. For example:

    ls data/backup/2022-05-05-21-27-37/system_test_gemfire_locator_0_system_test_gemfire_locator_0_1_locator_ec_v0_41000/
    README_FILE.txt  config           diskstores       restore.sh       user
    
    cat data/backup/2022-05-05-21-27-37/system_test_gemfire_locator_0_system_test_gemfire_locator_0_1_locator_ec_v0_41000/restore.sh
    #!/bin/bash -e
    cd `dirname $0`
    
    # Restore a backup of GemFire persistent data to the location it was backed up
    # from. This script will refuse to restore if the original data still exists.
    # This script was automatically generated by the GemFire backup utility.
    
    # Test for existing originals. If they exist, do not restore the backup.
    test -e '/data/ConfigDiskDir_system-test-gemfire-locator-0/BACKUPcluster_config.if' && echo 'Backup not restored. Refusing to overwrite /data/ConfigDiskDir_system-test-gemfire-locator-0/BACKUPcluster_config.if' && exit 1
    
    # Restore data
    mkdir -p '/data/ConfigDiskDir_system-test-gemfire-locator-0'
    cp -rp 'diskstores/cluster_config_1fb7bcfaa467497a-83382cd92ff59d4c/dir0'/* '/data/ConfigDiskDir_system-test-gemfire-locator-0'
    
  5. Enable /bin/bash on each restore pod. The Alpine Docker image does not have /bin/bash enabled by default.

    for i in ${locatorPodList[@]}; do kubectl exec -it -n ${NAMESPACE_NAME} restore-$i -- sh -c "apk add --no-cache bash"; done
    
    for i in ${serverPodList[@]}; do kubectl exec -it -n ${NAMESPACE_NAME} restore-$i -- sh -c "apk add --no-cache bash"; done
    
  6. Make the restore.sh script on each locator and server pod executable:

    for i in ${locatorPodList[@]}; do kubectl exec -it -n ${NAMESPACE_NAME} restore-$i -- sh -c "chmod +x /data/backup/*/*/restore.sh"; done
    
    for i in ${serverPodList[@]}; do kubectl exec -it -n ${NAMESPACE_NAME} restore-$i -- sh -c "chmod +x /data/backup/*/*/restore.sh"; done
    

Start New Cluster

Start a Tanzu GemFire cluster that uses the restored volumes attached to its locators and servers.

  1. Create locator view files for all locators except for locator 0, otherwise the system will not properly start up from backup.

    for i in ${locatorPodList[@]}; do if [ "${i: -1}" -ne 0 ]; then kubectl exec -it -n ${NAMESPACE_NAME} restore-$i -- sh -c "touch /data/locator10334view.dat"; fi; done
    
  2. Run the restore scripts on all locator restore pods. This places the backup files in their expected locations:

    for i in ${locatorPodList[@]}; do kubectl exec -it -n ${NAMESPACE_NAME} restore-$i -- /bin/bash -c "./data/backup/*/*/restore.sh"; done
    
  3. Run the restore scripts on all server restore pods. This places the backup files in their expected locations:

    for i in ${serverPodList[@]}; do kubectl exec -it -n ${NAMESPACE_NAME} restore-$i -- /bin/bash -c "./data/backup/*/*/restore.sh"; done
    
  4. After the restore scripts have finished running, delete the restore pods.

  5. Launch a new Tanzu GemFire cluster and inspect the regions:

    1. Shell into the locator pod:

      kubectl exec -it LOCATOR-POD-NAME sh -n ${NAMESPACE_NAME}
      

      where LOCATOR-POD-NAME is the name of a locator, as listed in the kubectl get pods command, and NAMESPACE_NAME is an environment variable whose value is your chosen name for the Tanzu GemFire cluster namespace.

    2. Within the shell on the locator, retrieve the locator’s fully qualified domain name.

      hostname -f
      
    3. Launch gfsh.

      gfsh
      
    4. Connect to the cluster with gfsh:

      gfsh> connect --locator=<LOCATOR-FQDN>[10334] --security-properties-file=/security/gfsecurity.properties
      
    5. Confirm the presence of the recovered regions:

      gfsh>list regions
      List of regions
      ----------------------------------------------------------
      system-test-client-7f4475f66-cxqwr-region-2b6b4ddf141dce44
      system-test-client-7f4475f66-cxqwr-region-6c4ac1be94dfc5da
      system-test-client-7f4475f66-f9knw-region-15984b2673b8251a
      system-test-client-7f4475f66-f9knw-region-1c88a17ccea4a17c
      system-test-client-7f4475f66-ms6q7-region-1aaff046364ca9a2
      system-test-client-7f4475f66-ms6q7-region-a71eeb50f734d120
      system-test-client-7f4475f66-nvgcb-region-8dad70a328d45880
      system-test-client-7f4475f66-nvgcb-region-e98ec4bb28f87aa7
      system-test-client-7f4475f66-pxtcb-region-3afd5eef182258de
      system-test-client-7f4475f66-pxtcb-region-c1f5abdafd699866
      
check-circle-line exclamation-circle-line close-line
Scroll to top icon