Backup Replica Nodes with RocksDB Checkpoint on vSphere

Replica node RockDB checkpoint-based backup can be used for operational recovery, creating a blockchain, disaster recovery, or auditing.

The RocksDB checkpoint-based backup does not require a maintenance window. You can create a backup when the system is running.

Note:

You can schedule either a RocksDB checkpoint-based backup or intermittent backup. You cannot configure both types of backup processes to run simultaneously.

A RocksDB checkpoint-based backup is enabled by default on all the Replica nodes with the following parameter values:

"NUM_DB_SNAPSHOTS": "2",
"DB_SNAPSHOT_INTERVAL_HOURS": "6"

Before deployment, you can configure these parameter values by updating them in the infrastructure_descriptor.json file. See the Advanced Features Parameters section for details, Configuring the Infrastructure Descriptor Parameters on vSphere.

You can also create an on-demand RocksDB checkpoint-based backup using the operator container.

Prerequisites

Verify that you have the deployed blockchain ID information.
Familiarize yourself with the backup and restore consideration for your VMware Blockchain nodes. See VMware Blockchain Node Backup and Restore Considerations on vSphere.
Verify that you have access to the latest version of VMware Blockchain.
Verify that you have captured the IP addresses of all the Client and Replica node VMs and have access to them. You can find the information in the VMware Blockchain Orchestrator descriptor file.

Procedure

After deployment, validate that the default or customized values appear in the deployment configuration file.
```
cat /config/concord/config-local/deployment.config | grep FEATURE_db_checkpoints
FEATURE_db_checkpoints: true
```
The Concord container creates a database checkpoint when the listed default configuration is set to True.
- FEATURE_db_checkpoints - True
- db_checkpoint_duration - 21600
- num_of_rocksdbcheckpoints - 2
- dbcheckpoint_window - 30000
- db_checkpoint_diskspace_threshold - 0.5
- db_checkpoint_path - "rocksdbdata/checkpoints"
The db_checkpoint_duration parameter sets the time since the last database checkpoint is greater than 21600 seconds and the sequence number 30000 or more have been executed since the last database checkpoint.

The db_checkpoint_diskspace_threshold parameter defines the threshold factor to detect if RocksDB checkpoint disk usage is within the defined limit. This defined limit means that the free disk space available to the Concord container does not go under the threshold factor of 0.5 times of RocksDB size.

The db_checkpoint_path parameter defines the path in the file system where database checkpoints are created.

(Optional) Instantiate the operator container to create an on-demand database checkpoint.

You can only use one Client node, part of the original blockchain, as an operator container.

SSH into a Client node.
Verify that the Client node has an operator container image and identify the operator image ID.
```
sudo docker images | grep "operator"
```
Verify that the operator container configuration file has content.
```
sudo cat /config/daml-ledger-api/concord-operator/operator.config
```
For deployments with unencrypted configuration, copy the private key content into the following location.
```
sudo vi /config/daml-ledger-api/concord-operator/operator_priv.pem
```

For deployments with encrypted configuration, deploy the encrypted operator private key content into the following location.

sudo docker run -ti --network=blockchain-fabric --name=operator --entrypoint /operator/install_private_key.py --rm -v /config/daml-ledger-api/concord-operator:/operator/config-local -v /config/daml-ledger-api/concord-operator:/concord/config-public -v /config/clientservice/cert:/config/clientservice/cert -v /config/daml-ledger-api/config-public:/operator/config-public <Operator_IMAGE_ID>

Launch the Client node operator container.

sudo docker run -d --network=blockchain-fabric --name=operator -v /config/daml-ledger-api/concord-operator:/operator/config-local -v /config/daml-ledger-api/concord-operator:/concord/config-public -v /config/daml-ledger-api/config-local/cert:/config/daml-ledger-api/config-local/cert -v /config/daml-ledger-api/config-public:/operator/config-public <Operator_IMAGE_ID>

A sample operator image is vmwaresaas.jfrog.io/vmwblockchain/operator:1.6.0.0.234.

Create the on-demand database checkpoint on the Client node where the operator is instantiated and verify the checkpoint status.

# create db checkpoint on the next stable sequence number
sudo docker exec -it operator sh -c './concop dbCheckpoint create'
 
# get the information of all created dbCheckpoints on each replica
sudo docker exec -it operator sh -c './concop dbCheckpoint status'

(Optional) Run the Wedge stop command to create a database checkpoint.
The command includes the latest checkpoint while any maintenance activity is running.
```
#creates a db checkpoint when replica nodes stop./concop wedge stop             
```

Verify that the database checkpoints are successfully created.
1. SSH into a Replica node.
2. Verify the newly created database checkpoints.
```
ls -lrt /mnt/data/rocksdbdata/checkpoints/
```

What to do next

If there is a Replica node failure, you can restore it from the RocksDB checkpoint backup data. See Restore Replica Nodes with RocksDB Checkpoint Data on vSphere.