As a best practice, you must back up the data on the Client nodes within the Client group and Replica nodes in the Replica Network to perform operations such as restoring or cloning.
Backing up the Client and Replica nodes data lets you access the data if an operation fails, and you must restore the lost data.
Procedure
- Stop all the applications that invoke connection requests to the Daml Ledger.
- Stop the Client node components.
curl -X POST 127.0.0.1:8546/api/node/management?action=stop
vmbc@localhost [ ~ ]# curl -X POST 127.0.0.1:8546/api/node/management?action=stop vmbc@localhost [ ~ ]# sudo docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 218a1bdaddd6 vmwaresaas.jfrog.io/vmwblockchain/operator:1.6.0.0.234 "/operator/operator_…" 18 hours ago Up 18 hours operator cd476a6b3d6c vmwaresaas.jfrog.io/vmwblockchain/agent:1.6.0.0.234 "java -jar node-agen…" 18 hours ago Up 18 hours 127.0.0.1:8546->8546/tcp agent vmbc@localhost [ ~ ]#
- Repeat the stop operation on each Client node in the Client group.
- Verify that all the containers except the
agent
and deployed operator
container are stopped.
sudo docker ps -a
If the sudo docker ps -a command shows that some containers, with the exception agent and deployed operator container, are still running, then rerun the command or use the curl -X POST 127.0.0.1:8546/api/node/management?action=stop command to stop the containers.
- On the destination Client node, change ownership.
sudo chown -R vmbc /mnt/data/db
- Back up the data on each of the Client nodes in the group.
sudo tar cvzf <backup_name> /mnt/data/db
#For data greater than 64GB
cd /mnt/data/
sudo nohup tar cvzf <BackupName> db & tail -f nohup.out
#wait for the tar to complete
The <backup_name> must end in .tar.gz. For example, db-backup.tar.gz.
The rsync command might time out due to SSH inactivity. Incrementally rerun the command.
- (Optional) If the operator containers were deployed, pause all the Replica nodes at the same checkpoint from the operator container and check the status periodically until all the Replica nodes' status is true.
Any blockchain node or nodes in state transfer or down for other reasons cause the wedge status command to return false. The wedge status command returns true when state transfer completes and all Replica nodes are healthy, allowing all Replica nodes to stop at the same checkpoint successfully.
Wedge command might take some time to complete. The metrics dashboards indicate nodes that have stopped processing blocks as they have been wedged. If you notice a false report in the dashboard, contact the VMware Blockchain support to diagnose the Replica nodes experiencing the problem.
sudo docker exec -it operator sh -c './concop wedge stop' {"succ":true}
sudo docker exec -it operator sh -c './concop wedge status' {"192.168.100.107":true,"192.168.100.108":true,"192.168.100.109":true,"192.168.100.110":true}
- Verify that the following metrics indicate that your blockchain network is operating properly.
Option |
Description |
Metrics |
Description |
Blocks per second metrics |
All the blockchain nodes must process blocks because time blocks are constantly being added. The nodes should be a positive number to be considered in a healthy state. |
FastPaths |
All Replica nodes must report in a fast path, and none reporting in a slow path. When the Blocks per second metrics indicate an unhealthy state, the wedge status is always false until all the nodes have stopped at the same checkpoint. |
- Stop the Replica node.
curl -X POST 127.0.0.1:8546/api/node/management?action=stop
vmbc@localhost [ ~ ]# curl -X POST 127.0.0.1:8546/api/node/management?action=stop
vmbc@localhost [ ~ ]# sudo docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3b7135c677cf vmwaresaas.jfrog.io/vmwblockchain/agent:1.6.0.0.234 "java -jar node-agen…" 20 hours ago Up 20 hours 127.0.0.1:8546->8546/tcp agent
- Repeat the stop operation on each Replica node in the Replica Network.
- Verify that all the containers except for the
agent
are stopped.
sudo docker ps -a
If the sudo docker ps -a command shows that some containers beside the agent are running, then rerun the command or use the sudo docker stop <container_name> command to stop the containers.
- Check that all the Replica nodes are stopped in the same state.
Verifying that the LastReacheableBlockID and LastBlockID sequence number of each Replica node stopped helps determine if any nodes lag.
If there is a lag when you power on the Replica Network, some Replica nodes in the state-transfer mode might have to catch up. Otherwise, it can result in a failed consensus and require restoring each Replica node from the latest single copy.
sudo docker run -it --rm --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata <ImageName> /concord/kv_blockchain_db_editor /concord/rocksdbdata getLastBlockID
sudo docker run -it --rm --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata <image_name> /concord/kv_blockchain_db_editor /concord/rocksdbdata getLastReachableBlockID
The <image_name> is the Concord-core image name in the blockchain.
vmwaresaas.jfrog.io/vmwblockchain/concord-core:1.6.0.0.234
- On the destination Replica node, change ownership.
sudo chown -R vmbc /mnt/data/db
- Back up the data on each of the Replica nodes.
sudo tar cvzf <backup_name> /mnt/data/db
#For data greater than 64GB
cd /mnt/data/ sudo nohup tar cvzf <BackupName> db & tail -f nohup.out
#wait for the tar to complete
The <backup_name> must end in .tar.gz. For example, db-backup.tar.gz.
The rsync command might time out due to SSH inactivity. Incrementally rerun the command.
- Start all the Replica nodes.
curl -X POST 127.0.0.1:8546/api/node/management?action=start
- From the operator container, unwedge the system.
./concop unwedge
# unwedge all replicas {'succ': True}
./concop wedge status
# Check the wedge status of the replicas
- Start all the Client node components.
curl -X POST 127.0.0.1:8546/api/node/management?action=start
- Start all the applications that invoke connection requests to the Daml Ledger.
What to do next
If there is a failure or you want to clone a deployment, you can restore it from the backup data. See Restore Replica Nodes from the Backup Data on vSphere.