You can restore the Replica node intermittent backup data onto a parallel system to run analytics and find errors.
Prerequisites
Verify that you have backed up the Replica nodes data. See Intermittent Back-Up for VMware Blockchain Nodes.
Procedure
- Clone your deployment environment.
- Pause all the Replica nodes at the same checkpoint from the operator container and check the status periodically until all the Replica nodes' status is true.
Any blockchain node or nodes in state transfer or down for other reasons cause the wedge status command to return false. The wedge status command returns true when state transfer completes and all Replica nodes are healthy, allowing all Replica nodes to stop at the same checkpoint successfully.
Wedge command might take some time to complete. The metrics dashboards indicate nodes that have stopped processing blocks as they have been wedged. If you notice a false report in the dashboard, contact the VMware Blockchain support to diagnose the Replica nodes experiencing the problem.
docker exec -it operator sh -c './concop wedge stop' {"succ":true} docker exec -it operator sh -c './concop wedge status' {"192.168.100.107":true,"192.168.100.108":true,"192.168.100.109":true,"192.168.100.110":true}
- Delete the old database data.
rm -rf /mnt/data/rocksdbdata/*
- Copy the data files from the intermittent backup Replica node directory to the cloned Replica node directory.
Note:
While copying the data files, make sure that the file permissions are not changed.
- Sanitize the Replica nodes.
docker run -it --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata <image_name> /concord/kv_blockchain_db_editor /concord/rocksdbdata removeMetadata #On each concord container where the data is being sanitized rm /config/concord/config-generated/gen*
The <image_name> is the Concord-core image name in the blockchain.
vmwaresaas.jfrog.io/vmwblockchain/concord-core:1.3.0.0.49
- Delete the LOCK file from Replica nodes.
The sanitization process creates a LOCK file under /mnt/data/rocksdbdata
directory.
rm -rf /mnt/data/rocksdbdata/LOCK
- Start all the Replica nodes.
curl -X POST 127.0.0.1:8546/api/node/management?action=start
- (Optional) If the containers are not running, use the command to restart the containers.
docker ps -aq | grep -v $(docker ps -aq --filter='name=^/agent') | xargs docker rm -rf docker restart agent
- Verify that all the containers such as daml_execution_engine and concord are running.
docker ps -a
- (Optional) If the containers must be restarted with resource limits, modify the /config/agent/config.json file on all the nodes before starting the agent.
vi/config/agent/config.json
Modify the resourceThresholds value for the container resource limits.
- Start all the Client nodes.
curl -X POST 127.0.0.1:8546/api/node/management?action=start
- If there are Full Copy Client nodes, start all of the nodes.
curl -X POST 127.0.0.1:8546/api/node/management?action=start
- Monitor the deployed VMware Blockchain nodes' health and check whether new blocks are added to the DAML Ledger from the logs and metrics for about five minutes.
docker exec -it telegraf curl -s http://concord:9891/metrics | grep -ia last_block | tail -1 docker exec -it concord sh -c './concord-ctl status get state-transfer' | grep Fetching docker exec -it concord sh -c './concord-ctl status get replica' | grep -E 'lastStableSeqNum|curView' docker logs --since 1m -f concord | grep -ia addBlock | cut -d '|' -f 3,10