You can restore the Replica node intermittent backup data onto a parallel system to run analytics and find errors.

Prerequisites

Procedure

  1. Clone your deployment environment.
  2. Pause all the Replica nodes at the same checkpoint from the operator container and check the status periodically until all the Replica nodes' status is true.

    Any blockchain node or nodes in state transfer or down for other reasons cause the wedge status command to return false. The wedge status command returns true when state transfer completes and all Replica nodes are healthy, allowing all Replica nodes to stop at the same checkpoint successfully.

    Wedge command might take some time to complete. The metrics dashboards indicate nodes that have stopped processing blocks as they have been wedged. If you notice a false report in the dashboard, contact the VMware Blockchain support to diagnose the Replica nodes experiencing the problem.

    sudo docker exec -it operator sh -c './concop wedge stop' {"succ":true} 
    sudo docker exec -it operator sh -c './concop wedge status' {"192.168.100.107":true,"192.168.100.108":true,"192.168.100.109":true,"192.168.100.110":true} 
  3. Sanitize the Replica nodes.
    sudo docker run -it --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata <image_name> /concord/kv_blockchain_db_editor /concord/rocksdbdata removeMetadata
     
    #On each concord container where the data is being sanitized sudo rm /config/concord/config-generated/gen*

    The <image_name> is the Concord-core image name in the blockchain.

    vmwaresaas.jfrog.io/vmwblockchain/concord-core:1.6.0.0.234

  4. Delete the old database data.
    sudo rm -rf /mnt/data/rocksdbdata/*
  5. Copy the data files from the intermittent backup Replica node directory to the cloned Replica node directory.
    Note:

    While copying the data files, make sure that the file permissions are not changed.

  6. Delete the LOCK file from Replica nodes.

    The sanitization process creates a LOCK file under the /mnt/data/rocksdbdatdirectory.

    sudo rm -rf /mnt/data/rocksdbdata/LOCK

  7. Start all the Replica nodes.
    curl -X POST 127.0.0.1:8546/api/node/management?action=start
  8. (Optional) From the operator container, unwedge the system.
    ./concop unwedge 
    # unwedge all replicas {'succ': True} 
    ./concop wedge status 
    # Check the wedge status of the replicas
  9. (Optional) If the containers are not running, use the command to restart the containers.
    sudo docker ps -aq | grep -v $(sudo docker ps -aq --filter='name=^/agent') | xargs 
    sudo docker rm -rf
    sudo docker restart agent
  10. Verify that all the containers, such as daml_execution_engine and concord, are running.

    sudo docker ps -a

  11. (Optional) If the containers must be restarted with resource limits, modify the /config/agent/config.json file on all the nodes before starting the agent.
    sudo vi /config/agent/config.json

    Modify the resourceThresholds value for the container resource limits.

  12. Start all the Client node components.
    curl -X POST 127.0.0.1:8546/api/node/management?action=start 
  13. If there are Full Copy Client nodes, start all of the nodes.
    curl -X POST 127.0.0.1:8546/api/node/management?action=start
  14. Monitor the deployed VMware Blockchain nodes' health and check whether new blocks are added to the Daml Ledger from the logs and metrics for about five minutes.
    sudo docker exec -it telegraf curl -s http://concord:9891/metrics | grep -ia last_block | tail -1
    sudo docker exec -it concord sh -c './concord-ctl status get state-transfer' | grep Fetching
    sudo docker exec -it concord sh -c './concord-ctl status get replica' | grep -E 'lastStableSeqNum|curView'
    sudo docker logs --since 1m -f concord | grep -ia addBlock | cut -d '|' -f 3,10