You can optionally restore a failed Replica node from the backup data.

From the moment when the EBS volume snapshot was taken, the snapshot can be used to restore your data to a new EBS volume. The replicated EBS volume loads data in restored the background so that you can use it immediately. For more information, see Amazon EBS snapshot documentation.

Prerequisites

  • Verify that the backup was created from a healthy Replica node. See Back-Up Replica Nodes on AWS.

  • Verify that you have captured the IP addresses of all the Replica and Client node VMs and have access to them. You can find the information in the VMware Blockchain Orchestrator descriptor file.

  • Verify that you have the volume snapshot ID from which the backup must be restored.

Procedure

  1. Stop all the applications that invoke connection requests from the Daml Ledger.
  2. Pause all the Replica nodes at the same checkpoint from the Concord operator container and check the status periodically until all the Replica nodes' status is true.

    Any blockchain node or nodes in state transfer or down for other reasons cause the wedge status command to return false. The wedge status command returns true when state transfer completes and all Replica nodes are healthy, allowing all Replica nodes to stop at the same checkpoint successfully.

    Wedge command might take some time to complete. The metrics dashboards indicate nodes that have stopped processing blocks as they have been wedged. If you notice a false report in the dashboard, contact the VMware Blockchain support to diagnose the Replica nodes experiencing the problem. If the Wedge command times out, the system operator must execute the Wedge command again.

    sudo docker exec -it operator sh -c './concop wedge stop' {"succ":true} 
    sudo docker exec -it operator sh -c './concop wedge status' {"192.168.100.107":true,"192.168.100.108":true,"192.168.100.109":true,"192.168.100.110":true} 
  3. Stop the Replica nodes.
    curl -X POST 127.0.0.1:8546/api/node/management?action=stop
  4. Verify that all the containers except the agent and deployed Concord operator container are stopped.

    sudo docker ps -a

    If the sudo docker ps -a command shows that some containers, with the exception agent and deployed Concord operator container, are still running, then rerun the command or use the sudo docker stop <container_name> command to stop the containers.

  5. In the EC2 interface, navigate to Snapshot and filter based on the snapshot ID.
  6. Select Actions > Create volume from snapshot > Create Volume.
  7. Click Volume ID in the title menu.
  8. Select EC2 > Volume > Volume ID to validate the current path.
  9. Select Actions > Attach volume to attach the EBS volume to a blockchain node.

    The EC2 instance shows the EBS volumes attached to the blockchain node.

  10. Type the node IP address in the Instance Info section.
  11. SSH into the blockchain node and validate that the EC2 instance has three volumes.
    fdisk -l
  12. (Optional) If you are restoring from one Replica node to another, identify the target Replica node ID.
    sudo docker run -it --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata vmwaresaas.jfrog.io/vmwblockchain/concord-core:1.8.0.0.53 /concord/kv_blockchain_db_editor /concord/rocksdbdata getSTMetadata

    The target Replica node ID is the MyReplicaId, used during the restore process.

  13. Replace the data volume.
    sudo umount -d /dev/xvdb
    sudo mount -o nouuid /dev/xvdf /mnt/data/
  14. (Optional) If you are restoring from one Replica node to another, reset metadata with the target Replica node ID.
    sudo docker run -it --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata vmwaresaas.jfrog.io/vmwblockchain/concord-core:1.8.0.0.53 /concord/kv_blockchain_db_editor /concord/rocksdbdata resetMetadata <node id> 

    The target Replica node ID is that MyReplicaId that was identified in the output of the Step 12.

  15. Start all the Replica nodes.
    curl -X POST 127.0.0.1:8546/api/node/management?action=start
  16. Verify that all the containers, such as daml_execution_engine and concord, are running.

    sudo docker ps -a

    If the containers are not running, use the command to restart the containers.

    sudo docker ps -aq | grep -v $(sudo docker ps -aq --filter='name=^/agent') | xargs 
    sudo docker rm -rf
    sudo docker restart agent
  17. From the Concord operator container, unwedge the system.
    ./concop unwedge 
    # unwedge all replicas {'succ': True} 
    ./concop wedge status 
    # Check the wedge status of the replicas
  18. Start all the applications that invoke connection requests to the Daml Ledger.
  19. (Optional) To restore all the Replica nodes, repeat the restore steps except, the restoring from one Replica node to another steps, on all the nodes and make sure that the volume created from the EC2 instance snapshot is mounted on the same EC2 instance.

    For example, the Snapshot_1 created from the data volume of Instance_1 and Volume_1 created from Snapshot_1 are mounted on Instance_1.