You can optionally restore single or multiple Replica nodes from the RocksDB checkpoint-based backup data.

Prerequisites

  • Verify that there is a RocksDB checkpoint-based backup available. See Backup Replica Nodes with RocksDB Checkpoint on vSphere.

  • Verify that the checkpoint-based backup was created from a healthy Replica node.

  • Verify that you have captured the IP addresses of all the Client and Replica node VMs and have access to them. You can find the information in the VMware Blockchain Orchestrator descriptor file.

  • Verify that you have captured the backup server IP address where you plan to transfer and retain the backups.

Procedure

  1. Restore a Replica node from an existing checkpoint-based backup.

    Database checkpoints are created in the /mnt/data/rocksdbdata/checkpoints/<checkpoint_id> directory. The <checkpoint_id> is the last block number in the database checkpoint. The <checkpoint_id> is used to name the directory.

    1. Verify the integrity of the checkpoint-based backup.
      image=$(sudo docker images --format "{{.Repository}}:{{.Tag}}" | grep "concord");sudo docker run -it --rm --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata/checkpoints/<checkpoint_dir>/,target=/concord/rocksdbdata $image /concord/kv_blockchain_db_editor /concord/rocksdbdata verifyDbCheckpoint [NumberOfBlocksToVerify] [true]

      The ImageName is the Concord core image name with the version tag. The NumberOfBlocksToVerify is an optional parameter where you can specify the number of blocks to verify. If the value of NumberOfBlocksToVerify is high, the command requires additional time to verify.

      Note:

      The Client node triggers the creation of a default database checkpoint shortly after deployment. The database checkpoint is required to provide a verifiable snapshot of the state. Verifying the integrity of this default database checkpoint is expected to fail because the checkpoint window has not yet been reached.

    2. Depending on your database size, restore the backup files from a healthy Replica node RocksDB checkpoint directory to a failed Replica node RocksDB directory database greater than 64 GB.

      Adhere to the prescribed order of procedures to avoid an error.

          #1. Clean rocksdb folder on failed replica node
           sudo find /mnt/data/rocksdbdata -maxdepth 1 -name '*.sst' -delete && sudo rm -rf /mnt/data/rocksdbdata/*
      
          #2. Create a temp dir on failed replica node
           sudo mkdir -p /mnt/data/temp_chkpt && sudo chown -R vmbc /mnt/data/temp_chkpt
      
          #3. On healthy replica node execute rsync command
           sudo rsync -avh /mnt/data/rocksdbdata/checkpoints/<checkpoint_id> vmbc@<ip_addr>:/mnt/data/temp_chkpt/
      
          #4. On failed replica node, move rocksdb files from temp to rocksdb folder
           sudo find /mnt/data/temp_chkpt/<checkpoint_id> -maxdepth 1  -name '*.*'  -exec sudo mv {} /mnt/data/rocksdbdata/ \; && sudo mv  /mnt/data/temp_chkpt/<checkpoint_id>/* /mnt/data/rocksdbdata/  && sudo rm -rf /mnt/data/temp_chkpt
    3. Depending on your database size, restore the backup files from a healthy Replica node RocksDB checkpoint directory to a failed Replica node RocksDB directory database smaller than 64 GB.

      Adhere to the prescribed order of procedures to avoid an error.

      #Execute these steps on source replica from where db checkpoint is needs to be copied
      #1. Creating tar of db checkpoint backup
      sudo tar -cvzf <BackupName> -C /mnt/data/rocksdbdata/checkpoints/<checkpoint_dir> .
      
      #2. rsync the backup file to faulty replica
      sudo rsync -avhP <BackupName> <DESTINATION>
       
      # BackupName : Name of the backup file (eg: dbBckp.zip)
      # DESTINATION : Destination IP of stopped replica node  ( eg: vmbc@10.202.68.131:/vmbc/ )
      ------------------------------------------------------------------------------------------------
       
      # Execute these steps on faulty replica nodes to restore database
      #1. Remove the existing contents of rocksdbdata directory
      sudo rm -rf /mnt/data/rocksdbdata/*   
       
      #2. Unzip the content of db backup file in rocksdbdata directory                     
      sudo tar -zxvf <BackupName> -C /mnt/data/rocksdbdata/     
    4. Run the configuration command on the failed Replica node to retrieve the target node ID.

      The target node ID is the same as the principal ID in the sample code. The replica_host value is 127.0.0.1.

      cat /config/concord/config-local/deployment.config | grep -A 3 " replica:"
       
      Example:
      #Here, target node id is 1
      vmbc@1dadec1a-cfb0-44a5-bac5-bfcb8e632587-22bbb623-ec3b-424f-a6ee-d28 [ /config/concord/config-local ]# cat deployment.config | grep -A 3 " replica:"
          replica:
            - event_port: 50051
              principal_id: 0
              replica_host: 10.72.23.170
      --
          replica:
            - event_port: 50051
              principal_id: 1
              replica_host: 127.0.0.1
      --
          replica:
            - event_port: 50051
              principal_id: 2
              replica_host: 10.72.23.166
      --
          replica:
            - event_port: 50051
              principal_id: 3
              replica_host: 10.72.23.168
    5. Sanitize the failed Replica node to reset the metadata, remove the lock file, and start the Concord container.
      # Run reset metadata command
      image=$(sudo docker images --format "{{.Repository}}:{{.Tag}}" | grep "concord");sudo docker run -it --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata $image /concord/kv_blockchain_db_editor /concord/rocksdbdata resetMetadata <target_nodeId>
       
      # Remove the LOCK file
      sudo rm -f /mnt/data/rocksdbdata/LOCK
       
      # Start concord container
      sudo docker start concord
       
      # Verify that all container are up by running
      sudo docker ps -a
  2. Restore all the Replica nodes.

    As a best practice to restore all the Replica nodes, use the latest database checkpoint backup created while wedging Replica nodes.

    1. Wedge the Replica nodes.
      sudo docker exec -it operator sh -c './concop wedge stop'
      {"succ":true}
       
      # optional - verify that new db-checkpoint is created (note: Do not wait for wedge to complete)
      sudo docker exec -it operator sh -c './concop dbCheckpoint status'

      Wedging the Replica nodes creates a database checkpoint with the latest database snapshot.

    2. Stop all the Replica nodes.
      curl -X POST 127.0.0.1:8546/api/node/management?action=stop
    3. Repeat the stop operation on each Replica node in the Replica Network.
    4. Restore all the Replica nodes with a database size smaller than 64GB.
      #Execute steps on source/healthy replica node from which you are planning to restore all replica nodes
      #1. Create a tar of latest db checkpoint backup
      sudo tar -cvzf <BackupName> -C /mnt/data/rocksdbdata/checkpoints/<checkpoint_dir> .
       
      #2. rsync the db checkpoint backup file to all other replica node.
      sudo rsync -avhP <BackupName> <DESTINATION2>
      sudo rsync -avhP <BackupName> <DESTINATION3>
      sudo rsync -avhP <BackupName> <DESTINATION4>
      ------------------------------------------------
       
      # On all replicas, execute these steps to restore the DB:
      # Remove the existing contents of rocksdbdata directory
      sudo rm -rf /mnt/data/rocksdbdata/*
       
      # Unzip the content of db backup file in rocksdbdata directory
      sudo tar -zxvf <BackupName> -C /mnt/data/rocksdbdata/
    5. Restore all the Replica nodes with a database size greater than 64GB.

      Adhere to the prescribed order of procedures to avoid an error.

      #Execute below steps if database size greater than 64GB.
      #1. Run this command on all replica nodes
      sudo mkdir -p /mnt/data/temp_chkpt && sudo chown -R vmbc /mnt/data/temp_chkpt
      
      #2. On a healthy replica from where checkpoint folder needs to be copied
      sudo mv /mnt/data/rocksdbdata/checkpoints/<checkpoint_id> /mnt/data/temp_chkpt/
      
      #3. Clean rocksdb folder on all replica nodes
      sudo find /mnt/data/rocksdbdata -maxdepth 1 -name '*.sst' -delete && sudo rm -rf /mnt/data/rocksdbdata/*
      
      #4. On healthy replica node from where checkpoint is copied to other replica nodes
      sudo rsync -avh /mnt/data/temp_chkpt/<checkpoint_id> vmbc@<ip_addr>:/mnt/data/temp_chkpt/
      
      #5. On all replica nodes, move rocksdb files from temp to rocksdb folder
      sudo find /mnt/data/temp_chkpt/<checkpoint_id> -maxdepth 1  -name '*.*'  -exec sudo mv {} /mnt/data/rocksdbdata/ \; && sudo mv  /mnt/data/temp_chkpt/<checkpoint_id>/* /mnt/data/rocksdbdata/  && sudo rm -rf /mnt/data/temp_chkpt 
    6. Sanitize all the failed Replica nodes and remove the lock file.
      # Sanitize Replica data
      image=$(sudo docker images --format "{{.Repository}}:{{.Tag}}" | grep "concord");sudo docker run -it --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata $image /concord/kv_blockchain_db_editor /concord/rocksdbdata removeMetadata
      
      # Remove the gen-sec file
      sudo rm /config/concord/config-generated/gen-sec*
       
      # Remove the LOCK file
      sudo rm -rf /mnt/data/rocksdbdata/LOCK
    7. Start all the Replica nodes.
      curl -X POST 127.0.0.1:8546/api/node/management?action=start
  3. Transfer the RocksDB checkpoint-based backup data to a remote server location.
    1. Log in to the Replica node VM from which the checkpoint database data must be synchronized.
    2. Log in to the agent.
      sudo docker exec -it agent bash
    3. Verify that you can access the RocksDB database folder.
      ls /mnt/data/rocksdbdata | wc -l        
      #This command should not print 0 or error
    4. Create an encrypted password.
      echo '<password>' | openssl enc -base64 -e -aes-256-cbc -nosalt -pass file:/config/generic/identifiers.env
      # result: <encrypted_password>
       
      # Ensure that password is in single quotes in the command above
      # This password is the ssh password of the remote storage server where db-checkpoint is copied.
    5. Verify that the external server backup location is accessible using an IP address and SSH credentials.

      Adhere to the prescribed order of procedures to avoid an error.

      #1. Run now:
      ./rocksdb_data_backup_tool.sh -t /mnt/data/temp -u root -p "<encrypted_password>" -k "/mnt/data/rocksdbdata/checkpoints" -b "<ip_addr>:/<dest_dir>" & > /dev/null
       
      #2. Schedule an hourly cron job to backup checkpoints
      ./rocksdb_data_backup_tool.sh -t /mnt/data/temp -u root -p "<encrypted_password>" -k "/mnt/data/rocksdbdata/checkpoints" -b "10.202.68.44:/<dest_dir>" -s yes & > /dev/null
       
      # To check if there is an existing cron job deployed for scheduled backup, run the following command:
      crontab -l | grep rocksdb_data_backup_tool.sh
       
      # To stop current scheduled backup:
      ./rocksdb_data_backup_tool.sh -s stop
       
      # Ensure that remote backup server has rsync service running
      # Ensure that remote backup has the required storage capacity for db backup to be copied here
    6. Verify the status of the last backup and monitor the progress of current checkpoint replication on the remote server.
      # Command to check exit code of last run:
         log_path="/var/log/backup_script_logs/" && exit_log=$(ls -lrt $log_path | tail -3 | grep exit | awk '{print $9}') && cat $log_path/$exit_log
         
      # if there is an error, the command above will print a non-zero exit code else it won't print anything
      # Command to print logs from the last run:
         log_path="/var/log/backup_script_logs/" && info_log=$(ls -lrt $log_path | tail -3 | grep log | awk '{print $9}') && cat $log_path/$info_log
        
      # use the command above to check detailed info/error logs from the last run    
       
      # log_path has default value. if a custom path is set for log files while scheduling the backup, then modify the command's log_path accordingly
      # It is possible that script run is in progress when we check the status. In that case, use the tail command to check the logs as shown below:
      
      log_path="/var/log/backup_script_logs/" && exit_log=$(ls -lrt $log_path | tail -3 | grep exit | awk '{print $9}') && tail -f $log_path/$exit_log
      
      log_path="/var/log/backup_script_logs/" && info_log=$(ls -lrt $log_path | tail -3 | grep log | awk '{print $9}') && tail -f $log_path/$info_log
      #use ctrl+c to exit