During a clone-based upgrade, you create a clone blockchain to install the latest version of the product. The existing Replica and Client nodes data is backed up and migrated to the cloned blockchain.
Deployed Full Copy Client nodes do not need to back up data during the clone-based upgrade process.
The cloned blockchain configuration in the deployment descriptor creates the following components during deployment.
Replica and Client nodes in the new topology
Containers but does not start them, except for the agent container
Downloads all the required images and configurations for each blockchain node
Prerequisites
Familiarize yourself with the upgrade workflow. See Considerations for Upgrading VMware Blockchain Nodes on vSphere.
- Verify that you have the deployed blockchain ID information.
- Familiarize yourself with the backup and restore consideration for your VMware Blockchain nodes. See VMware Blockchain Node Backup and Restore Considerations on vSphere.
- Verify that you have access to the latest version of VMware Blockchain.
Verify that you have a backup of the Replica Network and one Client node backup per Client node group. See Back-Up Replica Nodes on vSphere and Back-Up Client Node on vSphere.
Verify that you capture the IP addresses of all the Replica and Client node VMs, Client node group name and group ID name, and Daml database password details. You can find the information in the VMware Blockchain Orchestrator output file of the original deployment.
Procedure
- In the infrastructure descriptor file, configure the following parameters.
Set the new upgrade VMware Blockchain version.
Verify that the generateDamlPassword parameter must not be set to true.
You can use the deployed blockchain ID or set a new blockchain ID.
Sample infrastructure_upgrade_clone.json file to deploy the cloned blockchain nodes.
{ "organization": { "damlSdk": "2.0.1", "dockerImage": "1.6.0.0.234" }, "zones": [ { "name": "Zone-1", "vCenter": { "url": "https://vcenter.com/", "userName": "admin@vcenter.com", "password": "password", "resourcePool": "Compute-ResourcePool", "storage": "Datastore-1", "folder": "folder-1" }, "network": { "name": "network-1", "gateway": "10.10.10.1", "subnet": 24, "nameServers": [ "10.10.10.252", "10.10.10.253" ] }, "containerRegistry": { "url": "https://vmwaresaas.jfrog.io", "userName": "user-name", "password": "password", "tlsCertificateData": "" }, "wavefront": { "url": "https://vmware.wavefront.com", "token": "token" }, "logManagement": [] } ] } - Configure the deployment descriptor parameters for cloning.
See Configuring the Deployment Descriptor Parameters for Cloning on vSphere.
Sample deployment_upgrade_clone.json file to deploy the cloned blockchain nodes.
{ "blockchain": { "consortiumName": "bc-upgrade", "blockchainType": "DAML", "blockchainId": "d7c6fc22-aec0-4bf4-b233-fcc7bef0a328" }, "populatedReplicas": [ { "zoneName": "Zone-1", "providedIp": "10.10.10.217" }, { "zoneName": "Zone-1", "providedIp": "10.10.10.218" }, { "zoneName": "Zone-1", "providedIp": "10.72.10.219" }, { "zoneName": "Zone-1", "providedIp": "10.72.10.220" } ], "replicaNodeSpec": { "cpuCount": 4, "memoryGb": 32, "diskSizeGb": 64 }, "populatedFullCopyClients": [ { "providedIp": "10.10.10.222", "zoneName": "Zone-1", "accessKey": "user", "bucketName": "bucket-1", "protocol": "HTTP", "secretKey": "<password>", "url": "10.10.10.252:9000" } ], "fullCopyClientNodeSpec": { "cpuCount": 4, "memoryGb": 32, "diskSizeGb": 64 }, "populatedClients": [ { "zoneName": "Zone-1", "providedIp": "10.10.10.221", "clientGroupId" : "p5289815c_6f2b_4447_a058_3409ef0826b6", "damlDbPassword": "<password>", "groupName" : "Group1" } ], "clientNodeSpec": { "cpuCount": 4, "memoryGb": 32, "diskSizeGb": 64 }, "operatorSpecifications": { "operatorPublicKey": "-----BEGIN PUBLIC KEY----- \nMFkwEwYHKoZ\n -----END PUBLIC KEY-----\n" } } - (Optional) Enable vCPU pinning on the new cloned blockchain.
See Advanced Features Parameters, Configuring the Infrastructure Descriptor Parameters on vSphere.
- Perform the following validation tasks.
Verify that the underlying infrastructure can accommodate a second blockchain deployment with the same storage size.
If you enable vCPU pinning during an upgrade, verify that the underlying infrastructure can accommodate 128 vCPUs for each Replica node VM for higher performance.
Check whether the remote backup location has available space to transfer backup data.
Validate that the IP addresses for the second blockchain are available and accurately listed in the deployment descriptor JSON files.
- Stop all the applications that invoke connection requests to the Daml Ledger.
- Stop the Client node components.
sudo curl -X POST 127.0.0.1:8546/api/node/management?action=stop
vmbc@localhost [ ~ ]# sudo curl -X POST 127.0.0.1:8546/api/node/management?action=stop vmbc@localhost [ ~ ]# sudo docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 218a1bdaddd6 vmwaresaas.jfrog.io/vmwblockchain/operator:1.6.0.0.234 "/operator/operator_…" 18 hours ago Up 18 hours operator cd476a6b3d6c vmwaresaas.jfrog.io/vmwblockchain/agent:1.6.0.0.234 "java -jar node-agen…" 18 hours ago Up 18 hours 127.0.0.1:8546->8546/tcp agent vmbc@localhost [ ~ ]#
- Repeat the stop operation on each Client node in the Client group.
- Verify that all the containers except the
agentand deployedoperatorcontainer are stopped.sudo docker ps -a
If the sudo docker ps -a command shows that some containers, with the exception agent and deployed operator container, are still running, rerun the command or use the sudo docker stop <container_name> command to stop the containers.
- On the destination Client node, change ownership.
sudo chown -R vmbc <destination_folder> - Back up the data on each of the Client nodes in the group.
sudo tar cvzf <backup_name> /mnt/data/db #For data greater than 64GB cd /mnt/data/ sudo nohup tar cvzf <BackupName> db & tail -f nohup.out #wait for the tar to complete
The rsync command might time out due to SSH inactivity. Incrementally rerun the command.
- (Optional) If the operator containers were deployed, pause all the Replica nodes at the same checkpoint from the operator container and check the status periodically until all the Replica nodes' status is true.
Any blockchain node or nodes in state transfer or down for other reasons cause the wedge status command to return false. The wedge status command returns true when state transfer completes and all Replica nodes are healthy, allowing all Replica nodes to stop at the same checkpoint successfully.
Wedge command might take some time to complete. The metrics dashboards indicate nodes that have stopped processing blocks as they have been wedged. If you notice a false report in the dashboard, contact the VMware Blockchain support to diagnose the Replica nodes experiencing the problem.
sudo docker exec -it operator sh -c './concop wedge stop' {"succ":true} sudo docker exec -it operator sh -c './concop wedge status' {"192.168.100.107":true,"192.168.100.108":true,"192.168.100.109":true,"192.168.100.110":true}./concop wedge stop # Stop all replicas on the next checkpoint {'additional_data': 'set stop flag', 'succ': True} or {'succ': False} ./concop wedge status # Check the wedge status of the replicas list Keep trying the status command periodically until all replicas return true. - Stop the Replica node.
sudo curl -X POST 127.0.0.1:8546/api/node/management?action=stop
vmbc@localhost [ ~ ]# sudo curl -X POST 127.0.0.1:8546/api/node/management?action=stop vmbc@localhost [ ~ ]# sudo docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 3b7135c677cf vmwaresaas.jfrog.io/vmwblockchain/agent:1.6.0.0.234 "java -jar node-agen…" 20 hours ago Up 20 hours 127.0.0.1:8546->8546/tcp agent
- Repeat the stop operation on each Replica node in the Replica Network.
- Stop the Full Copy Client node.
sudo curl -X POST 127.0.0.1:8546/api/node/management?action=stop
vmbc@localhost [ ~ ]# curl -X POST 127.0.0.1:8546/api/node/management?action=stop vmbc@localhost [ ~ ]# sudo docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 3b7135c677cf vmwaresaas.jfrog.io/vmwblockchain/agent:1.6.0.0.234 "java -jar node-agen…" 20 hours ago Up 20 hours 127.0.0.1:8546->8546/tcp agent
- Repeat the stop operation on all the Full Copy Client nodes.
- Verify that all the containers except for the
agentare stopped.sudo docker ps -a
If the sudo docker ps -a command shows that some containers beside the agent are running, then rerun the command or use the sudo docker stop <container_name> command to stop the containers.
- Check that all the Replica nodes are stopped in the same state.
Verifying that the LastReacheableBlockID and LastBlockID sequence number of each Replica node stopped helps determine if any nodes lag.
If there is a lag when you power on the Replica Network, some Replica nodes in the state-transfer mode might have to catch up. Otherwise, it can result in a failed consensus and require restoring each Replica node from the latest single copy.
sudo docker run -it --rm --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata <ImageName> /concord/kv_blockchain_db_editor /concord/rocksdbdata getLastBlockID sudo docker run -it --rm --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata <image_name> /concord/kv_blockchain_db_editor /concord/rocksdbdata getLastReachableBlockID
The <image_name> is the Concord-core image name in the blockchain.
vmwaresaas.jfrog.io/vmwblockchain/concord-core:1.6.0.0.234
- On the destination Replica node, change ownership.
sudo chown -R vmbc <destination_folder> - Back up the data on each of the Replica nodes.
sudo tar cvzf <backup_name> /mnt/data/db #For data greater than 64GB cd /mnt/data/ sudo nohup tar cvzf <BackupName> db & tail -f nohup.out #wait for the tar to complete
The rsync command might time out due to SSH inactivity. Incrementally rerun the command.
- Verify that the persephone-provisioning/config-service/blockchain-content-library-server containers are running on version 1.6.0.0.234.
If required, start the blockchain service with the sudo systemctl start blockchain.service command.
$ sudo docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES b41adc6dd0f7 vmwaresaas.jfrog.io/vmwblockchain/persephone-provisioning:cl-nginx-blockchain-1.6.0.0.234 "nginx -g 'daemon of…" 2 days ago Up 2 days 0.0.0.0:8083->80/tcp orchestrator-runtime_blockchain-content-library-server_1 48de4a8d570f vmwaresaas.jfrog.io/vmwblockchain/persephone-configuration:1.6.0.0.234 "java -Dspring.confi…" 2 days ago Up 2 days 0.0.0.0:9003->9003/tcp, 0.0.0.0:8000->9023/tcp orchestrator-runtime_config-service_1 346a553ee917 vmwaresaas.jfrog.io/vmwblockchain/persephone-provisioning:1.6.0.0.234 "java -Dspring.confi…" 2 days ago Up 2 days 8000/tcp, 0.0.0.0:9002->9002/tcp orchestrator-runtime_persephone-provisioning_1
- Define the deployment type as CLONE in the ORCHESTRATOR_DEPLOYMENT_TYPE environment parameter and verify that the required parameters are specified correctly in the docker-compose-orchestrator.yml file.
Sample parameters in the docker-compose-orchestrator.yml file.
export ORCHESTRATOR_DEPLOYMENT_TYPE=CLONE export CONFIG_SERVICE_IP=<config_service_ip_address> export ORCHESTRATOR_DESCRIPTORS_DIR=/home/blockchain/descriptors export ORCHESTRATOR_OUTPUT_DIR=/home/blockchain/output export INFRA_DESC_FILENAME=infrastructure_upgrade_clone.json export DEPLOY_DESC_FILENAME=deployment_upgrade_clone.json
- Run sudo docker ps -a to verify that the persephone-provisioning/config-service containers are running on version 1.6.0.0.234.
If needed, start the blockchain service with the sudo systemctl start blockchain.service command.
descriptors]$ sudo docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7e95342da36c vmwaresaas.jfrog.io/vmwblockchain/persephone-provisioning:1.6.0.0.234 "java -Dspring.confi…" About a minute ago Up About a minute 8000/tcp, 0.0.0.0:9002->9002/tcp, :::9002->9002/tcp orchestrator-runtime_persephone-provisioning_1 007a74bb5c76 vmwaresaas.jfrog.io/vmwblockchain/persephone-configuration:1.6.0.0.234 "java -Dspring.confi…" About a minute ago Up About a minute 0.0.0.0:9003->9003/tcp, :::9003->9003/tcp, 0.0.0.0:8000->9023/tcp orchestrator-runtime_config-service_1
- Run the VMware Blockchain Orchestrator deployment script.
sudo docker-compose -f docker-compose-orchestrator.yml up
- Restore the backup data on each Client node.
sudo rm -rf /mnt/data/db #1: change the owner of /mnt/data to vmbc user in new client sudo chown vmbc:users /mnt/data #2: Copy the backup data from the old client to new client rsync -avh /mnt/data/<BackupName> <destination> #Untar the backup DB in the new client for small DB sudo tar xvzf <BackupName> --directory / #Untar the backup DB in the new client for large DB cd /mnt/data/ sudo nohup tar xvzf db-backup.tar.gz --directory . &
Use the backup that you created recently. The <backup_name> must end in .tar.gz. For example,
rsync -avh /mnt/data/db-backup.tar.gz vmbc@10.10.10.5:/mnt/data/ - Restore the backup data on each Replica node.
sudo rm -rf /mnt/data/rocksdbdata/ #1: change the owner of /mnt/data to vmbc user in new client sudo chown vmbc:users /mnt/data #2: Copy the backup data from the old client to new client rsync -avh /mnt/data/<BackupName> <destination> #Untar the backup DB in the new client for small DB sudo tar xvzf <BackupName> --directory / #Untar the backup DB in the new client for large DB cd /mnt/data/ sudo nohup tar xvzf db-backup.tar.gz --directory . &
Use the backup that you created recently. The <backup_name> must end in .tar.gz. For example,
rsync -avh /mnt/data/db-backup.tar.gz vmbc@10.10.10.5:/mnt/data/ - (Optional) Sanitize the Replica node RocksDB data.
sudo docker run -it --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata <image_name> /concord/kv_blockchain_db_editor /concord/rocksdbdata removeMetadata rm /mnt/data/rocksdbdata/LOCK
The <image_name> is the Concord-core image name in the blockchain.
vmwaresaas.jfrog.io/vmwblockchain/concord-core:1.6.0.0.234
- Migrate the data using the RocksDB migration tool.
Adhere to the prescribed order of procedures to avoid a migration error.
#if the number of CPUs is 8 then the number of threads is 16. For other CPUs we have to calculate the number of threads sudo swapoff -a sudo docker run -d -v /mnt/data/rocksdbdata:/concord/rocksdbdata <ImageName> bash -c 'mkdir /concord/rocksdbdata/temp && /concord/block_merkle_latest_ver_cf_migration_tool --rocksdb-path /concord/rocksdbdata --temp-export-path /concord/rocksdbdata/temp --parallel-block-reads 16' # Success condition: 1) The container must be in Exited(0) state in sudo docker container ls -a command 2) Wait for "Success! Migration executed successfully!" message in the docker logs for the container. #After this, remove the container other than agent: sudo docker rm $(sudo docker ps --filter status=exited -q) sudo docker run -d --name=daml_execution_engine --network=blockchain-fabric blockchain-docker-internal.artifactory.eng.vmware.com/vmwblockchain/daml-execution-engine:X.X.X.Xsudo docker run --network blockchain-fabric -v /mnt/data/rocksdbdata:/concord/rocksdbdata -it blockchain-docker-internal.artifactory.eng.vmware.com/vmwblockchain/concord-core:X.X.X.X bash -c '/concord/acl_migration --point-lookup-threads 16 --rocksdb-path /concord/rocksdbdata --acl-migration-service-addr "daml_execution_engine:55000"' #Enable swap sudo swapon -a # Success condition: 1) The container should be in Exited(0) state in sudo docker container ls -a command 2) Wait for "|INFO ||concord.migration.acl||||acl_migration.cpp:108|migrateAcls|Successfully completed ACL Migration !!!" message in the docker logs for the container. #After this, remove the container other than agent: sudo docker stop daml_execution_engine sudo docker rm $(sudo docker ps --filter status=exited -q)
- Change the COMPONENT_NO_LAUNCH parameter in the /config/agent/config.json file to False on all the Replica, Full Copy Client, and Client nodes.
sudo sed -i 's/"COMPONENT_NO_LAUNCH": "True"/"COMPONENT_NO_LAUNCH": "False"/g' /config/agent/config.json
- Start all the Replica, Full Copy Client, Client nodes, and agent using the same command.
curl -X POST 127.0.0.1:8546/api/node/management?action=start
- Monitor the deployed VMware Blockchain nodes' health and check whether new blocks are added to the Daml Ledger from the logs and metrics for about five minutes.
sudo docker exec -it telegraf curl -s http://concord:9891/metrics | grep -ia last_block | tail -1 sudo docker exec -it concord sh -c './concord-ctl status get state-transfer' | grep Fetching sudo docker exec -it concord sh -c './concord-ctl status get replica' | grep -E 'lastStableSeqNum|curView'
- After validating that the newly cloned deployment is healthy and fully functional, shut down the initial blockchain deployment.
Note:
Confirm that the new deployment is functioning properly. You cannot recover a deleted blockchain deployment.
- Delete the initial blockchain deployment to recover the storage resources.