Perform Clone-Based Upgrade

During a clone-based upgrade, you create a clone blockchain to install the latest version of the product. The existing Replica and Replica nodes data is backed up and migrated to the cloned blockchain.

Note:

When you upgrade from VMware Blockchain 1.2 to 1.3.0.1, implement the clone-based upgrade process.

With a clone-based upgrade, you cannot revert to the pre-upgraded version of the product. If there is an upgrade failure, you can use the backup data that was captured before the upgrade to restore your blockchain nodes.

Deployed Full Copy Client nodes do not need to back up data during the clone-based upgrade process.

Prerequisites

Familiarize yourself with the upgrade workflow. See Considerations for Upgrading VMware Blockchain Nodes.
Verify that you have the deployed blockchain ID information.
Familiarize yourself with the backup and restore consideration for your VMware Blockchain nodes. See VMware Blockchain Node Backup and Restore Considerations on vSphere.
Verify that you have access to the latest version of VMware Blockchain.
Verify that you capture the IP addresses of all the Replica and Client node VMs, Client node group name and group ID name, and DAML database password details. You can find the information in the VMware Blockchain Orchestrator output file of the original deployment.
Verify that you have a backup of the Replica Network and one Client node backup per Client node group. See Back-Up VMware Blockchain Nodes.

Procedure

In the infrastructure schema file, define the new upgrade VMware Blockchain version and the generateDamlPassword parameter must not be set to true.

You can use the deployed Blockchain ID or set a new Blockchain ID.
Configure the deployment schema parameters for cloning.

See Understanding the Deployment Schema Parameters for Cloning.
Perform the following validation tasks.
- Verify that the underlying infrastructure can accommodate a second blockchain deployment with the same storage size. If you enable vCPU pinning during an upgrade, verify that the underlying infrastructure can accommodate 96 vCPUs for each Replica node VM for higher performance.
- Check whether the remote backup location has available space to transfer backup data.
- Validate that the IP addresses for the second blockchain are available and accurately listed in the deployment descriptor JSON files.
Stop all the applications that invoke connection requests to the DAML Ledger.

Stop the Client node.

curl -X POST 127.0.0.1:8546/api/node/management?action=stop

root@localhost [ ~ ]# curl -X POST 127.0.0.1:8546/api/node/management?action=stop root@localhost [ ~ ]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 218a1bdaddd6 vmwaresaas.jfrog.io/vmwblockchain/operator:1.2.0.1.91  "/operator/operator_…" 18 hours ago Up 18 hours operator cd476a6b3d6c vmwaresaas.jfrog.io/vmwblockchain/agent:1.2.0.1.91  "java -jar node-agen…" 18 hours ago Up 18 hours 127.0.0.1:8546->8546/tcp agent root@localhost [ ~ ]#

Repeat the stop operation on each Client node in the Client group.
Verify that all the containers except for the agent and deployed operator container are stopped.

docker ps -a

If the docker ps -a command shows that some containers, with the exception agent and deployed operator container, are still running, then rerun the command or use the docker stop <container_name> command to stop the containers.
Back up the data on each of the Client nodes in the group.
```
tar cvzf <backup_name> /mnt/data/db 
#For data greater than 64GB 
rsync -avh /mnt/data/db <destination_folder> 
```
The rsync command might time out due to SSH inactivity, incrementally rerun the command.
(Optional) If the operator containers were deployed, pause all the Replica nodes at the same checkpoint from the operator container and check the status periodically until all the Replica nodes' status is true.
Any blockchain node or nodes in state transfer or down for other reasons cause the wedge status command to return false. The wedge status command returns true when state transfer completes and all Replica nodes are healthy, allowing all Replica nodes to stop at the same checkpoint successfully.

Wedge command might take some time to complete. The metrics dashboards indicate nodes that have stopped processing blocks as they have been wedged. If you notice a false report in the dashboard, contact the VMware Blockchain support to diagnose the Replica nodes experiencing the problem.
```
docker exec -it operator sh -c './concop wedge stop' {"succ":true} docker exec -it operator sh -c './concop wedge status' {"192.168.100.107":true,"192.168.100.108":true,"192.168.100.109":true,"192.168.100.110":true} 
```

Stop the Replica node.

curl -X POST 127.0.0.1:8546/api/node/management?action=stop

root@localhost [ ~ ]# curl -X POST 127.0.0.1:8546/api/node/management?action=stop
root@localhost [ ~ ]# docker ps -a
CONTAINER ID        IMAGE                                                                  COMMAND                  CREATED             STATUS              PORTS                      NAMES
3b7135c677cf        vmwaresaas.jfrog.io/vmwblockchain/agent:1.2.0.1.91    "java -jar node-agen…"   20 hours ago        Up 20 hours         127.0.0.1:8546->8546/tcp   agent

Repeat the stop operation on each Replica node in the Replica Network.

Stop the Full Copy Client node.

curl -X POST 127.0.0.1:8546/api/node/management?action=stop

root@localhost [ ~ ]# curl -X POST 127.0.0.1:8546/api/node/management?action=stop
root@localhost [ ~ ]# docker ps -a
CONTAINER ID        IMAGE                                                                  COMMAND                  CREATED             STATUS              PORTS                      NAMES
3b7135c677cf        vmwaresaas.jfrog.io/vmwblockchain/agent:1.2.0.1.91    "java -jar node-agen…"   20 hours ago        Up 20 hours         127.0.0.1:8546->8546/tcp   agent

Repeat the stop operation on all the Full Copy Client nodes.
Verify that all the containers except for the agent are stopped.

docker ps -a

If the docker ps -a command shows that some containers beside the agent are running, then rerun the command or use the docker stop <container_name> command to stop the containers.
Check that all the Replica nodes are stopped in the same state.
Verifying that the LastReacheableBlockID and LastBlockID sequence number of each Replica node stopped helps determine if any nodes lag.

If there is a lag when you power on the Replica Network, some Replica nodes in the state-transfer mode might have to catch up. Otherwise, it can result in a failed consensus and require restoring each Replica node from the latest single copy.
```
docker run -it --rm --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata <ImageName> /concord/kv_blockchain_db_editor /concord/rocksdbdata getLastBlockID
docker run -it --rm --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata <image_name> /concord/kv_blockchain_db_editor /concord/rocksdbdata getLastReachableBlockID
```
The <image_name> is the Concord-core image name in the blockchain.
vmwaresaas.jfrog.io/vmwblockchain/concord-core:1.2.0.1.91
Back up the data on each of the Replica nodes.
```
tar cvzf <backup_name> /mnt/data/db 
#For data greater than 64GB 
rsync -avh /mnt/data/db <destination_folder> 
```
The rsync command might time out due to SSH inactivity, incrementally rerun the command.

Verify that the persephone-provisioning/config-service/blockchain-content-library-server containers are running on version 1.3.0.0.49.

If required, start the blockchain service with the sudo systemctl start blockchain.service command.

$ docker ps
CONTAINER ID        IMAGE                                                                                            COMMAND                  CREATED             STATUS              PORTS                                            NAMES
b41adc6dd0f7        vmwaresaas.jfrog.io/vmwblockchain/persephone-provisioning:cl-nginx-blockchain-1.3   "nginx -g 'daemon of…"   2 days ago          Up 2 days           0.0.0.0:8083->80/tcp                             orchestrator-runtime_blockchain-content-library-server_1
48de4a8d570f        vmwaresaas.jfrog.io/vmwblockchain/persephone-configuration:1.3.0.0.49               "java -Dspring.confi…"   2 days ago          Up 2 days           0.0.0.0:9003->9003/tcp, 0.0.0.0:8000->9023/tcp   orchestrator-runtime_config-service_1
346a553ee917        vmwaresaas.jfrog.io/vmwblockchain/persephone-provisioning:1.3.0.0.49                "java -Dspring.confi…"   2 days ago          Up 2 days           8000/tcp, 0.0.0.0:9002->9002/tcp                 orchestrator-runtime_persephone-provisioning_1

Define the deployment type as CLONE in ORCHESTRATOR_DEPLOYMENT_TYPE environment parameter and verify that the required parameters are specified correctly in the docker-compose-orchestrator.yml file.

Sample parameters in the docker-compose-orchestrator.yml file.

export ORCHESTRATOR_DEPLOYMENT_TYPE=CLONE
export CONFIG_SERVICE_IP=<config_service_ip_address>
export ORCHESTRATOR_DESCRIPTORS_DIR=/home/blockchain/descriptors 
export ORCHESTRATOR_OUTPUT_DIR=/home/blockchain/output
export INFRA_DESC_FILENAME=infrastructure_clone.json
export DEPLOY_DESC_FILENAME=deployment_clone.json

Run the VMware Blockchain Orchestrator deployment script.

docker-compose -f docker-compose-orchestrator.yml up

Restore the backup data on each Client node.

rm -rf /mnt/data/db
tar xvzf <backup_nam> --directory /

Restore the backup data on each Replica node.

rm -rf /mnt/data/rocksdbdata/
tar xvzf <backup_name> --directory /

(Optional) Sanitize the Replica node RocksDB data.

docker run -it --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata <image_name> /concord/kv_blockchain_db_editor /concord/rocksdbdata removeMetadata

The <image_name> is the Concord-core image name in the blockchain.

vmwaresaas.jfrog.io/vmwblockchain/concord-core:1.3.0.0.49

Change the COMPONENT_NO_LAUNCH parameter in the /config/agent/config.json file to False on all the Replica, Full Copy Client, and Client nodes.
```
sed -i 's/"COMPONENT_NO_LAUNCH": "True"/"COMPONENT_NO_LAUNCH": "False"/g' /config/agent/config.json 
```

Start all the Replica, Full Copy Client, Client nodes using the same command.

curl -X POST 127.0.0.1:8546/api/node/management?action=start
docker restart agent

Monitor the deployed VMware Blockchain nodes' health and check whether new blocks are added to the DAML Ledger from the logs and metrics for about five minutes.

docker exec -it telegraf curl -s http://concord:9891/metrics | grep -ia last_block | tail -1
docker exec -it concord sh -c './concord-ctl status get state-transfer' | grep Fetching
docker exec -it concord sh -c './concord-ctl status get replica' | grep -E 'lastStableSeqNum|curView'
docker logs --since 1m -f concord | grep -ia addBlock | cut -d '|' -f 3,10

After validating that the newly cloned deployment is healthy, shut down the original blockchain deployment.

Note:
Confirm that the new deployment is functioning properly. You cannot recover a deleted blockchain deployment.
Delete the initial blockchain deployment to recover the storage resources.