Perform Clone-Based Upgrade on AWS

During a clone-based upgrade, you create a clone blockchain to install the latest version of the product. The existing Replica and Client nodes data is backed up and migrated to the cloned blockchain.

The cloned blockchain configuration in the deployment descriptor creates the following components during deployment.

Replica and Client nodes in the new topology
Containers but does not start them, except for the agent container
Downloads all the required images and configurations for each blockchain node

Prerequisites

Familiarize yourself with the upgrade workflow. See Considerations for Upgrading VMware Blockchain Nodes on AWS.
Verify that you have the deployed blockchain ID information.
Familiarize yourself with the backup and restore consideration for your VMware Blockchain nodes. See VMware Blockchain Node Backup and Restore Considerations on AWS.
Verify that you have access to the latest version of VMware Blockchain.
Verify that you capture the IP addresses of all the Replica and Client node VMs, Client node group name and group ID name, and Daml database password details. You can find the information in the VMware Blockchain Orchestrator output file of the original deployment.
If the Concord operator containers were deployed, verify that the Concord operator container is running. See Instantiate the operator container.

Procedure

In the infrastructure descriptor file, configure the following parameters.

Specify the new upgrade VMware Blockchain version in the blockchainVersion parameter.
Verify that the generateDamlPassword parameter is not set to True.
You can use the deployed blockchain ID or set a new blockchain ID.
Validate that the PERFORM_CONCORD_METADATA_CLEANUP parameter under advancedFeatures is set to True for cloning.
See the Advanced Features Parameters for a sample configuration.

Sample aws_infrastructure_upgrade_clone.json file to deploy the cloned blockchain nodes.

{
    "organization": {
        "generatePassword": false,
        "damlSdkVersion": "2.4.0",
        "blockchainVersion": "1.8.0.0.53",
        "advancedFeatures" : {
            "PERFORM_CONCORD_METADATA_CLEANUP" :  "True",
            "ENABLE_DAML_OUTGOING_TLS" : "True"
            "ENABLE_TELEGRAF_PULL_TLS" : "True"
        }
      },
    "zones": [
        {
            "name": "zone-A",
            "region": "us-east-1",
            "credentials": {
              "accessKeyId": "ASIA2VDCH3NJGVOJZRCD",
              "secretAccessKey": "<secret_access_key>",
              "sessionToken": "IQoJb3JpZ2luX2VjEA0aCXVzLWVhc3QtMSJHMEUCIQD8Q1yB3WMvSyT4kAY5VogKi4tqE/6VOdg3zfoLT1UvYAIgMURS0OG/Bk4cckVYVzUnjNrkZavVA22oR0c7vAIk16IqqQIIxf//////////ARABGgw3MzI0OTc5NTk3NjIiDBa7fQhevadaD4uUQir9Af54o7f0ZbXG9BoeIWrHdTQndcJhkeNL2+uyEpK6RE1CXcfH8Z+8tihsiGNS9cVPOxOojSAKgbult3+om1sZ1pI4yzPWza4eNv5ogaBXZAYw6803qgrKS0V5ywaGXQYaponEEOYVnFbzCMv1oYvGrKKZOGwYynjLeHiYN/CY/qZP/NBAAo7BZkq8IjrEgji2w9s5NqzzGE6scUL08KJOPFpCk414KJqmjYosNkwNuGOTPwgGKzZqYvecR1I3lUh1YNDuQnGYcl8bKEY2SIFchLb2CZGrKMGszWHHb+k9fUuD3d56gAkHeBBW1lg5/2IXua7/2XkO77mIvghLzQEwp8umkwY6nQFsQEqzoOdMkJ2vCQViPnmDqstirQygIi7KRdLHEdfp1rtaibOcL6khrQpRbcuE+hJRQ26R/fYMn5di+Ssh0yoFYmMurDDjbNuXOYlsOAi3KWpv0YKdd6Pj3PTVsM3Pa2gy01WwQ8+wCl4LN7WbOuIdKWjePPMxIW6rSRt39HTJsn5tTX6ig4ifsdUJbYfRNt9twgV/3pf34VD1kaTe"
        },
            "network": {
                "subnetId": "subnet-2542cf14",
                "securityGroupIds": ["sg-97024493"]
            },
            "containerRegistry": {
                "url": "https://vmwaresaas.jfrog.io",
                "userName": "vmbc-ro-token",
            "password": "<password>"
            },
            "wavefront": {
                "url": "https://vmware.wavefront.com/api/",
                "token": "90e3b381-6b69-41ce-9f4f-d08a9d72d68b"
            },
            "logManagement": [
              {
                  "type": "AWS_CLOUDWATCH",
                  "cloudwatchLogConfig": {
                      "region": "us-east-1",
                      "logGroupName": "log-group",
                      "logStreamName": "log-stream"
                  }
              }
            ]
        }
    ]
}

Configure the deployment descriptor parameters for cloning.

See Configuring the Deployment Descriptor Parameters on AWS.

Sample aws_deployment_upgrade_clone.json file to deploy the cloned blockchain nodes.

{
    "populatedReplicas": [
        {
            "zoneName": "orchestrator-zone-A",
            "keyName": "keyname",
            "snapshotId": "snap-04793f30cc9108473"
        },
        {
            "zoneName": "orchestrator-zone-A",
            "keyName": "keyname",
            "snapshotId": "snap-04793f30cc9108473"
        },
        {
            "zoneName": "orchestrator-zone-A",
            "keyName": "keyname",
            "snapshotId": "snap-04793f30cc9108473"
        },
        {
            "zoneName": "orchestrator-zone-A",
            "keyName": "keyname",
            "snapshotId": "snap-04793f30cc9108473"
        }
    ],
    "populatedClients": [
        {
            "zoneName": "orchestrator-zone-A",
            "keyName": "keyname",
            "clientGroupId": "aa7455a5-3d4b-48ea-98b7-59b879fd9174",
            "groupName": "g1",
            "damlDbPassword": "fSN91E2w0hd_B-w",
            "snapshotId": "snap-04793f30cc9108473",
            "tlsLedgerData": {               
                "crt": "-----BEGIN CERTIFICATE-----\ncrtvalue\n-----END CERTIFICATE-----\n",               
                "pem": "-----BEGIN PRIVATE KEY-----\npemvalue\n-----END PRIVATE KEY-----\n",               
                "cacrt": "-----BEGIN CERTIFICATE-----\ncacrtvalue\n",               
                "clientAuth": "REQUIRE"           
            }
        }
    ],
    "operatorSpecifications": {
        "operatorPublicKey": "key"
      },
 
    "blockchain": {
        "consortiumName": "awsclone",
        "blockchainType": "DAML",
        "blockchainId": "dae8cd01-2af6-459d-8aea-6e28cc9873a7"
    },
    "replicaNodeSpec": {
        "instanceType": "m4.2xlarge",
        "diskSizeGib": 64
    },
    "clientNodeSpec": {
        "instanceType": "m4.2xlarge",
        "diskSizeGib": 64
    },
    "tags": {
        "Name": "awsclone"
    }
}

Perform the following validation tasks.
- Verify that the underlying infrastructure can accommodate a cloned blockchain deployment with the same storage size.
- Check whether the remote backup location has available space to store the backup data.
- Validate that the IP addresses for the cloned blockchain are available and accurately listed in the deployment descriptor JSON files.

Stop the Client node containers to take snapshots.

sudo curl -X POST 127.0.0.1:8546/api/node/management?action=stop

Repeat the stop operation on each Client node in the Client group.
(Optional) If the Concord operator containers were deployed, pause all the Replica nodes at the same checkpoint from the Concord operator container and check the status periodically until all the Replica nodes status is true before proceeding.
Any blockchain node or nodes in state transfer or down for other reasons cause the wedge status command status to return false. The wedge status command returns true when state transfer completes and all Replica nodes are healthy, allowing all Replica nodes to stop at the same checkpoint successfully.

Wedge command might take some time to complete. If the Wedge command times out, the system operator must execute the Wedge command again.
```
docker exec -it operator sh -c './concop wedge stop' 
docker exec -it operator sh -c './concop wedge status' 
#Keep trying the status command periodically until all replicas return false.
```

Stop the Replica node containers to take snapshots.

sudo curl -X POST 127.0.0.1:8546/api/node/management?action=stop

Repeat the stop operation on each Replica node in the Replica Network.

Check that all the Replica nodes are stopped in the same state.

Verifying that the LastReacheableBlockID and LastBlockID sequence number of each Replica node has stopped helps determine if any nodes lag.

If there is a lag when you power on the Replica Network, some Replica nodes in the state-transfer mode might have to catch up. Otherwise, it can result in a failed consensus and require restoring each Replica node from the latest copy.

# Compare Last Block ID and Last Reachable Block ID to confirm all nodes are stopped
sudo image=$(docker images --format "{{.Repository}}:{{.Tag}}" | grep "concord");docker run --rm --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata $image /concord/kv_blockchain_db_editor /concord/rocksdbdata getLastBlockID
sudo image=$(docker images --format "{{.Repository}}:{{.Tag}}" | grep "concord");docker run --rm --entrypoint="" --mount type=bind,source=/mnt/data/rocksdbdata,target=/concord/rocksdbdata $image /concord/kv_blockchain_db_editor /concord/rocksdbdata getLastReachableBlockID

The <image> is the Concord-core image name in the blockchain.

vmwaresaas.jfrog.io/vmwblockchain/concord-core:1.8.0.0.53

In the EC2 interface, select the VMware Blockchain node from the Amazon EC2 page and navigate to the Storage tab.
Select the data volume ID, navigate to the EBS volumes, and select Actions > Create Snapshot.

This step creates a snapshot of the secondary EBS volume you can use for restoring your data.
Save the snapshot ID.
Verify that the persephone-configuration and persephone-provisioning containers are running.
Verify that the PERFORM_CONCORD_METADATA_CLEANUP in the infrastructure descriptor file is configured to True for cloning.
Encrypt and redirect the infrastructure and the deployment descriptor files for added security.
1. Encrypt the infrastructure_descriptor.json file.
```
$HOME/descriptors > ansible-vault encrypt infrastructure_descriptor.json
New Vault password:
Confirm New Vault password:
Encryption successful
```
2. Encrypt the deployment_descriptor.json file.
```
$HOME/descriptors > ansible-vault encrypt deployment_descriptor.json
New Vault password:
Confirm New Vault password:
Encryption successful
```
3. Configure the two environment variable values.
  - ORCHESTRATOR_OUTPUT_DIR - The output directory where the output file is written.
  - ORCHESTRATOR_DEPLOYMENT_TYPE - Set deployment type to PROVISION.
4. Run the secure-orchestrator.sh script from the orchestrator_runtime directory.
```
ORCHESTRATOR_OUTPUT_DIR=$HOME/output ORCHESTRATOR_DEPLOYMENT_TYPE=PROVISION ./secure-orchestrator.sh
```
  The script creates temporary files.
  - /dev/shm/orchestrator-awsIGoa0JA/infra_descriptor
  - /dev/shm/orchestrator-awsIGoa0JA/deployment_descriptor
5. Redirect the decrypted infrastructure_descriptor.json to the infrastructure_descriptor file location.
  Use the vault password used to encrypt the infrastructure_descriptor.json file.
```
ansible-vault view $HOME/descriptors/infrastructure_descriptor.json > /dev/shm/orchestrator-awsIGoa0JA/infra_descriptor
```
6. Redirect the decrypted deployment_descriptor.json to the deployment_descriptor file location.
  Use the vault password used to encrypt the deployment_descriptor.json file.
```
ansible-vault view $HOME/descriptors/deployment_descriptor.json > /dev/shm/orchestrator-awsIGoa0JA/deployment_descriptor
```
  After the script completes running, the temporary files are deleted.
7. (Optional) If the script fails or the secure_orchestrator.sh script is terminated, delete the temporary folder under the /dev/shm/orchestrator-* directory.

Define the deployment type as CLONE in the ORCHESTRATOR_DEPLOYMENT_TYPE environment parameter and verify that the required parameters are specified correctly in the docker-compose-orchestrator.yml file.

Sample parameters in the docker-compose-orchestrator.yml file.

ORCHESTRATOR_DESCRIPTORS_DIR=/home/blockchain/descriptors
ORCHESTRATOR_DEPLOYMENT_PLATFORM=AWS
CONFIG_SERVICE_IP=<config_service_ip_address>
INFRA_DESC_FILENAME=aws_infrastructure_upgrade_clone.json
DEPLOY_DESC_FILENAME=aws_deployment_upgrade_clone.json 
ORCHESTRATOR_OUTPUT_DIR=/home/blockchain/output 
ORCHESTRATOR_DEPLOYMENT_TYPE=CLONE

Run the VMware Blockchain Orchestrator deployment script.

docker-compose -f docker-compose-orchestrator.yml up
SSH into each deployed blockchain node, verify that all the containers are created and only the agent container is running, and check the logs to see if the meta data cleanup is complete.

sudo ps -a
Remove the LOCK file in the RocksDB folder on the Replica nodes.
```
sudo rm /mnt/data/rocksdbdata/LOCK
```
Change the COMPONENT_NO_LAUNCH parameter in the /config/agent/config.json file to False on all the Replica and Client nodes.
```
sudo sed -i 's/"COMPONENT_NO_LAUNCH": "True"/"COMPONENT_NO_LAUNCH": "False"/g' /config/agent/config.json 
```
Start all the Replica and Client nodes using the same command.
```
docker restart agent
```
After validating that the newly cloned deployment is healthy and fully functional, shut down the original blockchain deployment.

Note:
Confirm that the new deployment is functioning properly. You cannot recover a deleted blockchain deployment.
Delete the original blockchain deployment to recover the storage resources.