VMware Blockchain 1.6 | 12 MAY 2022 | Build 234 Check for additions and updates to these release notes. |
VMware Blockchain is an enterprise-grade blockchain platform that meets the needs of business-critical multi-party workflows. The VMware Blockchain 1.6 release includes the following enhancements:
Security Improvements
Full Copy Client Data Integrity Tool
Users can initiate an integrity check on the data stored on the ObjectStore attached to the Full Copy Client. This integrity check provides proof of origination and tamper-detection in case of a dispute over historical data or suspicion of a data breach.
AWS Deployment Security Enhancements
In previous releases, default IAM roles were used for Replica and Client nodes deployed on AWS EC2 instances. To allow customers better security controls and provide specific IAM roles and policies, these specific roles can now be provided in the deployment descriptor file of the VMware Blockchain Orchestrator.
Scalability and Flexibility Enhancements
Adding Client Nodes
Previous versions of VMware Blockchain did not allow the addition of new Client nodes in new Client node groups when pruning was enabled on the blockchain. Users can now add new Client nodes in a trusted, authenticated, and automated manner to existing Client node groups and new Client node groups, regardless of the usage of pruning on the blockchain.
Recoverability Enhancements
Tamper Resistant Replica Node Back Up and Restore
Replica node backups are used for several critical purposes, including operational recovery, auditing, and disaster recovery. These backups must be protected against tampering, and operators need the ability to verify the integrity of the backups. Replica node backups are cryptographically signed in this release, allowing operators to ensure backups have not been altered. A signature validation stage precedes the restore process to ensure only valid backups are used for restoration.
Replica Node Back Up and Restore Redesign
The Replica node backup process has been restructured to leverage RocksDB snapshots to enable the creation of consistent state backups while the blockchain is online and processing transactions. With this redesign, cryptographically signed checkpoints in the BFT state machine replication protocol are used to create state backups.
Supportability Enhancements
Logging and Metrics
Logs generated within VMware blockchain contain pertinent details relevant to operations teams and minimize less relevant details to improve troubleshooting and supportability. In addition, the frequency of logs and metric generation has also been adjusted to avoid excessive data capture without impacting debuggability, thereby avoiding unnecessary disk IO and capacity consumption.
Component Versions
The supported domain versions include:
Domain | Version |
---|---|
VMware Blockchain Platform | 1.6 |
VMware Blockchain Orchestrator | 1.6 |
DAML SDK | 2.0.1 |
The VMware products and solutions discussed in this document are protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents. VMware is a registered trademark or trademark of VMware, Inc. and its subsidiaries in the United States and other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies.
Implement the clone-based upgrade process only when upgrading from VMware Blockchain 1.5 to 1.6. See the Perform Clone-Based Upgrade on vSphere or Perform Clone-Based Upgrade on AWS instructions in the Using and Managing VMware Blockchain Guide.
A staggered restart of Replica nodes might cause some nodes into state transfer or, in a rare occurrence, to become non-functional
When Replica nodes are restarted one after the other, some Replica nodes might enter into state transfer which does not complete. In certain rare circumstances, the blockchain might become non-functional due to the Replica nodes not agreeing to a single view.
When a deployed VM is created, the VM might not have a private static IP address due to a race condition
In a rare occurrence, a deployed VM might not have a private static IP address when it is created. The problem occurs due to a race condition in the underlying Photon OS.
In rare cases, VMware Blockchain deployments on vSphere 6.7 might fail
On-premises VMware Blockchain Replica and Client node deployments on vSphere 6.7 can potentially enter a CPU lock during the initial boot.
Workaround: If you experience this behavior, install and configure vSphere 7.0 for your on-premises environment with blockchain deployments.
Concord container fails after a few days of running because batch requests cannot reach pre-processing consensus
In rare cases, if one of the batch requests has not reached the pre-execution consensus, the entire batch is canceled. When one of the batch requests is in the middle of pre-processing, it cannot be canceled, and users must wait until the processing completes. This missing validation causes the Concord container process to fail.
Workaround: None. The agent restarts the Concord container when this problem occurs to fix the error.
Small form factor for Client node groups in AWS deployment is not supported
With the introduction of client services, a Client node requires 32GB minimum memory allocation. The small form factor uses the M4.xlarge instance type, which provides 16GB of memory.
Workaround: Update the clientNodeSpec parameter values with M4.2xLarge in the deployment descriptor for AWS deployments that require smaller provisions.
State Transfer delay hinders Replica nodes from restarting and catching up with other Replica nodes
The State Transfer process is slow due to storage and CPU resource shortage. As a result, a restarted Replica node cannot catch up with other Replica nodes and fails.
Workaround: Use the backup process to restore the failed Replica node.
Fluctuations in transactions per second (TPS) might destabilize some Client nodes
In some cases, after the blockchain ran for several hours, fluctuations in TPS might be observed for some Client nodes. After that, the load stabilizes and continues with slight drops in TPS.
Workaround: None
Primary Concord container fails after a few hours or days of run
In rare cases, the information required by the pre-processor thread gets modified by the consensus thread, which causes the Concord container process to fail.
Workaround: None. The agent restarts the Concord container when this problem occurs to fix the error.
Due to a RocksDB size calculation error, the oldest database checkpoints are removed even when adequate disk space is available
A known calculation error in the database checkpoints causes over-estimation of RocksDB size resulting in the oldest database checkpoints being removed to ensure RocksDB internal operations do not fail because of insufficient disk space. However, the database checkpoint removal is not required because adequate disk space is available.
Workaround: Configure your system to allocate more disk space for the Concord container.
For example, if your system is configured to retain two database checkpoints, the total disk space allotted for the Concord container must be greater than six times the RocksDB size. On the other hand, if your configuration retains one database checkpoint, the total disk space allotted for the Concord container must be greater than four times the RocksDB size.
Large state snapshot requests to create a checkpoint time out
The Daml ledger API sends state snapshot requests to the database Checkpoint manager to create a checkpoint. The checkpointing takes 18 seconds or more for large databases, and this delay causes a timeout.
Workaround: Restart the Daml ledger API after 30 seconds.
Assertion fails on State Transfer source while sending messages to a State Transfer destination
Οn a rare occasion, when two destination Replica nodes request blocks with overlapping ranges, prefetch capability is enabled on the source.
For example, when a destination Replica-node-1 requests blocks between 500 and 750, the source prefetches blocks 751-800. When another destination, Replica-node-2, requests blocks between 751 and 900, the source prefetch is considered valid, and the assertion fails while sending blocks to destination Replica-node-2.
Workaround: Use the backup and restore operation to avoid requesting and receiving overlapping block ranges for Replica nodes.
Client nodes cannot resynchronize data when the data size is greater than 5GB
Client nodes cannot resynchronize data from the Replica nodes when the data is greater than 5GB, and the data folder is removed due to data corruption. Therefore, any .dar file uploads cause the Client node Daml Ledger API to fail.
Workaround: Complete the applicable steps:
/mnt/data/db
folder. Wavefront metrics do not appear after the network interface is deactivated and re-enabled on Replica nodes
This problem was observed while executing a test case that explicitly deactivated the network interface on Replica nodes and rarely manifested in a production environment.
Workaround: Restart the telegraf container by running the command, docker restart telegraf
for the metrics to appear in Wavefront.