VMware vSAN 7.0 Update 2 | 9 MAR 2021 | Build 17630552
Check for additions and updates to these release notes.
What's in the Release NotesThe release notes cover the following topics:
vSAN 7.0 Update 2 introduces the following new features and enhancements:
Scale without Compromise
- HCI Mesh. HCI Mesh now enables vSAN clusters to share capacity with vSphere compute-only clusters, or non-HCI based clusters. You can also specify storage rules for recommended data placement, to find a compatible datastore. Scalability for a single remote vSAN datastore has been increased to 128 hosts.
- vSAN File Service enhancements. vSAN File Service now supports stretched cluster deployments and two-node clusters. Scalability is increased to 64 hosts and 100 shares per cluster.
- Stretched cluster enhancements. A stretched cluster now can include up to 20 hosts at each site. DRS awareness of stretched clusters provides more consistent performance during failback situations.
- vSAN over RDMA. vSAN over RDMA delivers increased performance and enables you to obtain better VM consolidation ratios.
- Enhanced platform performance. Improves platform NUMA awareness to deliver increased performance.
Boost Infrastructure and Data Security
- vSphere Native Key Provider. vSAN supports vSphere Native Key Provider for built-in encryption.
- Data-in-transit encryption for vSAN File Service. Security enhancements to File Service include support for data-in-transit encryption, when File Service is enabled along with vSAN data-in-transit encryption.
- Data-in-transit encryption for shared witness. vSAN 7.0 Update 2 supports data-in-transit encryption for shared witness hosts.
- vLCM enhancements. vLCM now supports firmware updates for select Hitachi UCP HC servers, along with existing support for select Dell EMC, HPE and Lenovo servers. vLCM can update vSphere with Tanzu clusters configured with NSX-T networking. In addition, scalability is increased to 400 hosts managed by vLCM within a single vCenter Server.
- vSAN management and monitoring enhancements. Additional tools are available to analyze your environment, and rapidly identify root causes of issues and ways to remediate. Enhancements include proactive capacity management, networking diagnostics, insights into performance top contributors, and health check history.
- Unplanned failure handling. vSAN 7.0 Update 2 includes enhanced data durability to tolerate unplanned host, disk, or network failures by creating additional durability components at the time of failure.
- File Service snapshots. vSAN 7.0 Update 2 simplifies backup of file shares with snapshot support and APIs that allow backup and recovery software vendors to integrate with vSAN File Service.
- vSphere Proactive HA support. vSAN now supports proactive HA, which detects hardware issues and can take proactive steps to place hosts into maintenance mode.
- VMFS6 file system support. A newly created VM on vSAN datastore will have VMFS6 file system on the VM namespace object if the object format version is 14. You can use SEsparse snapshots with the VM.
- Efficient VMDK moves. When you move a VMDK between two directories on same vSAN datastore using the vSphere Datastore Browser UI or API (VirtualDiskManager.moveVirtualDisk), only the VMDK descriptor file and object metadata is updated. This operation is faster because the VMDK backing vSAN object data is not copied.
VMware vSAN Community
Use the vSAN Community Web site to provide feedback and request assistance with any problems you find while using vSAN.
Upgrades for This Release
For instructions about upgrading vSAN, see the VMware vSAN 7.0 documentation.
Note: Before performing the upgrade, please review the most recent version of the VMware Compatibility Guide to validate that the latest vSAN version is available for your platform.
vSAN 7.0 Update 2 is a new release that requires a full upgrade to vSphere 7.0 Update 2. Perform the following tasks to complete the upgrade:
1. Upgrade to vCenter Server 7.0 Update 2. For more information, see the VMware vSphere 7.0 Update 2 Release Notes.
2. Upgrade hosts to ESXi 7.0 Update 2. For more information, see the VMware vSphere 7.0 Update 2 Release Notes.
3. Upgrade the vSAN on-disk format to version 14.0. If upgrading from on-disk format version 3.0 or later, no data evacuation is required (metadata update only).
4. Upgrade FSVM to enable new File Service features such as stretched cluster support, snapshot support, and data-in-transit encryption.
Note: vSAN retired disk format version 1.0 in vSAN 7.0 Update 1. Disks running disk format version 1.0 are no longer recognized by vSAN. vSAN will block upgrade through vSphere Update Manager, ISO install, or esxcli to vSAN 7.0 Update 1. To avoid these issues, upgrade disks running disk format version 1.0 to a higher version. If you have disks on version 1, a health check alerts you to upgrade the disk format version.
Disk format version 1.0 does not have performance and snapshot enhancements, and it lacks support for advanced features including checksum, deduplication and compression, and encryption. For more information about vSAN disk format version, see KB 2148493.
Upgrading the On-disk Format for Hosts with Limited Capacity
During an upgrade of the vSAN on-disk format from version 1.0 or 2.0, a disk group evacuation is performed. The disk group is removed and upgraded to on-disk format version 14.0, and the disk group is added back to the cluster. For two-node or three-node clusters, or clusters without enough capacity to evacuate each disk group, select Allow Reduced Redundancy from the vSphere Client. You also can use the following RVC command to upgrade the on-disk format: vsan.ondisk_upgrade --allow-reduced-redundancy
When you allow reduced redundancy, your VMs are unprotected for the duration of the upgrade, because this method does not evacuate data to the other hosts in the cluster. It removes each disk group, upgrades the on-disk format, and adds the disk group back to the cluster. All objects remain available, but with reduced redundancy.
If you enable deduplication and compression during the upgrade to vSAN 7.0 Update 2, you can select Allow Reduced Redundancy from the vSphere Client.
For information about maximum configuration limits for the vSAN 7.0 Update 2 release, see the Configuration Maximums documentation.
The known issues are grouped as follows.vSAN Issues
- Deleting files in a file share might not be reflected in vSAN capacity view
The allocated blocks may not be returned back to the vSAN storage instantly after all the files are deleted and hence it would take some time before the reclaimed storage capacity to be updated in vSAN capacity view. Also, when new data is written to the same file share, these deleted blocks might get reused prior to returning them to vSAN storage.
If unmap is enabled and vSAN deduplication is disabled, the space may not be freed back to vSAN unless 4MB aligned space are freed in VDFS. If unmap is enabled and vSAN deduplication is enabled, space freed by VDFS will be freed back to vSAN with a delay.
Workaround: To release the storage back to vSAN immediately, delete the file shares.
- vSAN over RDMA might experience lower performance due to network congestion
RDMA requires lossless network infrastructure that is free of congestion. If your network has congestion, certain large I/O workloads might experience lower performance than TCP.
Workaround: Address any network congestion issues following OEM best practices for RDMA.
- vCenter VM crash on stretched cluster with data-in-transit encryption
vCenter VM might crash on a vSAN stretched cluster if the vCenter VM is on vSAN with data-in-transit encryption enabled. When all hosts in one site are down and then power on again, the vCenter VM might crash after the failed site returns to service.
Workaround: Use the following script to resolve this problem: thumbPrintRepair.py
- VM migration from VMFS datastore or vSAN datastore to vSAN datastore fails
When you have Content Based Read Cache (CBRC) enabled, sVmotion or xVmotion might fail to migrate a VM that has one or more snapshots to the vSAN datastore. You might see the following error message: The operation is not supported on the object.
The following messages appear in /var/log/vmware/vpxd/
2021-01-31T17:12:27.477Z error vpxd [Originator@6876 sub=vpxLro opID=65ef3b53-01] [VpxLRO] Unexpected Exception: N5Vmomi5Fault12NotSupported9ExceptionE(Message is: The operation is not supported on the object.,
--> Fault cause: vmodl.fault.NotSupported
--> Fault Messages are:
Workaround: Consolidate snapshots, or delete all snapshots before migration.
- vSAN allows a VM to be provisioned across local and remote datastores
vSphere does not prevent users from provisioning a VM across local and remote datastores in an HCI Mesh environment. For example, you can provision one VMDK on the local vSAN datastore and one VMDK on remote vSAN datastore. This is not supported because vSphere HA is not supported with this configuration.
Workaround: Do not provision a VM across local and remote datastores.
- The object reformatting task is not progressing
If object reformatting is needed after an upgrade, a health alert is triggered, and vSAN begins reformatting. vSAN performs this task in batches, and it depends on the amount of transient capacity available in the cluster. When the transient capacity exceeds the maximum limit, vSAN waits for the transient capacity to be freed before proceeding with the reformatting. During this phase, the task might appear to be halted. The health alert will clear and the task will progress when transient capacity is available.
Workaround: None. The task is working as expected.
- System VMs cannot be powered-off
With the release of vSphere Cluster Services (vCLS) in vSphere 7.0 Update 1, a set of system VMs might be placed within the vSAN cluster. These system VMs cannot be powered-off by users. This issue can impact some vSAN workflows, which are documented in the following article: https://kb.vmware.com/s/article/80877
Workaround: For more information about this issue, refer to KB 80483.
- vSAN File Service cannot be enabled due to an old vSAN on-disk format version.
vSAN File Service cannot be enabled with the vSAN on-disk format version earlier than 11.0 (this is the on-disk format version in vSAN 7.0).
Upgrade the vSAN disk format version before enabling file service.
- Remediate cluster task might fail in large scale cluster due to vSAN health network test issues
Large scale clusters with more than 16 hosts, intermittent ping failures can occur during host upgrade. These failures can interrupt host remediation in vSphere Life Cycle Manager.
After remediation pre-check passes, silence alerts for the following vSAN health tests:
- vSAN: Basic (unicast) connectivity check
- vSAN: MTU check (ping with large packet size)
When the remediation task is complete, restore alerts for the vSAN health tests.
- Host failure in hot-plug scenario when drive is reinserted
During a hot drive removal, VMware native NVMe hot-plug can cause a host failure if the NVMe drive is pulled and reinserted within one minute. This is applicable to both vSphere and vSAN for any new or existing drive reinsertion.
Workaround: After removing a hot drive, wait for one minute before you reinsert the new or existing drive.
- Cannot place last host in a cluster into maintenance mode, or remove a disk or disk group
Operations in Full data migration or Ensure accessibility mode might fail without providing guidance to add a new resource, when there is only one host left in the cluster and that host enters maintenance mode. This can also happen when there is only one disk or disk group left in the cluster and that disk or disk group is to be removed.
Workaround: Before you place the last remaining host in the cluster into maintenance mode with Full data migration or Ensure accessibility mode selected, add another host with the same configuration to the cluster. Before you remove the last remaining disk or disk group in the cluster, add a new disk or disk group with the same configuration and capacity.
- Object reconfiguration workflows might fail due to the lack of capacity if one or more disks or disk groups are almost full
vSAN resyncs get paused when the disks in non-deduplication clusters or disk groups in deduplication clusters reach a configurable resync pause fullness threshold. This is to avoid filling up the disks with resync I/O. If the disks reach this threshold, vSAN stops reconfiguration workflows, such as EMM, repairs, rebalance, and policy change.
Workaround: If space is available elsewhere in the cluster, rebalancing the cluster frees up space on the other disks, so that subsequent reconfiguration attempts succeed.
- After recovery from cluster full, VMs can lose HA protection
In a vSAN cluster that has hosts with disks 100% full, the VMs might have a question pending and hence lose the HA protection. Also, the VMs that had a pending question are not HA protected after recovering from cluster full scenario.
Workaround: After recovering from vSAN cluster full scenario, perform one of the following actions:
- Disable and re-enable HA.
- Reconfigure HA.
- Power off and power on the VMs.
- Power Off VMs fails with Question Pending
If a VM has a pending question, you are not allowed to do any VM-related operations until the question is answered.
Workaround: Try to free the disk space on the relevant volume, and then click Retry.
- When the cluster is full, the IP addresses of VMs either change to IPV6 or become unavailable
When a vSAN cluster is full with one or more disk groups reaching 100%, there can be a VM pending question that requires user action. If the question is not answered and if the cluster full condition is left unattended, the IP addresses VMs might change to IPv6 or become unavailable. This prevents you from using SSH to access the VMs. It also prevents you from using the VM console, because the console goes blank after you type
- Unable to remove a dedupe enabled disk group after a capacity disk enters PDL state
When a capacity disk in a dedupe-enabled disk group is removed, or its unique ID changes, or when the device experiences an unrecoverable hardware error, it enters Permanent Device Loss (PDL) state. If you try to remove the disk group, you might see an error message informing you that the action cannot be completed.
Workaround: Whenever a capacity disk is removed, or its unique ID changes, or when the device experiences an unrecoverable hardware error, wait for a few minutes before trying to remove the disk group.
- vSAN health indicates non-availability related incompliance with failed pending policy
A policy change request leaves the object health status of vSAN in a non-availability related incompliance state. This is because there might be other scheduled work that is utilizing the requested resources. However, vSAN reschedules this policy request automatically as resources become available.
Workaround: The vSAN period scan fixes this issue automatically in most cases. However, other work in progress might use up available resources even after the policy change was accepted but not applied. You can add more capacity if the capacity reporting displays a high value.
- In deduplication clusters, reactive rebalancing might not happen when the disks show more than 80% full
In deduplication clusters, when the disks display more than 80% full on the dashboard, the reactive rebalancing might not start as expected. This is because in deduplication clusters, pending writes and deletes are also considered for calculating the free capacity.
- TRIM/UNMAP commands from Guest OS fail
If the Guest OS attempts to perform space reclamation during online snapshot consolidation, the TRIM/UNMAP commands fail. This failure keeps space from being reclaimed.
Workaround: Try to reclaim the space after the online snapshot operation is complete. If subsequent TRIM/UNMAP operations fail, remount the disk.
- Space reclamation from SCSI TRIM/UNMAP is lost when online snapshot consolidation is performed
Space reclamation achieved from SCSI TRIM/UNMAP commands is lost when you perform online snapshot consolidation. Offline snapshot consolidation does not affect SCSI unmap operation.
Workaround: Reclaim the space after online snapshot consolidation is complete.
- Host failure when converting data host into witness host
When you convert a vSAN cluster into a stretched cluster, you must provide a witness host. You can convert a data host into the witness host, but you must use maintenance mode with Full data migration during the process. If you place the host into maintenance mode with Ensure accessibility option, and then configure it as the witness host, the host might fail with a purple diagnostic screen.
Workaround: Remove the disk group on the witness host and then re-create the disk group.
- Duplicate VM with the same name in vCenter Server when residing host fails during datastore migration
If a VM is undergoing storage vMotion from vSAN to another datastore, such as NFS, and the host on which it resides encounters a failure on the vSAN network, causing HA failover of the VM, the VM might be duplicated in the vCenter Server.
Workaround: Power off the invalid VM and unregister it from the vCenter Server.
Reconfiguring an existing stretched cluster under a new vCenter Server causes vSAN to issue a health check warning
When rebuilding a current stretched cluster under a new vCenter Server, the vSAN cluster health check is red. The following message appears: vSphere cluster members match vSAN cluster members
Workaround: Use the following procedure to configure the stretched cluster.
- Use SSH to log in to the witness host.
- Decommission the disks on witness host. Run the following command: esxcli vsan storage remove -s "SSD UUID"
- Force the witness host to leave the cluster. Run the following command: esxcli vsan cluster leave
- Reconfigure the stretched cluster from the new vCenter Server (Configure > vSAN > Fault Domains & Stretched Cluster).
Disk format upgrade fails while vSAN resynchronizes large objects
If the vSAN cluster contains very large objects, the disk format upgrade might fail while the object is resynchronized. You might see the following error message: Failed to convert object(s) on vSAN
vSAN cannot perform the upgrade until the object is resynchronized. You can check the status of the resynchronization (Monitor > vSAN > Resyncing Components) to verify when the process is complete.
Workaround: Wait until no resynchronization is pending, then retry the disk format upgrade.
Cluster consistency health check fails during deep rekey operation
The deep rekey operation on an encrypted vSAN cluster can take several hours. During the rekey, the following health check might indicate a failure: Cluster configuration consistency. The cluster consistency check does not detect the deep rekey operation, and there might not be a problem.
Workaround: Retest the vSAN cluster consistency health check after the deep rekey operation is complete.
vSAN stretched cluster configuration lost after you disable vSAN on a cluster
If you disable vSAN on a stretched cluster, the stretched cluster configuration is not retained. The stretched cluster, witness host, and fault domain configuration is lost.
Workaround: Reconfigure the stretched cluster parameters when you re-enable the vSAN cluster.
- Powered off VMs appear as inaccessible during witness host replacement
When you change a witness host in a stretched cluster, VMs that are powered off appear as inaccessible in the vSphere Web Client for a brief time. After the process is complete, powered off VMs appear as accessible. All running VMs appear as accessible throughout the process.
Cannot place hosts in maintenance mode if they have faulty boot media
vSAN cannot place hosts with faulty boot media into maintenance mode. The task to enter maintenance mode might fail with an internal vSAN error, due to the inability to save configuration changes. You might see log events similar to the following: Lost Connectivity to the device xxx backing the boot filesystem
Workaround: Remove disk groups manually from each host, using the Full data evacuation option. Then place the host in maintenance mode.
Health service does not work if vSAN cluster has ESXi hosts with vSphere 6.0 Update 1 or earlier
The vSAN 6.6 and later health service does not work if the cluster has ESXi hosts running vSphere 6.0 Update 1 or earlier releases.
Workaround: Do not add ESXi hosts with vSphere 6.0 Update 1 or earlier software to a vSAN 6.6 or later cluster.
- After stretched cluster failover, VMs on the preferred site register alert: Failed to failover
If the secondary site in a stretched cluster fails, VMs failover to the preferred site. VMs already on the preferred site might register the following alert: Failed to failover. Ignore this alert. It does not impact the behavior of the failover.
During network partition, components in the active site appear to be absent
During a network partition in a vSAN 2 host or stretched cluster, the vSphere Web Client might display a view of the cluster from the perspective of the non-active site. You might see active components in the primary site displayed as absent.
Workaround: Use RVC commands to query the state of objects in the cluster. For example: vsan.vm_object_info
Some objects are non-compliant after force repair
After you perform a force repair, some objects might not be repaired because the ownership of the objects was transferred to a different node during the process. The force repair might be delayed for those objects.
Workaround: Attempt the force repair operation after all other objects are repaired and resynchronized. You can wait until vSAN repairs the objects.
When you move a host from one encrypted cluster to another, and then back to the original cluster, the task fails
When you move a host from an encrypted vSAN cluster to another encrypted vSAN cluster, then move the host back to the original encrypted cluster, the task might fail. You might see the following message: A general system error occurred: Invalid fault. This error occurs because vSAN cannot re-encrypt data on the host using the original encryption key. After a short time, vCenter Server restores the original key on the host, and all unmounted disks in the vSAN cluster are mounted.
Workaround: Reboot the host and wait for all disks to get mounted.
Stretched cluster imbalance after a site recovers
When you recover a failed site in a stretched cluster, sometimes hosts in the failed site are brought back sequentially over a long period of time. vSAN might overuse some hosts when it begins repairing the absent components.
Workaround: Recover all of the hosts in a failed site together within a short time window.
- VM operations fail due to HA issue with stretched clusters
Under certain failure scenarios in stretched clusters, certain VM operations such as vMotions or powering on a VM might be impacted. These failures scenarios include a partial or a complete site failure, or the failure of the high speed network between the sites. This problem is caused by the dependency on VMware HA being available for normal operation of stretched cluster sites.
Workaround: Disable vSphere HA before performing vMotion, VM creation, or powering on VMs. Then re-enable vSphere HA.
Cannot perform deep rekey if a disk group is unmounted
Before vSAN performs a deep rekey, it performs a shallow rekey. The shallow rekey fails if an unmounted disk group is present. The deep rekey process cannot begin.
Workaround: Remount or remove the unmounted disk group.
Log entries state that firewall configuration has changed
A new firewall entry appears in the security profile when vSAN encryption is enabled: vsanEncryption. This rule controls how hosts communicate directly to the KMS. When it is triggered, log entries are added to /var/log/vobd.log. You might see the following messages:
Firewall configuration has changed. Operation 'addIP4' for rule set vsanEncryption succeeded.
Firewall configuration has changed. Operation 'removeIP4' for rule set vsanEncryption succeeded.
These messages can be ignored.
HA failover does not occur after setting Traffic Type option on a vmknic to support witness traffic
If you set the traffic type option on a vmknic to support witness traffic, vSphere HA does not automatically discover the new setting. You must manually disable and then re-enable HA so it can discover the vmknic. If you configure the vmknic and the vSAN cluster first, and then enable HA on the cluster, it does discover the vmknic.
Workaround: Manually disable vSphere HA on the cluster, and then re-enable it.
- iSCSI MCS is not supported
vSAN iSCSI target service does not support Multiple Connections per Session (MCS).
Any iSCSI initiator can discover iSCSI targets
vSAN iSCSI target service allows any initiator on the network to discover iSCSI targets.
Workaround: You can isolate your ESXi hosts from iSCSI initiators by placing them on separate VLANs.
After resolving network partition, some VM operations on linked clone VMs might fail
Some VM operations on linked clone VMs that are not producing I/O inside the guest operating system might fail. The operations that might fail include taking snapshots and suspending the VMs. This problem can occur after a network partition is resolved, if the parent base VM's namespace is not yet accessible. When the parent VM's namespace becomes accessible, HA is not notified to power on the VM.
Workaround: Power cycle VMs that are not actively running I/O operations.
Cannot place a witness host in Maintenance Mode
When you attempt to place a witness host in Maintenance Mode, the host remains in the current state and you see the following notification: A specified parameter was not correct.
Workaround: When placing a witness host in Maintenance Mode, choose the No data migration option.
Moving the witness host into and then out of a stretched cluster leaves the cluster in a misconfigured state
If you place the witness host in a vSAN-enabled vCenter cluster, an alarm notifies you that the witness host cannot reside in the cluster. But if you move the witness host out of the cluster, the cluster remains in a misconfigured state.
Workaround: Move the witness host out of the vSAN stretched cluster, and reconfigure the stretched cluster. For more information, see Knowledge Base article 2130587.
When a network partition occurs in a cluster which has an HA heartbeat datastore, VMs are not restarted on the other data site
When the preferred or secondary site in a vSAN cluster loses its network connection to the other sites, VMs running on the site that loses network connectivity are not restarted on the other data site, and the following error might appear: vSphere HA virtual machine HA failover failed.
This is expected behavior for vSAN clusters.
Workaround: Do not select HA heartbeat datastore while configuring vSphere HA on the cluster.
- Unmounted vSAN disks and disk groups displayed as mounted in the vSphere Web Client Operational Status field
After the vSAN disks or disk groups are unmounted by either running the esxcli vsan storage disk group unmount command or by the vSAN Device Monitor service when disks show persistently high latencies, the vSphere Web Client incorrectly displays the Operational Status field as mounted.
Workaround: Use the Health field to verify disk status, instead of the Operational Status field.