VMware vSAN 7.0 Update 1 | 6 OCT 2020 | Build 16850804
Check for additions and updates to these release notes.
What's in the Release NotesThe release notes cover the following topics:
vSAN 7.0 Update 1 introduces the following new features and enhancements:
Scale Without Compromise
- HCI Mesh. HCI Mesh is a software-based approach for disaggregation of compute and storage resources in vSAN. HCI Mesh brings together multiple independent vSAN clusters by enabling cross-cluster utilization of remote datastore capacity within vCenter Server. HCI Mesh enables you to efficiently utilize and consume data center resources, which provides simple storage management at scale.
- vSAN File Service enhancements. Native vSAN File Service includes support for SMB file shares. Support for Microsoft Active Directory, Kerberos authentication, and scalability improvements also are available.
- Compression-only vSAN. You can enable compression independently of deduplication, which provides a storage efficiency option for workloads that cannot take advantage of deduplication. With compression-only vSAN, a failed capacity device only impacts that device and not the entire disk group.
- Increased usable capacity. Internal optimizations allow vSAN to no longer need the 25-30% of free space available for internal operations and host failure rebuilds. The amount of space required is a deterministic value based on deployment variables, such as size of the cluster and density of storage devices. These changes provide more usable capacity for workloads.
- Shared witness for two-node clusters. vSAN 7.0 Update 1 enables a single vSAN witness host to manage multiple two-node clusters. A single witness host can support up to 64 clusters, which greatly reduces operational and resource overhead.
- vSAN Data-in-Transit encryption. This feature enables secure over the wire encryption of data traffic between nodes in a vSAN cluster. vSAN data-in-transit encryption is a cluster-wide feature, and can be enabled independently or along with vSAN data-at-rest encryption. Traffic encryption uses the same FIPS-2 validated cryptographic module as existing encryption features, and does not require use of a KMS server.
- Enhanced data durability during maintenance mode. This improvement protects the integrity of data when you place a host into maintenance mode with the Ensure Accessibility option. All incremental writes which would have been written to the host in maintenance are now redirected to another host, if one is available. This feature benefits VMs that have PFTT=1 configured, and also provides an alternative to using PFTT=2 for ensuring data integrity during maintenance operations.
- vLCM enhancements. vSphere Lifecycle Manager (vLCM) is a solution for unified software and firmware lifecycle management. In this release, vLCM is enhanced with firmware support for Lenovo ReadyNodes, awareness of vSAN stretched cluster and fault domain configurations, additional hardware compatibility pre-checks, and increased scalability for concurrent cluster operations.
- Reserved capacity. You can enable capacity reservations for internal cluster operations and host failure rebuilds. Reservations are soft-thresholds designed to prevent user-driven provisioning activity from interfering with internal operations, such as data rebuilds, rebalancing activity, or policy re-configurations.
- Default gateway override. You can override the default gateway for VMkernel adapter to provide a different gateway for vSAN network. This feature simplifies routing configuration for stretched clusters, two-node clusters, and fault domain deployments that previously required manual configuration of static routes. Static routing is not necessary.
- Faster vSAN host restarts. The time interval for a planned host restart has been reduced by persisting in-memory metadata to disk before the restart or shutdown. This method reduces the time required for hosts in a vSAN cluster to restart, which decreases the overall cluster downtime during maintenance windows.
- Workload I/O analysis. Analyze VM I/O metrics with IOInsight, a monitoring and troubleshooting tool that is integrated directly into vCenter Server. Gain a detailed view of VM I/O characteristics such as performance, I/O size and type, read/write ratio, and other important data metrics. You can run IOInsight operations against VMs, hosts, or the entire cluster.
- Consolidated I/O performance view. You can select multiple VMs, and display a combined view of storage performance metrics such as IOPS, throughput, and latency. You can compare storage performance characteristics across multiple VMs.
- VM latency monitoring with IOPS limits. This improvement in performance monitoring helps you distinguish the periods of latency that can occur due to enforced IOPS limits. This view can help organizations that set IOPS limits in VM storage policies.
- Secure drive erase. Securely wipe flash storage devices before decommissioning from a vSAN cluster through a set of new PowerCLI or API commands. Use these commands to safely erase data in accordance to NIST standards.
- Data migration pre-check for disks. vSAN's data migration pre-check for host maintenance mode now includes support for individual disk devices or entire disk groups. This offers more granular pre-checks for disk or disk group decommissioning.
- VPAT section 508 compliant. vSAN is compliant with the Voluntary Product Accessibility Template (VPAT). VPAT section 508 compliance ensures that vSAN had a thorough audit of accessibility requirements, and has instituted product changes for proper compliance.
Note: vSAN 7.0 Update 1 improves CPU performance by standardizing task timers throughout the system. This change addresses issues with timers activating earlier or later than requested, resulting in degraded performance for some workloads.
VMware vSAN Community
Use the vSAN Community Web site to provide feedback and request assistance with any problems you find while using vSAN.
Upgrades for This Release
For instructions about upgrading vSAN, see the VMware vSAN 7.0 documentation.
Note: Before performing the upgrade, please review the most recent version of the VMware Compatibility Guide to validate that the latest vSAN version is available for your platform.
vSAN 7.0 Update 1 is a new release that requires a full upgrade to vSphere 7.0 Update 1. Perform the following tasks to complete the upgrade:
1. Upgrade to vCenter Server 7.0 Update 1. For more information, see the VMware vSphere 7.0 Update 1 Release Notes.
2. Upgrade hosts to ESXi 7.0 Update 1. For more information, see the VMware vSphere 7.0 Update 1 Release Notes.
3. Upgrade the vSAN on-disk format to version 13.0. If upgrading from on-disk format version 3.0 or later, no data evacuation is required (metadata update only).
Note: vSAN retired disk format version 1.0 in vSAN 7.0 Update 1. Disks running disk format version 1.0 are no longer recognized by vSAN. vSAN will block upgrade through vSphere Update Manager, ISO install, or esxcli to vSAN 7.0 Update 1. To avoid these issues, upgrade disks running disk format version 1.0 to a higher version. If you have disks on version 1, a health check alerts you to upgrade the disk format version.
Disk format version 1.0 does not have performance and snapshot enhancements, and it lacks support for advanced features including checksum, deduplication and compression, and encryption. For more information about vSAN disk format version, see KB 2145267.
Upgrading the On-disk Format for Hosts with Limited Capacity
During an upgrade of the vSAN on-disk format from version 1.0 or 2.0, a disk group evacuation is performed. The disk group is removed and upgraded to on-disk format version 13.0, and the disk group is added back to the cluster. For two-node or three-node clusters, or clusters without enough capacity to evacuate each disk group, select Allow Reduced Redundancy from the vSphere Client. You also can use the following RVC command to upgrade the on-disk format: vsan.ondisk_upgrade --allow-reduced-redundancy
When you allow reduced redundancy, your VMs are unprotected for the duration of the upgrade, because this method does not evacuate data to the other hosts in the cluster. It removes each disk group, upgrades the on-disk format, and adds the disk group back to the cluster. All objects remain available, but with reduced redundancy.
If you enable deduplication and compression during the upgrade to vSAN 7.0 Update 1, you can select Allow Reduced Redundancy from the vSphere Client.
For information about maximum configuration limits for the vSAN 7.0 Update 1 release, see the Configuration Maximums documentation.
The known issues are grouped as follows.vSAN Issues
- vSAN allows a VM to be provisioned across local and remote datastores
vSphere does not prevent users from provisioning a VM across local and remote datastores in an HCI Mesh environment. For example, you can provision one VMDK on the local vSAN datastore and one VMDK on remote vSAN datastore. This is not supported because vSphere HA is not supported with this configuration.
Workaround: Do not provision a VM across local and remote datastores.
- Host summary storage information does not include vSAN datastores
When you click the Summary view for an ESXi host, the Storage summary for Free, Used, and Capacity values do not include vSAN datastores which are accessible from the host.
Workaround: You can check information about the vSAN datastore in the cluster Summary view.
- The object reformatting task is stuck and is not progressing.
When the object reformatting task is running, vSAN reconfigures the format of the objects in the background. Reconfiguring these objects are done in batches and it depends on the amount of transient capacity available in the cluster. When the transient capacity exceeds the maximum limit, vSAN waits for the transient capacity to clear up before proceeding with the reformatting. During this phase, the task may appear to be stuck and is not progressing.
Workaround: There is no workaround.
- System VMs cannot be powered-off
With the release of vSphere Cluster Services (vCLS) in vSphere 7.0 Update 1, a set of system VMs might be placed within the vSAN cluster. These system VMs cannot be powered-off by users. This issue can impact some vSAN workflows, which are documented in the following article: https://kb.vmware.com/s/article/80877
Workaround: For more information about this issue, refer to KB 80483.
- vSAN File Services VM (FSVM) docker internal network may overlap with the customer network without warning or reconfiguration
There is known conflict issue if the specified file service network overlaps with the docker internal network (172.17.0.0/16). This causes routing problem for traffic to the correct endpoint.
Specify a different file service network so that it does not overlap with the docker internal network (172.17.0.0/16).
- Some of the tasks to deploy File Services VM (FSVM) may fail while enabling deduplication or encryption/compression on a vSAN File Service enabled cluster
Enabling dedupe or encryption/compression, triggers Disk Format Change (DFC). DFC brings down the FSVM on each host sequentially. However the ESX Agent Manager (EAM) tries to deploy the FSVM and such deployments fail.
Such failures can be safely ignored. After the DFC is complete, the FSVM remediation succeeds.
- Deep Rekey may disrupt the client I/Os
During Deep Rekey process, I/O operations of the NFS and SMB clients may fail or get interrupted.
If the client failures cannot be tolerated, it is recommended to execute the Deep Rekey process during a maintenance window.
- vSAN File Services cannot be enabled due to an old vSAN on-disk format version.
vSAN File Services cannot be enabled with the vSAN on-disk format version earlier than 11.0 (this is the on-disk format version in vSAN 7.0).
Upgrade the vSAN disk format version before enabling file service.
- Deleting files in a file share might not be reflected in vSAN capacity view
The allocated blocks are not returned back to the vSAN storage even if all the files are deleted. These allocated blocks will be reused when new data is written to the same file share.
To release the storage back to vSAN, delete the file shares.
- Remediate cluster task might fail in large scale cluster due to vSAN health network test issues
Large scale clusters with more than 16 hosts, intermittent ping failures can occur during host upgrade. These failures can interrupt host remediation in vSphere Life Cycle Manager.
After remediation pre-check passes, silence alerts for the following vSAN health tests:
- vSAN: Basic (unicast) connectivity check
- vSAN: MTU check (ping with large packet size)
When the remediation task is complete, restore alerts for the vSAN health tests.
- Host failure in hot-plug scenario when drive is reinserted
During a hot drive removal, VMware native NVMe hot-plug can cause a host failure if the NVMe drive is pulled and reinserted within one minute. This is applicable to both vSphere and vSAN for any new or existing drive reinsertion.
Workaround: After removing a hot drive, wait for one minute before you reinsert the new or existing drive.
- Update Manager displays test ID instead of health check name
When you use Update Manager to remediate hosts in a vSAN cluster, vSAN health checks can identify upgrade issues. When the remediation task fails on a host, you might see an error message with a test ID instead of a health check name. For example:
Before host exits MM, remediation failed because vSAN health check failed. vSAN cluster is not healthy because vSAN health check(s): com.vmware.vsan.health.test.controlleronhcl failed
Each test ID is related to a vSAN health check. To learn about the remediation health checks, refer to the following article: https://kb.vmware.com/s/article/60219
Workaround: If a remediation task fails on a vSAN host, use the health service to identify and resolve the issues. Then perform another remediation task.
- Cannot place last host in a cluster into maintenance mode, or remove a disk or disk group
Operations in Full data migration or Ensure accessibility mode might fail without providing guidance to add a new resource, when there is only one host left in the cluster and that host enters maintenance mode. This can also happen when there is only one disk or disk group left in the cluster and that disk or disk group is to be removed.
Workaround: Before you place the last remaining host in the cluster into maintenance mode with Full data migration or Ensure accessibility mode selected, add another host with the same configuration to the cluster. Before you remove the last remaining disk or disk group in the cluster, add a new disk or disk group with the same configuration and capacity.
- Object reconfiguration workflows might fail due to the lack of capacity if one or more disks or disk groups are almost full
vSAN resyncs get paused when the disks in non-deduplication clusters or disk groups in deduplication clusters reach a configurable resync pause fullness threshold. This is to avoid filling up the disks with resync I/O. If the disks reach this threshold, vSAN stops reconfiguration workflows, such as EMM, repairs, rebalance, and policy change.
Workaround: If space is available elsewhere in the cluster, rebalancing the cluster frees up space on the other disks, so that subsequent reconfiguration attempts succeed.
- After recovery from cluster full, VMs can lose HA protection
In a vSAN cluster that has hosts with disks 100% full, the VMs might have a question pending and hence lose the HA protection. Also, the VMs that had a pending question are not HA protected after recovering from cluster full scenario.
Workaround: After recovering from vSAN cluster full scenario, perform one of the following actions:
- Disable and re-enable HA.
- Reconfigure HA.
- Power off and power on the VMs.
- Power Off VMs fails with Question Pending
If a VM has a pending question, you are not allowed to do any VM-related operations until the question is answered.
Workaround: Try to free the disk space on the relevant volume, and then click Retry.
- When the cluster is full, the IP addresses of VMs either change to IPV6 or become unavailable
When a vSAN cluster is full with one or more disk groups reaching 100%, there can be a VM pending question that requires user action. If the question is not answered and if the cluster full condition is left unattended, the IP addresses VMs might change to IPv6 or become unavailable. This prevents you from using SSH to access the VMs. It also prevents you from using the VM console, because the console goes blank after you type
- Unable to remove a dedupe enabled disk group after a capacity disk enters PDL state
When a capacity disk in a dedupe-enabled disk group is removed, or its unique ID changes, or when the device experiences an unrecoverable hardware error, it enters Permanent Device Loss (PDL) state. If you try to remove the disk group, you might see an error message informing you that the action cannot be completed.
Workaround: Whenever a capacity disk is removed, or its unique ID changes, or when the device experiences an unrecoverable hardware error, wait for a few minutes before trying to remove the disk group.
- vSAN health indicates non-availability related incompliance with failed pending policy
A policy change request leaves the object health status of vSAN in a non-availability related incompliance state. This is because there might be other scheduled work that is utilizing the requested resources. However, vSAN reschedules this policy request automatically as resources become available.
Workaround: The vSAN period scan fixes this issue automatically in most cases. However, other work in progress might use up available resources even after the policy change was accepted but not applied. You can add more capacity if the capacity reporting displays a high value.
- In deduplication clusters, reactive rebalancing might not happen when the disks show more than 80% full
In deduplication clusters, when the disks display more than 80% full on the dashboard, the reactive rebalancing might not start as expected. This is because in deduplication clusters, pending writes and deletes are also considered for calculating the free capacity.
- TRIM/UNMAP commands from Guest OS fail
If the Guest OS attempts to perform space reclamation during online snapshot consolidation, the TRIM/UNMAP commands fail. This failure keeps space from being reclaimed.
Workaround: Try to reclaim the space after the online snapshot operation is complete. If subsequent TRIM/UNMAP operations fail, remount the disk.
- Space reclamation from SCSI TRIM/UNMAP is lost when online snapshot consolidation is performed
Space reclamation achieved from SCSI TRIM/UNMAP commands is lost when you perform online snapshot consolidation. Offline snapshot consolidation does not affect SCSI unmap operation.
Workaround: Reclaim the space after online snapshot consolidation is complete.
- Host failure when converting data host into witness host
When you convert a vSAN cluster into a stretched cluster, you must provide a witness host. You can convert a data host into the witness host, but you must use maintenance mode with Full data migration during the process. If you place the host into maintenance mode with Ensure accessibility option, and then configure it as the witness host, the host might fail with a purple diagnostic screen.
Workaround: Remove the disk group on the witness host and then re-create the disk group.
- Duplicate VM with the same name in vCenter Server when residing host fails during datastore migration
If a VM is undergoing storage vMotion from vSAN to another datastore, such as NFS, and the host on which it resides encounters a failure on the vSAN network, causing HA failover of the VM, the VM might be duplicated in the vCenter Server.
Workaround: Power off the invalid VM and unregister it from the vCenter Server.
Reconfiguring an existing stretched cluster under a new vCenter Server causes vSAN to issue a health check warning
When rebuilding a current stretched cluster under a new vCenter Server, the vSAN cluster health check is red. The following message appears: vSphere cluster members match vSAN cluster members
Workaround: Use the following procedure to configure the stretched cluster.
- Use SSH to log in to the witness host.
- Decommission the disks on witness host. Run the following command: esxcli vsan storage remove -s "SSD UUID"
- Force the witness host to leave the cluster. Run the following command: esxcli vsan cluster leave
- Reconfigure the stretched cluster from the new vCenter Server (Configure > vSAN > Fault Domains & Stretched Cluster).
Disk format upgrade fails while vSAN resynchronizes large objects
If the vSAN cluster contains very large objects, the disk format upgrade might fail while the object is resynchronized. You might see the following error message: Failed to convert object(s) on vSAN
vSAN cannot perform the upgrade until the object is resynchronized. You can check the status of the resynchronization (Monitor > vSAN > Resyncing Components) to verify when the process is complete.
Workaround: Wait until no resynchronization is pending, then retry the disk format upgrade.
Cluster consistency health check fails during deep rekey operation
The deep rekey operation on an encrypted vSAN cluster can take several hours. During the rekey, the following health check might indicate a failure: Cluster configuration consistency. The cluster consistency check does not detect the deep rekey operation, and there might not be a problem.
Workaround: Retest the vSAN cluster consistency health check after the deep rekey operation is complete.
vSAN stretched cluster configuration lost after you disable vSAN on a cluster
If you disable vSAN on a stretched cluster, the stretched cluster configuration is not retained. The stretched cluster, witness host, and fault domain configuration is lost.
Workaround: Reconfigure the stretched cluster parameters when you re-enable the vSAN cluster.
- Powered off VMs appear as inaccessible during witness host replacement
When you change a witness host in a stretched cluster, VMs that are powered off appear as inaccessible in the vSphere Web Client for a brief time. After the process is complete, powered off VMs appear as accessible. All running VMs appear as accessible throughout the process.
Cannot place hosts in maintenance mode if they have faulty boot media
vSAN cannot place hosts with faulty boot media into maintenance mode. The task to enter maintenance mode might fail with an internal vSAN error, due to the inability to save configuration changes. You might see log events similar to the following: Lost Connectivity to the device xxx backing the boot filesystem
Workaround: Remove disk groups manually from each host, using the Full data evacuation option. Then place the host in maintenance mode.
Health service does not work if vSAN cluster has ESXi hosts with vSphere 6.0 Update 1 or earlier
The vSAN 6.6 and later health service does not work if the cluster has ESXi hosts running vSphere 6.0 Update 1 or earlier releases.
Workaround: Do not add ESXi hosts with vSphere 6.0 Update 1 or earlier software to a vSAN 6.6 or later cluster.
- After stretched cluster failover, VMs on the preferred site register alert: Failed to failover
If the secondary site in a stretched cluster fails, VMs failover to the preferred site. VMs already on the preferred site might register the following alert: Failed to failover. Ignore this alert. It does not impact the behavior of the failover.
During network partition, components in the active site appear to be absent
During a network partition in a vSAN 2 host or stretched cluster, the vSphere Web Client might display a view of the cluster from the perspective of the non-active site. You might see active components in the primary site displayed as absent.
Workaround: Use RVC commands to query the state of objects in the cluster. For example: vsan.vm_object_info
Some objects are non-compliant after force repair
After you perform a force repair, some objects might not be repaired because the ownership of the objects was transferred to a different node during the process. The force repair might be delayed for those objects.
Workaround: Attempt the force repair operation after all other objects are repaired and resynchronized. You can wait until vSAN repairs the objects.
When you move a host from one encrypted cluster to another, and then back to the original cluster, the task fails
When you move a host from an encrypted vSAN cluster to another encrypted vSAN cluster, then move the host back to the original encrypted cluster, the task might fail. You might see the following message: A general system error occurred: Invalid fault. This error occurs because vSAN cannot re-encrypt data on the host using the original encryption key. After a short time, vCenter Server restores the original key on the host, and all unmounted disks in the vSAN cluster are mounted.
Workaround: Reboot the host and wait for all disks to get mounted.
Stretched cluster imbalance after a site recovers
When you recover a failed site in a stretched cluster, sometimes hosts in the failed site are brought back sequentially over a long period of time. vSAN might overuse some hosts when it begins repairing the absent components.
Workaround: Recover all of the hosts in a failed site together within a short time window.
- VM operations fail due to HA issue with stretched clusters
Under certain failure scenarios in stretched clusters, certain VM operations such as vMotions or powering on a VM might be impacted. These failures scenarios include a partial or a complete site failure, or the failure of the high speed network between the sites. This problem is caused by the dependency on VMware HA being available for normal operation of stretched cluster sites.
Workaround: Disable vSphere HA before performing vMotion, VM creation, or powering on VMs. Then re-enable vSphere HA.
Cannot perform deep rekey if a disk group is unmounted
Before vSAN performs a deep rekey, it performs a shallow rekey. The shallow rekey fails if an unmounted disk group is present. The deep rekey process cannot begin.
Workaround: Remount or remove the unmounted disk group.
Log entries state that firewall configuration has changed
A new firewall entry appears in the security profile when vSAN encryption is enabled: vsanEncryption. This rule controls how hosts communicate directly to the KMS. When it is triggered, log entries are added to /var/log/vobd.log. You might see the following messages:
Firewall configuration has changed. Operation 'addIP4' for rule set vsanEncryption succeeded.
Firewall configuration has changed. Operation 'removeIP4' for rule set vsanEncryption succeeded.
These messages can be ignored.
HA failover does not occur after setting Traffic Type option on a vmknic to support witness traffic
If you set the traffic type option on a vmknic to support witness traffic, vSphere HA does not automatically discover the new setting. You must manually disable and then re-enable HA so it can discover the vmknic. If you configure the vmknic and the vSAN cluster first, and then enable HA on the cluster, it does discover the vmknic.
Workaround: Manually disable vSphere HA on the cluster, and then re-enable it.
- iSCSI MCS is not supported
vSAN iSCSI target service does not support Multiple Connections per Session (MCS).
Any iSCSI initiator can discover iSCSI targets
vSAN iSCSI target service allows any initiator on the network to discover iSCSI targets.
Workaround: You can isolate your ESXi hosts from iSCSI initiators by placing them on separate VLANs.
After resolving network partition, some VM operations on linked clone VMs might fail
Some VM operations on linked clone VMs that are not producing I/O inside the guest operating system might fail. The operations that might fail include taking snapshots and suspending the VMs. This problem can occur after a network partition is resolved, if the parent base VM's namespace is not yet accessible. When the parent VM's namespace becomes accessible, HA is not notified to power on the VM.
Workaround: Power cycle VMs that are not actively running I/O operations.
Cannot place a witness host in Maintenance Mode
When you attempt to place a witness host in Maintenance Mode, the host remains in the current state and you see the following notification: A specified parameter was not correct.
Workaround: When placing a witness host in Maintenance Mode, choose the No data migration option.
Moving the witness host into and then out of a stretched cluster leaves the cluster in a misconfigured state
If you place the witness host in a vSAN-enabled vCenter cluster, an alarm notifies you that the witness host cannot reside in the cluster. But if you move the witness host out of the cluster, the cluster remains in a misconfigured state.
Workaround: Move the witness host out of the vSAN stretched cluster, and reconfigure the stretched cluster. For more information, see Knowledge Base article 2130587.
When a network partition occurs in a cluster which has an HA heartbeat datastore, VMs are not restarted on the other data site
When the preferred or secondary site in a vSAN cluster loses its network connection to the other sites, VMs running on the site that loses network connectivity are not restarted on the other data site, and the following error might appear: vSphere HA virtual machine HA failover failed.
This is expected behavior for vSAN clusters.
Workaround: Do not select HA heartbeat datastore while configuring vSphere HA on the cluster.
- Unmounted vSAN disks and disk groups displayed as mounted in the vSphere Web Client Operational Status field
After the vSAN disks or disk groups are unmounted by either running the esxcli vsan storage disk group unmount command or by the vSAN Device Monitor service when disks show persistently high latencies, the vSphere Web Client incorrectly displays the Operational Status field as mounted.
Workaround: Use the Health field to verify disk status, instead of the Operational Status field.