Updated on: 23 June 2020 VMware vSAN 7.0 | 24 APR 2020 | Build 15843807 Check for additions and updates to these release notes. |
What's in the Release Notes
The release notes cover the following topics:What's New
vSAN 7.0 introduces the following new features and enhancements:
- vSphere Lifecycle Manager. vSphere Lifecycle Manager enables simplified, consistent lifecycle management for your ESXi hosts. It uses a desired-state model that provides lifecycle management for the hypervisor and the full stack of drivers and firmware. vSphere Lifecycle Manager reduces the effort to monitor compliance for individual components and helps maintain a consistent state for the entire cluster. In vSAN 7.0, this solution supports Dell and HPE ReadyNodes.
With vCenter Server 7.0.0a, vSAN File Services and vSphere Lifecycle Manager can be enabled simultaneously on the same vSAN cluster. - Integrated File Services. vSAN native File Service delivers the ability to leverage vSAN clusters to create and present NFS v4.1 and v3 file shares. vSAN File Service extends vSAN capabilities to files, including availability, security, storage efficiency, and operations management.
- Native support for NVMe hot plug. This enhancement delivers a consistent way of servicing NVMe devices, and provides operational efficiency for select OEM drives.
- I/O redirect based on capacity imbalance with stretched clusters. vSAN redirects all VM I/O from a capacity-strained site to the other site, untill the capacity is freed up. This feature improves uptime of your VMs.
- Skyline integration with vSphere health and vSAN health. Joining forces under the Skyline brand, Skyline Health for vSphere and vSAN are available in the vSphere Client, enabling a native, in-product experience with consistent proactive analytics.
- Remove EZT for shared disk. vSAN 7.0 eliminates the prerequisite that shared virtual disks using the multi-writer flag must also use the eager zero thick format.
- Support vSAN memory as metric in performance service. vSAN memory usage is now available within the vSphere Client and through the API.
- Visibility of vSphere Replication objects in vSAN capacity view. vSphere replication objects are visible in vSAN capacity view. Objects are recognized as vSphere replica type, and space usage is accounted for under the Replication category.
- Support for large capacity drives. Enhancements extend support for 32TB physical capacity drives, and extend the logical capacity to 1PB when deduplication and compression is enabled.
- Immediate repair after new witness is deployed. When vSAN performs a replace witness operation, it immediately invokes a repair object operation after the witness has been added.
- vSphere with Kubernetes integration. CNS is the default storage platform for vSphere with Kubernetes. This integration enables various stateful containerized workloads to be deployed on vSphere with Kubernetes Supervisor and Guest clusters on vSAN, VMFS and NFS datastores.
- File-based persistent volumes. Kubernetes developers can dynamically create shared (Read/Write/Many) persistent volumes for applications. Multiple pods can share data. vSAN native File Services is the foundation that enables this capability.
- vVol support for modern applications. You can deploy modern Kubernetes applications to external storage arrays on vSphere using the CNS support added for vVols. vSphere now enables unified management for Persistent Volumes across vSAN, NFS, VMFS and vVols.
- vSAN VCG notification service. You can subscribe to vSAN HCL components such as vSAN ReadyNode, I/O controller, drives (NVMe, SSD, HDD) and get notified through email about any changes. The changes include firmware, driver, driver type (async/inbox), and so on. You can track the changes over time with new vSAN releases.
- New: Default gateway override. With ESXi 7.0b, vSAN enables you to override the default gateway for the vSAN VMkernel adapter on each host, and configure a gateway address for the vSAN network.
For information on how to install and configure Kubernetes node VMs and to use Cloud Native Storage, see Getting Started with VMware Cloud Native Storage.
For Cloud Native Storage known issues, see the vSphere 7.0 Release Notes.
VMware vSAN Community
Use the vSAN Community Web site to provide feedback and request assistance with any problems you find while using vSAN.
Upgrades for This Release
For instructions about upgrading vSAN, see the VMware vSAN 7.0 documentation.
Note: Before performing the upgrade, please review the most recent version of the VMware Compatibility Guide to validate that the latest vSAN version is available for your platform.
vSAN 7.0 is a new release that requires a full upgrade to vSphere 7.0. Perform the following tasks to complete the upgrade:
1. Upgrade to vCenter Server 7.0. For more information, see the VMware vSphere 7.0 Release Notes.
2. Upgrade hosts to ESXi 7.0. For more information, see the VMware vSphere 7.0 Release Notes.
3. Upgrade the vSAN on-disk format to version 11.0. If upgrading from on-disk format version 3.0 or later, no data evacuation is required (metadata update only).
Note: vSAN will retire disk format version 1.0 in vSAN 7.0 Update 1. Disks running disk format version 1.0 will not be recognized by vSAN. vSAN will block upgrade through vSphere Update Manager, ISO install, or esxcli to vSAN 7.0 Update 1. To avoid these issues, upgrade disks running disk format version 1.0 to a higher version. If you have disks on version 1, a health check alerts you to upgrade the disk format version.
Disk format version 1.0 does not have performance and snapshot enhancements, and it lacks support for advanced features including checksum, deduplication and compression, encryption. For more information about vSAN disk format version, see KB 2148493.
Upgrading the On-disk Format for Hosts with Limited Capacity
During an upgrade of the vSAN on-disk format from version 1.0 or 2.0, a disk group evacuation is performed. The disk group is removed and upgraded to on-disk format version 11.0, and the disk group is added back to the cluster. For two-node or three-node clusters, or clusters without enough capacity to evacuate each disk group, select Allow Reduced Redundancy from the vSphere Client. You also can use the following RVC command to upgrade the on-disk format: vsan.ondisk_upgrade --allow-reduced-redundancy
When you allow reduced redundancy, your VMs are unprotected for the duration of the upgrade, because this method does not evacuate data to the other hosts in the cluster. It removes each disk group, upgrades the on-disk format, and adds the disk group back to the cluster. All objects remain available, but with reduced redundancy.
If you enable deduplication and compression during the upgrade to vSAN 7.0, you can select Allow Reduced Redundancy from the vSphere Client.
Limitations
For information about maximum configuration limits for the vSAN 7.0 release, see the Configuration Maximums documentation.
Resolved Issues
- vSphere Lifecycle Manager and vSAN File Services cannot be simultaneously enabled on a vSAN cluster in vSphere 7.0 GA release
If vSphere Lifecycle Manager is enabled, vSAN File Services cannot be enabled on the same cluster and vice versa.
With vCenter Server 7.0.0a, vSAN File Services and vSphere Lifecycle Manager can be enabled simultaneously on the same vSAN Cluster.
Known Issues
The known issues are grouped as follows.
vSAN Issues- Error message "vSAN File Service Node (6): vSphere HA virtual machine failover failed" is displayed when a disk group is removed from the host
When a disk group is removed from the host, you might see the error message "vSAN File Service Node (6): vSphere HA virtual machine failover failed". However, the file service node becomes inaccessible and removed from the host.
The error message describes the failover activity of the vSAN File Service node when it’s underlying disk group is removed. However, as the vSAN File Service node is a virtual machine that is pinned to the host it is running, the vSphere HA operation to migrate this virtual machine to other hosts will fail.
Add the disk group back to the same host and remediate the vSAN File Service for this cluster using the vSAN user interface.
- vSAN File Service remediation task fails when the host is in the process of entering maintenance mode (EMM)
When the EMM tasks are running on the host, the File Service VM (FSVM) will be in powered off state. The vSAN File Service remediation considers this as FSVM failure and tries to remediate it. However, the vCenter prevents the vSAN file service remediation task from running until all the EMM tasks are completed. This results in error messages.
The vSAN File Service remediation task automatically succeeds after all the EMM tasks are completed on the host.
- If the host that is being added to a vSAN cluster does not have a disk group, then the File Service VM (FSVM) cannot be deployed on that host
File Service enablement on hosts requires that each host has disk groups claimed by vSAN.
Add a disk group to the host and let the disk group be claimed by vSAN.
- File share creation, deletion, and reconfiguration operations might fail if the cluster has hosts with infrastructure issues
If the file share creation, deletion, and reconfiguration operations are dispatched to a host which is experiencing infrastructure issues, then the operations might fail.
Retry the operation for that particular cluster.
- After upgrade, multiple FSVMs are present on some hosts
After the File Service VM (FSVM) upgrade is complete, some hosts might have more than one FSVM, which might be running an older version or be powered off.
1. Verify the current version of each FSVM.
2. If you find that a FSVM is in powered off state, remove that FSVM.
3. If you find that a FSVM is running on a older version, then power off and remove it from the host.
4. Navigate to vSAN cluster and then click Monitor > vSAN > Skyline Health.
5. In the Skyline Health section, click File Service and then click Retest.
6. Click Infrastructure Health and then Remediate File Service. Wait for the remediation to complete.
7. Repeat step 1 to 6 until all the FSVMs are powered on and running on the new version. - When a cluster is in degraded mode, writing to file shares might result in I/O error (EIO)
vSAN File Service automatically creates new vSAN objects for scaling out storage when all the existing space in the share is used. When the cluster is in a state where it cannot create new vSAN objects, writing to the file share will fail. This includes the scenario where there is an insufficient number of fault domains due to disk or node failures in the cluster.
Check the vSAN health service, and remediate the faults in the cluster.
- Deleting files in a file share might not be reflected in vSAN capacity view
The allocated blocks are not returned back to the vSAN storage even if all the files are deleted. These allocated blocks will be reused when new data is written to the same file share.
To release the storage back to vSAN, delete the file shares.
- The host enter maintenance mode (EMM) task freezes
- The host EMM task might freeze due to the File Service VM (FSVM) not being able to power off. When the file service is enabled and the host goes into maintenance mode, the FSVM placed on this host might still be running. This is due to some of the background monitoring procedure running on the host that conflicts with the VM power-off action. The power-off action gets interrupted but it does not retry, causing the VM to continue running. This blocks the host from entering maintenance mode.
Cancel the EMM task, and then retry. If the retry operation is unsuccessful, contact VMware Global Support.
- Remediate cluster task might fail in large scale cluster due to vSAN health network test issues
Large scale clusters with more than 16 hosts, intermittent ping failures can occur during host upgrade. These failures can interrupt host remediation in vSphere Life Cycle Manager.
After remediation pre-check passes, silence alerts for the following vSAN health tests:
- vSAN: Basic (unicast) connectivity check
- vSAN: MTU check (ping with large packet size)
When the remediation task is complete, restore alerts for the vSAN health tests.
- Host failure in hot-plug scenario when drive is reinserted
During a hot drive removal, VMware native NVMe hot-plug can cause a host failure if the NVMe drive is pulled and reinserted within one minute. This is applicable to both vSphere and vSAN for any new or existing drive reinsertion.
Workaround: After removing a hot drive, wait for one minute before you reinsert the new or existing drive.
- Update Manager displays test ID instead of health check name
When you use Update Manager to remediate hosts in a vSAN cluster, vSAN health checks can identify upgrade issues. When the remediation task fails on a host, you might see an error message with a test ID instead of a health check name. For example:
Before host exits MM, remediation failed because vSAN health check failed. vSAN cluster is not healthy because vSAN health check(s): com.vmware.vsan.health.test.controlleronhcl failed
Each test ID is related to a vSAN health check. To learn about the remediation health checks, refer to the following article: https://kb.vmware.com/s/article/60219
Workaround: If a remediation task fails on a vSAN host, use the health service to identify and resolve the issues. Then perform another remediation task.
- Cannot place last host in a cluster into maintenance mode, or remove a disk or disk group
Operations in Full data migration or Ensure accessibility mode might fail without providing guidance to add a new resource, when there is only one host left in the cluster and that host enters maintenance mode. This can also happen when there is only one disk or disk group left in the cluster and that disk or disk group is to be removed.
Workaround: Before you place the last remaining host in the cluster into maintenance mode with Full data migration or Ensure accessibility mode selected, add another host with the same configuration to the cluster. Before you remove the last remaining disk or disk group in the cluster, add a new disk or disk group with the same configuration and capacity.
- Object reconfiguration workflows might fail due to the lack of capacity if one or more disks or disk groups are almost full
vSAN resyncs get paused when the disks in non-deduplication clusters or disk groups in deduplication clusters reach a configurable resync pause fullness threshold. This is to avoid filling up the disks with resync I/O. If the disks reach this threshold, vSAN stops reconfiguration workflows, such as EMM, repairs, rebalance, and policy change.
Workaround: If space is available elsewhere in the cluster, rebalancing the cluster frees up space on the other disks, so that subsequent reconfiguration attempts succeed.
- After recovery from cluster full, VMs can lose HA protection
In a vSAN cluster that has hosts with disks 100% full, the VMs might have a question pending and hence lose the HA protection. Also, the VMs that had a pending question are not HA protected after recovering from cluster full scenario.
Workaround: After recovering from vSAN cluster full scenario, perform one of the following actions:
- Disable and re-enable HA.
- Reconfigure HA.
- Power off and power on the VMs.
- Power Off VMs fails with Question Pending
If a VM has a pending question, you are not allowed to do any VM-related operations until the question is answered.
Workaround: Try to free the disk space on the relevant volume, and then click Retry.
- When the cluster is full, the IP addresses of VMs either change to IPV6 or become unavailable
When a vSAN cluster is full with one or more disk groups reaching 100%, there can be a VM pending question that requires user action. If the question is not answered and if the cluster full condition is left unattended, the IP addresses VMs might change to IPv6 or become unavailable. This prevents you from using SSH to access the VMs. It also prevents you from using the VM console, because the console goes blank after you type
root
.Workaround: None.
- Unable to remove a dedupe enabled disk group after a capacity disk enters PDL state
When a capacity disk in a dedupe-enabled disk group is removed, or its unique ID changes, or when the device experiences an unrecoverable hardware error, it enters Permanent Device Loss (PDL) state. If you try to remove the disk group, you might see an error message informing you that the action cannot be completed.
Workaround: Whenever a capacity disk is removed, or its unique ID changes, or when the device experiences an unrecoverable hardware error, wait for a few minutes before trying to remove the disk group.
- vSAN health indicates non-availability related incompliance with failed pending policy
A policy change request leaves the object health status of vSAN in a non-availability related incompliance state. This is because there might be other scheduled work that is utilizing the requested resources. However, vSAN reschedules this policy request automatically as resources become available.
Workaround: The vSAN period scan fixes this issue automatically in most cases. However, other work in progress might use up available resources even after the policy change was accepted but not applied. You can add more capacity if the capacity reporting displays a high value.
- In deduplication clusters, reactive rebalancing might not happen when the disks show more than 80% full
In deduplication clusters, when the disks display more than 80% full on the dashboard, the reactive rebalancing might not start as expected. This is because in deduplication clusters, pending writes and deletes are also considered for calculating the free capacity.
Workaround: None.
- TRIM/UNMAP commands from Guest OS fail
If the Guest OS attempts to perform space reclamation during online snapshot consolidation, the TRIM/UNMAP commands fail. This failure keeps space from being reclaimed.
Workaround: Try to reclaim the space after the online snapshot operation is complete. If subsequent TRIM/UNMAP operations fail, remount the disk.
- Space reclamation from SCSI TRIM/UNMAP is lost when online snapshot consolidation is performed
Space reclamation achieved from SCSI TRIM/UNMAP commands is lost when you perform online snapshot consolidation. Offline snapshot consolidation does not affect SCSI unmap operation.
Workaround: Reclaim the space after online snapshot consolidation is complete.
- Host failure when converting data host into witness host
When you convert a vSAN cluster into a stretched cluster, you must provide a witness host. You can convert a data host into the witness host, but you must use maintenance mode with Full data migration during the process. If you place the host into maintenance mode with Ensure accessibility option, and then configure it as the witness host, the host might fail with a purple diagnostic screen.
Workaround: Remove the disk group on the witness host and then re-create the disk group.
- Duplicate VM with the same name in vCenter Server when residing host fails during datastore migration
If a VM is undergoing storage vMotion from vSAN to another datastore, such as NFS, and the host on which it resides encounters a failure on the vSAN network, causing HA failover of the VM, the VM might be duplicated in the vCenter Server.
Workaround: Power off the invalid VM and unregister it from the vCenter Server.
Reconfiguring an existing stretched cluster under a new vCenter Server causes vSAN to issue a health check warning
When rebuilding a current stretched cluster under a new vCenter Server, the vSAN cluster health check is red. The following message appears: vSphere cluster members match vSAN cluster membersWorkaround: Use the following procedure to configure the stretched cluster.
- Use SSH to log in to the witness host.
- Decommission the disks on witness host. Run the following command: esxcli vsan storage remove -s "SSD UUID"
- Force the witness host to leave the cluster. Run the following command: esxcli vsan cluster leave
- Reconfigure the stretched cluster from the new vCenter Server (Configure > vSAN > Fault Domains & Stretched Cluster).
-
Disk format upgrade fails while vSAN resynchronizes large objects
If the vSAN cluster contains very large objects, the disk format upgrade might fail while the object is resynchronized. You might see the following error message: Failed to convert object(s) on vSANvSAN cannot perform the upgrade until the object is resynchronized. You can check the status of the resynchronization (Monitor > vSAN > Resyncing Components) to verify when the process is complete.
Workaround: Wait until no resynchronization is pending, then retry the disk format upgrade.
-
Cluster consistency health check fails during deep rekey operation
The deep rekey operation on an encrypted vSAN cluster can take several hours. During the rekey, the following health check might indicate a failure: Cluster configuration consistency. The cluster consistency check does not detect the deep rekey operation, and there might not be a problem.Workaround: Retest the vSAN cluster consistency health check after the deep rekey operation is complete.
-
vSAN stretched cluster configuration lost after you disable vSAN on a cluster
If you disable vSAN on a stretched cluster, the stretched cluster configuration is not retained. The stretched cluster, witness host, and fault domain configuration is lost.Workaround: Reconfigure the stretched cluster parameters when you re-enable the vSAN cluster.
- Powered off VMs appear as inaccessible during witness host replacement
When you change a witness host in a stretched cluster, VMs that are powered off appear as inaccessible in the vSphere Web Client for a brief time. After the process is complete, powered off VMs appear as accessible. All running VMs appear as accessible throughout the process.
Workaround: None.
-
Cannot place hosts in maintenance mode if they have faulty boot media
vSAN cannot place hosts with faulty boot media into maintenance mode. The task to enter maintenance mode might fail with an internal vSAN error, due to the inability to save configuration changes. You might see log events similar to the following: Lost Connectivity to the device xxx backing the boot filesystemWorkaround: Remove disk groups manually from each host, using the Full data evacuation option. Then place the host in maintenance mode.
-
Health service does not work if vSAN cluster has ESXi hosts with vSphere 6.0 Update 1 or earlier
The vSAN 6.6 and later health service does not work if the cluster has ESXi hosts running vSphere 6.0 Update 1 or earlier releases.Workaround: Do not add ESXi hosts with vSphere 6.0 Update 1 or earlier software to a vSAN 6.6 or later cluster.
- After stretched cluster failover, VMs on the preferred site register alert: Failed to failover
If the secondary site in a stretched cluster fails, VMs failover to the preferred site. VMs already on the preferred site might register the following alert: Failed to failover. Ignore this alert. It does not impact the behavior of the failover.
Workaround: None.
-
During network partition, components in the active site appear to be absent
During a network partition in a vSAN 2 host or stretched cluster, the vSphere Web Client might display a view of the cluster from the perspective of the non-active site. You might see active components in the primary site displayed as absent.Workaround: Use RVC commands to query the state of objects in the cluster. For example: vsan.vm_object_info
-
Some objects are non-compliant after force repair
After you perform a force repair, some objects might not be repaired because the ownership of the objects was transferred to a different node during the process. The force repair might be delayed for those objects.Workaround: Attempt the force repair operation after all other objects are repaired and resynchronized. You can wait until vSAN repairs the objects.
-
When you move a host from one encrypted cluster to another, and then back to the original cluster, the task fails
When you move a host from an encrypted vSAN cluster to another encrypted vSAN cluster, then move the host back to the original encrypted cluster, the task might fail. You might see the following message: A general system error occurred: Invalid fault. This error occurs because vSAN cannot re-encrypt data on the host using the original encryption key. After a short time, vCenter Server restores the original key on the host, and all unmounted disks in the vSAN cluster are mounted.Workaround: Reboot the host and wait for all disks to get mounted.
-
Stretched cluster imbalance after a site recovers
When you recover a failed site in a stretched cluster, sometimes hosts in the failed site are brought back sequentially over a long period of time. vSAN might overuse some hosts when it begins repairing the absent components.Workaround: Recover all of the hosts in a failed site together within a short time window.
- VM operations fail due to HA issue with stretched clusters
Under certain failure scenarios in stretched clusters, certain VM operations such as vMotions or powering on a VM might be impacted. These failures scenarios include a partial or a complete site failure, or the failure of the high speed network between the sites. This problem is caused by the dependency on VMware HA being available for normal operation of stretched cluster sites.
Workaround: Disable vSphere HA before performing vMotion, VM creation, or powering on VMs. Then re-enable vSphere HA.
-
Cannot perform deep rekey if a disk group is unmounted
Before vSAN performs a deep rekey, it performs a shallow rekey. The shallow rekey fails if an unmounted disk group is present. The deep rekey process cannot begin.Workaround: Remount or remove the unmounted disk group.
Log entries state that firewall configuration has changed
A new firewall entry appears in the security profile when vSAN encryption is enabled: vsanEncryption. This rule controls how hosts communicate directly to the KMS. When it is triggered, log entries are added to /var/log/vobd.log. You might see the following messages:Firewall configuration has changed. Operation 'addIP4' for rule set vsanEncryption succeeded.
Firewall configuration has changed. Operation 'removeIP4' for rule set vsanEncryption succeeded.These messages can be ignored.
Workaround: None.
-
HA failover does not occur after setting Traffic Type option on a vmknic to support witness traffic
If you set the traffic type option on a vmknic to support witness traffic, vSphere HA does not automatically discover the new setting. You must manually disable and then re-enable HA so it can discover the vmknic. If you configure the vmknic and the vSAN cluster first, and then enable HA on the cluster, it does discover the vmknic.Workaround: Manually disable vSphere HA on the cluster, and then re-enable it.
- iSCSI MCS is not supported
vSAN iSCSI target service does not support Multiple Connections per Session (MCS).
Workaround: None.
Any iSCSI initiator can discover iSCSI targets
vSAN iSCSI target service allows any initiator on the network to discover iSCSI targets.Workaround: You can isolate your ESXi hosts from iSCSI initiators by placing them on separate VLANs.
-
After resolving network partition, some VM operations on linked clone VMs might fail
Some VM operations on linked clone VMs that are not producing I/O inside the guest operating system might fail. The operations that might fail include taking snapshots and suspending the VMs. This problem can occur after a network partition is resolved, if the parent base VM's namespace is not yet accessible. When the parent VM's namespace becomes accessible, HA is not notified to power on the VM.Workaround: Power cycle VMs that are not actively running I/O operations.
Cannot place a witness host in Maintenance Mode
When you attempt to place a witness host in Maintenance Mode, the host remains in the current state and you see the following notification: A specified parameter was not correct.Workaround: When placing a witness host in Maintenance Mode, choose the No data migration option.
Moving the witness host into and then out of a stretched cluster leaves the cluster in a misconfigured state
If you place the witness host in a vSAN-enabled vCenter cluster, an alarm notifies you that the witness host cannot reside in the cluster. But if you move the witness host out of the cluster, the cluster remains in a misconfigured state.Workaround: Move the witness host out of the vSAN stretched cluster, and reconfigure the stretched cluster. For more information, see Knowledge Base article 2130587.
When a network partition occurs in a cluster which has an HA heartbeat datastore, VMs are not restarted on the other data site
When the preferred or secondary site in a vSAN cluster loses its network connection to the other sites, VMs running on the site that loses network connectivity are not restarted on the other data site, and the following error might appear: vSphere HA virtual machine HA failover failed.This is expected behavior for vSAN clusters.
Workaround: Do not select HA heartbeat datastore while configuring vSphere HA on the cluster.
- Unmounted vSAN disks and disk groups displayed as mounted in the vSphere Web Client Operational Status field
After the vSAN disks or disk groups are unmounted by either running the esxcli vsan storage disk group unmount command or by the vSAN Device Monitor service when disks show persistently high latencies, the vSphere Web Client incorrectly displays the Operational Status field as mounted.
Workaround: Use the Health field to verify disk status, instead of the Operational Status field.