VMware vSphere 8.0 | 11 OCT 2022 VMware ESXi 8.0 | 11 OCT 2022 | Build ISO Build 20513097 Check for additions and updates to these release notes. |
VMware vSphere 8.0 | 11 OCT 2022 VMware ESXi 8.0 | 11 OCT 2022 | Build ISO Build 20513097 Check for additions and updates to these release notes. |
These release notes introduce you to new features in VMware vSAN 8.0 and provide information on resolved and known issues.
vSAN 8.0 introduces the following new features and enhancements:
Performance without Tradeoffs
vSAN Express Storage Architecture. vSAN ESA is an alternative architecure that provides the potential for huge boosts in performance with more predictable I/O latencies and optimized space efficiency.
Increased write buffer. vSAN Original Storage Architecture can support more intensive workloads. You can configure vSAN hosts to increase the write buffer from 600 GB to 1.6 TB.
Native snapshots with minimal performance impact. vSAN ESA file system has snapshots built in. These native snapshots cause minimal impact to VM performance, even if the snapshot chain gets deep. The snapshots are fully compatible with existing backup applications using VMware VADP.
Supreme Resource and Space Efficiency
Erasure Coding without compromising performance. The vSAN ESA RAID5/RAID6 capabilities with Erasure Coding provide a highly efficient Erasure Coding code path, so you can have both a high-performance and a space-efficient storage policy.
Improved compression. vSAN ESA has advanced compression capabilities that can bring up to 4x better compression. Compression is performed before data is sent across the vSAN network, providing better bandwidth usage.
Expanded usable storage potential. vSAN ESA consists of a single-tier architecture with all devices contributing to capacity. This flat storage pool removes the need for disk groups with caching devices.
Reduced performance overhead for high VM consolidation. Resource and space efficiency improvements enable you to store more VM data per cluster, potentially increasing VM consolidation ratios.
HCI Mesh support for 10 client clusters. A storage server cluster can be shared with up to 10 client clusters.
Fast, Efficient Data Protection with vSAN ESA Native Snapshots
Negligible performance impact. Long snapshot chains and deep snapshot chains cause minimal performance impact.
Faster snapshot operations. Applications that suffered from snapshot create or snapshot delete stun times will perform better with vSAN ESA.
Consistent partner backup application experience using VMware VADP. VMware snapshot APIs are unchanged. VMware VADP supports all vSAN ESA native snapshot operations on the vSphere platform.
Availability and Serviceability
Simplified and accelerated servicing per device. vSAN ESA removes the complexity of disk groups, which streamlines the replacement process for failed drives.
Smaller failure domains and reduced data resynchronization. vSAN ESA has no single points of failure in its storage pool design. vSAN data and metadata are protected according to the Failures To Tolerate (FTT) SPBM setting. Neither caching nor compression lead to more than a single disk failure domain if a disk crashes. Resync operations complete faster with vSAN ESA.
Enhanced data availability and improved SLAs. Reduction in disk failure domains and quicker repair times means you can improve the SLAs provided to your customers or business units.
vSAN boot-time optimizations. vSAN boot logic has been further optimized for faster startup.
Enhanced shutdown and startup workflows. The vSAN cluster shutdown and cluster startup process has been enhanced to support vSAN clusters that house vCenter or infrastructure services such as AD, DNS, DHCP, and so on.
Reduced vSAN File Service failover time. vSAN File Service planned failovers have been streamlined.
Intuitive, Agile Operations
Consistent interfaces across all vSAN platforms. vSAN ESA uses the same screens and workflows as vSAN OSA, so the learning curve is small.
Per-VM policies increase flexibility. vSAN ESA is moving cluster-wide settings to the SPBM level. In this release, SPBM compression settings give you granular control down to the VM or even VMDK level, and you can apply them broadly with datastore default policies.
Proactive Insight into compatibility and compliance. This mechanism helps vSAN clusters connected to VMware Analytics Cloud identify software and hardware anomalies. If an OEM partner publishes an advisory about issues for a drive or I/O controller listed in vSAN HCL, you can be notified about the potentially impacted environment.
Additional Features and Enhancements
Enhanced network uplink latency metrics. vSAN defines more meaningful and relevant metrics catered to the environment, whether the latencies are temporary or from an excessive workload.
RDT level checksums. You can set checksums at the RDT layer. These new checksums can aid in debugging and triaging.
vSAN File Service debugging. File Service Day 0 operations have been improved for efficient validation and troubleshooting.
vSAN File Service over IPv6. You can create a file service domain with IPv6 network.
vSAN File Service network reconfiguration. You can change file server IPs including the primary IP to new IPs in the same or different subnet.
vSphere Client Remote Plug-ins. All VMware-owned local plug-ins are transitioning to the new remote plug-in architecture. vSAN local plug-ins have been moved to vSphere Client remote plug-ins. The local vSAN plug-ins are deprecated in this release.
vLCM HCL disk device. Enhancements improve vLCM’s functionality and efficiency for checking compatibility with the desired image. It includes a check for “partNumber” and “vendor" to add coverage for more vendors.
Reduced start time of vSAN health service. The time needed to stop vSAN health service as a part of vCenter restart or upgrade has been reduced to 5 seconds.
vSAN health check provides perspective to VCF LCM. This release provides only relevant vSAN health checks to VCF in order to improve LCM resiliency in VCF.
vSAN improves cluster NDU for VMC. New capabilities improve design and operation of a highly secure, reliable, and operationally efficient service.
vSAN encryption key verification. Detects invalid or corrupt keys sent from the KMS server, identifies discrepancies between in-memory and on-disk DEKs, and alerts customers in case of discrepancies.
Better handling of large component deletes. Reclaims the logical space and accounts for the physical space faster, without causing NO_SPACE error.
Renamed vSAN health "Check" to "Finding." This change makes the term consistent with all VMware products.
Place vSAN in separate sandbox domain. Daemon sandboxing prevents lateral movement and provides defense in depth. Starting with vSAN 8.0, least privilege security model is implemented, wherein any daemon that does not have its custom sandbox domain defined, will run as a deprivileged domain. This achieves least-privilege model on an ESXi host, with all vSAN running in their own sandbox domain with the least possible privilege.
vSAN Proactive Insights. This mechanism enables vSAN clusters connected to VMware Analytics Cloud to identify software and hardware anomalies proactively.
Management and monitoring of PMEM for SAP HANA. You can manage PMEM devices within the hosts. vSAN provides management capabilities such as health checks, performance monitoring, and space reporting for the PMEM devices. PMEM management capabilities do not require vSAN services to be enabled. vSAN does not use PMEM devices for caching vSAN metadata or for vSAN data services such as encryption, checksum, or dedupe and compression. The PMEM datastore is local to each host, but can be managed from the monitor tab at the cluster level.
Replace MD5, SHA1, and SHA2 in vSAN. SHA1 is no longer considered secure, so VMware is replacing SHA1, MD5, and SHA2 with SHA256 across all VMware products, including vSAN.
IL6 compliance. vSAN 8.0 is IL6 compliant.
Use the vSAN Community Web site to provide feedback and request assistance with any problems you find while using vSAN.
For instructions about upgrading vSAN, see the VMware vSAN 8.0 documentation.
Note: Before performing the upgrade, please review the most recent version of the VMware Compatibility Guide to validate that the latest vSAN version is available for your platform.
Note: vSAN Express Storage Architecture is available only for new deployments. You cannot upgrade a cluster to vSAN ESA.
vSAN 8.0 is a new release that requires a full upgrade to vSphere 8.0. Perform the following tasks to complete the upgrade:
Upgrade to vCenter Server 8.0. For more information, see the VMware vSphere 8.0 Release Notes.
Upgrade hosts to ESXi 8.0. For more information, see the VMware vSphere 8.0 Release Notes.
Upgrade the vSAN on-disk format to version 17.0. If upgrading from on-disk format version 3.0 or later, no data evacuation is required (metadata update only).
Upgrade FSVM to enable new File Service features such as access based enumeration for SMB shares.
Note: vSAN retired disk format version 1.0 in vSAN 7.0 Update 1. Disks running disk format version 1.0 are no longer recognized by vSAN. vSAN will block upgrade through vSphere Update Manager, ISO install, or esxcli to vSAN 7.0 Update 1. To avoid these issues, upgrade disks running disk format version 1.0 to a higher version. If you have disks on version 1.0, a health check alerts you to upgrade the disk format version.
Disk format version 1.0 does not have performance and snapshot enhancements, and it lacks support for advanced features including checksum, deduplication and compression, and encryption. For more information about vSAN disk format version, see KB 2148493.
Upgrading the On-disk Format for Hosts with Limited Capacity
During an upgrade of the vSAN on-disk format from version 1.0 or 2.0, a disk group evacuation is performed. The disk group is removed and upgraded to on-disk format version 17.0, and the disk group is added back to the cluster. For two-node or three-node clusters, or clusters without enough capacity to evacuate each disk group, select Allow Reduced Redundancy from the vSphere Client. You also can use the following RVC command to upgrade the on-disk format: vsan.ondisk_upgrade --allow-reduced-redundancy
When you allow reduced redundancy, your VMs are unprotected for the duration of the upgrade, because this method does not evacuate data to the other hosts in the cluster. It removes each disk group, upgrades the on-disk format, and adds the disk group back to the cluster. All objects remain available, but with reduced redundancy.
If you enable deduplication and compression during the upgrade to vSAN 8.0, you can select Allow Reduced Redundancy from the vSphere Client.
For information about maximum configuration limits for the vSAN 8.0 release, see the Configuration Maximums documentation.
vSAN Health cannot find VUM with proxy configured
When a proxy is configured for vSAN, the vsan-health service falsely reported that VMware Update Manager (VUM) is disabled or not installed.
This issue is fixed in this release.
RemoveFileShare task failure may cause vSAN File Services server failover
RemoveFileShare task for the NFS share may fail on the VCenter Server even though the share is deleted. This happens because the NFS server fails while removing the export. This does not cause any problems in the overall workflow as the share gets successfully deleted.
When the NFS server fails, it triggers vSAN File Services server failover. Since NFS server and SMB server fails over together, if there are any SMB shares exported from the same vSAN File Services server it causes SMB mount disruptions. SMB mount disruption due to server failover is a known behavior as vSAN does not support transparent failover for SMB servers.
Workaround: None.
hostAffinity policy option lost during upgrade
When you upgrade from vSAN 6.7 to vSAN 8.0, the vCenter Server hostaffinity option value is changed to false.
Workaround: Set the hostaffinity option back to true to continue using vSAN HostLocal policy for a normal VM.
Cannot upgrade cluster to vSAN Express Storage Architecture
You cannot upgrade or convert a cluster on vSAN Original Storage Architecture to vSAN Express Storage Architecture. vSAN ESA is supported only on new deployments.
Workaround: None.
Encryption deep rekey not supported on vSAN ESA
vSAN Express Storage Architecture does not support encryption deep rekey in this release.
Workaround: None.
vSAN File Service not supported on vSAN ESA
vSAN Express Storage Architecture does not support vSAN File Service in this release.
Workaround: None.
Cannot change encryption settings on vSAN ESA
Encryption can only be configured vSAN ESA during cluster creation. You cannot change the settings later.
Workaround: None.
vSAN File Service does not support NFSv4 delegations
vSAN File Service does not support NFSv4 delegations in this release.
Workaround: None.
In stretched cluster, file server with no affinity cannot rebalance
In the stretched cluster vSAN File Service environment, a file server with no affinity location configured cannot be rebalanced between Preferred ESXi hosts and Non-preferred ESXi hosts.
Workaround: Set the affinity location of the file server to Preferred or Non-Preferred by editing the file service domain configuration.
Kubernetes pods with CNS volumes cannot be created, deleted, or re-scheduled during vSAN stretched cluster partition
When a vSAN stretched cluster has a network partition between sites, an intermittent timing issue can cause volume information to be lost from the CNS. When volume metadata is not present in the CNS, you cannot create, delete, or re-schedule pods with CNS volumes. vSphere CSI Driver must access volume information from CNS to perform these operations.
When the network partition is fixed, CNS volume metadata is restored, and pods with CNS volumes can be created, deleted, or re-scheduled.
Workaround: None.
Shutdown Cluster wizard displays an error on HCI Mesh compute-ony cluster
The vSAN Shutdown Cluster wizard is designed for vSAN clusters that have a vSAN datastore and vSAN services. It does not support HCI Mesh compute-only clusters. If you use the wizard to shutdown a compute-only cluster, it displays the following error message:
Cannot retrieve the health service data.
Workaround: None. Do not use the vSAN Shutdown Cluster wizard on an HCI Mesh compute-only cluster.
Remediation of ESXi hosts in a vSphere Lifecycle Manager cluster with vSAN fails if vCenter services are deployed on custom ports
If vCenter Server services are deployed on custom ports in a cluster with vSAN, vSphere DRS, and vSphere HA, remediation of vSphere Lifecycle Manager clusters might fail. This problem is caused by a vSAN resource health check error. ESXi hosts cannot enter maintenance mode, which leads to failing remediation tasks.
Workaround: None.
When vSAN file service is enabled, DFC-related operations such as upgrade, enabling encryption or data-efficiency might fail
When file service is enabled, an agent VM runs on each host. The underlying vSAN object might be placed across multiple diskgroups. When the first diskgroup gets converted, the vSAN object becomes inaccessible and the agent VM is in an invalid state. If you try to delete the VM and redeploy a new VM, the operation fails due to the VM’s invalid state. The VM gets unregistered but the inaccessible object still exists there. When the next diskgroup gets converted, there is a precheck for inaccessible objects in the whole cluster. This check fails the DFC since it finds inaccessible objects of the old agent VM.
Workaround: Manually remove the inaccessible objects.
When such failure happens, you can see the DFC task failure.
Identify the inaccessible objects from the failure task fault information.
To ensure that the objects belong to the agent VM, inspect the hostd log file and confirm that the objects belong to the VM’s object layout.
Log in to the host and use the /usr/lib/vmware/osfs/bin/objtool
command to remove the objects manually.
Note: To prevent this problem, disable file service before performing any DFC-related operation.
esxcli vsan cluster leave
command fails to disable vSAN on an ESXi host
In some cases, the following command fails to disable vSAN on a member host: esxcli vsan cluster leave
You might see an error message similar to the following:
Failed to unmount default vSAN datastore. Unable to complete Sysinfo operation. Please see the VMKernel log file for more details.
Workaround: Perform the following steps in the vSphere Client to disable vSAN on a single member host:
Place the host into maintenance mode.
Move the host out of the vSAN cluster, and into its parent data center.
vSAN service on the host is disabled automatically during the movement.
Cannot extract host profile on a vSAN HCI mesh compute-only host
vSAN host profile plugin does not support vSAN HCI mesh compute-only hosts. If you try to extract the host profile on an HCI mesh compute-only host, the attempt fails.
Workaround: None.
Deleting files in a file share might not be reflected in vSAN capacity view
The allocated blocks might not be returned back to the vSAN storage instantly after all the files are deleted and hence it would take some time before the reclaimed storage capacity to be updated in vSAN capacity view. When new data is written to the same file share, these deleted blocks might get reused prior to returning them to vSAN storage.
If unmap is enabled and vSAN deduplication is disabled, the space may not be freed back to vSAN unless 4MB aligned space are freed in VDFS. If unmap is enabled and vSAN deduplication is enabled, space freed by VDFS will be freed back to vSAN with a delay.
Workaround: To release the storage back to vSAN immediately, delete the file shares.
vSAN over RDMA might experience lower performance due to network congestion
RDMA requires lossless network infrastructure that is free of congestion. If your network has congestion, certain large I/O workloads might experience lower performance than TCP.
Workaround: Address any network congestion issues following OEM best practices for RDMA.
vCenter VM crash on stretched cluster with data-in-transit encryption
vCenter VM might crash on a vSAN stretched cluster if the vCenter VM is on vSAN with data-in-transit encryption enabled. When all hosts in one site are down and then power on again, the vCenter VM might crash after the failed site returns to service.
Workaround: Use the following script to resolve this problem: thumbPrintRepair.py
VM migration from VMFS datastore or vSAN datastore to vSAN datastore fails
When you have Content Based Read Cache (CBRC) enabled, sVmotion or xVmotion might fail to migrate a VM that has one or more snapshots to the vSAN datastore. You might see the following error message: The operation is not supported on the object.
The following messages appear in /var/log/vmware/vpxd
/2021-01-31T17:12:27.477Z error vpxd[18588] [Originator@6876 sub=vpxLro opID=65ef3b53-01] [VpxLRO] Unexpected Exception: N5Vmomi5Fault12NotSupported9ExceptionE(Message is: The operation is not supported on the object.,
--> Fault cause: vmodl.fault.NotSupported
--> Fault Messages are:
--> (null)
--> )
-->
Workaround: Consolidate snapshots, or delete all snapshots before migration.
vSAN allows a VM to be provisioned across local and remote datastores
vSphere does not prevent users from provisioning a VM across local and remote datastores in an HCI Mesh environment. For example, you can provision one VMDK on the local vSAN datastore and one VMDK on remote vSAN datastore. This is not supported because vSphere HA is not supported with this configuration.
Workaround: Do not provision a VM across local and remote datastores.
The object reformatting task is not progressing
If object reformatting is needed after an upgrade, a health alert is triggered, and vSAN begins reformatting. vSAN performs this task in batches, and it depends on the amount of transient capacity available in the cluster. When the transient capacity exceeds the maximum limit, vSAN waits for the transient capacity to be freed before proceeding with the reformatting. During this phase, the task might appear to be halted. The health alert will clear and the task will progress when transient capacity is available.
Workaround: None. The task is working as expected.
System VMs cannot be powered-off
With the release of vSphere Cluster Services (vCLS) in vSphere 7.0 Update 1, a set of system VMs might be placed within the vSAN cluster. These system VMs cannot be powered-off by users. This issue can impact some vSAN workflows, which are documented in the following article: https://kb.vmware.com/s/article/80877
Workaround: For more information about this issue, refer to this KB article: https://kb.vmware.com/s/article/80483.
vSAN File Service cannot be enabled due to an old vSAN on-disk format version
vSAN File Service cannot be enabled with the vSAN on-disk format version earlier than 11.0 (this is the on-disk format version in vSAN 7.0).
Workaround: Upgrade the vSAN disk format version before enabling File Service.
Remediate cluster task might fail in large scale cluster due to vSAN health network test issues
Large scale clusters with more than 16 hosts, intermittent ping failures can occur during host upgrade. These failures can interrupt host remediation in vSphere Life Cycle Manager.
Workaround: After remediation pre-check passes, silence alerts for the following vSAN health tests:
vSAN: Basic (unicast) connectivity check
vSAN: MTU check (ping with large packet size)
When the remediation task is complete, restore alerts for the vSAN health tests.
Host failure in hot-plug scenario when drive is reinserted
During a hot drive removal, VMware native NVMe hot-plug can cause a host failure if the NVMe drive is pulled and reinserted within one minute. This is applicable to both vSphere and vSAN for any new or existing drive reinsertion.
Workaround: After removing a hot drive, wait for one minute before you reinsert the new or existing drive.
Cannot place last host in a cluster into maintenance mode, or remove a disk or disk group
Operations in Full data migration or Ensure accessibility mode might fail without providing guidance to add a new resource, when there is only one host left in the cluster and that host enters maintenance mode. This can also happen when there is only one disk or disk group left in the cluster and that disk or disk group is to be removed.
Workaround: Before you place the last remaining host in the cluster into maintenance mode with Full data migration or Ensure accessibility mode selected, add another host with the same configuration to the cluster. Before you remove the last remaining disk or disk group in the cluster, add a new disk or disk group with the same configuration and capacity.
Object reconfiguration workflows might fail due to the lack of capacity if one or more disks or disk groups are almost full
vSAN resyncs get paused when the disks in non-deduplication clusters or disk groups in deduplication clusters reach a configurable resync pause fullness threshold. This is to avoid filling up the disks with resync I/O. If the disks reach this threshold, vSAN stops reconfiguration workflows, such as EMM, repairs, rebalance, and policy change.
Workaround: If space is available elsewhere in the cluster, rebalancing the cluster frees up space on the other disks, so that subsequent reconfiguration attempts succeed.
After recovery from cluster full, VMs can lose HA protection
In a vSAN cluster that has hosts with disks 100% full, the VMs might have a question pending and hence lose the HA protection. Also, the VMs that had a pending question are not HA protected after recovering from cluster full scenario.
Workaround: After recovering from vSAN cluster full scenario, perform one of the following actions:
Disable and re-enable HA.
Reconfigure HA.
Power off and power on the VMs.
Power Off VMs fails with Question Pending
If a VM has a pending question, you are not allowed to do any VM-related operations until the question is answered.
Workaround: Try to free the disk space on the relevant volume, and then click Retry.
When the cluster is full, the IP addresses of VMs either change to IPV6 or become unavailable
When a vSAN cluster is full with one or more disk groups reaching 100%, there can be a VM pending question that requires user action. If the question is not answered and if the cluster full condition is left unattended, the IP addresses VMs might change to IPv6 or become unavailable. This prevents you from using SSH to access the VMs. It also prevents you from using the VM console, because the console goes blank after you type root
.
Workaround: None.
Unable to remove a dedupe enabled disk group after a capacity disk enters PDL state
When a capacity disk in a dedupe-enabled disk group is removed, or its unique ID changes, or when the device experiences an unrecoverable hardware error, it enters Permanent Device Loss (PDL) state. If you try to remove the disk group, you might see an error message informing you that the action cannot be completed.
Workaround: Whenever a capacity disk is removed, or its unique ID changes, or when the device experiences an unrecoverable hardware error, wait for a few minutes before trying to remove the disk group.
vSAN health indicates non-availability related incompliance with failed pending policy
A policy change request leaves the object health status of vSAN in a non-availability related incompliance state. This is because there might be other scheduled work that is utilizing the requested resources. However, vSAN reschedules this policy request automatically as resources become available.
Workaround: The vSAN period scan fixes this issue automatically in most cases. However, other work in progress might use up available resources even after the policy change was accepted but not applied. You can add more capacity if the capacity reporting displays a high value.
In deduplication clusters, reactive rebalancing might not happen when the disks show more than 80% full
In deduplication clusters, when the disks display more than 80% full on the dashboard, the reactive rebalancing might not start as expected. This is because in deduplication clusters, pending writes and deletes are also considered for calculating the free capacity.
Workaround: None.
TRIM/UNMAP commands from Guest OS fail
If the Guest OS attempts to perform space reclamation during online snapshot consolidation, the TRIM/UNMAP commands fail. This failure keeps space from being reclaimed.
Workaround: Try to reclaim the space after the online snapshot operation is complete. If subsequent TRIM/UNMAP operations fail, remount the disk.
Space reclamation from SCSI TRIM/UNMAP is lost when online snapshot consolidation is performed
Space reclamation achieved from SCSI TRIM/UNMAP commands is lost when you perform online snapshot consolidation. Offline snapshot consolidation does not affect SCSI unmap operation.
Workaround: Reclaim the space after online snapshot consolidation is complete.
Host failure when converting data host into witness host
When you convert a vSAN cluster into a stretched cluster, you must provide a witness host. You can convert a data host into the witness host, but you must use maintenance mode with Full data migration during the process. If you place the host into maintenance mode with Ensure accessibility option, and then configure it as the witness host, the host might fail with a purple diagnostic screen.
Workaround: Remove the disk group on the witness host and then re-create the disk group.
Duplicate VM with the same name in vCenter Server when residing host fails during datastore migration
If a VM is undergoing storage vMotion from vSAN to another datastore, such as NFS, and the host on which it resides encounters a failure on the vSAN network, causing HA failover of the VM, the VM might be duplicated in the vCenter Server.
Workaround: Power off the invalid VM and unregister it from the vCenter Server.
Reconfiguring an existing stretched cluster under a new vCenter Server causes vSAN to issue a health check warning
When rebuilding a current stretched cluster under a new vCenter Server, the vSAN cluster health check is red. The following message appears: vSphere cluster members match vSAN cluster members
Workaround: Use the following procedure to configure the stretched cluster.
Use SSH to log in to the witness host.
Decommission the disks on witness host. Run the following command: esxcli vsan storage remove -s "SSD UUID"
Force the witness host to leave the cluster. Run the following command: esxcli vsan cluster leave
Reconfigure the stretched cluster from the new vCenter Server (Configure > vSAN > Fault Domains & Stretched Cluster).
Disk format upgrade fails while vSAN resynchronizes large objects
If the vSAN cluster contains very large objects, the disk format upgrade might fail while the object is resynchronized. You might see the following error message: Failed to convert object(s) on vSAN
vSAN cannot perform the upgrade until the object is resynchronized. You can check the status of the resynchronization (Monitor > vSAN > Resyncing Components) to verify when the process is complete.
Workaround: Wait until no resynchronization is pending, then retry the disk format upgrade.
vSAN stretched cluster configuration lost after you disable vSAN on a cluster
If you disable vSAN on a stretched cluster, the stretched cluster configuration is not retained. The stretched cluster, witness host, and fault domain configuration is lost.
Workaround: Reconfigure the stretched cluster parameters when you re-enable the vSAN cluster.
Powered off VMs appear as inaccessible during witness host replacement
When you change a witness host in a stretched cluster, VMs that are powered off appear as inaccessible in the vSphere Web Client for a brief time. After the process is complete, powered off VMs appear as accessible. All running VMs appear as accessible throughout the process.
Workaround: None.
Cannot place hosts in maintenance mode if they have faulty boot media
vSAN cannot place hosts with faulty boot media into maintenance mode. The task to enter maintenance mode might fail with an internal vSAN error, due to the inability to save configuration changes. You might see log events similar to the following: Lost Connectivity to the device xxx backing the boot filesystem
Workaround: Remove disk groups manually from each host, using the Full data evacuation option. Then place the host in maintenance mode.
After stretched cluster failover, VMs on the preferred site register alert: Failed to failover
If the secondary site in a stretched cluster fails, VMs failover to the preferred site. VMs already on the preferred site might register the following alert: Failed to failover.
Workaround: Ignore this alert. It does not impact the behavior of the failover.
During network partition, components in the active site appear to be absent
During a network partition in a vSAN two-host or stretched cluster, the vSphere Web Client might display a view of the cluster from the perspective of the non-active site. You might see active components in the primary site displayed as absent.
Workaround: Use RVC commands to query the state of objects in the cluster. For example: vsan.vm_object_info
Some objects are non-compliant after force repair
After you perform a force repair, some objects might not be repaired because the ownership of the objects was transferred to a different node during the process. The force repair might be delayed for those objects.
Workaround: Attempt the force repair operation after all other objects are repaired and resynchronized. You can wait until vSAN repairs the objects.
When you move a host from one encrypted cluster to another, and then back to the original cluster, the task fails
When you move a host from an encrypted vSAN cluster to another encrypted vSAN cluster, then move the host back to the original encrypted cluster, the task might fail. You might see the following message: A general system error occurred: Invalid fault
. This error occurs because vSAN cannot re-encrypt data on the host using the original encryption key. After a short time, vCenter Server restores the original key on the host, and all unmounted disks in the vSAN cluster are mounted.
Workaround: Reboot the host and wait for all disks to get mounted.
Stretched cluster imbalance after a site recovers
When you recover a failed site in a stretched cluster, sometimes hosts in the failed site are brought back sequentially over a long period of time. vSAN might overuse some hosts when it begins repairing the absent components.
Workaround: Recover all of the hosts in a failed site together within a short time window.
VM operations fail due to HA issue with stretched clusters
Under certain failure scenarios in stretched clusters, certain VM operations such as vMotions or powering on a VM might be impacted. These failures scenarios include a partial or a complete site failure, or the failure of the high speed network between the sites. This problem is caused by the dependency on VMware HA being available for normal operation of stretched cluster sites.
Workaround: Disable vSphere HA before performing vMotion, VM creation, or powering on VMs. Then re-enable vSphere HA.
Cannot perform deep rekey if a disk group is unmounted
Before vSAN performs a deep rekey, it performs a shallow rekey. The shallow rekey fails if an unmounted disk group is present. The deep rekey process cannot begin.
Workaround: Remount or remove the unmounted disk group.
Log entries state that firewall configuration has changed
A new firewall entry appears in the security profile when vSAN encryption is enabled: vsanEncryption. This rule controls how hosts communicate directly to the KMS. When it is triggered, log entries are added to /var/log/vobd.log
. You might see the following messages:
Firewall configuration has changed. Operation 'addIP4' for rule set vsanEncryption succeeded.
Firewall configuration has changed. Operation 'removeIP4' for rule set vsanEncryption succeeded.
These messages can be ignored.
Workaround: None.
HA failover does not occur after setting Traffic Type option on a vmknic to support witness traffic
If you set the traffic type option on a vmknic to support witness traffic, vSphere HA does not automatically discover the new setting. You must manually disable and then re-enable HA so it can discover the vmknic. If you configure the vmknic and the vSAN cluster first, and then enable HA on the cluster, it does discover the vmknic.
Workaround: Manually disable vSphere HA on the cluster, and then re-enable it.
iSCSI MCS is not supported
vSAN iSCSI target service does not support Multiple Connections per Session (MCS).
Workaround: None.
Any iSCSI initiator can discover iSCSI targets
vSAN iSCSI target service allows any initiator on the network to discover iSCSI targets.
Workaround: You can isolate your ESXi hosts from iSCSI initiators by placing them on separate VLANs.
After resolving network partition, some VM operations on linked clone VMs might fail
Some VM operations on linked clone VMs that are not producing I/O inside the guest operating system might fail. The operations that might fail include taking snapshots and suspending the VMs. This problem can occur after a network partition is resolved, if the parent base VM's namespace is not yet accessible. When the parent VM's namespace becomes accessible, HA is not notified to power on the VM.
Workaround: Power cycle VMs that are not actively running I/O operations.
Cannot place a witness host in Maintenance Mode
When you attempt to place a witness host in Maintenance Mode, the host remains in the current state and you see the following notification: A specified parameter was not correct.
Workaround: When placing a witness host in Maintenance Mode, choose the No data migration option.
Moving the witness host into and then out of a stretched cluster leaves the cluster in a misconfigured state
If you place the witness host in a vSAN-enabled vCenter cluster, an alarm notifies you that the witness host cannot reside in the cluster. But if you move the witness host out of the cluster, the cluster remains in a misconfigured state.
Workaround: Move the witness host out of the vSAN stretched cluster, and reconfigure the stretched cluster. For more information, see this article: https://kb.vmware.com/s/article/2130587.
When a network partition occurs in a cluster which has an HA heartbeat datastore, VMs are not restarted on the other data site
When the preferred or secondary site in a vSAN cluster loses its network connection to the other sites, VMs running on the site that loses network connectivity are not restarted on the other data site, and the following error might appear: vSphere HA virtual machine HA failover failed
.
This is expected behavior for vSAN clusters.
Workaround: Do not select HA heartbeat datastore while configuring vSphere HA on the cluster.
Unmounted vSAN disks and disk groups displayed as mounted in the vSphere Web Client Operational Status field
After the vSAN disks or disk groups are unmounted by either running the esxcli vsan storage disk group unmount
command or by the vSAN Device Monitor service when disks show persistently high latencies, the vSphere Web Client incorrectly displays the Operational Status field as mounted.
Workaround: Use the Health field to verify disk status, instead of the Operational Status field.