These release notes introduce you to new features in VMware vSAN 8.0 Update 1 and provide information on resolved and known issues.
These release notes introduce you to new features in VMware vSAN 8.0 Update 1 and provide information on resolved and known issues.
vSAN 8.0 Update 1 introduces the following new features and enhancements:
Disaggregated Storage
Disaggregation with vSAN Express Storage Architecture. vSAN 8.0 Update 1 provides disaggregation support for vSAN Express Storage Architecture (ESA), as it is supported with vSAN Original Storage Architecture (OSA). You can mount remote vSAN datastores that reside in other vSAN ESA server clusters. You also can use an ESA cluster as the external storage resource for a compute-only cluster. All capabilities and limits that apply to disaggregation support for vSAN OSA also apply to vSAN ESA. vSAN ESA client clusters can connect only to a vSAN ESA based server cluster.
Disaggregation for vSAN stretched clusters (vSAN OSA). This release supports vSAN stretched clusters in disaggregated topology. In addition to supporting several stretched cluster configurations, vSAN can optimize network paths for certain topologies to improve stretched cluster performance.
Disaggregation across clusters using multiple vCenter Servers (vSAN OSA). vSAN 8.0 Update 1 introduces support for vSAN OSA disaggregation across environments using multiple vCenter Servers. This enables clusters managed by one vCenter Server to use storage resources that reside on a vSAN cluster managed by a different vCenter Server.
Optimized Performance, Durability, and Flexibility
Improved performance with new Adaptive Write Path. vSAN ESA introduces a new adaptive write path that dynamically optimizes guest workloads tht issue large streaming writes, resulting in higher throughput and lower latency with no additional complexity.
Optimized I/O processing for single VMDK/objects (vSAN ESA). vSAN ESA has optimized the I/O processing that occurs for each object that reside on a vSAN datastore, increasing the performance of VMs with a significant amount of virtual hardware storage resources.
Enhanced durability in maintenance mode scenarios. When a vSAN ESA cluster enters maintenance mode (EMM) with Ensure Accessibility (applies to RAID 5/6 Erasure Coding), vSAN can write all incremental updates to another host in addition to the hosts holding the data. This helps ensure the durability of the changed data if additional hosts fail while the original host is still in maintenance mode.
Increased administrative storage capacity on vSAN datastores using customizable namespace objects. You can customize the size of namespace objects that enable administrators to store ISO files, VMware content library, or other infrastructure support files on a vSAN datastore.
Witness appliance certification. In vSAN 8.0 Update 1, the software acceptance level for vSAN witness appliance has changed to Partner Supported. All vSphere Installation Bundles (VIBs) must be certified.
Simplified Management
Auto-policy management for the default storage policy (vSAN ESA). vSAN ESA introduces auto-policy management, an optional feature that creates and assigns a default storage policy designed for the cluster. Based on the size and type of cluster, auto-policy management selects the ideal level of failure to tolerate and data placement scheme. Skyline health uses this data to monitor and alert you if the default storage policy is ideal or sub-optimal, and guides you to adjust the default policy based on the cluster characteristics. Skyline health actively monitors the cluster as its size changes, and provides new recommendations as needed.
Skyline health intelligent cluster health scoring, diagnostics and remediation. Improve efficiency by using the cluster health status and troubleshooting dashboard that prioritizes identified issues, enabling you to focus and take action on the most important issues.
High resolution performance monitoring in vSAN performance service. vSAN performance service provides real-time monitoring of performance metrics that collects and renders metrics every 30 seconds, making monitoring and troubleshooting more meaningful. VMware snapshot APIs are unchanged. VMware VADP supports all vSAN ESA native snapshot operations on the vSphere platform.
VM I/O trip analyzer task scheduling. VM I/O trip analyzer can schedule based on time-of-day, for a particular duration and frequency to capture details for repeat-offender VMs. The diagnostics data collected are available for analysis in the VM I/O trip analyzer interface in vCenter.
PowerCLI enhancements. PowerCLI supports the following new capabilities:
vSAN ESA disaggregation
vSAN OSA disaggregation for stretched clusters
vSAN OSA disaggregation across multiple vCenter Servers
vSAN cluster shutdown
Object format updates and custom namespace objects
Cloud Native Storage
Cloud Native Support for TKGs and supervisor clusters (vSAN ESA). Containers powered by vSphere and vSAN can consume persistent storage for developers and administrators, and use the improved performance and efficiency for their cloud native workloads.
Data Persistence platform support using common vSphere switching. vSAN Data Persistence platform allows third-party ISVs to build solutions, such as S3-compatible object stores, that run natively on vSAN. vDPp is now compatible with VMware vSphere Distributed Switches, reducing the cost and complexity of these solutions.
Thick provisioning for persistent volumes using SPBM on VMFS datastores (VMware vSAN Direct Configuration). Persistent volumes can be programmatically provisioned as thick when defined in the storage class that is mapped to a storage policy.
Use the vSAN Community Web site to provide feedback and request assistance with any problems you find while using vSAN.
For instructions about upgrading vSAN, see the VMware vSAN 8.0 Update 1 documentation.
Note: Before performing the upgrade, please review the most recent version of the VMware Compatibility Guide to validate that the latest vSAN version is available for your platform.
Note: vSAN Express Storage Architecture is available only for new deployments. You cannot upgrade a cluster to vSAN ESA.
vSAN 8.0 Update 1 is a new release that requires a full upgrade to vSphere 8.0 Update 1. Perform the following tasks to complete the upgrade:
Upgrade to vCenter Server 8.0 Update 1. For more information, see the VMware vSphere 8.0 Update 1 Release Notes.
Upgrade hosts to ESXi 8.0 Update 1. For more information, see the VMware vSphere 8.0 Update 1 Release Notes.
Upgrade the vSAN on-disk format to version 18.0. If upgrading from on-disk format version 3.0 or later, no data evacuation is required (metadata update only).
Upgrade FSVM to enable new File Service features and get all the latest updates.
Note: vSAN retired disk format version 1.0 in vSAN 7.0 Update 1. Disks running disk format version 1.0 are no longer recognized by vSAN. vSAN will block upgrade through vSphere Update Manager, ISO install, or esxcli to vSAN 7.0 Update 1. To avoid these issues, upgrade disks running disk format version 1.0 to a higher version. If you have disks on version 1.0, a health check alerts you to upgrade the disk format version.
Disk format version 1.0 does not have performance and snapshot enhancements, and it lacks support for advanced features including checksum, deduplication and compression, and encryption. For more information about vSAN disk format version, see KB 2148493.
Upgrading the On-disk Format for Hosts with Limited Capacity
During an upgrade of the vSAN on-disk format from version 1.0 or 2.0, a disk group evacuation is performed. The disk group is removed and upgraded to on-disk format version 17.0, and the disk group is added back to the cluster. For two-node or three-node clusters, or clusters without enough capacity to evacuate each disk group, select Allow Reduced Redundancy from the vSphere Client. You also can use the following RVC command to upgrade the on-disk format: vsan.ondisk_upgrade --allow-reduced-redundancy
When you allow reduced redundancy, your VMs are unprotected for the duration of the upgrade, because this method does not evacuate data to the other hosts in the cluster. It removes each disk group, upgrades the on-disk format, and adds the disk group back to the cluster. All objects remain available, but with reduced redundancy.
If you enable deduplication and compression during the upgrade, you can select Allow Reduced Redundancy from the vSphere Client.
For information about maximum configuration limits for the vSAN 8.0 Update 1 release, see the Configuration Maximums documentation.
Snapshots on vSAN ESA HCI Mesh client cluster not supported
Certain snapshot operations on VMs deployed on a HCI Mesh client cluster over vSAN ESA server cluster might fail under specific conditions. Do not use snapshots in vSAN ESA client cluster or migrate VMs with snapshots to a vSAN ESA client cluster.
Workaround: None.
Remote datastore on vSAN ESA compute client cluster does not show valid capacity
This issue affects compute-only client clusters that mount from a vSAN ESA server cluster. When you mount a remote datastore, the datastore capacity value shown on the host and the vSphere client does not match the actual value. Aside from the reporting issue, there is no known impact on VM operations.
Workaround: None.
Cannot enable File Service if vCenter Server internet connectivity is disabled
If you disable vCenter Server internet connectivity, the Enable File Service dialog does not display File service agent section and you cannot select OVF.
Workaround: To enable vCenter Server internet connectivity:
Navigate to Cluster > Configure > vSAN > Internet Connectivity.
Click Edit to open Edit Internet Connectivity dialog.
Select Enable Internet access for all vSAN clusters checkbox and click Apply.
KMS connection health checks not available when KMS is offline
This issue affects vSAN health checks for clusters with data-at-rest encryption. When the KMS is offline, the following health check might not be available: VMware vCenter and all hosts are connected to Key Management Servers. If this issue occurs, you cannot see warnings or errors that indicate the offline status of the KMS.
Workaround: None
Mount remote datastore from a stretched server cluster fails with message: Site affinity provided in server cluster configuration are not present
Mounting a remote datastore from a stretched server cluster might fail under the following conditions:
The client vSAN cluster already has another datastore from a different stretched server cluster.
The stretched server clusters have different fault domain names.
The client has asymmetric network topology to both the server clusters.
The following message is displayed: Site affinity provided in server cluster configuration are not present in server cluster fault domains
Workaround: Rename the fault domains on both server clusters to match, and retry the operation.
Sequential workload performance improvements not enabled
Some performance improvements for sequential workloads cannot take effect until the vSAN object moves from the host or the host is rebooted. You must manually abdicate the DOM owner of all vSAN objects to enable performance improvements.
Workaround: After you upgrade from vSAN 8.0 to 8.0 Update 1, use the following command to manually abdicate the DOM owner of all vSAN objects:
vsish -e set /vmkModules/vsan/dom/ownerAbdicateAll 1
Adding host back to cluster fails with the following message: A general system error occurred: Too many outstanding requests
vSAN module unload operation can timeout while waiting for control device references. If this happens, an attempt to move the host out of the cluster fails with the following message: Operation timed out
Any further attempts to move the host back to the cluster fail with the following message: A general system error occurred: Too many outstanding requests
Workaround: Reboot the host before adding it back to the cluster.
Virtual machine snapshot fails after extending virtual disk size in vSAN ESA
This issue affects any virtual machine that has CBRC enabled in a vSAN ESA cluster. If you extend size of the VM's virtual disks, taking a virtual machine snapshot fails.
Workaround: Perform the following steps to take a VM snapshot after you extend the size of a VM's virtual disks.
Power off the virtual machine and disable CBRC to all disks through API.
Take the virtual machine snapshot.
Reenable CBRC and power on the virtual machine.
Linked clone VMs migrated to vSAN ESA creates snapshot for linked clone vsanSpase disk
When migrating VMs from VMFS/NFS/vSAN OSA datastore to vSAN ESA datastore, vSAN cannot distinguish between a snapshot vsanSparse disk and a linked clone vsanSparse disk. Since vSAN ESA supports native snapshot, a native snapshot disk is created. If you migrate multiple VMs with moveAllDiskBackingsAndAllowSharing option, each VM attempts to create a native snapshot of a base disk and run I/O on that object. Only the last VM can run I/O, other VMs will fail.
Workaround: To avoid this issue, do not use moveAllDiskBackingsAndAllowSharing option when migrating linked VMs to a vSAN ESA cluster.
hostAffinity policy option lost during upgrade
When you upgrade from vSAN 6.7 to vSAN 8.0, the vCenter Server hostaffinity option value is changed to false.
Workaround: Set the hostaffinity option back to true to continue using vSAN HostLocal policy for a normal VM.
Cannot upgrade cluster to vSAN Express Storage Architecture
You cannot upgrade or convert a cluster on vSAN Original Storage Architecture to vSAN Express Storage Architecture. vSAN ESA is supported only on new deployments.
Workaround: None.
Encryption deep rekey not supported on vSAN ESA
vSAN Express Storage Architecture does not support encryption deep rekey in this release.
Workaround: None.
vSAN File Service not supported on vSAN ESA
vSAN Express Storage Architecture does not support vSAN File Service in this release.
Workaround: None.
Cannot change encryption settings on vSAN ESA
Encryption can only be configured vSAN ESA during cluster creation. You cannot change the settings later.
Workaround: None.
vSAN File Service does not support NFSv4 delegations
vSAN File Service does not support NFSv4 delegations in this release.
Workaround: None.
In stretched cluster, file server with no affinity cannot rebalance
In the stretched cluster vSAN File Service environment, a file server with no affinity location configured cannot be rebalanced between Preferred ESXi hosts and Non-preferred ESXi hosts.
Workaround: Set the affinity location of the file server to Preferred or Non-Preferred by editing the file service domain configuration.
Kubernetes pods with CNS volumes cannot be created, deleted, or re-scheduled during vSAN stretched cluster partition
When a vSAN stretched cluster has a network partition between sites, an intermittent timing issue can cause volume information to be lost from the CNS. When volume metadata is not present in the CNS, you cannot create, delete, or re-schedule pods with CNS volumes. vSphere CSI Driver must access volume information from CNS to perform these operations.
When the network partition is fixed, CNS volume metadata is restored, and pods with CNS volumes can be created, deleted, or re-scheduled.
Workaround: None.
Shutdown Cluster wizard displays an error on HCI Mesh compute-ony cluster
The vSAN Shutdown Cluster wizard is designed for vSAN clusters that have a vSAN datastore and vSAN services. It does not support HCI Mesh compute-only clusters. If you use the wizard to shutdown a compute-only cluster, it displays the following error message:
Cannot retrieve the health service data.
Workaround: None. Do not use the vSAN Shutdown Cluster wizard on an HCI Mesh compute-only cluster.
Remediation of ESXi hosts in a vSphere Lifecycle Manager cluster with vSAN fails if vCenter services are deployed on custom ports
If vCenter Server services are deployed on custom ports in a cluster with vSAN, vSphere DRS, and vSphere HA, remediation of vSphere Lifecycle Manager clusters might fail. This problem is caused by a vSAN resource health check error. ESXi hosts cannot enter maintenance mode, which leads to failing remediation tasks.
Workaround: None.
When vSAN file service is enabled, DFC-related operations such as upgrade, enabling encryption or data-efficiency might fail
When file service is enabled, an agent VM runs on each host. The underlying vSAN object might be placed across multiple diskgroups. When the first diskgroup gets converted, the vSAN object becomes inaccessible and the agent VM is in an invalid state. If you try to delete the VM and redeploy a new VM, the operation fails due to the VM’s invalid state. The VM gets unregistered but the inaccessible object still exists there. When the next diskgroup gets converted, there is a precheck for inaccessible objects in the whole cluster. This check fails the DFC since it finds inaccessible objects of the old agent VM.
Workaround: Manually remove the inaccessible objects.
When such failure happens, you can see the DFC task failure.
Identify the inaccessible objects from the failure task fault information.
To ensure that the objects belong to the agent VM, inspect the hostd log file and confirm that the objects belong to the VM’s object layout.
Log in to the host and use the /usr/lib/vmware/osfs/bin/objtool
command to remove the objects manually.
Note: To prevent this problem, disable file service before performing any DFC-related operation.
esxcli vsan cluster leave
command fails to disable vSAN on an ESXi host
In some cases, the following command fails to disable vSAN on a member host: esxcli vsan cluster leave
You might see an error message similar to the following:
Failed to unmount default vSAN datastore. Unable to complete Sysinfo operation. Please see the VMKernel log file for more details.
Workaround: Perform the following steps in the vSphere Client to disable vSAN on a single member host:
Place the host into maintenance mode.
Move the host out of the vSAN cluster, and into its parent data center.
vSAN service on the host is disabled automatically during the movement.
Cannot extract host profile on a vSAN HCI mesh compute-only host
vSAN host profile plugin does not support vSAN HCI mesh compute-only hosts. If you try to extract the host profile on an HCI mesh compute-only host, the attempt fails.
Workaround: None.
Deleting files in a file share might not be reflected in vSAN capacity view
The allocated blocks might not be returned back to the vSAN storage instantly after all the files are deleted and hence it would take some time before the reclaimed storage capacity to be updated in vSAN capacity view. When new data is written to the same file share, these deleted blocks might get reused prior to returning them to vSAN storage.
If unmap is enabled and vSAN deduplication is disabled, the space may not be freed back to vSAN unless 4MB aligned space are freed in VDFS. If unmap is enabled and vSAN deduplication is enabled, space freed by VDFS will be freed back to vSAN with a delay.
Workaround: To release the storage back to vSAN immediately, delete the file shares.
vSAN over RDMA might experience lower performance due to network congestion
RDMA requires lossless network infrastructure that is free of congestion. If your network has congestion, certain large I/O workloads might experience lower performance than TCP.
Workaround: Address any network congestion issues following OEM best practices for RDMA.
vCenter VM crash on stretched cluster with data-in-transit encryption
vCenter VM might crash on a vSAN stretched cluster if the vCenter VM is on vSAN with data-in-transit encryption enabled. When all hosts in one site are down and then power on again, the vCenter VM might crash after the failed site returns to service.
Workaround: Use the following script to resolve this problem: thumbPrintRepair.py
vSAN allows a VM to be provisioned across local and remote datastores
vSphere does not prevent users from provisioning a VM across local and remote datastores in an HCI Mesh environment. For example, you can provision one VMDK on the local vSAN datastore and one VMDK on remote vSAN datastore. This is not supported because vSphere HA is not supported with this configuration.
Workaround: Do not provision a VM across local and remote datastores.
The object reformatting task is not progressing
If object reformatting is needed after an upgrade, a health alert is triggered, and vSAN begins reformatting. vSAN performs this task in batches, and it depends on the amount of transient capacity available in the cluster. When the transient capacity exceeds the maximum limit, vSAN waits for the transient capacity to be freed before proceeding with the reformatting. During this phase, the task might appear to be halted. The health alert will clear and the task will progress when transient capacity is available.
Workaround: None. The task is working as expected.
System VMs cannot be powered-off
With the release of vSphere Cluster Services (vCLS) in vSphere 7.0 Update 1, a set of system VMs might be placed within the vSAN cluster. These system VMs cannot be powered-off by users. This issue can impact some vSAN workflows, which are documented in the following article: https://kb.vmware.com/s/article/80877
Workaround: For more information about this issue, refer to this KB article: https://kb.vmware.com/s/article/80483.
vSAN File Service cannot be enabled due to an old vSAN on-disk format version
vSAN File Service cannot be enabled with the vSAN on-disk format version earlier than 11.0 (this is the on-disk format version in vSAN 7.0).
Workaround: Upgrade the vSAN disk format version before enabling File Service.
Remediate cluster task might fail in large scale cluster due to vSAN health network test issues
Large scale clusters with more than 16 hosts, intermittent ping failures can occur during host upgrade. These failures can interrupt host remediation in vSphere Life Cycle Manager.
Workaround: After remediation pre-check passes, silence alerts for the following vSAN health tests:
vSAN: Basic (unicast) connectivity check
vSAN: MTU check (ping with large packet size)
When the remediation task is complete, restore alerts for the vSAN health tests.
Host failure in hot-plug scenario when drive is reinserted
During a hot drive removal, VMware native NVMe hot-plug can cause a host failure if the NVMe drive is pulled and reinserted within one minute. This is applicable to both vSphere and vSAN for any new or existing drive reinsertion.
Workaround: After removing a hot drive, wait for one minute before you reinsert the new or existing drive.
Cannot place last host in a cluster into maintenance mode, or remove a disk or disk group
Operations in Full data migration or Ensure accessibility mode might fail without providing guidance to add a new resource, when there is only one host left in the cluster and that host enters maintenance mode. This can also happen when there is only one disk or disk group left in the cluster and that disk or disk group is to be removed.
Workaround: Before you place the last remaining host in the cluster into maintenance mode with Full data migration or Ensure accessibility mode selected, add another host with the same configuration to the cluster. Before you remove the last remaining disk or disk group in the cluster, add a new disk or disk group with the same configuration and capacity.
Object reconfiguration workflows might fail due to the lack of capacity if one or more disks or disk groups are almost full
vSAN resyncs get paused when the disks in non-deduplication clusters or disk groups in deduplication clusters reach a configurable resync pause fullness threshold. This is to avoid filling up the disks with resync I/O. If the disks reach this threshold, vSAN stops reconfiguration workflows, such as EMM, repairs, rebalance, and policy change.
Workaround: If space is available elsewhere in the cluster, rebalancing the cluster frees up space on the other disks, so that subsequent reconfiguration attempts succeed.
After recovery from cluster full, VMs can lose HA protection
In a vSAN cluster that has hosts with disks 100% full, the VMs might have a question pending and hence lose the HA protection. Also, the VMs that had a pending question are not HA protected after recovering from cluster full scenario.
Workaround: After recovering from vSAN cluster full scenario, perform one of the following actions:
Disable and re-enable HA.
Reconfigure HA.
Power off and power on the VMs.
Power Off VMs fails with Question Pending
If a VM has a pending question, you are not allowed to do any VM-related operations until the question is answered.
Workaround: Try to free the disk space on the relevant volume, and then click Retry.
When the cluster is full, the IP addresses of VMs either change to IPV6 or become unavailable
When a vSAN cluster is full with one or more disk groups reaching 100%, there can be a VM pending question that requires user action. If the question is not answered and if the cluster full condition is left unattended, the IP addresses VMs might change to IPv6 or become unavailable. This prevents you from using SSH to access the VMs. It also prevents you from using the VM console, because the console goes blank after you type root
.
Workaround: None.
Unable to remove a dedupe enabled disk group after a capacity disk enters PDL state
When a capacity disk in a dedupe-enabled disk group is removed, or its unique ID changes, or when the device experiences an unrecoverable hardware error, it enters Permanent Device Loss (PDL) state. If you try to remove the disk group, you might see an error message informing you that the action cannot be completed.
Workaround: Whenever a capacity disk is removed, or its unique ID changes, or when the device experiences an unrecoverable hardware error, wait for a few minutes before trying to remove the disk group.
In deduplication clusters, reactive rebalancing might not happen when the disks show more than 80% full
In deduplication clusters, when the disks display more than 80% full on the dashboard, the reactive rebalancing might not start as expected. This is because in deduplication clusters, pending writes and deletes are also considered for calculating the free capacity.
Workaround: None.
TRIM/UNMAP commands from Guest OS fail
If the Guest OS attempts to perform space reclamation during online snapshot consolidation, the TRIM/UNMAP commands fail. This failure keeps space from being reclaimed.
Workaround: Try to reclaim the space after the online snapshot operation is complete. If subsequent TRIM/UNMAP operations fail, remount the disk.
Space reclamation from SCSI TRIM/UNMAP is lost when online snapshot consolidation is performed
Space reclamation achieved from SCSI TRIM/UNMAP commands is lost when you perform online snapshot consolidation. Offline snapshot consolidation does not affect SCSI unmap operation.
Workaround: Reclaim the space after online snapshot consolidation is complete.
Host failure when converting data host into witness host
When you convert a vSAN cluster into a stretched cluster, you must provide a witness host. You can convert a data host into the witness host, but you must use maintenance mode with Full data migration during the process. If you place the host into maintenance mode with Ensure accessibility option, and then configure it as the witness host, the host might fail with a purple diagnostic screen.
Workaround: Remove the disk group on the witness host and then re-create the disk group.
Duplicate VM with the same name in vCenter Server when residing host fails during datastore migration
If a VM is undergoing storage vMotion from vSAN to another datastore, such as NFS, and the host on which it resides encounters a failure on the vSAN network, causing HA failover of the VM, the VM might be duplicated in the vCenter Server.
Workaround: Power off the invalid VM and unregister it from the vCenter Server.
Reconfiguring an existing stretched cluster under a new vCenter Server causes vSAN to issue a health check warning
When rebuilding a current stretched cluster under a new vCenter Server, the vSAN cluster health check is red. The following message appears: vSphere cluster members match vSAN cluster members
Workaround: Use the following procedure to configure the stretched cluster.
Use SSH to log in to the witness host.
Decommission the disks on witness host. Run the following command: esxcli vsan storage remove -s "SSD UUID"
Force the witness host to leave the cluster. Run the following command: esxcli vsan cluster leave
Reconfigure the stretched cluster from the new vCenter Server (Configure > vSAN > Fault Domains & Stretched Cluster).
Disk format upgrade fails while vSAN resynchronizes large objects
If the vSAN cluster contains very large objects, the disk format upgrade might fail while the object is resynchronized. You might see the following error message: Failed to convert object(s) on vSAN
vSAN cannot perform the upgrade until the object is resynchronized. You can check the status of the resynchronization (Monitor > vSAN > Resyncing Components) to verify when the process is complete.
Workaround: Wait until no resynchronization is pending, then retry the disk format upgrade.
vSAN stretched cluster configuration lost after you disable vSAN on a cluster
If you disable vSAN on a stretched cluster, the stretched cluster configuration is not retained. The stretched cluster, witness host, and fault domain configuration is lost.
Workaround: Reconfigure the stretched cluster parameters when you re-enable the vSAN cluster.
Powered off VMs appear as inaccessible during witness host replacement
When you change a witness host in a stretched cluster, VMs that are powered off appear as inaccessible in the vSphere Web Client for a brief time. After the process is complete, powered off VMs appear as accessible. All running VMs appear as accessible throughout the process.
Workaround: None.
Cannot place hosts in maintenance mode if they have faulty boot media
vSAN cannot place hosts with faulty boot media into maintenance mode. The task to enter maintenance mode might fail with an internal vSAN error, due to the inability to save configuration changes. You might see log events similar to the following: Lost Connectivity to the device xxx backing the boot filesystem
Workaround: Remove disk groups manually from each host, using the Full data evacuation option. Then place the host in maintenance mode.
After stretched cluster failover, VMs on the preferred site register alert: Failed to failover
If the secondary site in a stretched cluster fails, VMs failover to the preferred site. VMs already on the preferred site might register the following alert: Failed to failover.
Workaround: Ignore this alert. It does not impact the behavior of the failover.
During network partition, components in the active site appear to be absent
During a network partition in a vSAN two-host or stretched cluster, the vSphere Web Client might display a view of the cluster from the perspective of the non-active site. You might see active components in the primary site displayed as absent.
Workaround: Use RVC commands to query the state of objects in the cluster. For example: vsan.vm_object_info
Some objects are non-compliant after force repair
After you perform a force repair, some objects might not be repaired because the ownership of the objects was transferred to a different node during the process. The force repair might be delayed for those objects.
Workaround: Attempt the force repair operation after all other objects are repaired and resynchronized. You can wait until vSAN repairs the objects.
When you move a host from one encrypted cluster to another, and then back to the original cluster, the task fails
When you move a host from an encrypted vSAN cluster to another encrypted vSAN cluster, then move the host back to the original encrypted cluster, the task might fail. You might see the following message: A general system error occurred: Invalid fault
. This error occurs because vSAN cannot re-encrypt data on the host using the original encryption key. After a short time, vCenter Server restores the original key on the host, and all unmounted disks in the vSAN cluster are mounted.
Workaround: Reboot the host and wait for all disks to get mounted.
Stretched cluster imbalance after a site recovers
When you recover a failed site in a stretched cluster, sometimes hosts in the failed site are brought back sequentially over a long period of time. vSAN might overuse some hosts when it begins repairing the absent components.
Workaround: Recover all of the hosts in a failed site together within a short time window.
VM operations fail due to HA issue with stretched clusters
Under certain failure scenarios in stretched clusters, certain VM operations such as vMotions or powering on a VM might be impacted. These failures scenarios include a partial or a complete site failure, or the failure of the high speed network between the sites. This problem is caused by the dependency on VMware HA being available for normal operation of stretched cluster sites.
Workaround: Disable vSphere HA before performing vMotion, VM creation, or powering on VMs. Then re-enable vSphere HA.
Cannot perform deep rekey if a disk group is unmounted
Before vSAN performs a deep rekey, it performs a shallow rekey. The shallow rekey fails if an unmounted disk group is present. The deep rekey process cannot begin.
Workaround: Remount or remove the unmounted disk group.
Log entries state that firewall configuration has changed
A new firewall entry appears in the security profile when vSAN encryption is enabled: vsanEncryption. This rule controls how hosts communicate directly to the KMS. When it is triggered, log entries are added to /var/log/vobd.log
. You might see the following messages:
Firewall configuration has changed. Operation 'addIP4' for rule set vsanEncryption succeeded.
Firewall configuration has changed. Operation 'removeIP4' for rule set vsanEncryption succeeded.
These messages can be ignored.
Workaround: None.
HA failover does not occur after setting Traffic Type option on a vmknic to support witness traffic
If you set the traffic type option on a vmknic to support witness traffic, vSphere HA does not automatically discover the new setting. You must manually disable and then re-enable HA so it can discover the vmknic. If you configure the vmknic and the vSAN cluster first, and then enable HA on the cluster, it does discover the vmknic.
Workaround: Manually disable vSphere HA on the cluster, and then re-enable it.
iSCSI MCS is not supported
vSAN iSCSI target service does not support Multiple Connections per Session (MCS).
Workaround: None.
Any iSCSI initiator can discover iSCSI targets
vSAN iSCSI target service allows any initiator on the network to discover iSCSI targets.
Workaround: You can isolate your ESXi hosts from iSCSI initiators by placing them on separate VLANs.
After resolving network partition, some VM operations on linked clone VMs might fail
Some VM operations on linked clone VMs that are not producing I/O inside the guest operating system might fail. The operations that might fail include taking snapshots and suspending the VMs. This problem can occur after a network partition is resolved, if the parent base VM's namespace is not yet accessible. When the parent VM's namespace becomes accessible, HA is not notified to power on the VM.
Workaround: Power cycle VMs that are not actively running I/O operations.
Cannot place a witness host in Maintenance Mode
When you attempt to place a witness host in Maintenance Mode, the host remains in the current state and you see the following notification: A specified parameter was not correct.
Workaround: When placing a witness host in Maintenance Mode, choose the No data migration option.
Moving the witness host into and then out of a stretched cluster leaves the cluster in a misconfigured state
If you place the witness host in a vSAN-enabled vCenter cluster, an alarm notifies you that the witness host cannot reside in the cluster. But if you move the witness host out of the cluster, the cluster remains in a misconfigured state.
Workaround: Move the witness host out of the vSAN stretched cluster, and reconfigure the stretched cluster. For more information, see this article: https://kb.vmware.com/s/article/2130587.
When a network partition occurs in a cluster which has an HA heartbeat datastore, VMs are not restarted on the other data site
When the preferred or secondary site in a vSAN cluster loses its network connection to the other sites, VMs running on the site that loses network connectivity are not restarted on the other data site, and the following error might appear: vSphere HA virtual machine HA failover failed
.
This is expected behavior for vSAN clusters.
Workaround: Do not select HA heartbeat datastore while configuring vSphere HA on the cluster.
Unmounted vSAN disks and disk groups displayed as mounted in the vSphere Web Client Operational Status field
After the vSAN disks or disk groups are unmounted by either running the esxcli vsan storage disk group unmount
command or by the vSAN Device Monitor service when disks show persistently high latencies, the vSphere Web Client incorrectly displays the Operational Status field as mounted.
Workaround: Use the Health field to verify disk status, instead of the Operational Status field.