What's in the Release Notes

The release notes cover the following topics:

What's New
VMware vSAN Community
Upgrades for This Release
Limitations
Known Issues

What's New

VMware Virtual SAN (vSAN) 6.6 introduces the following new features and enhancements:

Unicast. In vSAN 6.6 and later releases, multicast is not required on the physical switches that support the vSAN cluster. If some hosts in your vSAN cluster are running earlier versions of software, a multicast network is still required.
Encryption. vSAN supports data-at-rest encryption of the vSAN datastore. When you enable encryption, vSAN performs a rolling reformat of every disk group in the cluster. vSAN encryption requires a trusted connection between vCenter Server and a key management server (KMS). The KMS must support the Key Management Interoperability Protocol (KMIP) 1.1 standard.
Enhanced stretched cluster availability with local fault protection. You can provide local fault protection for virtual machine objects within a single site in a stretched cluster. You can define a Primary level of failures to tolerate for the cluster, and a Secondary level of failures to tolerate for objects within a single site. When one site is unavailable, vSAN maintains availability with local redundancy in the available site.
- Enhanced stretched cluster availability with local fault protection. You can provide local fault protection for virtual machine objects within a single site in a stretched cluster. Define a Primary level of failures to tolerate for the cluster, and a Secondary level of failures to tolerate for objects within a single site. When one site is unavailable, vSAN maintains availability with local redundancy in the available site.
- Change witness host. You can change the witness host for a stretched cluster. On the Fault Domains and Stretched Cluster page, click Change witness host.

Configuration Assist and Updates. You can use the Configuration Assist and Updates pages to check the configuration of your vSAN cluster, and resolve any issues.
- Configuration Assist helps you verify the configuration of cluster components, resolve issues, and troubleshoot problems. Configuration checks are divided into categories, similar to those in the vSAN health service. The configuration checks cover hardware compatibility, network, and vSAN configuration options.
- You can use the Updates page to update storage controller firmware and drivers to meet vSAN requirements.
Resynchronization throttling. You can throttle the IOPS used for cluster resynchronization. Use this control if latencies are rising in the cluster due to resynchronization, or if resynchronization traffic is too high on a host.

Health service enhancements. New and enhanced health checks for encryption, cluster membership, time drift, controller firmware, disk groups, physical disks, disk balance. Online health checks can monitor vSAN cluster health and send the data to the VMware analytics backend system for advanced analysis. You must participate in the Customer Experience Improvement Program to use online health checks.
Updated Host-based vSAN monitoring. You can monitor vSAN health and basic configuration through the ESXi host client. In the host client navigator, click Storage. Select the vSAN datastore, and then click Monitor. Click the tabs to view vSAN information for the host. On the vSAN tab, you can click Edit Settings to correct configuration issues at the host level.
Performance service enhancements. vSAN performance service includes statistics for networking, resynchronization, and iSCSI. You can select saved time ranges in performance views. vSAN saves each selected time range when you run a performance query.
vSAN integration with vCenter Server Appliance. You can create a vSAN cluster as you deploy a vCenter Server Appliance, and host the appliance on that cluster. The vCenter Server Appliance Installer enables you to create a one-host vSAN cluster, with disks claimed from the host. vCenter Server Appliance is deployed on the vSAN cluster.

Maintenance mode enhancements. The Confirm Maintenance Mode dialog box provides information to guide your maintenance activities. You can view the impact of each data evacuation option. For example, you can check whether enough free space is available to complete the selected option.
Rebalancing and repair enhancements. Disk rebalancing operations are more efficient. Manual rebalancing operation provides better progress reporting.
- Rebalancing protocol has been tuned to be more efficient and achieve better cluster balance. Manual rebalance provides more updates and better progress reporting.
- More efficient repair operations require fewer cluster resynchronizations. vSAN can partially repair degraded or absent components to increase the Failures to tolerate even if vSAN cannot make the object compliant.
Disk failure handling. If a disk experiences sustained high latencies or congestion, vSAN considers the device as a dying disk, and evacuates data from the disk. vSAN handles the dying disk by evacuating or rebuilding data. No user action is required, unless the cluster lacks resources or has inaccessible objects. When vSAN completes evacuation of data, the health status is listed as DyingDiskEmpty. vSAN does not unmount the failed device.

New esxcli commands.
- Display vSAN cluster health: esxcli vsan health
- Display vSAN debug information: esxcli vsan debug

VMware vSAN Community

Use the vSAN Community Web site to provide feedback and request assistance with any problems you find while using vSAN.

Upgrades for This Release

For instructions about upgrading vSAN, see the VMware Virtual SAN 6.6 documentation.

vSAN 6.6 is a major new release that requires a full upgrade. Perform the following tasks to complete the upgrade to vSAN 6.6:

Upgrade the vCenter Server to vSphere 6.5.0d. For more information, see the VMware vCenter Server 6.5.0d Release Notes.
Upgrade the ESXi hosts to vSphere 6.5.0d. For more information, see the VMware ESXi 6.5.0d Release Notes.
Upgrade the vSAN on-disk format to version 5.0.

Note: Direct upgrade from vSphere 6.0 Update 3 to vSphere 6.5.0d and vSAN 6.6 is not supported.

Upgrading the On-disk Format for Hosts with Limited Capacity

During an upgrade of the vSAN on-disk format, a disk group evacuation is performed. The disk group is removed and upgraded to on-disk format version 5.0, and the disk group is added back to the cluster. For two-node or three-node clusters, or clusters without enough capacity to evacuate each disk group, you must use this following RVC command to upgrade the on-disk format: vsan.ondisk_upgrade --allow-reduced-redundancy

When you allow reduced redundancy, your VMs are unprotected for the duration of the upgrade, because this method does not evacuate data to the other hosts in the cluster. It removes each disk group, upgrades the on-disk format, and adds the disk group back to the cluster. All objects remain available, but with reduced redundancy.

If you enable deduplication and compression during the upgrade to vSAN 6.6, you can select Allow Reduced Redundancy from the vSphere Web Client.

Using VMware Update Manager with Stretched Clusters

Using VMware Update Manager to upgrade hosts in parallel might result in the witness host being upgraded in parallel with one of the data hosts in a stretched cluster. To avoid upgrade problems, do not configure VMware Update Manager to upgrade a witness host in parallel with the data hosts in a stretched cluster. Upgrade the witness host after all data hosts have been successfully upgraded and have exited maintenance mode.

Verifying Health Check Failures During Upgrade

During upgrades of the vSAN on-disk format, the Physical Disk Health – Metadata Health check can fail intermittently. These failures can occur if the destaging process is slow, most likely because vSAN must allocate physical blocks on the storage devices. Before you take action, verify the status of this health check after the period of high activity, such as multiple virtual machine deployments, is complete. If the health check is still red, the warning is valid. If the health check is green, you can ignore the previous warning. For more information, see Knowledge Base article 2108690.

Limitations

For information about other maximum configuration limits for the vSAN 6.6 release, see the Configuration Maximums documentation.

Known Issues

The following issues are known to occur in vSAN 6.6:

Cluster consistency health check fails during deep rekey operation
The deep rekey operation on an encrypted vSAN cluster can take several hours. During the rekey, the following health check might indicate a failure: Cluster configuration consistency. The cluster consistency check does not detect the deep rekey operation, and there might not be a problem.

Workaround: Retest the vSAN cluster consistency health check after the deep rekey operation is complete.
VM OVF deploy fails if DRS is disabled
If you deploy an OVF template on the vSAN cluster, the operation fails if DRS is disabled on the vSAN cluster. You might see a message similar to the following: The operation is not allowed in the current state.

Workaround: Enable DRS on the vSAN cluster before you deploy an OVF template.
vSAN stretched cluster configuration lost after you disable vSAN on a cluster
If you disable vSAN on a stretched cluster, the stretched cluster configuration is not retained. The stretched cluster, witness host, and fault domain configuration is lost.

Workaround: Reconfigure the stretched cluster parameters when you re-enable the vSAN cluster.
Orphaned or inaccessible VMs after total cluster failure
After total cluster failure, some powered off or suspended VMs might become orphaned or inaccessible, especially when vSAN encryption is enabled.

Workaround: Use the following procedure to re-register orphaned or inaccessible VMs.
1. Use RVC to connect to vCenter Server.
2. Navigate to the name of the cluster where orphaned VMs exist and re-register them. For example, if the name of the cluster is "vsan," run the following command: vsan.check_state -ref /localhost/Datacenter/computers/vsan
  Sample output:
  
  vsan.check_state -ref /localhost/Datacenter/computers/vsan 2017-03-03 18:54:04 +0000: Step 1: Check for inaccessible vSAN objects 2017-03-03 18:54:10 +0000: Step 1b: Check for inaccessible vSAN objects, again 2017-03-03 18:54:11 +0000: Step 2: Check for invalid/inaccessible VMs 2017-03-03 18:54:11 +0000: Step 2b: Check for invalid/inaccessible VMs again 2017-03-03 18:54:11 +0000: Step 3: Check for VMs for which VC/hostd/vmx are out of sync Did not find VMs for which VC/hostd/vmx are out of sync
On-disk format version for witness host is later than version for data hosts
When you change the witness host during an upgrade to vSAN 6.6, the new witness host receives the latest on-disk format version. The on-disk format version of the witness host might be later than the on-disk format version of the data hosts. In this case, the witness host cannot store components.

Workaround: Use the following procedure to change the on-disk format to an earlier version.
1. Delete the disk group on the new witness host.
2. Set the advanced parameter to enable formatting of disk groups with an earlier on-disk format. For more information, see Knowledge Base article 2146221.
3. Recreate a new disk group on the witness host with a vSAN on-disk format version that matches the data hosts.

Powered off VMs appear as inaccessible during witness host replacement
When you change a witness host in a stretched cluster, VMs that are powered off appear as inaccessible in the vSphere Web Client for a brief time. After the process is complete, powered off VMs appear as accessible. All running VMs appear as accessible throughout the process.

Workaround: None
Cannot place hosts in maintenance mode if they have faulty boot media
vSAN cannot place hosts with faulty boot media into maintenance mode. The task to enter maintenance mode might fail with an internal vSAN error, due to the inability to save configuration changes. You might see log events similar to the following: Lost Connectivity to the device xxx backing the boot filesystem

Workaround: Remove disk groups manually from each host, using the Full data evacuation option. Then place the host in maintenance mode.
Health check times out if a host fails
If one host in the cluster fails, the health check might time out. You might see the following message: a back-end task took more than 120 seconds. When the vSAN health service detects that the host has failed, it restarts. The health check automatically resumes after ten minutes.

Workaround: None
Health service does not work if vSAN cluster has ESXi hosts with vSphere 6.0 Update 1 or earlier
The vSAN 6.6 health service does not work if the cluster has ESXi hosts running vSphere 6.0 Update 1 or earlier releases.

Workaround: Do not add ESXi hosts with vSphere 6.0 Update 1 or earlier software to a vSAN 6.6 cluster.
After stretched cluster failover, VMs on the preferred site register alert: Failed to failover
If the secondary site in a stretched cluster fails, VMs failover to the preferred site. VMs already on the preferred site might register the following alert: Failed to failover. Ignore this alert. It does not impact the behavior of the failover.

Workaround: None
During network partition, components in the active site appear to be absent
During a network partition in a vSAN 2 host or stretched cluster, the vSphere Web Client might display a view of the cluster from the perspective of the non-active site. You might see active components in the primary site displayed as absent.

Workaround: Use RVC commands to query the state of objects in the cluster. For example: vsan.vm_object_info
vCenter Server Appliance Installer accepts cluster name greater than 80 characters
If you enter a vSAN cluster name that is more than characters, the vCenter Server Appliance Installer accepts the name, but the configuration is invalid. The vCenter Server Appliance fails when it is booted.

Workaround: Enter a vSAN cluster name that is 80 characters or less.
vCenter Server Appliance Installer accepts mix of flash and magnetic drives for capacity
The vCenter Server Appliance Installer allows you to select a mix of flash devices and magnetic disks for the capacity tier of a disk group in a new vSAN cluster. The capacity tier of each disk group can support either all-flash or all-magnetic devices.

Workaround: Do not mix flash devices and magnetic disks on the capacity tier of the vSAN cluster.
Temporary Update configuration tasks visible if hosts are disconnected when you change vSAN encryption configurations
When you change the configurations in an encrypted vSAN cluster (such as turning encryption on or off or changing the KMS key), an Update vSAN configuration task runs on each host every 3 seconds until all hosts reconnect or until 5 minutes have passed. These tasks are not harmful and rarely impact performance.

Workaround: None
Some objects are non-compliant after force repair
After you perform a force repair, some objects might not be repaired because the ownership of the objects was transferred to a different node during the process. The force repair might be delayed for those objects.

Workaround: Attempt the force repair operation after all other objects are repaired and resynchronized. You can wait until vSAN repairs the objects.
When you move a host from one encrypted cluster to another, and then back to the original cluster, the task fails
When you move a host from an encrypted vSAN cluster to another encrypted vSAN cluster, then move the host back to the original encrypted cluster, the task might fail. You might see the following message: A general system error occurred: Invalid fault. This error occurs because vSAN cannot re-encrypt data on the host using the original encryption key. After a short time, vCenter Server restores the original key on the host, and all unmounted disks in the vSAN cluster are mounted.

Workaround: Reboot the host and wait for all disks to get mounted.
Cluster becomes partitioned if vCenter Server and ESXi hosts reboot
If both the vCenter Server and ESXi hosts of a vSAN cluster are rebooted, the cluster can become partitioned.

Workaround: Restart the vSAN health service.
Stretched cluster imbalance after a site recovers
When you recover a failed site in a stretched cluster, sometimes hosts in the failed site are brought back sequentially over a long period of time. vSAN might overuse some hosts when it begins repairing the absent components.

Workaround: Recover all of the hosts in a failed site together within a short time window.
VM operations fail due to HA issue with stretched clusters
Under certain failure scenarios in stretched clusters, certain VM operations such as vMotions or powering on a VM might be impacted. These failure scenarios include a partial or a complete site failure, or the failure of the high speed network between the sites. This problem is caused by the dependency on VMware HA being available for normal operation of stretched cluster sites.

Workaround: Disable vSphere HA before performing vMotion, VM creation, or powering on VMs. Then re-enable vSphere HA.
Updated Restoring or replacing vCenter Server can cause cluster partition
If the vCenter Server is replaced or recovered from backup, the host membership list might become out-of-date. This can cause ESXi hosts to become partitioned from the cluster.

Workaround: Use the following procedure to make sure all hosts are added to the vSAN cluster as the vCenter Server reboots.
1. Before you reboot vCenter Server, configure hosts to ignore cluster member list updates. Run the following command on each host in the vSAN cluster:
  esxcfg-advcfg -s1 /VSAN/IgnoreClusterMemberListUpdates
2. After vCenter Server is running and all hosts are present in the cluster, configure hosts to use cluster member list updates. Run the following command on each host in the cluster:
  esxcfg-advcfg -s0 /VSAN/IgnoreClusterMemberListUpdates
Disk decommission or disk unmount task fails
Disk decommission or disk unmount task might fail due to a conflict between the data write commit task and the virtual disk delete task. This problem might occur during upgrades that require a new vSAN on-disk format. You might see the following message in the VMkernel.log:

4724 2017-04-10T18:46:51.309Z cpu30:67232)LSOM: LSOMFreeMDDispatch:3797: Throttled: Waiting for component cleanup

Workaround: Reboot the host to clear the conflict and retry the operation.
vMotion network connectivity test incorrectly reports ping failures
The vMotion network connectivity test (Cluster > Monitor > vSAN > Health > Network) reports ping failures if the vMotion stack is used for vMotion. The vMotion network connectivity (ping) check only supports vmknics that use the default network stack. The check fails for vmknics using the vMotion network stack. These reports do not indicate a connectivity problem.

Workaround: Configure the vmknic to use the default network stack. You can disable the vMotion ping check using RVC commands. For example: vsan.health.silent_health_check_configure -a vmotionpingsmall
Cannot perform deep rekey if a disk group is unmounted
Before vSAN performs a deep rekey, it performs a shallow rekey. The shallow rekey fails if an unmounted disk group is present. The deep rekey process cannot begin.

Workaround: Remount or remove the unmounted disk group.
Log entries state that firewall configuration has changed
A new firewall entry appears in the security profile when vSAN encryption is enabled: vsanEncryption. This rule controls how hosts communicate directly to the KMS. When it is triggered, log entries are added to /var/log/vobd.log. You might see the following messages:

Firewall configuration has changed. Operation 'addIP4' for rule set vsanEncryption succeeded.
Firewall configuration has changed. Operation 'removeIP4' for rule set vsanEncryption succeeded.

These messages can be ignored.

Workaround: None
Updated Limited support for First Class Disks with vSAN datastores
vSAN 6.6 does not fully support First Class Disks in vSAN datastores. You might experience the following problems if you use First Class Disks in a vSAN datastore:
- vSAN health service does not display the health of First Class Disks correctly.
- The Used Capacity Breakdown includes the used capacity for First Class Disks in the following category: Other
- The health status of VMs that use First Class Disks is not calculated correctly.
Workaround: None
HA failover does not occur after setting Traffic Type option on a vmknic to support witness traffic
If you set the traffic type option on a vmknic to support witness traffic, vSphere HA does not automatically discover the new setting. You must manually disable and then re-enable HA so it can discover the vmknic. If you configure the vmknic and the vSAN cluster first, and then enable HA on the cluster, it does discover the vmknic.

Workaround: Manually disable vSphere HA on the cluster, and then re-enable it.
After you disable and delete the iSCSI target service, some iSCSI objects remain in the vSAN datastore
If you use the Web Client to remove all iSCSI targets and LUNs, and disable the iSCSI target service, the iSCSI home object still exists in the vSAN datastore.

Workaround: To delete the iSCSI home object and all metadata associated with the iSCSI target service, run the following command on any host in the cluster: esxcli vsan iscsi homeobject delete
iSCSI I/O operation might be interrupted during iSCSI target failover
During iSCSI target failover, the iSCSI I/O operations might be interrupted. A host failure or a host reboot might trigger an iSCSI target failover.

Workaround: Retry the session from the iSCSI initiator.
iSCSI MCS is not supported
vSAN iSCSI target service does not support Multiple Connections per Session (MCS).

Workaround: None
Any iSCSI initiator can discover iSCSI targets
vSAN iSCSI target service allows any initiator on the network to discover iSCSI targets.

Workaround: You can isolate your ESXi hosts from iSCSI initiators by placing them on separate VLANs.
After resolving network partition, some VM operations on linked clone VMs might fail
Some VM operations on linked clone VMs that are not producing I/O inside the guest operating system might fail. The operations that might fail include taking snapshots and suspending the VMs. This problem can occur after a network partition is resolved, if the parent base VM's namespace is not yet accessible. When the parent VM's namespace becomes accessible, HA is not notified to power on the VM.

Workaround: Power cycle VMs that are not actively running I/O operations.
When you log out of the Web client after using the Configure vSAN wizard, some configuration tasks might fail
The Configure vSAN wizard might require up to several hours to complete the configuration tasks. You must remain logged in to the Web client until the wizard completes the configuration. This problem usually occurs in clusters with many hosts and disk groups.

Workaround: If some configuration tasks failed, perform the configuration again.
New policy rules ignored on hosts with older versions of ESXi software
This might occur when you have two or more vSAN clusters, with one cluster running the latest software and another cluster running an older software version. The vSphere Web Client displays policy rules for the latest vSAN software, but those new policies are not supported on the older hosts. For example, RAID-5/6 (Erasure Coding) – Capacity is not supported on hosts running 6.0U1 or earlier software. You can configure the new policy rules and apply them to any VMs and objects, but they are ignored on hosts running the older software version.

Workaround: None
Snapshot memory objects are not displayed in the Used Capacity Breakdown of the vSAN Capacity monitor
For Virtual Machines created with hardware version lower than 10, the snapshot memory is included in the Vmem objects on the Used Capacity Breakdown.

Workaround: To view snapshot memory objects in the Used Capacity Breakdown, create Virtual Machines with hardware version 10 or higher.
Storage Usage reported in VM Summary page might appear larger after upgrading to vSAN 6.5 or later
In previous releases of vSAN, the value reported for VM Storage Usage was the space used by a single copy of the data. For example, if the guest wrote 1 GB to a thin-provisioned object with two mirrors, the Storage Usage was shown as 1 GB. In vSAN 6.5 and later, the Storage Usage field displays the actual space used, including all copies of the data. So if the guest writes 1 GB to a thin-provisioned object with two mirrors, the Storage Usage is shown as 2 GB. The reported storage usage on some VMs might appear larger after upgrading to vSAN 6.5, but the actual space consumed did not increase.

Workaround: None
Cannot place a witness host in Maintenance Mode
When you attempt to place a witness host in Maintenance Mode, the host remains in the current state and you see the following notification: A specified parameter was not correct.

Workaround: When placing a witness host in Maintenance Mode, choose the No data migration option.
Moving the witness host into and then out of a stretched cluster leaves the cluster in a misconfigured state
If you place the witness host in a vSAN-enabled vCenter cluster, an alarm notifies you that the witness host cannot reside in the cluster. But if you move the witness host out of the cluster, the cluster remains in a misconfigured state.

Workaround: Move the witness host out of the vSAN stretched cluster, and reconfigure the stretched cluster. For more information, see Knowledge Base article 2130587.
When a network partition occurs in a cluster which has an HA heartbeat datastore, VMs are not restarted on the other data site
When the preferred or secondary site in a vSAN cluster loses its network connection to the other sites, VMs running on the site that loses network connectivity are not restarted on the other data site, and the following error might appear: vSphere HA virtual machine HA failover failed.

This is expected behavior for vSAN clusters.

Workaround: Do not select HA heartbeat datastore while configuring vSphere HA on the cluster.
Unmounted vSAN disks and disk groups displayed as mounted in the vSphere Web Client Operational Status field
After the vSAN disks or disk groups are unmounted by either running the esxcli vsan storage disk group unmount command or by the vSAN Device Monitor service when disks show persistently high latencies, the vSphere Web Client incorrectly displays the Operational Status field as mounted.
Workaround: Use the Health field to verify disk status, instead of the Operational Status field.
On-disk format upgrade displays disks not on vSAN
When you upgrade the disk format, vSAN might incorrectly display disks that were removed from the cluster. The UI also might show the version status as mixed. This display issue usually occurs after one or multiple disks are manually unmounted from the cluster. It does not affect the upgrade process. Only the mounted disks are checked. The unmounted disks are ignored.

Workaround: None

All vSAN clusters share the same external proxy settings
All vSAN clusters share the same external proxy settings, even if you set the proxy at the cluster level. vSAN uses external proxies to connect to Support Assistant, the Customer Experience Improvement Program, and the HCL database, if the cluster does not have direct Internet access.

Workaround: None
VMs in a stretched cluster become inaccessible when preferred site is isolated, then regains connectivity only to the witness host
When the preferred site becomes unavailable or loses its network connection to the secondary site and the witness host, the secondary site forms a cluster with the witness host and continues storage operations. Data on the preferred site might become outdated over time. If the preferred site then reconnects to the witness host but not to the secondary site, the witness host leaves the cluster it is in and forms a cluster with the preferred site, and some VMs might become inaccessible because they do not have access to the most recent data in this cluster.
Workaround: Before you reconnect the preferred site to the cluster, mark the secondary site as the preferred site. After the sites are resynchronized, you can mark the site you want to use as the preferred site.
Storage Consumption Model for VM Storage Policy wizard shows incorrect information
If one or more hosts in a vSAN cluster is not running software version 6.0 Update 2 or later, the Storage Consumption Model for the VM Storage Policy wizard might show incorrect information when you select RAID 5/6 as the failure tolerance method.
Workaround: Upgrade all hosts to the latest software version.