You can deploy a generic Kubernetes cluster and persistent volumes on vSAN stretched clusters.

For information about vSAN stretched clusters, see Extending a Datastore Across Two Sites with Stretched Clusters and vSAN Stretched Cluster Guide.

Limitations When Using vSAN Stretched Clusters for Kubernetes

When you plan to configure a Kubernetes cluster on a vSAN stretched cluster, consider the following limitations:
  • A generic Kubernetes cluster does not enforce the same storage policy on the node VMs and on the persistent volumes. The vSphere administrator is responsible for the correct storage policy configuration, assignment, and use of the storage policies within the Kubernetes clusters.
  • The topology feature can not be used to provision a volume that belongs to a specific fault domain within the vSAN stretched cluster.
  • vSAN stretched clusters do not support RWM volumes.

vSAN Stretched Cluster Design Considerations

Consider these guidelines when working with a vSAN stretched cluster that you use for Kubernetes.
  • Configure VM storage policy settings for the stretched cluster.
    • Set Site disaster tolerance. Defines whether to use non-stretched or stretched cluster.

      For the stretched cluster, defines whether data is mirrored at both sites, Dual site mirroring, or whether it is constrained within only one of the sites in the vSAN cluster.

    • Set Failures to tolerate. Defines the number of failures a storage object can tolerate and the method used to tolerate failures.

      For the stretched cluster, defines the number of disk or host failures a storage object can tolerate for each of the site. The number of required fault domains, hosts within a site for the stretched cluster, in order to tolerate n failures is 2n + 1 for mirroring.

      Raid-1 mirroring provides better performance. Raid-5 and Raid-6 achieve failure tolerance using parity blocks, which provides better space efficiency. These options are available only on all-flash clusters.

    • Enable Force provisioning.

  • Configure general settings for the stretched cluster.
    • Use the VM storage policy with the same replication and site affinity settings for all storage objects on the Kubernetes cluster. The same storage policy should be used for all node VMs, including the control plane and worker, and all PVs.
    • You can deploy multiple Kubernetes clusters with different storage requirements in the same vSAN stretched cluster.
    • Enable DRS on the stretched luster.
    • Enable vSphere HA stretched cluster. Use the following vSphere HA requirements.

  • Create VM host rules to pin Kubernetes nodes to specific primary or secondary site, such as Site-A.

Requirements for Kubernetes Deployments on vSAN Stretched Cluster

Requirements Deployment
Node Placement In a typical deployment, the control plane and worker nodes are tied to the primary site, but flexible enough to failover on another site, if the primary site fails.

Deploy HA Proxy on the primary site.

Failure to Tolerate At least FTT1
DRS Enabled
Site Disaster Tolerance Dual Site Mirroring
Storage Policy Force Provisioning Enabled
vSphere HA Enabled

Potential Failure Scenarios

The following table describes potential failure scenarios that might occur when you deploy a generic Kubernetes cluster on a on vSAN stretched cluster.

Failure Scenario Description
Several ESXi hosts fail on the primary site.
  • Kubernetes node VMs move from unavailable hosts to the available hosts within primary sites.
  • If the worker node needs to be restarted, pods running on that node can be re-scheduled and re-created on another node.
  • If the control plane node needs to be restarted, the existing application workload does not get affected.
The entire primary site and all hosts on the site fail.
  • Kubernetes node VMs move from the primary site to the secondary site.
  • You experience a complete downtime until node VMs restart on the secondary site.
Several hosts fail on the secondary site.

The failure does not affect the Kubernetes cluster because the entire cluster is at the primary site.

The entire secondary site and all hosts on the site fail.
  • The failure does not affect the Kubernetes cluster because the entire cluster is at the primary site.
  • Replication for storage objects stops because the secondary site is not available.
Intersite network failure occurs.
  • The failure does not affect the Kubernetes cluster because the entire cluster is at the primary site.
  • Replication for storage objects stops because the secondary site is not available.