You can design your vSphere environment in such a way that certain datastores become accessible only from a subset of nodes in the vSphere cluster, based on availability zones. You can segment the cluster into racks, regions, or zones, or use some other type of grouping. When topology is enabled in the cluster, you can use vSphere Container Storage Plug-in to deploy a Kubernetes workload to a specific region or zone defined in the topology.

Note: Volume topology is only supported for block volumes.

In addition, you can use the volumeBindingMode parameter in the StorageClass to specify when the volume should be created and bound to the PVC request. vSphere Container Storage Plug-in supports two volume binding modes that Kubernetes provides.

Immediate
This is the default volume binding mode. The mode indicates that volume binding and dynamic provisioning occur immediately after the PersistentVolumeClaim is created. To deploy workloads with Immediate binding mode in topology-aware environment, you can specify zone parameters in the StorageClass.
WaitForFirstConsumer
This mode delays the creation and binding of a persistent volume for a PVC until a pod that uses the PVC is created. When you use this mode, you do not need to specify StorageClass zone parameters because pod policies drive the decision of which zones to use for volume provisioning.

Before deploying workloads with topology, enable topology in the native Kubernetes cluster in your vSphere environment. For more information, see Deploy the vSphere Container Storage Plug-in with Topology.

Deploy Workloads with Immediate Mode in Topology-Aware Environment

vSphere Container Storage Plug-in supports volume topology and availability zones in vSphere environment. You can deploy a Kubernetes workload to a specific region or zone defined in the topology using the default Immediate volume binding mode. The mode indicates that volume binding and dynamic provisioning occur immediately after the PersistentVolumeClaim is created.

To deploy workloads with Immediate binding mode in topology-aware environment, you must specify zone parameters in the StorageClass.

Prerequisites

Enable topology in the native Kubernetes cluster in your vSphere environment. For more information, see Deploy the vSphere Container Storage Plug-in with Topology.

Procedure

  1. Create a StorageClass with Immediate volume binding mode.
    When you do not specify the volume binding mode, it is Immediate by default.
    You can also specify zone parameters. In the following example, the StorageClass can provision volumes on either zone-a or zone-b.
    kind: StorageClass
    apiVersion: storage.k8s.io/v1
    metadata:
      name: example-multi-zones-sc
    provisioner: csi.vsphere.vmware.com
    allowedTopologies:
      - matchLabelExpressions:
          - key: topology.csi.vmware.com/k8s-region
            values:
              - region-1
          - key: topology.csi.vmware.com/k8s-zone
            values:
              - zone-a
              - zone-b
  2. Create a PersistentVolumeClaim to use the StorageClass.
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: example-multi-zones-pvc
    spec:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 100Mi
      storageClassName: example-multi-zones-sc
  3. Check that the PVC is bound.
    $ kubectl get pvc example-multi-zones-pvc
    NAME                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS             AGE
    example-multi-zones-pvc   Bound    pvc-b489e551-9c76-44ca-9434-76c628836748   100Mi      RWO            example-multi-zones-sc   3s
  4. Verify that the PV node affinity rules include at least one domain within zone-a or zone-b depending on whether the selected datastore is local or shared across zones.
    root@k8s-control-108-1632518174:~# kubectl describe pv pvc-b489e551-9c76-44ca-9434-76c628836748
    Name:              pvc-b489e551-9c76-44ca-9434-76c628836748
    Labels:            <none>
    Annotations:       pv.kubernetes.io/provisioned-by: csi.vsphere.vmware.com
    Finalizers:        [kubernetes.io/pv-protection]
    StorageClass:      example-multi-zones-sc
    Status:            Bound
    Claim:             default/example-multi-zones-pvc
    Reclaim Policy:    Delete
    Access Modes:      RWO
    VolumeMode:        Filesystem
    Capacity:          100Mi
    Node Affinity:    
      Required Terms: 
        Term 0:        topology.csi.vmware.com/k8s-zone in [zone-b]
                       topology.csi.vmware.com/k8s-region in [region-1]
    Message:          
    Source:
        Type:              CSI (a Container Storage Interface (CSI) volume source)
        Driver:            csi.vsphere.vmware.com
        FSType:            ext4
        VolumeHandle:      db13a347-0fd5-4b8a-894c-23cf84ab2973
        ReadOnly:          false
        VolumeAttributes:      storage.kubernetes.io/csiProvisionerIdentity=1634758806527-8081-csi.vsphere.vmware.com
                               type=vSphere CNS Block Volume
    Events:                <none>
  5. Create an application to use the PVC.
    apiVersion: v1
    kind: Pod
    metadata:
      name: example-multi-zones-pod
    spec:
      containers:
        - name: test-container
          image: gcr.io/google_containers/busybox:1.24
          command: ["/bin/sh", "-c", "echo 'hello' > /mnt/volume1/index.html  && chmod o+rX /mnt /mnt/volume1/index.html && while true ; do sleep 2 ; done"]
          volumeMounts:
            - name: test-volume
              mountPath: /mnt/volume1
      restartPolicy: Never
      volumes:
        - name: test-volume
          persistentVolumeClaim:
            claimName: example-multi-zones-pvc
    You can notice that the pod is scheduled in to the zone where volume has been provisioned. In this example, it is zone-b.
    $ kubectl get pods -o wide
    NAME                      READY   STATUS    RESTARTS   AGE     IP            NODE         NOMINATED NODE   READINESS GATES
    example-multi-zones-pod   1/1     Running   0          3m53s   10.244.5.34   k8s-node-2   <none>           <none>
     
    $ kubectl get node k8s-node-2 --show-labels
    NAME         STATUS   ROLES    AGE  VERSION   LABELS
    k8s-node-2   Ready    <none>   2d   v1.21.1   topology.csi.vmware.com/k8s-region=region-1,topology.csi.vmware.com/k8s-zone=zone-b

Example: StorageClass with Additional Restrictive Parameters

You can use additional parameters, such as storagepolicyname, in the StorageClass to further restrict the selection of a datastore for volume provisioning. For example, if you have a shared datastore datastore-AB accessible to zone-a and zone-b, create a storage policy that points to the datastore-AB. And then mention this storage policy as a parameter in the storage class.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: example-multi-zones-sc
provisioner: csi.vsphere.vmware.com
parameters:
  storagepolicyname: "shared datastore zones A and B"
allowedTopologies:
  - matchLabelExpressions:
      - key: topology.csi.vmware.com/k8s-region
        values:
          - region-1
      - key: topology.csi.vmware.com/k8s-zone
        values:
          - zone-a
          - zone-b

Deploy Workloads with WaitForFirstConsumer Mode in Topology-Aware Environment

vSphere Container Storage Plug-in supports topology-aware volume provisioning with WaitForFirstConsumer. With WaitForFirstConsumer topology-aware provisioning, Kubernetes can make intelligent decisions and find the best place to dynamically provision a volume for a pod. In multi-zone clusters, you can provision volumes in an appropriate zone that can run your pod. You can deploy and scale your stateful workloads across failure domains to provide high availability and fault tolerance.

Prerequisites

Enable topology in the native Kubernetes cluster in your vSphere environment. For more information, see Deploy the vSphere Container Storage Plug-in with Topology.

Procedure

  1. Create a StorageClass with the volumeBindingMode parameter set to WaitForFirstConsumer.
    kind: StorageClass
    apiVersion: storage.k8s.io/v1
    metadata:
      name: topology-aware-standard
    provisioner: csi.vsphere.vmware.com
    volumeBindingMode: WaitForFirstConsumer
  2. Create an application to use the StorageClass created previously.

    Instead of creating a volume immediately, the WaitForFirstConsumer setting instructs the volume provisioner to wait until a pod using the associated PVC runs through scheduling. In contrast with the Immediate volume binding mode, when the WaitForFirstConsumer setting is used, the Kubernetes scheduler drives the decision of which failure domain to use for volume provisioning using the pod policies.

    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: web
    spec:
      replicas: 2
      selector:
        matchLabels:
          app: nginx
      serviceName: nginx
      template:
        metadata:
          labels:
            app: nginx
        spec:
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: topology.csi.vmware.com/k8s-zone
                    operator: In
                    values:
                    - zone-a
                    - zone-b
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                  - key: app
                    operator: In
                    values:
                    - nginx
                topologyKey: topology.csi.vmware.com/k8s-zone
          containers:
            - name: nginx
              image: gcr.io/google_containers/nginx-slim:0.8
              ports:
                - containerPort: 80
                  name: web
              volumeMounts:
                - name: www
                  mountPath: /usr/share/nginx/html
                - name: logs
                  mountPath: /logs
      volumeClaimTemplates:
        - metadata:
            name: www
          spec:
            accessModes: [ "ReadWriteOnce" ]
            storageClassName: topology-aware-standard
            resources:
              requests:
                storage: 2Gi
        - metadata:
            name: logs
          spec:
            accessModes: [ "ReadWriteOnce" ]
            storageClassName: topology-aware-standard
            resources:
              requests:
                storage: 1Gi
  3. Verify that the statefulset is in Running state and check that the pods are evenly distributed among the zone-a and zone-b.
    $  kubectl get statefulset
     NAME   READY   AGE
     web    2/2     3m51s
     
    $ kubectl get pods -o wide
    NAME                      READY   STATUS    RESTARTS   AGE     IP            NODE         NOMINATED NODE   READINESS GATES
    web-0                     1/1     Running   0          4m40s   10.244.3.21   k8s-node-2   <none>           <none>
    web-1                     1/1     Running   0          4m12s   10.244.4.25   k8s-node-1   <none>           <none>
     
    $ kubectl get node k8s-node-1 k8s-node-2 --show-labels
    NAME         STATUS   ROLES    AGE  VERSION   LABELS
    k8s-node-1   Ready    <none>   2d   v1.21.1   topology.csi.vmware.com/k8s-region=region-1,topology.csi.vmware.com/k8s-zone=zone-a
    k8s-node-2   Ready    <none>   2d   v1.21.1   topology.csi.vmware.com/k8s-region=region-1,topology.csi.vmware.com/k8s-zone=zone-b
    
    Notice that the PV node affinity rules include at least one domain within zone-a or zone-b depending on whether the selected datastore is local or shared across zones.
    $ kubectl get pv -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.claimRef.name}{"\t"}{.spec.nodeAffinity}{"\n"}{end}'
     pvc-2253dc52-a9ed-11e9-b26e-005056a04307    www-web-0    map[required:map[nodeSelectorTerms:[map[matchExpressions:[map[key:topology.csi.vmware.com/k8s-region operator:In values:[region-1]] map[operator:In values:[zone-b] key:topology.csi.vmware.com/k8s-zone]]]]]]
     pvc-22575240-a9ed-11e9-b26e-005056a04307    logs-web-0    map[required:map[nodeSelectorTerms:[map[matchExpressions:[map[key:topology.csi.vmware.com/k8s-zone operator:In values:[zone-b]] map[key:topology.csi.vmware.com/k8s-region operator:In values:[region-1]]]]]]]
     pvc-3c963150-a9ed-11e9-b26e-005056a04307    www-web-1    map[required:map[nodeSelectorTerms:[map[matchExpressions:[map[key:topology.csi.vmware.com/k8s-zone operator:In values:[zone-a]] map[operator:In values:[region-1] key:topology.csi.vmware.com/k8s-region]]]]]]
     pvc-3c98978f-a9ed-11e9-b26e-005056a04307    logs-web-1    map[required:map[nodeSelectorTerms:[map[matchExpressions:[map[key:topology.csi.vmware.com/k8s-zone operator:In values:[zone-a]] map[key:topology.csi.vmware.com/k8s-region operator:In values:[region-1]]]]]]]