This topic describes how you can maintain workload uptime for Kubernetes clusters deployed with VMware Tanzu Kubernetes Grid Integrated Edition (TKGI).
To maintain workload uptime, configure the following settings in your deployment manifest:
To increase uptime, you can also refer to the documentation for the services that run on your clusters, and configure your workload based on the recommendations of the software vendor.
The Tanzu Kubernetes Grid Integrated Edition tile contains an errand that upgrades all Kubernetes clusters. Upgrades run on a single VM at a time:
While a worker VM is upgraded, the workload on that VM goes down. The cluster’s additional worker VMs continue to run replicas of your workload, maintaining the uptime of your workload.
Note: Ensure that your pods are bound to a ReplicaSet or Deployment. Naked pods are not rescheduled in the event of a node failure. For more information, see Configuration Best Practices in the Kubernetes documentation.
Upgrading a cluster with only a single control plane or worker VM results in a workload outage.
To prevent workload downtime during a cluster upgrade, VMware recommends the following:
Set the number of workload replicas to handle traffic during rolling upgrades. To replicate your workload on additional worker VMs, deploy the workload using a replica set.
Edit the spec.replicas
value in your deployment manifest:
kind: Deployment
metadata:
# ...
spec:
replicas: 3
template:
metadata:
labels:
app: APP-NAME
See the following table for more information about this section of the manifest:
Key-Value Pair | Description |
---|---|
spec: |
Set this value to at least 3 to have at least three instances of your workload running at any time. |
app: APP-NAME |
Use this app name when you define the anti-affinity rule later in the spec. |
To distribute your workload across multiple worker VMs, you must use anti-affinity rules. If you do not define an anti-affinity rule, the replicated pods can be assigned to the same worker node. See the Kubernetes documentation for more information about anti-affinity rules.
To define an anti-affinity rule, add the spec.template.spec.affinity
section to your deployment manifest:
kind: Deployment
metadata:
# ...
spec:
replicas: 3
template:
metadata:
labels:
app: APP-NAME
spec:
containers:
- name: MY-APP
image: MY-IMAGE
ports:
- containerPort: 12345
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- APP-NAME
topologyKey: "kubernetes.io/hostname"
See the following table for more information:
Key-Value Pair | Description |
---|---|
podAntiAffinity: |
|
matchExpressions: |
This value matches spec.template.metadata.labels.app . |
values: |
This value matches the APP-NAME you defined earlier in the spec. |
Kubernetes evenly spreads pods in a replication controller over multiple Availability Zones (AZs). For more granular control over scheduling pods, add an Anti-Affinity Rule
to the deployment spec by replacing "kubernetes.io/hostname"
with "failure-domain.beta.kubernetes.io/zone"
.
For more information about scheduling pods, see Advanced Scheduling in Kubernetes on the Kubernetes Blog.
If an AZ goes down, PersistentVolumes (PVs) and their data also go down and cannot be automatically re-attached. To preserve your PV data in the event of a fallen AZ, your persistent workload needs to have a failover mechanism in place.
Depending on the underlying storage type, PVs are either completely free of zonal information or can have multiple AZ labels attached. Both options enable a PV to travel between AZs.
To ensure the uptime of your PVs during a cluster upgrade, VMware recommends that you have at least two nodes per AZ. By configuring your workload as suggested, Kubernetes reschedules pods in the other node of the same AZ while BOSH is performing the upgrade.
For information about configuring PVs in Tanzu Kubernetes Grid Integrated Edition, see Configuring and Using PersistentVolumes.
For information about the supported storage topologies for Tanzu Kubernetes Grid Integrated Edition on vSphere, see PersistentVolume Storage Options on vSphere.