Restart and Delete Clusters

This topic explains how to shut down and restart workload clusters, and how to delete them.

Shut Down and Restart Workload Clusters

You may need to shut down and restart workload clusters to accommodate planned outages for network maintenance or planned network downtime.

Prerequisites

The DHCP lease and reservation for all cluster nodes on all networks must be longer than your planned outage.
- This ensures that the nodes retain their same addresses after they reboot, which is necessary.
jq installed locally.
Back up NSX Manager.

Procedure

Run the following command to collect information about your etcd database:

kubectl --kubeconfig /etc/kubernetes/admin.conf get pods `kubectl --kubeconfig /etc/kubernetes/admin.conf  get pods -A | grep etc | awk '{print $2}'` -n kube-system  -o=jsonpath='{.spec.containers[0].command}' | jq

Example output:

[
  "etcd",
  "--advertise-client-urls=https://192.168.7.154:2379",
  "--cert-file=/etc/kubernetes/pki/etcd/server.crt",
  "--client-cert-auth=true",
  "--data-dir=/var/lib/etcd",
  "--initial-advertise-peer-urls=https://192.168.7.154:2380",
  "--initial-cluster=workload-vsphere-tkg2-control-plane-fk5hw=https://192.168.7.154:2380",
  "--key-file=/etc/kubernetes/pki/etcd/server.key",
  "--listen-client-urls=https://127.0.0.1:2379,https://192.168.7.154:2379",
  "--listen-metrics-urls=http://127.0.0.1:2381",
  "--listen-peer-urls=https://192.168.7.154:2380",
  "--name=workload-vsphere-tkg2-control-plane-fk5hw",
  "--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt",
  "--peer-client-cert-auth=true",
  "--peer-key-file=/etc/kubernetes/pki/etcd/peer.key",
  "--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt",
  "--snapshot-count=10000",
  "--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt"
]

For each control plane node:

Run the ssh command to log in to the node.

Run the following command to locate its etcdctl executable.

find / -type f -name "*etcdctl*" -print

Example output:

 /run/containerd/io.containerd.runtime.v1.linux/k8s.io/823581f975804b65048f4babe2015a95cfa7ed6f767073796afe47b9d03299fb/rootfs/usr/local/bin/etcdctl`
 /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/19/fs/usr/local/bin/etcdctl

Create an etcd backup file and verify.

Run the following command:

ETCD-EXE snapshot save LOCAL-BACKUP --endpoints=ENDPOINTS --cacert=CA --cert=CERT --key=KEY

Where:

ETCD-EXE is the local path to the etcdctl executable
LOCAL-BACKUP is the local file to back up to, for example /tmp/etcdBackup1.db
ENDPOINTS, CA, CERT, and KEY are the --advertise-client-urls, --peer-trusted-ca-file, --cert-file, and --key-file values recorded above

For example:

/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/19/fs/usr/local/bin/etcdctl snapshot save /tmp/etcdBackup1.db \
 --endpoints=https://192.168.7.154:2379 \
 --cacert=/etc/kubernetes/pki/etcd/ca.crt \
 --cert=/etc/kubernetes/pki/etcd/server.crt \
 --key=/etc/kubernetes/pki/etcd/server.key

Verify that the backup file was created:
```
ls -l  LOCAL-BACKUP
```
Verify the backup content by generating a snapshot from the file:
```
ETCD-EXE --write-out=table snapshot status LOCAL-BACKUP
```

From your bootstrap machine, run the following sequence of commands to collect cluster information and save it to a file:

  tanzu cluster list -A > CLUSTER-INFO-1
  kubectl config get-contexts >> CLUSTER-INFO-1
  kubectl config use-context tkg-mgmt-vsphere-20211111074850-admin@tkg-mgmt-vsphere-20211111074850 >> CLUSTER-INFO-1
  kubectl get nodes -o wide >> CLUSTER-INFO-1
  kubectl config use-context mycluster1-admin@mycluster1 >> CLUSTER-INFO-1
  kubectl get nodes -o wide >> CLUSTER-INFO-1
  cat CLUSTER-INFO-1

Where CLUSTER-INFO-1 is a local text file to save the information to, for example /tmp/SaveClusterInfo1.txt.

Drain all applications on the worker nodes.
Stop all virtual machines on vCenter in the following order:
1. Shut down management cluster control plane nodes.
2. Shut down management cluster worker nodes.
3. Shut down workload cluster control plane nodes.
4. Shut down workload cluster worker nodes.
Restart all virtual machines on vCenter in the following order:
1. Start workload cluster control plane nodes.
2. Start workload cluster worker nodes.
3. Start management cluster control plane nodes.
4. Start management cluster worker nodes.

Run the following sequence of commands to collect cluster information and save it to a different file:

tanzu cluster list -A --include-management-cluster -A > CLUSTER-INFO-2
kubectl config get-contexts >> CLUSTER-INFO-2
kubectl config use-context tkg-mgmt-vsphere-20211111074850-admin@tkg-mgmt-vsphere-20211111074850 >> CLUSTER-INFO-2
kubectl get nodes -o wide >> CLUSTER-INFO-2
kubectl config use-context mycluster1-admin@mycluster1 >> CLUSTER-INFO-2
kubectl get nodes -o wide >> CLUSTER-INFO-2
cat CLUSTER-INFO-2

Where CLUSTER-INFO-2 is a different local text file to save the information to, for example /tmp/SaveClusterInfo2.txt.

Compare the two cluster information files to verify that they have the same cluster information, for example:
```
sdiff /tmp/SaveClusterInfo1.txt /tmp/SaveClusterInfo2.txt
```

Delete Workload Clusters

To delete a workload cluster, run the tanzu cluster delete command. Depending on the cluster contents and cloud infrastructure, you may need to delete in-cluster volumes and services before you delete the cluster itself.

Important
You must delete workload clusters explicitly; you cannot delete them by deleting their namespace in the management cluster.

List the clusters.

To list all workload clusters within the Tanzu CLI’s current login context, run the tanzu cluster list -A command.
```
tanzu cluster list -A
```
Delete volumes and services. If the cluster you want to delete contains persistent volumes or services such as load balancers and databases, you may need to manually delete them before you delete the cluster itself:
- Load Balancer: See Delete Service type LoadBalancer.
- Persistent Volumes and Persistent Volume Claims: See Delete Persistent Volume Claims and Persistent Volumes.
Delete service type LoadBalancer

To delete Service type LoadBalancer (Service) in a cluster:
1. Set kubectl to the cluster’s context.
```
kubectl config set-context my-cluster@user
```
2. Retrieve the cluster’s list of services.
```
kubectl get service
```
3. Delete each Service type LoadBalancer.
```
kubectl delete service <my-svc>
```
Delete persistent volumes and persistent volume claims

To delete Persistent Volume (PV) and Persistent Volume Claim (PVC) objects in a cluster:
1. Run kubectl config set-context my-cluster@user to set kubectl to the cluster’s context.
2. Run kubectl get pvc to retrieve the cluster’s Persistent Volume Claims (PVCs).
3. For each PVC:
  1. Run kubectl describe pvc <my-pvc> to identify the PV it is bound to. The PV is listed in the command output as Volume, after Status: Bound.
  2. Run kubectl describe pv <my-pv> to describe to determine if its bound PV Reclaim Policy is Retain or Delete.
  3. Run kubectl delete pvc <my-pvc> to delete the PVC.
  4. If the PV reclaim policy is Retain, run kubectl delete pv <my-pvc> and then log into your cloud portal and delete the PV object there. For example, delete a vSphere CNS volume from your datastore pane > Monitor > Cloud Native Storage > Container Volumes. For more information about vSphere CNS, see Getting Started with Cloud Native Storage in vSphere.
If needed, migrate workloads off of the clusters, for example by using Velero as described in Cluster Migration and Resource Filtering in the Velero documentation.
Delete the clusters.

To delete a cluster, run tanzu cluster delete.
```
tanzu cluster delete my-cluster
```
If the cluster is running in a namespace other than the default namespace, you must specify the --namespace option to delete that cluster.
```
tanzu cluster delete my-cluster --namespace=my-namespace
```
To skip the yes/no verification step when you run tanzu cluster delete, specify the --yes option.
```
tanzu cluster delete my-cluster --namespace=my-namespace --yes
```

Important
Do not change context or edit the .kube-tkg/config file while Tanzu Kubernetes Grid operations are running.

Restart and Delete Clusters

Shut Down and Restart Workload Clusters

Prerequisites

Procedure

Delete Workload Clusters

Delete service type LoadBalancer

Delete persistent volumes and persistent volume claims