Restart and Delete Clusters

This topic explains how to shut down and restart workload clusters, and how to delete them.

Shut Down and Restart Workload Clusters

You may need to shut down and restart workload clusters to accommodate planned outages for network maintenance or planned network downtime.

Prerequisites

The DHCP lease and reservation for all cluster nodes on all networks must be longer than your planned outage.
- This ensures that the nodes retain their same addresses after they reboot, which is necessary.
jq installed locally.
Back up NSX Manager.

Procedure

Run the following command to collect information about your etcd database:

kubectl --kubeconfig /etc/kubernetes/admin.conf get pods `kubectl --kubeconfig /etc/kubernetes/admin.conf  get pods -A | grep etc | awk '{print $2}'` -n kube-system  -o=jsonpath='{.spec.containers[0].command}' | jq

Example output:

[
  "etcd",
  "--advertise-client-urls=https://192.168.7.154:2379",
  "--cert-file=/etc/kubernetes/pki/etcd/server.crt",
  "--client-cert-auth=true",
  "--data-dir=/var/lib/etcd",
  "--initial-advertise-peer-urls=https://192.168.7.154:2380",
  "--initial-cluster=workload-vsphere-tkg2-control-plane-fk5hw=https://192.168.7.154:2380",
  "--key-file=/etc/kubernetes/pki/etcd/server.key",
  "--listen-client-urls=https://127.0.0.1:2379,https://192.168.7.154:2379",
  "--listen-metrics-urls=http://127.0.0.1:2381",
  "--listen-peer-urls=https://192.168.7.154:2380",
  "--name=workload-vsphere-tkg2-control-plane-fk5hw",
  "--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt",
  "--peer-client-cert-auth=true",
  "--peer-key-file=/etc/kubernetes/pki/etcd/peer.key",
  "--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt",
  "--snapshot-count=10000",
  "--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt"
]

For each control plane node:

Run the ssh command to log in to the node.

Run the following command to locate its etcdctl executable.

find / -type f -name "*etcdctl*" -print

Example output:

 /run/containerd/io.containerd.runtime.v1.linux/k8s.io/823581f975804b65048f4babe2015a95cfa7ed6f767073796afe47b9d03299fb/rootfs/usr/local/bin/etcdctl`
 /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/19/fs/usr/local/bin/etcdctl

Create an etcd backup file and verify.

Run the following command:

ETCD-EXE snapshot save LOCAL-BACKUP --endpoints=ENDPOINTS --cacert=CA --cert=CERT --key=KEY

Where:

ETCD-EXE is the local path to the etcdctl executable
LOCAL-BACKUP is the local file to back up to, for example /tmp/etcdBackup1.db
ENDPOINTS, CA, CERT, and KEY are the --advertise-client-urls, --peer-trusted-ca-file, --cert-file, and --key-file values recorded above

For example:

/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/19/fs/usr/local/bin/etcdctl snapshot save /tmp/etcdBackup1.db \
 --endpoints=https://192.168.7.154:2379 \
 --cacert=/etc/kubernetes/pki/etcd/ca.crt \
 --cert=/etc/kubernetes/pki/etcd/server.crt \
 --key=/etc/kubernetes/pki/etcd/server.key

Verify that the backup file was created:
```
ls -l  LOCAL-BACKUP
```
Verify the backup content by generating a snapshot from the file:
```
ETCD-EXE --write-out=table snapshot status LOCAL-BACKUP
```

From your bootstrap machine, run the following sequence of commands to collect cluster information and save it to a file:

  tanzu cluster list -A > CLUSTER-INFO-1
  kubectl config get-contexts >> CLUSTER-INFO-1
  kubectl config use-context tkg-mgmt-vsphere-20211111074850-admin@tkg-mgmt-vsphere-20211111074850 >> CLUSTER-INFO-1
  kubectl get nodes -o wide >> CLUSTER-INFO-1
  kubectl config use-context mycluster1-admin@mycluster1 >> CLUSTER-INFO-1
  kubectl get nodes -o wide >> CLUSTER-INFO-1
  cat CLUSTER-INFO-1

Where CLUSTER-INFO-1 is a local text file to save the information to, for example /tmp/SaveClusterInfo1.txt.

Drain all applications on the worker nodes.
Stop all virtual machines on vCenter in the following order:
1. Shut down management cluster control plane nodes.
2. Shut down management cluster worker nodes.
3. Shut down workload cluster control plane nodes.
4. Shut down workload cluster worker nodes.
Restart all virtual machines on vCenter in the following order:
1. Start workload cluster control plane nodes.
2. Start workload cluster worker nodes.
3. Start management cluster control plane nodes.
4. Start management cluster worker nodes.

Run the following sequence of commands to collect cluster information and save it to a different file:

tanzu cluster list -A --include-management-cluster -A > CLUSTER-INFO-2
kubectl config get-contexts >> CLUSTER-INFO-2
kubectl config use-context tkg-mgmt-vsphere-20211111074850-admin@tkg-mgmt-vsphere-20211111074850 >> CLUSTER-INFO-2
kubectl get nodes -o wide >> CLUSTER-INFO-2
kubectl config use-context mycluster1-admin@mycluster1 >> CLUSTER-INFO-2
kubectl get nodes -o wide >> CLUSTER-INFO-2
cat CLUSTER-INFO-2

Where CLUSTER-INFO-2 is a different local text file to save the information to, for example /tmp/SaveClusterInfo2.txt.

Compare the two cluster information files to verify that they have the same cluster information, for example:
```
sdiff /tmp/SaveClusterInfo1.txt /tmp/SaveClusterInfo2.txt
```

Delete Workload Clusters

To delete a workload cluster, run the tanzu cluster delete command. Depending on the cluster contents and cloud infrastructure, you may need to delete in-cluster volumes and services before you delete the cluster itself.

Important
You must delete workload clusters explicitly; you cannot delete them by deleting their namespace in the management cluster.

List the clusters.

To list all workload clusters within the Tanzu CLI’s current login context, run the tanzu cluster list -A command.
```
tanzu cluster list -A
```
Delete volumes and services.

If the cluster you want to delete contains persistent volumes or services such as load balancers and databases, you may need to manually delete them before you delete the cluster itself. What you need to pre-delete depends on your cloud infrastructure:
vSphere
Do the following if your infrastructure is vSphere.
- Load Balancer: See Delete Service type LoadBalancer.
- Persistent Volumes and Persistent Volume Claims: See Delete Persistent Volume Claims and Persistent Volumes.
Delete service type LoadBalancer

To delete Service type LoadBalancer (Service) in a cluster:
1. Set kubectl to the cluster’s context.
  
  kubectl config set-context my-cluster@user
2. Retrieve the cluster’s list of services.
  
  kubectl get service
3. Delete each Service type LoadBalancer.
  
  kubectl delete service <my-svc>
Delete persistent volumes and persistent volume claims

To delete Persistent Volume (PV) and Persistent Volume Claim (PVC) objects in a cluster:
1. Run kubectl config set-context my-cluster@user to set kubectl to the cluster’s context.
2. Run kubectl get pvc to retrieve the cluster’s Persistent Volume Claims (PVCs).
3. For each PVC:
  
  Run kubectl describe pvc <my-pvc> to identify the PV it is bound to. The PV is listed in the command output as Volume, after Status: Bound.
  
  Run kubectl describe pv <my-pv> to describe to determine if its bound PV Reclaim Policy is Retain or Delete.
  
  Run kubectl delete pvc <my-pvc> to delete the PVC.
  
  If the PV reclaim policy is Retain, run kubectl delete pv <my-pvc> and then log into your cloud portal and delete the PV object there. For example, delete a vSphere CNS volume from your datastore pane > Monitor > Cloud Native Storage > Container Volumes. For more information about vSphere CNS, see Getting Started with Cloud Native Storage in vSphere.
Amazon Web Services (AWS)
Do the following if your infrastructure is AWS.
- Load Balancers: Application or Network Load Balancers (ALBs or NLBs) in the cluster’s VPC, but not Classic Load Balancers (ELB v1). Delete these resources in the AWS UI or with the kubectl delete command.
- Other Services: Any subnet and AWS-backed service in the cluster’s VPC, such as an RDS or VPC, and related resources such as:
  
  VPC: Delete under VPC Dashboard > Virtual Private Cloud > Your VPCs.
  
  RDS: Delete under RDS Dashboard > Databases.
  
  Subnets: Delete under VPC Dashboard > Virtual Private Cloud > Subnets.
  
  Route Tables: Delete under VPC Dashboard > Virtual Private Cloud > Route Tables.
  
  Internet Gateways: Delete under VPC Dashboard > Virtual Private Cloud > Internet Gateways.
  
  Elastic IP Addresses: Delete under VPC Dashboard > Virtual Private Cloud > Elastic IPs.
  
  NAT Gateways: Delete under VPC Dashboard > Virtual Private Cloud > NAT Gateways.
  
  Network ACLs: Delete under VPC Dashboard > Security > Network ACLs.
  
  Security Groups: Delete under VPC Dashboard > Security > Security Groups.
  
  Delete these resources in the AWS UI as above or with the aws CLI.
- Persistent Volumes and Persistent Volume Claims: Delete these resources with the kubectl delete command as described in Delete Persistent Volume Claims and Persistent Volumes, below.
Azure

No action is required if your infrastructure is Azure.
Deleting a cluster deletes everything that TKG created in the cluster’s resource group.
If needed, migrate workloads off of the clusters, for example by using Velero as described in Cluster Migration and Resource Filtering in the Velero documentation.
Delete the clusters.

To delete a cluster, run tanzu cluster delete.
```
tanzu cluster delete my-cluster
```
If the cluster is running in a namespace other than the default namespace, you must specify the --namespace option to delete that cluster.
```
tanzu cluster delete my-cluster --namespace=my-namespace
```
To skip the yes/no verification step when you run tanzu cluster delete, specify the --yes option.
```
tanzu cluster delete my-cluster --namespace=my-namespace --yes
```
To delete a cluster on AWS, the AWS_REGION variable must be set to the region where the cluster is running. You can set AWS_REGION in the local environment or credential profile, as described in Configure AWS Account Credentials. To delete the cluster in a different region, prepend the setting to the tanzu cluster delete command:
```
AWS_REGION=eu-west-1 tanzu cluster delete my-cluster
```

Important
Do not change context or edit the .kube-tkg/config file while Tanzu Kubernetes Grid operations are running.

Restart and Delete Clusters

Shut Down and Restart Workload Clusters

Prerequisites

Procedure

Delete Workload Clusters

Delete service type LoadBalancer

Delete persistent volumes and persistent volume claims