Shut Down and Restart Clusters

This topic explains how to shut down and restart Tanzu Kubernetes Grid clusters.

You may need to follow this procedure to accommodate planned outages for network maintenance or planned network downtime.

Prerequisites

  • The DHCP lease and reservation for all cluster nodes on all networks must be longer than your planned outage.
    • This ensures that the nodes retain their same addresses after they reboot, which is necessary.
  • jq installed locally.
  • Back up NSX Manager.

Procedure

  1. Run the following command to collect information about your etcd database:

    kubectl --kubeconfig /etc/kubernetes/admin.conf get pods `kubectl --kubeconfig /etc/kubernetes/admin.conf  get pods -A | grep etc | awk '{print $2}'` -n kube-system  -o=jsonpath='{.spec.containers[0].command}' | jq
    

    Example output:

    [
    "etcd",
    "--advertise-client-urls=https://192.168.7.154:2379",
    "--cert-file=/etc/kubernetes/pki/etcd/server.crt",
    "--client-cert-auth=true",
    "--data-dir=/var/lib/etcd",
    "--initial-advertise-peer-urls=https://192.168.7.154:2380",
    "--initial-cluster=workload-vsphere-tkg2-control-plane-fk5hw=https://192.168.7.154:2380",
    "--key-file=/etc/kubernetes/pki/etcd/server.key",
    "--listen-client-urls=https://127.0.0.1:2379,https://192.168.7.154:2379",
    "--listen-metrics-urls=http://127.0.0.1:2381",
    "--listen-peer-urls=https://192.168.7.154:2380",
    "--name=workload-vsphere-tkg2-control-plane-fk5hw",
    "--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt",
    "--peer-client-cert-auth=true",
    "--peer-key-file=/etc/kubernetes/pki/etcd/peer.key",
    "--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt",
    "--snapshot-count=10000",
    "--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt"
    ]
    
  2. For each control plane node:

    1. ssh in to the node.

    2. Run the following command to locate its etcdctl executable.

      find / -type f -name "*etcdctl*" -print
      

      Example output:

      /run/containerd/io.containerd.runtime.v1.linux/k8s.io/823581f975804b65048f4babe2015a95cfa7ed6f767073796afe47b9d03299fb/rootfs/usr/local/bin/etcdctl`
      /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/19/fs/usr/local/bin/etcdctl
      
    3. Create an etcd backup:

      ETCD-EXE snapshot save LOCAL-BACKUP --endpoints=ENDPOINTS --cacert=CA --cert=CERT --key=KEY
      

      Where:

      • ETCD-EXE is the local path to the etcdctl executable
      • LOCAL-BACKUP is the local file to back up to, for example /tmp/etcdBackup1.db
      • ENDPOINTS, CA, CERT, and KEY are the --advertise-client-urls, --peer-trusted-ca-file, --cert-file, and --key-file values recorded above

      For example:

      /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/19/fs/usr/local/bin/etcdctl snapshot save /tmp/etcdBackup1.db \
      --endpoints=https://192.168.7.154:2379 \
      --cacert=/etc/kubernetes/pki/etcd/ca.crt \
      --cert=/etc/kubernetes/pki/etcd/server.crt \
      --key=/etc/kubernetes/pki/etcd/server.key
      
    4. Verify that the backup file was created:

      ls -l  LOCAL-BACKUP
      
    5. Verify the backup content by generating a snapshot from the file:

      ETCD-EXE --write-out=table snapshot status LOCAL-BACKUP
      
  3. From your bootstrap machine, run the following sequence of commands to collect cluster information and save it to a file:

    tanzu cluster list --include-management-cluster > CLUSTER-INFO-1
    kubectl config get-contexts >> CLUSTER-INFO-1
    kubectl config use-context tkg-mgmt-vsphere-20211111074850-admin@tkg-mgmt-vsphere-20211111074850 >> CLUSTER-INFO-1
    kubectl get nodes -o wide >> CLUSTER-INFO-1
    kubectl config use-context mycluster1-admin@mycluster1 >> CLUSTER-INFO-1
    kubectl get nodes -o wide >> CLUSTER-INFO-1
    cat CLUSTER-INFO-1
    

    Where CLUSTER-INFO-1 is a local text file to save the information to, for example /tmp/SaveClusterInfo1.txt.

  4. Drain all applications on the worker nodes.

  5. Stop all virtual machines on vCenter in the following order:

    1. Shut down management cluster control plane nodes
    2. Shut down management cluster worker nodes
    3. Shut down workload cluster control plane nodes
    4. Shut down workload cluster worker nodes
  6. Restart all virtual machines on vCenter in the following order:

    1. Start workload cluster control plane nodes
    2. Start workload cluster worker nodes
    3. Start management cluster control plane nodes
    4. Start management cluster worker nodes
  7. Run the following sequence of commands to collect cluster information and save it to a different file:

    tanzu cluster list --include-management-cluster > CLUSTER-INFO-2
    kubectl config get-contexts >> CLUSTER-INFO-2
    kubectl config use-context tkg-mgmt-vsphere-20211111074850-admin@tkg-mgmt-vsphere-20211111074850 >> CLUSTER-INFO-2
    kubectl get nodes -o wide >> CLUSTER-INFO-2
    kubectl config use-context mycluster1-admin@mycluster1 >> CLUSTER-INFO-2
    kubectl get nodes -o wide >> CLUSTER-INFO-2
    cat CLUSTER-INFO-2
    

    Where CLUSTER-INFO-2 is a different local text file to save the information to, for example /tmp/SaveClusterInfo2.txt.

  8. Compare the two cluster information files to verify that they have the same cluster information, for example:

    sdiff /tmp/SaveClusterInfo1.txt /tmp/SaveClusterInfo2.txt
    
check-circle-line exclamation-circle-line close-line
Scroll to top icon