If the update of a Tanzu Kubernetes cluster fails, you can restart the update job and try the update again.

Problem

The update of a Tanzu Kubernetes cluster fails and results in cluster status of upgradefailed.

Cause

There can be a number of reasons for a failed cluster update, such as insufficient storage. To restart a failed update job and try the cluster update again, complete the following procedure.

Solution

  1. Log in to the Supervisor Cluster as an administrator. See Connect to the Supervisor Cluster as a vCenter Single Sign-On User.
  2. Look up the update_job_name.
    kubectl get jobs -n vmware-system-tkg -l "run.tanzu.vmware.com/cluster-namespace=${cluster_namespace},cluster.x-k8s.io/cluster-name=${cluster_name}"
  3. Run kubectl proxy so that curl can be used to issue requests.
    kubectl proxy &
    
    You should see Starting to serve on 127.0.0.1:8001.
    Note: You cannot use kubectl to patch or update the .status of a resource.
  4. Using curl issue the following patch command to raise the .spec.backoffLimit.
    curl -H "Accept: application/json" -H "Content-Type: application/json-patch+json" 
    --request PATCH --data '[{"op": "replace", "path": "/spec/backoffLimit", "value": 8}]' 
    http://127.0.0.1:8001/apis/batch/v1/namespaces/vmware-system-tkg/jobs/${update_job_name}
  5. Using curl issue the following patch command to clear the .status.conditions so that the Job controller will create new pods.
    $ curl -H "Accept: application/json" -H "Content-Type: application/json-patch+json" 
    --request PATCH --data '[{"op": "remove", "path": "/status/conditions"}]' 
    http://127.0.0.1:8001/apis/batch/v1/namespaces/vmware-system-tkg/jobs/${update_job_name}/status