This section includes tips to help you to troubleshoot common problems that you might encounter when installing Tanzu Kubernetes Grid and deploying Tanzu Kubernetes clusters.

Many of these procedures use the kind CLI on your bootstrap machine. To install kind, see Installation in the kind documentation.

Access Management Cluster Deployment Logs

To monitor and troubleshoot management cluster deployments, review:

  • The log file listed in the terminal output Logs of the command execution can also be found at...

  • The log from your cloud provider module for Cluster API. Retrieve the most recent one as follows:

    1. Search your tkg init output for Bootstrapper created. Kubeconfig: and copy the kubeconfig file path listed. The file is in ~/.kube-tkg/tmp/.
    2. Run the following, based on your cloud provider:
      • vSphere: kubectl logs deployment.apps/capv-controller-manager -n capv-system manager --kubeconfig </path/to/kubeconfig>
      • Amazon EC2: kubectl logs deployment.apps/capa-controller-manager -n capa-system manager --kubeconfig </path/to/kubeconfig>
      • Azure: kubectl logs deployment.apps/capz-controller-manager -n capz-system manager --kubeconfig </path/to/kubeconfig>

Monitor Workload Cluster Deployments in Cluster API Logs

After running tkg create cluster, you can monitor the deployment process in the Cluster API logs on the management cluster.

To access these logs, follow the steps below:

  1. Set kubeconfig to your management cluster. For example:

    kubectl config use-context my-management-cluster-admin@my-management-cluster
    
  2. Run the following:

    • capi logs:

      kubectl logs deployments/capi-controller-manager -n capi-system manager
      
    • IaaS-specific logs:

      • vSphere: kubectl logs deployments/capv-controller-manager -n capv-system manager
      • Amazon EC2: kubectl logs deployments/capa-controller-manager -n capa-system manager
      • Azure: kubectl logs deployments/capz-controller-manager -n capz-system manager

Clean Up After an Unsuccessful Management Cluster Deployment

Problem

An unsuccessful attempt to deploy a Tanzu Kubernetes Grid management cluster leaves orphaned objects in your cloud infrastructure and on your bootstrap machine.

Solution

  1. Monitor your tkg init command output either in the terminal or Tanzu Kubernetes Grid installer interface. If the command fails, it prints a help message that includes the following: "Failure while deploying management cluster... To clean up the resources created by the management cluster: tkg delete mc...."
  2. Run tkg delete mc YOUR-CLUSTER-NAME. This command removes the objects that it created in your infrastructure and locally.

You can also use the alternative methods described below:

  • Bootstrap machine cleanup:

    • To remove a kind cluster, use the kind CLI. For example:

      kind get clusters
      kind delete cluster --name tkg-kind-example1234567abcdef
      
    • To remove Docker objects, use the docker CLI. For example, docker rm, docker rmi, and docker system prune.

      CAUTION: If you are running Docker processes that are not related to Tanzu Kubernetes Grid on your system, remove unneeded Docker objects individually.

  • Infrastructure provider cleanup:

    • vSphere: Locate, power off, and delete the VMs and other resources that were created by Tanzu Kubernetes Grid.
    • AWS: Log in to your Amazon EC2 dashboard and delete the resources manually or use an automated solution.
    • Azure: In Resource Groups, open your AZURE_RESOURCE_GROUP. Use checkboxes to select and Delete the resources that were created by Tanzu Kubernetes Grid, which contain a timestamp in their names.

Kind Cluster Remains after Deleting Management Cluster

Problem

Running tkg delete management-cluster removes the management cluster, but fails to delete the local kind cluster from the bootstrap machine.

Solution

  1. List all running kind clusters and remove the one that looks like tkg-kind-unique_ID

    kind delete cluster --name tkg-kind-unique_ID
    
  2. List all running clusters and identify the kind cluster.

    docker ps -a
    
  3. Copy the container ID of the kind cluster and remove it.

    docker kill container_ID
    

Failed Validation, Credentials Error on Amazon EC2

Problem

Running tkg init fails with an error similar to the following:

Validating the pre-requisites...
Looking for AWS credentials in the default credentials provider chain

Error: : Tkg configuration validation failed: failed to get AWS client: NoCredentialProviders: no valid providers in chain
caused by: EnvAccessKeyNotFound: AWS_ACCESS_KEY_ID or AWS_ACCESS_KEY not found in environment
SharedCredsLoad: failed to load shared credentials file
caused by: FailedRead: unable to open file
caused by: open /root/.aws/credentials: no such file or directory
EC2RoleRequestError: no EC2 instance role found
caused by: EC2MetadataError: failed to make EC2Metadata request

Solution

Tanzu Kubernetes Grid uses the default AWS credentials provider chain. Before creating a management cluster on Amazon EC2, you must set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. For instructions, see Register an SSH Public Key with Your AWS Account and Deploy Management Clusters to Amazon EC2 with the Installer Interface or Deploy Management Clusters to Amazon EC2 with the CLI.

Deploying a Tanzu Kubernetes Cluster Times Out, but the Cluster Is Created

Problem

Running tkg create cluster fails with a timeout error similar to the following:

I0317 11:11:16.658433 clusterclient.go:341] Waiting for resource my-cluster of type *v1alpha3.Cluster to be up and running
E0317 11:26:16.932833 common.go:29]
Error: unable to wait for cluster and get the cluster kubeconfig: error waiting for cluster to be provisioned (this may take a few minutes): cluster control plane is still being initialized
E0317 11:26:16.933251 common.go:33]
Detailed log about the failure can be found at: /var/folders/_9/qrf26vd5629_y5vgxc1vjk440000gp/T/tkg-20200317T111108811762517.log

However, if you run tkg get cluster, the cluster appears to have been created.

-----------------------+

NAME	STATUS
-----------------------+

my-cluster	Provisioned
-----------------------+

Solution

  1. Use the tkg get credentials command to add the cluster credentials to your kubeconfig.

    tkg get credentials my-cluster
    
  2. Set kubectl to the cluster's context.

    kubectl config set-context my-cluster@user
    
  3. Check whether the cluster nodes are all in the ready state.

    kubectl get nodes
    
  4. Check whether all of the pods are up and running.

    kubectl get pods -A
    
  5. If all of the nodes and pods are running correctly, your Tanzu Kubernetes cluster has been created successfully and you can ignore the error.

  6. If the nodes and pods are not running correctly, attempt to delete the cluster.

    tkg delete cluster my-cluster
    
  7. If tkg delete cluster fails, use kubectl to delete the cluster manually.

Pods Are Stuck in Pending on Cluster Due to vCenter Connectivity

Problem

When you run kubectl get pods -A on the created cluster, some pods remain in pending.

You run kubectl describe pod -n pod-namespace pod-name on an affected pod and review events and see the following event:

n node(s) had taint {node.cloudprovider.kubernetes.io/uninitialized: true}, that the pod didn't tolerate

Solution

Ensure there is connectivity and firewall rules in place to ensure communication between the cluster and vCenter.

Tanzu Kubernetes Grid UI Does Not Display Correctly on Windows

Problem

When you run the tkg init --ui command on a Windows system, the UI opens in your default browser, but the graphics and styling are not applied. This happens because a Windows registry is set to application/x-css.

Solution

  1. In Windows search, enter regedit to open the Registry Editor utility.
  2. Expand HKEY_CLASSES_ROOT and select .css.
  3. Right-click Content Type and select Modify.
  4. Set the Value to text/css and click OK.
  5. Run the tkg init --ui command again to relaunch the UI.

Running tkg init on macOS Results in kubectl Version Error

Problem

If you run the tkg init command on macOS with the latest stable version of Docker Desktop, tkg init fails with the error message:

Error: : kubectl prerequisites validation failed: kubectl client version v1.15.5 is less than minimum supported kubectl client version 1.17.0

This happens because Docker Desktop symlinks kubectl 1.15 into the path.

Solution

Place a newer supported version of kubectl in the path before Docker's version.

Connect to Cluster Nodes with SSH

You can use SSH to connect to individual nodes of management clusters or Tanzu Kubernetes clusters. To do so, the SSH key pair that you created when you deployed the management cluster must be available on the machine on which you run the SSH command. Consquently, you must run ssh commands on the machine on which you run tkg commands.

The SSH keys that you register with the management cluster, and consequently that are used by any Tanzu Kubernetes clusters that you deploy from the management cluster, are associated with the following user accounts:

  • vSphere management cluster and Tanzu Kubernetes nodes: capv
  • Amazon EC2 bastion nodes: ubuntu
  • Amazon EC2 management cluster and Tanzu Kubernetes nodes: ec2-user

To connect to a node by using SSH, run one of the following commands from the machine that you use as the bootstrap machine:

  • vSphere nodes: ssh capv@node_IP_address
  • Amazon EC2 bastion: ssh ubuntu@bastion_IP_address
  • Amazon EC2 nodes: ssh ec2-user@node_IP_address
  • Azure nodes: ssh capi@node_IP_address

Because the SSH key is present on the system on which you are running the ssh command, no password is required.

Recover Management Cluster Credentials

If you have lost the credentials for a management cluster, for example by inadvertently deleting the .kube-tkg/config file on the system on which you run tkg commands, you can recover the credentials from the management cluster control plane node.

  1. Run tkg get management-cluster to recreate the .kube-tkg/config file.
  2. Obtain the public IP address of the management cluster control plane node, from vSphere, Amazon EC2, or Azure.
  3. Use SSH to log in to the management cluster control plane node.

    • vSphere: ssh capv@node_IP_address
    • Amazon EC2: ssh ec2-user@node_IP_address
    • Azure: ssh capi@node_IP_address
  4. Access the admin.conf file for the management cluster.

    sudo vi /etc/kubernetes/admin.conf
    

    The admin.conf file contains the cluster name, the cluster user name, the cluster context, and the client certificate data.

  5. Copy the cluster name, the cluster user name, the cluster context, and the client certificate data into the .kube-tkg/config file on the system on which you run tkg commands.

Restore ~/.tkg Directory

Problem

The ~/.tkg directory on the bootstrap machine has been accidentally deleted or corrupted. The tkg CLI creates and uses this directory, and cannot function without it.

Solution

To restore the contents of the ~/.tkg directory:

  1. To identify existing Tanzu Kubernetes Grid management clusters, run:

    kubectl --kubeconfig ~/.kube-tkg/config config get-contexts
    

    The command output lists names and contexts of all management clusters created or added by the v1.2 tkg or v1.3 tanzu CLI.

  2. For each management cluster listed in the output:

    1. Run kubectl config use-context MGMT-CLUSTER-NAME to set the current kubectl context to the management cluster.
    2. Run tkg add management-cluster restore the context to the ~/.tanzu directory and CLI.

Disable nfs-utils on Photon OS Nodes

Problem

In Tanzu Kubernetes Grid v1.1.2 and later, nfs-utils is enabled by default. If you do not require nfs-utils, you can remove it from cluster node VMs.

Solution

To disable nfs-utils on clusters that you deploy with Tanzu Kubernetes Grid v1.1.2 or later, use SSH to log in to the cluster node VMs and run the following command:

tdnf erase nfs-utils

For information about using nfs-utils on clusters deployed with Tanzu Kubernetes Grid v1.0 or 1.1.0, see Enable or Disable nfs-utils on Photon OS Nodes in the VMware Tanzu Kubernetes Grid 1.1.x Documentation.

check-circle-line exclamation-circle-line close-line
Scroll to top icon