The following sections describe how to troubleshoot failures to deploy of the VMware Tanzu Kubernetes Grid Integrated Edition Management Console and of Tanzu Kubernetes Grid Integrated Edition instances from the management console.
For information about how to deploy the management console and install Tanzu Kubernetes Grid Integrated Edition, see Install on vSphere with the Management Console.
Problem
Tanzu Kubernetes Grid Integrated Edition Management Console VM fails to deploy from the OVA template.
Solution
root
user. Run the following command to obtain the server logs:
journalctl -u pks-mgmt-server > server.log
If the logs do not provide the solution, delete the management console VM from vCenter Server and attempt to deploy it again.
Problem
Tanzu Kubernetes Grid Integrated Edition fails to deploy from the management console.
Solution
Problem
In a deployment to a multiple-tier0 topology, Tanzu Kubernetes Grid Integrated Edition Management Console cannot display cluster information when you go to TKG Integrated Edition > Clusters and select a cluster. You see errors of the following type:
Failed to retrieve current K8s Cluster summary. cannot get cluster details: cannot get cluster namespaces: Get https://<address>:8443/api/v1/namespaces: dial tcp <address>:8443: i/o timeout
Failed to retrieve current K8s Cluster Volumes. cannot get namespaces of cluster 0116663b-f27b-4026-87e3-cddd01af41f2: Get https://<address>:8443/api/v1/namespaces: dial tcp <address>:8443: i/o timeout
Cause
In a single tier0 topology, Tanzu Kubernetes Grid Integrated Edition Management Console is deployed to the same infrastructure network as vSphere and NSX-T Data Center. In a multiple-tier0 topology, due to tenant isolation, the infrastructure network is not routable to tenant tier0 uplink networks. In a multiple-tier0 topology, data from the Kubernetes API is exposed by floating IP addresses on tenant tier0 routers. Consequently, the management console cannot retrieve cluster data from the Kubernetes API because it is not on the same network as the tenants.
Solution
Make sure that the Tanzu Kubernetes Grid Integrated Edition Management Console can connect to tenant floating IP addresses.
ssh
.route add -net <destination_subnet> gw <gateway_address>
Because the gateway can reach both the management console and the tenant floating IP addresses, the management console can reach the tenants and retrieve cluster data from the Kubernetes API.
If you have an existing TKGI installation on a vSphere N-VDS network and migrate your network to VDS, TKGI MC will no longer be able to upgrade TKGI to a newer version.
Explanation
TKGI MC uses network resource MOIDs to manage network resources, allowing the Management Console to manage network resources with identical names.
When converting a network from N-VDS to VDS, network resources are assigned new MOIDs.
Although your existing TKGI Kubernetes cluster workloads continue to function, the Management Console configuration maintains stale network resource MOIDs and cannot upgrade the TKGI control plane.
Workaround
To upgrade TKGI MC after converting a network from N-VDS to VDS:
To collect your network’s current configuration:
Export the pks-management-server
container’s IP as an environment variable:
export MGMT_IP=`docker inspect pks-mgmt-server --format='{{.NetworkSettings.Networks.pks.IPAddress}}'`
Query and save your data center list as a json file named datacenter.json
:
curl -u root http://{$MGMT_IP}:8080/api/v1/inventory/vcenter/datacenter -X GET -k -H \
"Content-type:application/json" > datacenter.json
Review the exported datacenter.json
file and note the data center MOID.
Query and save your network list as a json file named networks.json
:
curl -u root http://{$MGMT_IP}:8080/api/v1/inventory/vcenter/network?dc=DC-MOID -X GET -k -H \
"Content-type:application/json" > networks.json
Where DC-MOID
is the data center MOID you noted in the previous step.
To reconfigure TKGI MC with the current network resources MOIDs, do one of the following:
Manually configure TKGI MC using a manifest file:
Manually create a TKGI MC manifest file with the current network resources MOIDs:
Export the TKGI MC manifest as a manifest file named manifest.json
:
curl -u root:Admin\ADMIN-PASSWORD https://localhost/api/v1/deployment/manifest -X GET -k -H "Content-type:application/json" > manifest.json
Where ADMIN-PASSWORD
is the password for the admin account used in the command.
Back up the exported manifest file.
networks.json
file you exported above and note the moid
and type
values.dep_network_moid
and dep_network_type
values to the moid
and type
values you collected from the networks.json
file.To restore your TKGI MC’s ability to manage the TKGI network resources:
Update TKGI MC with the revised manifest:
curl -u root:Admin\ADMIN-PASSWORD http://{$MGMT_IP}:8080/api/v1/deployment -X POST -d @manifest.json -k -H "Content-type:application/json"
Where ADMIN-PASSWORD
is the password for the admin account used in the command.
To validate your TKGI MC’s configuration:
SUCCESS
or SKIPPED
.To upgrade TKGI MC, deploy the TKGI MC OVA for the desired TKGI MC version.
If you enabled integration with VMware vRealize Log Insight, Tanzu Kubernetes Grid Integrated Edition Management Console generates a unique vRealize Log Insight agent ID for the management console VM. You must provide this agent ID to vRealize Log Insight so that it can pull the appropriate logs from the management console.
You obtain the vRealize Log Insight agent ID as follows:
root
user.Run the following command to obtain the ID:
grep LOGINSIGHT_ID /etc/vmware/environment | cut -d= -f2
The resulting ID will be similar to 59debec7-daba-4770-9d21-226ffd743843
.
Log in to the vRealize Log Insight Web user interface as administrator and add the agent ID to your list of agents.