The following outlines some basic workflows for troubleshooting a Tanzu Mission Control Self-Managed deployment that you can use in case of failure or loss of functionality.
All the packages for Tanzu Mission Control Self-Managed are deployed in the tmc-local
namespace. To verify the installation, You can list all the packages installed with the following command:
kubectl -n tmc-local get packageinstall
All packageinstalls should indicate Reconcile succeeded
in the DESCRIPTION
column. If you see any other status (for example, Reconcile failed
), then you can see more details on why it is failing using the following command:
kubectl -n tmc-local describe packageinstall <packageinstall-name>
The components of Tanzu Mission Control Self-Managed are deployed in the tmc-local
namespace. You can list the running pods with the following command:
kubectl -n tmc-local get pods
Observe the STATUS
column. All pods should be in Running
or Completed
state. If you see any other status (for example, Failed
or CrashLoopBackOff
), then you can see more details on why it is failing using the following command:
kubectl -n tmc-local describe po <pod-name>
For example:
kubectl -n tmc-local describe po wcm-server-545845b58-8d4dp
You can find the error message (if any) in the Events
section of the output. If you don’t see anything relevant, then you can check the logs from pods. For example:
kubectl -n tmc-local logs wcm-server-545845b58-8d4dp
# keep streaming new logs
kubectl -n tmc-local logs wcm-server-545845b58-8d4dp -f
# Stream logs from all pods matching label `app=resource-manager`.
# This may be helpful sometimes when you have two or more replicas of a pod running
kubectl -n tmc-local logs -l app=resource-manager -f
Example Error Message | Reason and Fix |
---|---|
transport: Error while dialing: dial tcp 10.20.10.100:443: connect: no route to host |
The wrong Load Balancer IP address was specified or a DNS entry points to the wrong address. Confirm the DNS entries are correct and the correct IP address was provided for the Load Balancer. |
transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate signed by unknown authority |
The certificate authority (CA) certificate for TMC Self-Managed certificates was not included in the list of trusted CAs. Update the configuration file to include the root CA certificate and then repeat the installation. |
Unable to attach or mount volumes: unmounted volumes=[landing-service-tls] MountVolume.SetUp failed for volume “landing-service-tls” : secret “landing-service-tls” not found This error may appear multiple times with different values. |
The TMC Self-Managed certificates are missing or cannot be created by cert-manager .If importing the certificates, verify the proper secrets have been created in the tmc-local namespace by running.kubectl -n tmc-local get secrets If using a ClusterIssuer, here are some potential fixes.
|
External traffic flows into Tanzu Mission Control Self-Managed through envoy
, a reverse proxy. The envoy proxy can receive traffic from a load balancer in your infrastructure. You can view this traffic by streaming the logs from envoy pods. For example:
kubectl -n tmc-local logs -c envoy -l app.kubernetes.io/component=envoy -f
Envoy routes this traffic to the appropriate service (pods) based on path prefix matching. You can see those details from the stack HTTPProxy object like this:
kubectl -n tmc-local get httpproxy stack-http-proxy -o yaml
Tanzu Mission Control Self-Managed has a postgres
instance deployed as a stateful set in the tmc-local
namespace. This is the central datastore for the microservices in Tanzu Mission Control Self-Managed. Pods in the tmc-local
namespace sometimes fail to start because of missing database credentials, which are mounted from Kubernetes secrets onto the pods. These secrets are facilitated by a Kubernetes operator named postgres-endpoint-controller
running in the same namespace. For each microservice, there is an object (custom resource) called postgresendpoints
. This is read by the postgres-endpoint-controller
operator and it issues the database credentials in a secret as specified in the postgresendpoints
object. Use the following command to gather information about the postgresendpoints
objects:
# List all postgresendpoints objects
kubectl -n tmc-local get postgresendpoints
This command returns a lot of useful information. First verify that all postgresendpoints
objects show Ready
status. It also displays some more information like secret, dbhost, username, and database name.
To regenerate a secret, you can delete it and postgres-endpoint-controller
creates a new one. This can be helpful when a pod fails due to a database credentials issue.
You can also check the logs from postgres-endpoint-controller
using the following command:
kubectl -n tmc-local logs -l app.kubernetes.io/instance=postgres-endpoint-controller
A valid token should be obtained after a successful login. The following example shows the OIDC claims in a Pinniped issued JWT token for Tanzu Mission Control Self-Managed:
{
"additionalClaims": {
"email": "[email protected]",
"name": "testuser01 admin",
},
"at_hash": "1IfPC3fM9WnlfiuZxZMFNw",
"aud": [
"pinniped-cli"
],
"auth_time": 1682540220,
"azp": "pinniped-cli",
"exp": 1682540341,
"groups": [
"tmc:member",
"Everyone",
"tmc:admin"
],
"iat": 1682540221,
"iss": "https://pinniped-supervisor.<TMC-SM-FQDN>/provider/pinniped",
"jti": "7cdfd23a-2f9f-4d56-82a0-84b9995f489c",
"rat": 1682540220,
"sub": "<oauth2provider>",
"username": "[email protected]"
}
NoteTanzu Mission Control permissions (
tmc:admin
andtmc:member
) are specified ingroups
. There should be at least onename
or
Cause: If you omit required settings from values.yaml or provide settings that are syntactically invalid, the installation will fail.
Fix: Follow the steps in Installation Troubleshooting to identify and resolve the issue.
After a successful installation, you may still be unable to login if the provided settings are incorrect for your environment. You may see the following due to a misconfiguration of the AD or OpenLDAP configuration settings.
An internal error occurred. Please contact your administrator
In a browser after submitting their username and password, the user sees the message: An internal error occurred. Please contact your administrator
.
Cause: The error is either due to a setting misconfiguration or inability to reach your AD domain controller or OpenLDAP server. To determine the cause,
Check the status of the corresponding Identity Provider object.
If using Active Directory, run kubectl get ActiveDirectoryIdentityProvider
.
If using OpenLDAP, run kubectl get LDAPIdentityProvider
.
The STATUS column of the command output will show Error when there is an issue and Ready if there are no issues.
If the status is Error, run the following command to get more details
kubectl describe
with the NAME value from step 1.
If using Active Directory, run kubectl describe ActiveDirectoryIdentityProvider <NAME>
.
If using OpenLDAP, run kubectl describe LDAPIdentityProvider <NAME>
.
Replace NAME
, with the value shown in the NAME column in step 1. The NAME value will also match the ldap.domainName in the values.yaml file.
Fix: The output of the kubectl describe
command shows multiple Message values under the Status.Conditions
section. The following table describes potential error messages and how to fix them.
Example Error Message | Reason and Fix |
---|---|
error dialing host “192.168.111.195”: LDAP Result Code 200 “Network Error”: dial tcp 192.168.111.195:636: connect: no route to host | This is related to connectivity issues on the destination host/IP or port. Ensure the value for ldap.host is reachable from the environment where TMC SM is installed. |
error dialing host “192.168.111.190”: LDAP Result Code 200 “Network Error”: tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of “x509: invalid signature: parent certificate cannot sign this kind of certificate” while trying to verify candidate authority certificate “dc01.tanzu.io”) | This is related to an incorrect/expired Root CA cert or an expired cert on the targeted server. Ensure the ldap.rootCA value contains a PEM-encoded cert of the Root CA that issued the TLS certificate for the targeted ldap.host. This value will be multiple lines and must include the BEGIN CERTIFICATE and END CERTIFICATE demarcation lines. |
error binding as “CN=TMC SvcAcct,CN=Users,DC=acme,DC=org”: LDAP Result Code 49 “Invalid Credentials” | This is related to the service account in the directory specified by the ldap.username and ldap.password values. Ensure the following:
|
If you are still unsure of the issue, you can verify the settings using an LDAP client, such as LDP.exe, Softerra, or ldapsearch. Update the values.yaml file to use the same exact settings, including host, port, baseDN, service account DN, and password.
Incorrect username or password
In a browser after submitting their username and password, the user sees the message: Incorrect username or password
.
Cause: This message will display when the user enters an incorrect username or password or the user account is not in the LDAP tree.
Fix: To fix the issue: