Troubleshooting your Tanzu Mission Control Self-Managed Deployment

The following outlines some basic workflows for troubleshooting a Tanzu Mission Control Self-Managed deployment that you can use in case of failure or loss of functionality.

Installation Troubleshooting

All the packages for Tanzu Mission Control Self-Managed are deployed in the tmc-local namespace. To verify the installation, You can list all the packages installed with the following command:

kubectl -n tmc-local get packageinstall

All packageinstalls should indicate Reconcile succeeded in the DESCRIPTION column. If you see any other status (for example, Reconcile failed), then you can see more details on why it is failing using the following command:

kubectl -n tmc-local describe packageinstall <packageinstall-name>

Basic Troubleshooting

The components of Tanzu Mission Control Self-Managed are deployed in the tmc-local namespace. You can list the running pods with the following command:

kubectl -n tmc-local get pods

Observe the STATUS column. All pods should be in Running or Completed state. If you see any other status (for example, Failed or CrashLoopBackOff), then you can see more details on why it is failing using the following command:

kubectl -n tmc-local describe po <pod-name>

For example:

kubectl -n tmc-local describe po wcm-server-545845b58-8d4dp

You can find the error message (if any) in the Events section of the output. If you don’t see anything relevant, then you can check the logs from pods. For example:

kubectl -n tmc-local logs wcm-server-545845b58-8d4dp

# keep streaming new logs
kubectl -n tmc-local logs wcm-server-545845b58-8d4dp -f

# Stream logs from all pods matching label `app=resource-manager`.
# This may be helpful sometimes when you have two or more replicas of a pod running
kubectl -n tmc-local logs -l app=resource-manager -f

Other errors observed during and after installation

Example Error Message	Reason and Fix
`transport: Error while dialing: dial tcp 10.20.10.100:443: connect: no route to host`	The wrong Load Balancer IP address was specified or a DNS entry points to the wrong address. Confirm the DNS entries are correct and the correct IP address was provided for the Load Balancer.
`transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate signed by unknown authority`	The certificate authority (CA) certificate for TMC Self-Managed certificates was not included in the list of trusted CAs. Update the configuration file to include the root CA certificate and then repeat the installation.
`Unable to attach or mount volumes: unmounted volumes=[landing-service-tls]` `MountVolume.SetUp failed for volume “landing-service-tls” : secret “landing-service-tls” not found` This error may appear multiple times with different values.	The TMC Self-Managed certificates are missing or cannot be created by `cert-manager`. If importing the certificates, verify the proper secrets have been created in the `tmc-local` namespace by running. `kubectl -n tmc-local get secrets` If using a ClusterIssuer, here are some potential fixes. Verify the configured `ClusterIssuer` resource is available. Check `kubectl -n tmc-local get certificates` for any entries which are not ready. Check `kubectl -n tmc-local get challenges` for any entries which are unable to complete. Check the `cert-manager` logs for more information. `kubectl -n cert-manager logs -f deploy/cert-manager`

Viewing external requests

External traffic flows into Tanzu Mission Control Self-Managed through envoy, a reverse proxy. The envoy proxy can receive traffic from a load balancer in your infrastructure. You can view this traffic by streaming the logs from envoy pods. For example:

kubectl -n tmc-local logs -c envoy -l app.kubernetes.io/component=envoy -f

Envoy routes this traffic to the appropriate service (pods) based on path prefix matching. You can see those details from the stack HTTPProxy object like this:

kubectl -n tmc-local get httpproxy stack-http-proxy -o yaml

Debugging Postgres

Tanzu Mission Control Self-Managed has a postgres instance deployed as a stateful set in the tmc-local namespace. This is the central datastore for the microservices in Tanzu Mission Control Self-Managed. Pods in the tmc-local namespace sometimes fail to start because of missing database credentials, which are mounted from Kubernetes secrets onto the pods. These secrets are facilitated by a Kubernetes operator named postgres-endpoint-controller running in the same namespace. For each microservice, there is an object (custom resource) called postgresendpoints. This is read by the postgres-endpoint-controller operator and it issues the database credentials in a secret as specified in the postgresendpoints object. Use the following command to gather information about the postgresendpoints objects:

# List all postgresendpoints objects
kubectl -n tmc-local get postgresendpoints

This command returns a lot of useful information. First verify that all postgresendpoints objects show Ready status. It also displays some more information like secret, dbhost, username, and database name.

To regenerate a secret, you can delete it and postgres-endpoint-controller creates a new one. This can be helpful when a pod fails due to a database credentials issue.

You can also check the logs from postgres-endpoint-controller using the following command:

kubectl -n tmc-local logs -l app.kubernetes.io/instance=postgres-endpoint-controller

Token Issues

A valid token should be obtained after a successful login. The following example shows the OIDC claims in a Pinniped issued JWT token for Tanzu Mission Control Self-Managed:

{
  "additionalClaims": {
    "email": "[email protected]",
    "name": "testuser01 admin",
  },
  "at_hash": "1IfPC3fM9WnlfiuZxZMFNw",
  "aud": [
    "pinniped-cli"
  ],
  "auth_time": 1682540220,
  "azp": "pinniped-cli",
  "exp": 1682540341,
  "groups": [
    "tmc:member",
    "Everyone",
    "tmc:admin"
  ],
  "iat": 1682540221,
  "iss": "https://pinniped-supervisor.<TMC-SM-FQDN>/provider/pinniped",
  "jti": "7cdfd23a-2f9f-4d56-82a0-84b9995f489c",
  "rat": 1682540220,
  "sub": "<oauth2provider>",
  "username": "[email protected]"
}

Note
Tanzu Mission Control permissions (tmc:admin and tmc:member) are specified in groups. There should be at least one name or email for Tanzu Mission Control to parse the token correctly.

Troubleshooting AD and OpenLDAP authentication

Installation failed due to an AD or OpenLDAP identity provider error

Cause: If you omit required settings from values.yaml or provide settings that are syntactically invalid, the installation will fail.
Fix: Follow the steps in Installation Troubleshooting to identify and resolve the issue.

UI log in errors

After a successful installation, you may still be unable to login if the provided settings are incorrect for your environment. You may see the following due to a misconfiguration of the AD or OpenLDAP configuration settings.

`An internal error occurred. Please contact your administrator`

In a browser after submitting their username and password, the user sees the message: An internal error occurred. Please contact your administrator.

Cause: The error is either due to a setting misconfiguration or inability to reach your AD domain controller or OpenLDAP server. To determine the cause,

Check the status of the corresponding Identity Provider object.

If using Active Directory, run kubectl get ActiveDirectoryIdentityProvider.

If using OpenLDAP, run kubectl get LDAPIdentityProvider.

The STATUS column of the command output will show Error when there is an issue and Ready if there are no issues.
If the status is Error, run the following command to get more details

kubectl describe with the NAME value from step 1.

If using Active Directory, run kubectl describe ActiveDirectoryIdentityProvider <NAME>.

If using OpenLDAP, run kubectl describe LDAPIdentityProvider <NAME>.

Replace NAME, with the value shown in the NAME column in step 1. The NAME value will also match the ldap.domainName in the values.yaml file.

Fix: The output of the kubectl describe command shows multiple Message values under the Status.Conditions section. The following table describes potential error messages and how to fix them.

Example Error Message	Reason and Fix
error dialing host “192.168.111.195”: LDAP Result Code 200 “Network Error”: dial tcp 192.168.111.195:636: connect: no route to host	This is related to connectivity issues on the destination host/IP or port. Ensure the value for ldap.host is reachable from the environment where TMC SM is installed.
error dialing host “192.168.111.190”: LDAP Result Code 200 “Network Error”: tls: failed to verify certificate: x509: certificate signed by unknown authority (possibly because of “x509: invalid signature: parent certificate cannot sign this kind of certificate” while trying to verify candidate authority certificate “dc01.tanzu.io”)	This is related to an incorrect/expired Root CA cert or an expired cert on the targeted server. Ensure the ldap.rootCA value contains a PEM-encoded cert of the Root CA that issued the TLS certificate for the targeted ldap.host. This value will be multiple lines and must include the BEGIN CERTIFICATE and END CERTIFICATE demarcation lines.
error binding as “CN=TMC SvcAcct,CN=Users,DC=acme,DC=org”: LDAP Result Code 49 “Invalid Credentials”	This is related to the service account in the directory specified by the ldap.username and ldap.password values. Ensure the following: The account exists The account is not expired, locked or disabled The full distinguishedName is correct and specified in the ldap.username setting (e. g. “CN=TMC SvcAcct,OU=Users…”) The password value is correct and doesn’t contain any unusual characters (try using only alphanumeric if unsure) The password is not expired

If you are still unsure of the issue, you can verify the settings using an LDAP client, such as LDP.exe, Softerra, or ldapsearch. Update the values.yaml file to use the same exact settings, including host, port, baseDN, service account DN, and password.

`Incorrect username or password`

In a browser after submitting their username and password, the user sees the message: Incorrect username or password.

Cause: This message will display when the user enters an incorrect username or password or the user account is not in the LDAP tree.

Fix: To fix the issue:

Verify and enter the correct username and password.
If the username and password are correct, verify the setting using an LDAP client, such as LDP.exe, Softerra or ldapsearch. Update the values.yaml file to use the same exact settings, including host, port, baseDN, service account DN, and password.