Troubleshooting on-demand instances

This topic tells you about techniques that app developers can use to begin troubleshooting on-demand instances.

Troubleshoot errors

Start here if you are responding to a specific error or error messages.

Common service errors

Errors common to on-demand services are:

No Metrics from Log Cache
When Using Service-Gateway Access, create-service or update-service Fails

No Metrics from Log Cache
Symptom	You receive no metrics when running the `cf tail` command.
Cause	This might happen because the Firehose is deactivated in the TAS for VMs tile.
Solution	Ask your operator to ensure that the V2 Firehose check box is activated, and the Enable Log Cache syslog ingestion check box is deactivated in the TAS for VMs tile. For more information about configuring these check boxes, see Activate syslog forwarding.

When Using Service-Gateway Access, create-service or update-service Fails
Symptom	When you run `cf create-service` or `cf update-service` with `{"enable_external_access": true}`, you receive an error like this: Service broker error: contact your operator, service configuration issue occurred
Cause	When off-platform access is set up for a foundation, a range of TCP ports is reserved for MySQL traffic. Each service instance for which service-gateway access is enabled requires one port. If all the ports in the range have been assigned to other service instances, then you cannot create or update service instances to use service-gateway access.
Solution	To resolve this error, confirm that the problem is due to not enough ports and, if so, increase the port range: Review the BOSH logs on the MySQL service broker VM, and, in the `broker.stdout.log` file look for this error message: `Failed to update manifest: There are no free ports in range: […` For information about how to download the service broker logs, see Access broker logs and VMs. Ask the operator to increase the external TCP port range for off-platform access by editing the Settings pane on the Tanzu SQL for VMs tile. For information about the Settings pane, see Enable Service-Gateway Access in Enabling Service-Gateway Access.

When Using Service-Gateway Access, create-service or update-service Fails

Symptom

When you run cf create-service or cf update-service with {"enable_external_access": true}, you receive an error like this:

Service broker error: contact your operator,
service configuration issue occurred

Cause

When off-platform access is set up for a foundation, a range of TCP ports is reserved for MySQL traffic. Each service instance for which service-gateway access is enabled requires one port.

If all the ports in the range have been assigned to other service instances, then you cannot create or update service instances to use service-gateway access.

Solution

To resolve this error, confirm that the problem is due to not enough ports and, if so, increase the port range:

Review the BOSH logs on the MySQL service broker VM, and, in the broker.stdout.log file look for this error message: Failed to update manifest: There are no free ports in range: [… For information about how to download the service broker logs, see Access broker logs and VMs.
Ask the operator to increase the external TCP port range for off-platform access by editing the Settings pane on the Tanzu SQL for VMs tile. For information about the Settings pane, see Enable Service-Gateway Access in Enabling Service-Gateway Access.

If instances or database are inaccessible

You might experience the following in a leader-follower or Multi‑Site Replication topology, or during upgrades:

Temporary outages
Apps cannot write to the database
Apps are inoperable
Apps cannot connect to the database
MySQL Connector/J v5.1.41 or earlier
Mutual TLS
Java apps cannot connect after buildpack update

Temporary Outages
Symptom	VMware Tanzu SQL with MySQL for VMs service instances can become temporarily inaccessible during upgrades and VM or network failures.
Solution	For more information, see Service interruptions.

Apps Cannot Write to the Database
Symptom	You have a leader-follower or Multi‑Site Replication topology, and your apps can no longer write to the database.
Cause	If you have a leader-follower topology, the leader VM might be read-only. If you can no longer read to the database as well, your persistent disk might be full. If you have a Multi‑Site Replication topology, your leader VM might be down.
Solution	If you have a leader-follower topology and the leader VM is read-only, for how to troubleshoot this problem, see Both Leader and Follower Instances Are Read-Only. If your apps can no longer read to the database as well, your persistent disk might be full. For more information about troubleshooting inoperable apps, see Apps are Inoperable. If you have a Multi‑Site Replication topology and your leader VM is down, to resolve this issue, you can trigger a failover to the follower VM. For more information about troubleshooting this problem, see Triggering multi-site replication failover and switchover.

Apps Cannot Write to the Database

Symptom

You have a leader-follower or Multi‑Site Replication topology, and your apps can no longer write to the database.

Cause

If you have a leader-follower topology, the leader VM might be read-only. If you can no longer read to the database as well, your persistent disk might be full.

If you have a Multi‑Site Replication topology, your leader VM might be down.

Solution

If you have a leader-follower topology and the leader VM is read-only, for how to troubleshoot this problem, see Both Leader and Follower Instances Are Read-Only.

If your apps can no longer read to the database as well, your persistent disk might be full. For more information about troubleshooting inoperable apps, see Apps are Inoperable.

If you have a Multi‑Site Replication topology and your leader VM is down, to resolve this issue, you can trigger a failover to the follower VM. For more information about troubleshooting this problem, see Triggering multi-site replication failover and switchover.

Apps Are Inoperable
Symptom	Your apps become inoperable. Read, write, and cf CLI operations do not work.
Cause	Your persistent disk might be full.
Solution	Contact your operator to check if your persistent disk is full. For more information about troubleshooting this problem, see Persistent disk is full.

Apps Cannot Connect to the Database
Symptom	Apps can fail to connect to the database.
Cause	When your app uses an incompatible version of MySQL Connector/J. When your app uses mutual TLS (mTLS).
Solution	See MySQL Connector/J v5.1.41 or earlier. See Mutual TLS. See Java apps cannot connect after buildpack update.

MySQL Connector/J v5.1.41 or Earlier
Symptom	Apps cannot connect to the database when TLS is enabled and the apps are using MySQL Connector/J v5.1.41 or earlier.
Cause	You see errors about certificates. For example: Caused by: javax.net.ssl.SSLHandshakeException: Received fatal alert: bad_certificate at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) ~[na:1.8.0_152]
Solution	If you cannot update the MySQL Connector/J, do the workaround in How to deactivate KeyManager and TrustManager in Container Security Provider Framework in the Javanbuildpack.

MySQL Connector/J v5.1.41 or Earlier

Symptom

Apps cannot connect to the database when TLS is enabled and the apps are using MySQL Connector/J v5.1.41 or earlier.

Cause

You see errors about certificates.

For example:

Caused by: javax.net.ssl.SSLHandshakeException: Received fatal alert: bad_certificate
  at sun.security.ssl.Alerts.getSSLException(Alerts.java:192) ~[na:1.8.0_152]

Solution

If you cannot update the MySQL Connector/J, do the workaround in How to deactivate KeyManager and TrustManager in Container Security Provider Framework in the Javanbuildpack.

Mutual TLS
Symptom	Apps cannot connect to the database when TLS is activated and your apps use mTLS.
Cause	You see network errors in your app logs.
Solution	To resolve this issue deactivate mTLS in your apps.

Java Apps Cannot Connect after Buildpack Update
Symptom	After updating a Java app to use Java buildpack v4.38 or later, the app cannot connect to the database over TLS. In the app logs, you see errors such as: javax.net.ssl.SSLHandshakeException: No appropriate protocol (protocol is deactivated or cipher suites are inappropriate)
Cause	By default, the new version of Java deactivates TLS v1.0 and v1.1.
Solution	Update the app to use MySQL Connector/J v5.1.49 or later or MySQL Connector/J v8.0.19 or later. This ensures that TLS v1.2 is used.

Java Apps Cannot Connect after Buildpack Update

Symptom

After updating a Java app to use Java buildpack v4.38 or later, the app cannot connect to the database over TLS.

In the app logs, you see errors such as:

javax.net.ssl.SSLHandshakeException: No appropriate protocol (protocol is deactivated or cipher suites are inappropriate)

Cause

By default, the new version of Java deactivates TLS v1.0 and v1.1.

Solution

Update the app to use MySQL Connector/J v5.1.49 or later or MySQL Connector/J v8.0.19 or later. This ensures that TLS v1.2 is used.

Failed backup or restore with the adbr plug-in

If you get errors when working with the ApplicationDataBackupRestore (adbr) plug-in for the Cloud Foundry Command Line Interface (cf CLI) tool, see:

“400” error during backup or restore
“500” error during backup or restore
“502” error during backup or restore
“Status: Restore failed” after adbr restore

“400” Error during Backup or Restore
Symptom	When running `cf adbr backup` or `cf adbr restore`, an error occurs. For example: $ cf adbr backup myDB Failed to backup service instance "myDB": failed due to server error, status code: 400
Cause	The broker on the VM is not running or is in an unhealthy state.
Solution	Verify the health of the broker VM and review the logs for the broker.

“400” Error during Backup or Restore

Symptom

When running cf adbr backup or cf adbr restore, an error occurs.

For example:

$ cf adbr backup myDB
  Failed to backup service instance "myDB": failed due to server error, status code: 400

Cause The broker on the VM is not running or is in an unhealthy state.

Solution Verify the health of the broker VM and review the logs for the broker.

“500” Error during Backup or Restore
Symptom	When running `cf adbr backup` or `cf adbr restore`, an error occurs. For example: $ cf adbr backup myDB Failed to backup service instance "myDB": failed due to server error, status code: 500
Cause	The service instance agent is not running or is in an unhealthy state.
Solution	Verify the health of the service instance VM and review the logs for the service instance.

“500” Error during Backup or Restore

Symptom

When running cf adbr backup or cf adbr restore, an error occurs.

For example:

$ cf adbr backup myDB
  Failed to backup service instance "myDB": failed due to server error, status code: 500

Cause The service instance agent is not running or is in an unhealthy state.

Solution Verify the health of the service instance VM and review the logs for the service instance.

“502” Error during Backup or Restore
Symptom	When running `cf adbr backup` or `cf adbr restore`, an error occurs. For example: $ cf adbr backup myDB Failed to backup service instance "myDB": failed due to server error, status code: 502
Cause	The VM is down, stopped, or in an unhealthy state.
Solution	Verify the health of the broker VM and review the logs for the broker.

“502” Error during Backup or Restore

Symptom

When running cf adbr backup or cf adbr restore, an error occurs.

For example:

$ cf adbr backup myDB
  Failed to backup service instance "myDB": failed due to server error, status code: 502

Cause The VM is down, stopped, or in an unhealthy state.

Solution Verify the health of the broker VM and review the logs for the broker.

“Status: Restore failed” after adbr Restore
Symptom	When running `cf adbr get-status` after restoring to a service instance, adbr returns `Restore failed`. For example: $ cf adbr get-status myTargetDB Getting status of service instance myTargetDB in org my-org / space system as admin... [Thu Feb 25 22:33:58 UTC 2021] Status: Restore failed
Cause	A possible cause is that the database on the new service instance is not empty. For more information, see The New Database Must Be Empty in Backing Up and Restoring VMware Tanzu SQL with MySQL for VMs.
Solution	To resolve the error and complete the restore: Determine if the database is empty by reviewing the log `/var/vcap/sys/log/mysql-restore/mysql-restore.stderr.log` on the service instance. If any GTIDs (global transaction identifiers) are printed in the logs, then the database is not empty. Delete the service instance and create a new service instance to restore the backup to. If the log does not contain any GTIDs, then the restore failed for another reason. Review other logs on the service instance and, if necessary, contact Support.

“Status: Restore failed” after adbr Restore

Symptom

When running cf adbr get-status after restoring to a service instance, adbr returns Restore failed.

For example:

$ cf adbr get-status myTargetDB
Getting status of service instance myTargetDB in org my-org / space system as admin...
[Thu Feb 25 22:33:58 UTC 2021] Status: Restore failed

Cause A possible cause is that the database on the new service instance is not empty. For more information, see The New Database Must Be Empty in Backing Up and Restoring VMware Tanzu SQL with MySQL for VMs.

Solution

To resolve the error and complete the restore:

Determine if the database is empty by reviewing the log /var/vcap/sys/log/mysql-restore/mysql-restore.stderr.log on the service instance.
If any GTIDs (global transaction identifiers) are printed in the logs, then the database is not empty.
Delete the service instance and create a new service instance to restore the backup to.
If the log does not contain any GTIDs, then the restore failed for another reason. Review other logs on the service instance and, if necessary, contact Support.

For general information about the adbr plug-in, see Backing up and restoring VMware Tanzu SQL with MySQL for VMs.

Persistent disk usage is increasing

If you have set the optimize_for_short_words parameter to true and you are alerted that persistent disk usage is high, then you might need to optimize the indexed tables.

Persistent Disk Usage Is Increasing
Symptom	You have set the `optimize_for_short_words` optional parameter to `true` and the persistent disk is filling up. For information about the parameter, see Optimize for short words.
Cause	Over time, data has been deleted from your database and the full-text index has become too large.
Solution	Remove full-text entries for deleted or old records by following the instructions in Optimize for short words.

For information about monitoring disk usage, see Monitoring and KPIs.

Techniques for troubleshooting

See the following sections for troubleshooting techniques when using the Cloud Foundry Command-Line Interface (cf CLI) to perform basic operations on a Tanzu SQL for VMs service instance.

Basic cf CLI operations include create, update, bind, unbind, and delete.

Understand a Cloud Foundry error message

Failed operations (create, update, bind, unbind, delete) result in an error message. You can retrieve the error message later by running the cf CLI command cf service INSTANCE-NAME.

$ cf service myservice

Service instance: myservice
Service: super-db
Bound apps:
Tags:
Plan: dedicated-vm
Description: Dedicated Instance
Documentation url:
Dashboard:

Last Operation
Status: create failed
Message: Instance provisioning failed: There was a problem completing your request.
     Please contact your operations team providing the following information:
     service: redis-acceptance,
     service-instance-guid: ae9e232c-0bd5-4684-af27-1b08b0c70089,
     broker-request-id: 63da3a35-24aa-4183-aec6-db8294506bac,
     task-id: 442,
     operation: create
Started: 2017-03-13T10:16:55Z
Updated: 2017-03-13T10:17:58Z

Use the information in the Message field to debug further. Provide this information to Support when filing a ticket.

The task-id field maps to the BOSH task ID. For more information on a failed BOSH task, use the bosh task TASK-ID.

The broker-request-guid maps to the portion of the On-Demand Broker log containing the failed step. Access the broker log through your syslog aggregator, or access BOSH logs for the broker by typing bosh logs broker 0. If you have more than one broker instance, repeat this process for each instance.

Find information about your service instance

You might need to find the name, GUID, or other information about a service instance. To find this information, do the following:

Log into the space containing the instance or failed instance.
```
$ cf login
```

If you do not know the name of the service instance, run cf services to see a listing of all service instances in the space. The service instances are listed in the name column.

$ cf services
Getting services in org my-org / space my-space as user@example.com...
OK
name          service      plan        bound apps    last operation
my-instance   p.mysql      db-small                  create succeeded

To retrieve more information about a specific instance, run cf service SERVICE-INSTANCE-NAME
To retrieve the GUID of the instance, run cf service SERVICE-INSTANCE-NAME --guid

The GUID is useful for debugging.

Use the Knowledge Base Community

Find the answer to your question and browse product discussions and solutions by searching the VMware Tanzu Knowledge Base.

Support

You can find support here. Be sure to provide the error message from cf service YOUR-SERVICE-INSTANCE.

To expedite troubleshooting, if possible, provide your service broker logs, service instance logs, and BOSH task output. Your cloud operator can obtain these from your error message.