You can monitor the health of the VMware Tanzu SQL with MySQL for VMs service using logs, metrics, and Key Performance Indicators (KPIs) that are generated by Tanzu SQL for VMs component VMs.
For more information about logging and metrics in VMware Tanzu Application Service for VMs, see Logging and metrics.
Metrics are regularly generated log entries that report measured component states. The default metrics polling interval is 30 seconds for MySQL instances. For the service broker, the default is 60 seconds.
You can configure the MySQL instance interval in Configure Monitoring in the Tanzu SQL for VMs tile. For more information, see Configure monitoring.
Metrics are long, single lines of text with the format:
origin:"p.mysql" eventType:ValueMetric timestamp:1496776477930669688 deployment:"service-instance_2b5a001f-2bf3-460c-aee6-fd2253f9fb0c" job:"mysql" index:"b09df494-b731-4d06-a4b0-c2985ceedf4c" ip:"10.0.8.4" valueMetric:<name:"/p.mysql/performance/open_files" value:24 unit:"file" >
To access MySQL metrics:
You can use Grafana to visually view metrics for Tanzu SQL for VMs service instances.
This section requires Healthwatch v2.x.
View the Grafana dashboard by going to:
https://grafana.YOUR-SYSTEM-DOMAIN
For more information about using Grafana dashboards, see the Healthwatch for VMware Tanzu documentation.
To access metrics for Tanzu SQL for VMs service instances, you can use Loggregator’s Log Cache feature with the Log Cache CLI plug-in. Log Cache is enabled by default.
To access metrics for on-demand service instances:
Install the cf CLI plug-in by running:
cf install-plugin -r CF-Community "log-cache"
To access metrics for a service instance, run:
cf tail SERVICE-INSTANCE-NAME
Where SERVICE-INSTANCE-NAME
is the name of your service instance.
For example:
For more information about the metrics output, seeFor more information about how to enable Log Cache and about the cf tail
command, see Enable Log Cache.
KPIs are metrics for MySQL service instances that you can monitor for two purposes:
To ensure high performance.
To discover emerging issues.
KPIs can be either raw component metrics or derived metrics generated by applying formulas to raw metrics.
VMware provides the following KPIs as general alerting and response guidance for typical Tanzu SQL for VMs installations. VMware recommends that you alter the alert measures by observing historical trends. You can also create your KPIs that are specific to your environment using the available component metrics.
For a list of all the Tanzu SQL for VMs component metrics, see Component Metrics.
/p.mysql/available |
|
---|---|
Description | If the MySQL Server is responding to requests. This indicates if the component is available. Use: If the server does not emit heartbeats, it is offline. Origin: Doppler/Firehose Type: Boolean Frequency: 30 s |
Recommended measurement | Average over last 5 minutes |
Recommended alert thresholds | Yellow warning: N/A Red critical: < 1 |
Recommended response | Check the MySQL Server logs for errors. You can find the instance by targeting your MySQL deployment with BOSH and inspecting logs for the instance. For more information, see Failing Jobs and Unhealthy Instances. If your service plan is an highly available (HA) cluster, you can also run mysql-diag to check logs for errors. |
/p.mysql/system/persistent_disk_used_percent |
|
---|---|
Description | The percentage of disk used on the persistent file system. Use: MySQL cannot function correctly if there is not sufficient free space on the file systems. Use these metrics to ensure that you have disks large enough for your user base. Origin: Doppler/Firehose Type: Percent Frequency: 30 s (default) |
Recommended measurement | Maximum of persistent disk used of all of nodes |
Recommended alert thresholds | Single Node and Leader Follower:
|
Recommended response | Upgrade the service instance to a plan with larger disk capacity. For Tanzu SQL for VMs v2.9 and later, if you set the optimize_for_short_words parameter to true , then see Troubleshooting VMware Tanzu SQL with MySQL for VMs before upgrading the service. |
/p.mysql/system/ephemeral_disk_used_percent |
|
---|---|
Description | The percentage of disk used on the ephemeral file system. Use: MySQL cannot function correctly if there is not sufficient free space on the file systems. Use these metrics to ensure that you have disks large enough for your user base. Origin: Doppler/Firehose Type: Percent Frequency: 30 s (default) |
Recommended measurement | Maximum disk used of all nodes |
Recommended alert thresholds | Yellow warning: > 80% Red critical: > 95% |
Recommended response | Upgrade the service instance to a plan with larger disk capacity. |
/p.mysql/performance/cpu_utilization_percent |
|
---|---|
Description | CPU time being consumed by the MySQL service. Use: A node that experiences context switching or high CPU use becomes unresponsive. This also affects the ability of the node to report metrics. Origin: Doppler/Firehose Type: Percent Frequency: 30 s (default) |
Recommended measurement | Average over last 10 minutes |
Recommended alert thresholds | Yellow warning: > 80 Red critical: > 90 |
Recommended response | Discover what is using so much CPU. If it is from normal processes, update the service instance to use a plan with larger CPU capacity. |
/p.mysql/variables/max_connections /p.mysql/net/max_used_connections |
|
---|---|
Description | The maximum number of connections used over the maximum permitted number of simultaneous client connections. Use: If the number of connections drastically changes or if apps are unable to connect, there might be a network or app issue. Origin: Doppler/Firehose Type: count Frequency: 30 s |
Recommended measurement | max_used_connections / max_connections |
Recommended alert thresholds | Yellow warning: > 80 % Red critical: > 90 % |
Recommended response | If this measurement meets or exceeds 80% with exponential growth, monitor app use to ensure that everything is working. When approaching 100% of maximum connections, apps might not always connect to the database. The connections/second for a service instance vary based on app instances and app use. |
/p.mysql/performance/queries_delta |
|
---|---|
Description | The number of statements executed by the server over the last 30 seconds. Use: The server always processes queries. If the server does not process queries, the server is non-functional. Origin: Doppler/Firehose Type: count Frequency: 30 s |
Recommended measurement | Average over last 2 minutes |
Recommended alert thresholds | Red critical: 0 |
Recommended response | Investigate the MySQL server logs, such as the audit log, to understand why query rate changed and decide on appropriate action. |
/p.mysql/galera/wsrep_ready |
|
---|---|
Description | Shows whether each cluster node can accept queries. Returns only 0 or 1. When this metric is 0, almost all queries to that node fail with the error:ERROR 1047 (08501) Unknown Command Use: Discover when nodes of a cluster were unable to communicate and accept transactions. Origin: Doppler/Firehose Type: Boolean Frequency: 30 s (default) |
Recommended measurement | Average of values of each cluster node, over the last 5 minutes |
Recommended alert thresholds | Yellow warning: < 1 Red critical: 0 (cluster is down) |
Recommended response |
|
/p.mysql/galera/wsrep_cluster_size |
|
---|---|
Description | The number of cluster nodes with which each node is communicating normally. Use: When running in a multi-node configuration, this metric indicates if each member of the cluster is communicating normally with all other nodes. Origin: Doppler/Firehose Type: count Frequency: 30 s (default) |
Recommended measurement | (Average of the values of each node / cluster size), over the last 5 minutes |
Recommended alert thresholds | Yellow warning: < 3 (availability compromised) Red critical: < 1 (cluster unavailable) |
Recommended response | Run mysql-diag and check the MySQL Server logs for errors. |
/p.mysql/galera/wsrep_cluster_status |
|
---|---|
Description | Shows the primary status of the cluster component that the node is in. Values are:
Primary indicates that the node is part of a non-operational component. This occurs in cases of multiple membership changes that cause a loss of quorum. Origin: Doppler/Firehose Type: integer (see above) Frequency: 30 s (default) |
Recommended measurement | Sum of each of the nodes, over the last 5 minutes |
Recommended alert thresholds | Yellow warning: < 3 Red critical: < 1 |
Recommended response |
|
/p.mysql/p.mysql/last_successful_backup |
|
---|---|
Description | Using the configured backup schedule for the service instance as a threshold, this metric shows how many hours have passed since the last successful backup. |
Recommended measurement | Hours elapsed since the last successful backup |
Recommended alert thresholds | Red critical: Metric exceeds your organization's policy for maximum backup interval. |
Recommended response | Check the adbr-agent logs for the service instance to determine why recent automated backups are failing. |
In addition to the KPIs, the MySQL service emits the followings metrics for monitoring and alerting:
The metrics that all Tanzu SQL for VMs service instances emit:
/p.mysql/available | |
---|---|
Description | Indicates if the local database server is available and responding. |
Unit | boolean |
/p.mysql/variables/max_connections | |
---|---|
Description | The maximum permitted number of simultaneous client connections. |
Unit | count |
/p.mysql/variables/open_files_limit | |
---|---|
Description | The number of files that the operating system permits mysqld to open. |
Unit | files |
/p.mysql/variables/read_only | |
---|---|
Description | Whether the server is in read-only mode |
Unit | boolean |
/p.mysql/performance/queries_delta | |
---|---|
Description | The change in the /performance/queries metric since the last time it was emitted. |
Unit | integer greater than or equal to zero |
/p.mysql/innodb/buffer_pool_pages_free | |
---|---|
Description | The amount of free space, measured in pages, in the InnoDB Buffer Pool. |
Unit | count |
/p.mysql/innodb/buffer_pool_pages_total | |
---|---|
Description | The total amount of free space, measured in pages, in the InnoDB Buffer Pool containing data. |
Unit | count |
/p.mysql/innodb/buffer_pool_pages_data | |
---|---|
Description | The number of pages in the InnoDB Buffer Pool containing data. The number includes both dirty and clean pages. |
Unit | count |
/p.mysql/innodb/row_lock_current_waits | |
---|---|
Description | The number of row locks currently being waited for by operations on InnoDB tables. |
Unit | count |
/p.mysql/innodb/data_read | |
---|---|
Description | The amount of data read since the server started. |
Unit | bytes |
/p.mysql/innodb/data_written | |
---|---|
Description | The amount of data written since the server started. |
Unit | bytes |
/p.mysql/innodb/mutex_os_waits | |
---|---|
Description | The number of mutex OS waits. |
Unit | events/second |
/p.mysql/innodb/mutex_spin_rounds | |
---|---|
Description | The number of mutex spin rounds. |
Unit | events/second |
/p.mysql/innodb/mutex_spin_waits | |
---|---|
Description | The number of mutex spin waits. |
Unit | events/second |
/p.mysql/innodb/os_log_fsyncs | |
---|---|
Description | The number of fsync() writes done to the InnoDB redo log files. |
Unit | count |
/p.mysql/innodb/row_lock_time | |
---|---|
Description | Total time spent in acquiring row locks. |
Unit | milliseconds |
/p.mysql/innodb/row_lock_waits | |
---|---|
Description | The number of times a row lock had to be waited for since the server started. |
Unit | count |
/p.mysql/net/connections | |
---|---|
Description | The number of connection attempts to the server, both successful and unsuccessful, to the MySQL server. |
Unit | count |
/p.mysql/net/max_used_connections | |
---|---|
Description | The maximum number of connections that have been in use simultaneously since the server started. |
Unit | count |
/p.mysql/performance/com_delete | |
---|---|
Description | The number of delete commands since the server started or the last FLUSH STATUS . |
Unit | count |
/p.mysql/performance/com_insert | |
---|---|
Description | The number of insert commands since the server started or the last FLUSH STATUS . |
Unit | count |
/p.mysql/performance/com_insert_select | |
---|---|
Description | The number of insert-select commands since the server started or the last FLUSH STATUS . |
Unit | count |
/p.mysql/performance/com_replace_select | |
---|---|
Description | The number of replace-select commands since the server started or the last FLUSH STATUS . |
Unit | count |
/p.mysql/performance/com_update | |
---|---|
Description | The number of update commands since the server started or the last FLUSH STATUS . |
Unit | count |
/p.mysql/performance/created_tmp_disk_tables | |
---|---|
Description | The number of internal on-disk temporary tables created by the server while executing statements. |
Unit | count |
/p.mysql/performance/created_tmp_files | |
---|---|
Description | The number of temporary files created by mysqld. |
Unit | count |
/p.mysql/performance/created_tmp_tables | |
---|---|
Description | The number of internal temporary tables created by the server while executing statements. |
Unit | count |
/p.mysql/performance/cpu_utilization_percent | |
---|---|
Description | The percent of the CPU in use by all processes on the MySQL node. |
Unit | percent |
/p.mysql/performance/open_files | |
---|---|
Description | The number of regular files currently open, which were opened by the server. |
Unit | count |
/p.mysql/performance/open_tables | |
---|---|
Description | The number of tables that are currently open. |
Unit | count |
/p.mysql/performance/opened_tables | |
---|---|
Description | The number of tables that have been opened. |
Unit | count |
/p.mysql/performance/open_table_definitions | |
---|---|
Description | The number of currently cached table definitions or .frm files. |
Unit | count |
/p.mysql/performance/opened_table_definitions | |
---|---|
Description | The number of .frm files that have been cached. |
Unit | count |
/p.mysql/performance/qcache_hits | |
---|---|
Description | The number of query cache hits. The query cache and qcache_hits metric is deprecated as of MySQL v5.7.20. |
Unit | count |
/p.mysql/performance/slow_queries | |
---|---|
Description | The number of queries that have taken more than long_query_time seconds. |
Unit | count |
/p.mysql/performance/table_locks_waited | |
---|---|
Description | The total number of times that a request for a table lock could not be granted immediately and a wait was needed. |
Unit | count |
/p.mysql/performance/threads_connected | |
---|---|
Description | The number of currently open connections. |
Unit | count |
/p.mysql/performance/threads_running | |
---|---|
Description | The number of threads that are not sleeping. |
Unit | count |
/p.mysql/rpl_semi_sync_master_tx_avg_wait_time | |
---|---|
Description | The average time the leader has waited for the follower to accept transactions. |
Unit | microseconds |
/p.mysql/rpl_semi_sync_master_no_tx | |
---|---|
Description | The number of transactions committed without follower acknowledgement. |
Unit | count |
/p.mysql/rpl_semi_sync_master_wait_sessions | |
---|---|
Description | The current number of connections waiting for a sync commit. For more information about sync replication, see About Synchronous Replication. |
Unit | count |
The disk usage metrics that all Tanzu SQL for VMs services emit:
/p.mysql/system/persistent_disk_used_percent | |
---|---|
Description | The percentage of disk used on the persistent file system. |
Unit | percent |
/p.mysql/system/persistent_disk_used | |
---|---|
Description | The amount of space used on the persistent disk. |
Unit | KB |
/p.mysql/system/persistent_disk_free | |
---|---|
Description | The amount of space available on the persistent disk. |
Unit | KB |
/p.mysql/system/persistent_disk_inodes_used_percent | |
---|---|
Description | The percentage of persistent disk inodes used by both the system and user applications. |
Unit | percent |
/p.mysql/system/persistent_disk_inodes_used | |
---|---|
Description | The number of inodes used on the persistent disk. |
Unit | count |
/p.mysql/system/persistent_disk_inodes_free | |
---|---|
Description | The number of inodes available on the persistent disk. |
Unit | count |
/p.mysql/system/ephemeral_disk_used_percent | |
---|---|
Description | The percentage of disk used on the ephemeral file system. |
Unit | percent |
/p.mysql/system/ephemeral_disk_used | |
---|---|
Description | The amount of space used on the ephemeral disk. |
Unit | KB |
/p.mysql/system/ephemeral_disk_free | |
---|---|
Description | The amount of space available on the ephemeral disk. |
Unit | KB |
/p.mysql/system/ephemeral_disk_inodes_used_percent | |
---|---|
Description | The percentage of ephemeral disk inodes used by both the system and user applications. |
Unit | percent |
/p.mysql/system/ephemeral_disk_inodes_used | |
---|---|
Description | The number of inodes used on the ephemeral disk. |
Unit | count |
/p.mysql/system/ephemeral_disk_inodes_free | |
---|---|
Description | The number of inodes available on the ephemeral disk. |
Unit | count |
The metrics that leader-follower VMs emit:
/p.mysql/follower/is_follower | |
---|---|
Description | Shows whether a node is the follower VM. |
Unit | boolean |
/p.mysql/follower/relay_log_space | |
---|---|
Description | The total size of all existing relay log files. |
Unit | bytes |
/p.mysql/follower/slave_io_running | |
---|---|
Description | Shows whether the I/O thread has started and has connected to the leader VM. |
Unit | boolean |
/p.mysql/follower/slave_sql_running | |
---|---|
Description | Shows whether the SQL thread has started. |
Unit | boolean |
The metrics that HA clusters emit:
/p.mysql/galera/wsrep_cluster_size | |
---|---|
Description | The current number of nodes in the HA cluster. |
Unit | count |
/p.mysql/galera/wsrep_local_recv_queue | |
---|---|
Description | The current length of the local receive queue, in messages. |
Unit | count |
/p.mysql/galera/wsrep_local_send_queue | |
---|---|
Description | The current length of the local send queue, in messages. |
Unit | count |
/p.mysql/galera/wsrep_local_index | |
---|---|
Description | This node index in the cluster (base 0). |
Unit | count |
/p.mysql/galera/wsrep_local_state | |
---|---|
Description | The local state of the node. Possible states include:
|
Unit | integer |
/p.mysql/galera/wsrep_ready | |
---|---|
Description | Shows whether the node can accept queries. |
Unit | boolean |