This topic describes how to monitor the health of the VMware Tanzu SQL with MySQL for VMs service using logs, metrics, and Key Performance Indicators (KPIs) generated by Tanzu SQL for VMs component VMs.
For more information about logging and metrics in VMware Tanzu Application Service for VMs, see Overview of Logging and Metrics.
About Metrics
Metrics are regularly-generated log entries that report measured component states. The default metrics polling interval is 30 seconds for MySQL instances. For the service broker, the default is 60 seconds.
You can configure the MySQL instance interval in Configure Monitoring in the Tanzu SQL for VMs tile. For more information, see Configure Monitoring.
Metrics are long, single lines of text with the format:
origin:"p.mysql" eventType:ValueMetric timestamp:1496776477930669688 deployment:"service-instance_2b5a001f-2bf3-460c-aee6-fd2253f9fb0c" job:"mysql" index:"b09df494-b731-4d06-a4b0-c2985ceedf4c" ip:"10.0.8.4" valueMetric:<name:"/p.mysql/performance/open_files" value:24 unit:"file" >
Access MySQL Metrics
To access MySQL metrics:
Use Grafana
You can use Grafana to visually view metrics for Tanzu SQL for VMs service instances.
Note: This section requires Healthwatch v2.x.
- Install the Healthwatch tile in Ops Manager. For information about installing Healthwatch, see the Healthwatch for VMware Tanzu documentation.
-
View the Grafana dashboard by navigating to:
https://grafana.YOUR-SYSTEM-DOMAIN
For more information about using Grafana dashboards, see the Healthwatch for VMware Tanzu documentation.
Use the Indicator Protocol Dashboard
Note: This section requires Healthwatch v1.x.
You can use the Indicator Protocol dashboard to view Key Performance Indicators (KPIs) for Tanzu SQL for VMs service instances. The dashboard includes a graphical representation of the KPIs and information about recommended alert thresholds. For more information about the Indicator Protocol dashboard, see Indicator Protocol Dashboard (Beta). For information about Tanzu SQL for VMs KPIs, see KPIs for MySQL Service Instances below.
To access KPIs using the Indicator Protocol dashboard:
- Install the Healthwatch tile in Ops Manager. See Installing and Configuring Healthwatch.
- Enable the Indicator Protocol add-on in Ops Manager. See Enable the Indicator Protocol Add-on.
-
View the Indicator Protocol Dashboard by navigating to:
https://healthwatch.YOUR-SYSTEM-DOMAIN/indicator-dashboards
The following image shows an example of the Indicator Protocol dashboard for a highly available (HA) cluster service instance:

View a larger version of this image
Use Log Cache
To access metrics for Tanzu SQL for VMs service instances, you can use Loggregator's Log Cache feature with the Log Cache CLI plugin. Log Cache is enabled by default.
Note: To use this feature the V2 Firehose must be activated and Enable Log Cache syslog ingestion must be deactivated in the TAS for VMs tile. For more information about configuring these checkboxes, see Enable Syslog Forwarding.
To access metrics for on-demand service instances:
- Install the cf CLI plugin by running:
cf install-plugin -r CF-Community "log-cache"
- To access metrics for a service instance, run:
cf tail SERVICE-INSTANCE-NAME
Two useful flags to append to this command are:
-
-f
, --follow
: Append metrics logs to stdout as they are generated, rather than returning a fixed number of metrics or metrics over a fixed interval.
-
--json
: Output metrics logs as envelopes in JSON format. For a complete list of cf tail
options, see the Log Cache CLI repository in GitHub.
Where SERVICE-INSTANCE-NAME
is the name of your service instance.
For example:
$ cf tail -f my-instance | egrep 'connected|available|persistent_disk_used_percent'
2019-05-17T11:25:59.48-0700 [my-instance] GAUGE /p.mysql/available:1.000000 boolean
2019-05-17T11:26:29.49-0700 [my-instance] GAUGE /p.mysql/system/persistent_disk_used_percent:17.000000 percentage
2019-05-17T11:26:29.49-0700 [my-instance] GAUGE /p.mysql/performance/threads_connected:**6**.000000 connection
2019-05-17T11:26:59.50-0700 [my-instance] GAUGE /p.mysql/available:1.000000 boolean
2019-05-17T11:27:29.50-0700 [my-instance] GAUGE /p.mysql/system/persistent_disk_used_percent:17.000000 percentage
2019-05-17T11:27:29.50-0700 [my-instance] GAUGE /p.mysql/performance/threads_connected:**7**.000000 connection
For more information about the metrics output, see
the Key Performance Indicators and Component Metrics sections below.
For more information about how to enable Log Cache and about the cf tail
command, see Enable Log Cache.
KPIs for MySQL Service Instances
KPIs are metrics for MySQL service instances that you can monitor for two purposes:
KPIs can be either raw component metrics or derived metrics generated by applying formulas to raw metrics.
VMware provides the following KPIs as general alerting and response guidance for typical Tanzu SQL for VMs installations. VMware recommends that you alter the alert measures by observing historical trends. You can also create your KPIs that are specific to your environment using the available component metrics.
For a list of all the Tanzu SQL for VMs component metrics, see Component Metrics below.
Component Metrics
In addition to the above KPIs, the MySQL service emits the followings metrics for monitoring and alerting:
MySQL Metrics
The metrics that all Tanzu SQL for VMs service instances emit:
/p.mysql/available |
Description |
Indicates if the local database server is available and responding. |
Unit |
boolean |
/p.mysql/variables/max_connections |
Description |
The maximum permitted number of simultaneous client connections. |
Unit |
count |
/p.mysql/variables/open_files_limit |
Description |
The number of files that the operating system permits mysqld to open. |
Unit |
files |
/p.mysql/variables/read_only |
Description |
Whether the server is in read-only mode |
Unit |
boolean |
/p.mysql/performance/questions |
Description |
The number of statements executed by the server since the server started or the last FLUSH STATUS . This includes only statements sent to the server by clients and not statements executed within stored programs, unlike the queries variable. |
Unit |
count |
/p.mysql/performance/queries |
Description |
The number of statements executed by the server, excluding COM_PING and COM_STATISTICS . Differs from questions in that it also counts statements executed within stored programs. Not affected by FLUSH STATUS . |
Unit |
count |
/p.mysql/performance/queries_delta |
Description |
The change in the /performance/queries metric since the last time it was emitted. |
Unit |
integer greater than or equal to zero |
/p.mysql/innodb/buffer_pool_pages_free |
Description |
The amount of free space, measured in pages, in the InnoDB Buffer Pool. |
Unit |
count |
/p.mysql/innodb/buffer_pool_pages_total |
Description |
The total amount of free space, measured in pages, in the InnoDB Buffer Pool containing data. |
Unit |
count |
/p.mysql/innodb/buffer_pool_pages_data |
Description |
The number of pages in the InnoDB Buffer Pool containing data. The number includes both dirty and clean pages. |
Unit |
count |
/p.mysql/innodb/row_lock_current_waits |
Description |
The number of row locks currently being waited for by operations on InnoDB tables. |
Unit |
count |
/p.mysql/innodb/data_read |
Description |
The amount of data read since the server started. |
Unit |
bytes |
/p.mysql/innodb/data_written |
Description |
The amount of data written since the server started. |
Unit |
bytes |
/p.mysql/innodb/mutex_os_waits |
Description |
The number of mutex OS waits. |
Unit |
events/second |
/p.mysql/innodb/mutex_spin_rounds |
Description |
The number of mutex spin rounds. |
Unit |
events/second |
/p.mysql/innodb/mutex_spin_waits |
Description |
The number of mutex spin waits. |
Unit |
events/second |
/p.mysql/innodb/os_log_fsyncs |
Description |
The number of fsync() writes done to the InnoDB redo log files. |
Unit |
count |
/p.mysql/innodb/row_lock_time |
Description |
Total time spent in acquiring row locks. |
Unit |
milliseconds |
/p.mysql/innodb/row_lock_waits |
Description |
The number of times a row lock had to be waited for since the server started. |
Unit |
count |
/p.mysql/net/connections |
Description |
The number of connection attempts to the server, both successful and unsuccessful, to the MySQL server. |
Unit |
count |
/p.mysql/net/max_used_connections |
Description |
The maximum number of connections that have been in use simultaneously since the server started. |
Unit |
count |
/p.mysql/performance/com_delete |
Description |
The number of delete commands since the server started or the last FLUSH STATUS . |
Unit |
count |
/p.mysql/performance/com_delete_multi |
Description |
The number of delete-multi commands since the server started or the last FLUSH STATUS . Applies to DELETE statements that use multiple-table syntax. |
Unit |
count |
/p.mysql/performance/com_insert |
Description |
The number of insert commands since the server started or the last FLUSH STATUS . |
Unit |
count |
/p.mysql/performance/com_insert_select |
Description |
The number of insert-select commands since the server started or the last FLUSH STATUS . |
Unit |
count |
/p.mysql/performance/com_replace_select |
Description |
The number of replace-select commands since the server started or the last FLUSH STATUS . |
Unit |
count |
/p.mysql/performance/com_select |
Description |
The number of select commands since the server started or the last FLUSH STATUS . If a query result is returned from query cache, the server increments the qcache_hits status variable, not com_select . |
Unit |
count |
/p.mysql/performance/com_update |
Description |
The number of update commands since the server started or the last FLUSH STATUS . |
Unit |
count |
/p.mysql/performance/com_update_multi |
Description |
The number of update-multi commands since the server started or the last FLUSH STATUS . Applies to UPDATE statements that use multiple-table syntax. |
Unit |
count |
/p.mysql/performance/created_tmp_disk_tables |
Description |
The number of internal on-disk temporary tables created by the server while executing statements. |
Unit |
count |
/p.mysql/performance/created_tmp_files |
Description |
The number of temporary files created by mysqld. |
Unit |
count |
/p.mysql/performance/created_tmp_tables |
Description |
The number of internal temporary tables created by the server while executing statements. |
Unit |
count |
/p.mysql/performance/cpu_utilization_percent |
Description |
The percent of the CPU in use by all processes on the MySQL node. |
Unit |
percent |
/p.mysql/performance/open_files |
Description |
The number of regular files currently open, which were opened by the server. |
Unit |
count |
/p.mysql/performance/open_tables |
Description |
The number of tables that are currently open. |
Unit |
count |
/p.mysql/performance/opened_tables |
Description |
The number of tables that have been opened. |
Unit |
count |
/p.mysql/performance/open_table_definitions |
Description |
The number of currently cached table definitions or .frm files. |
Unit |
count |
/p.mysql/performance/opened_table_definitions |
Description |
The number of .frm files that have been cached. |
Unit |
count |
/p.mysql/performance/qcache_hits |
Description |
The number of query cache hits. The query cache and qcache_hits metric is deprecated as of MySQL v5.7.20. |
Unit |
count |
/p.mysql/performance/slow_queries |
Description |
The number of queries that have taken more than long_query_time seconds. |
Unit |
count |
/p.mysql/performance/table_locks_waited |
Description |
The total number of times that a request for a table lock could not be granted immediately and a wait was needed. |
Unit |
count |
/p.mysql/performance/threads_connected |
Description |
The number of currently open connections. |
Unit |
count |
/p.mysql/performance/threads_running |
Description |
The number of threads that are not sleeping. |
Unit |
count |
/p.mysql/rpl_semi_sync_master_tx_avg_wait_time |
Description |
The average time the leader has waited for the follower to accept transactions. |
Unit |
microseconds |
/p.mysql/rpl_semi_sync_master_no_tx |
Description |
The number of transactions committed without follower acknowledgement. |
Unit |
count |
/p.mysql/rpl_semi_sync_master_wait_sessions |
Description |
The current number of connections waiting for a sync commit. For more information about sync replication, see About Synchronous Replication. |
Unit |
count |
Disk Metrics
The disk usage metrics that all Tanzu SQL for VMs services emit:
/p.mysql/system/persistent_disk_used_percent |
Description |
The percentage of disk used on the persistent file system. |
Unit |
percent |
/p.mysql/system/persistent_disk_used |
Description |
The amount of space used on the persistent disk. |
Unit |
KB |
/p.mysql/system/persistent_disk_free |
Description |
The amount of space available on the persistent disk. |
Unit |
KB |
/p.mysql/system/persistent_disk_inodes_used_percent |
Description |
The percentage of persistent disk inodes used by both the system and user applications. |
Unit |
percent |
/p.mysql/system/persistent_disk_inodes_used |
Description |
The number of inodes used on the persistent disk. |
Unit |
count |
/p.mysql/system/persistent_disk_inodes_free |
Description |
The number of inodes available on the persistent disk. |
Unit |
count |
/p.mysql/system/ephemeral_disk_used_percent |
Description |
The percentage of disk used on the ephemeral file system. |
Unit |
percent |
/p.mysql/system/ephemeral_disk_used |
Description |
The amount of space used on the ephemeral disk. |
Unit |
KB |
/p.mysql/system/ephemeral_disk_free |
Description |
The amount of space available on the ephemeral disk. |
Unit |
KB |
/p.mysql/system/ephemeral_disk_inodes_used_percent |
Description |
The percentage of ephemeral disk inodes used by both the system and user applications. |
Unit |
percent |
/p.mysql/system/ephemeral_disk_inodes_used |
Description |
The number of inodes used on the ephemeral disk. |
Unit |
count |
/p.mysql/system/ephemeral_disk_inodes_free |
Description |
The number of inodes available on the ephemeral disk. |
Unit |
count |
Leader-Follower Metrics
The metrics that leader-follower VMs emit:
/p.mysql/follower/is_follower |
Description |
Shows whether a node is the follower VM. |
Unit |
boolean |
/p.mysql/follower/seconds_behind_master |
Description |
The number of seconds the follower is behind in applying writes from the leader. For example, a follower might have copied writes from the leader that are timestamped up to 11:00am, but has only applied writes up to 8:00am. Normal values for this metric depend on your apps. |
Unit |
seconds |
/p.mysql/follower/seconds_since_leader_heartbeat |
Description |
The number of seconds that elapse between the leader heartbeat and the replication of the heartbeat in the follower. You can use this metric to determine how far behind the follower is from the leader. Normal values for this metric depends on your app. |
Unit |
seconds |
/p.mysql/follower/relay_log_space |
Description |
The total size of all existing relay log files. |
Unit |
bytes |
/p.mysql/follower/slave_io_running |
Description |
Shows whether the I/O thread has started and has connected to the leader VM. |
Unit |
boolean |
/p.mysql/follower/slave_sql_running |
Description |
Shows whether the SQL thread has started. |
Unit |
boolean |
Highly Available Cluster Metrics
The metrics that HA clusters emit:
/p.mysql/galera/wsrep_cluster_size |
Description |
The current number of nodes in the HA cluster. |
Unit |
count |
/p.mysql/galera/wsrep_local_recv_queue |
Description |
The current length of the local receive queue, in messages. |
Unit |
count |
/p.mysql/galera/wsrep_local_send_queue |
Description |
The current length of the local send queue, in messages. |
Unit |
count |
/p.mysql/galera/wsrep_local_index |
Description |
This node index in the cluster (base 0). |
Unit |
count |
/p.mysql/galera/wsrep_local_state |
Description |
The local state of the node. Possible states include:
- 1 =
JOINING
- 2 =
DONOR/DESYNCED
- 3 =
JOINED
- 4 =
SYNCED
|
Unit |
integer |
/p.mysql/galera/wsrep_ready |
Description |
Shows whether the node can accept queries. |
Unit |
boolean |
/p.mysql/galera/wsrep_cluster_status |
Description |
Shows the primary status of the cluster component that the node is in. Values are:
-
Primary : Node has a quorum.
-
Non-primary : Node has lost a quorum.
-
Disconnected : Node is unable to connect to other nodes.
|
Unit |
Status code |
/p.mysql/galera/wsrep_flow_control_paused |
Description |
Proportion of time, as a unit interval (0 to 1), that replication was paused due to flow control since the server started or last FLUSH STATUS . This metric is a measure of how much replication lag is slowing down the cluster. |
Unit |
float |
/p.mysql/galera/wsrep_flow_control_sent |
Description |
Number of FC_PAUSE or flow control pause events sent by this node. Unlike many status variables, the counter for this metric does not reset every time you run the query. |
Unit |
count |
/p.mysql/galera/wsrep_flow_control_recv |
Description |
Number of FC_PAUSE or flow control pause events received by this node. Unlike many status variables, the counter for this metric does not reset every time you run the query. |
Unit |
count |