About Monitoring Databases

Monitoring is a vital component of maintaining the availability and performance of your databases.

VMware Data Services Manager collects health and metric data for each database. You can view this data and use it to track the resource consumption, performance, and activity of your databases.

Database Status

VMware Data Services Manager uses the database Status to reflect the availability the database, and in some cases to identify an in-progress operation or a critical operation that failed. The Status of a database is also affected by alerts that VMware Data Services Manager may trigger on the database.

The Status of a database may be one of the following values:

Status	Description
BACKUP_IN_PROGRESS	A backup of the database is in progress.
CONTROL_PLANE_UPDATE_IN_PROGRESS	A control plane software update is in progress.
CONTROL_PLANE_UPDATE_FAILED	A control plane software update failed.
CRITICAL	The Service Engine has at least one CRITICAL-level alert and no LOST_CONNECTIVITY or FATAL alert.
DB_ENGINE_UPDATE_IN_PROGRESS	A Service Engine software update is in progress.
DB_ENGINE_UPDATE_FAILED	A Service Engine software update failed.
DELETED	The database has been deleted.
DELETING	Deletion of the database has been initiated.
ERROR	Creation of the database failed.
FATAL	The database has at least one FATAL-level alert and no LOST_CONNECTIVITY alerts.
INIT	A create operation has been initiated for the database.
LOST_CONNECTIVITY	The database has at least one LOST_CONNECTIVITY alert.
ONLINE	The database was created successfully, is operating, and has no outstanding alerts.
OS_UPDATE_IN_PROGRESS	An operating system software update is in progress.
OS_UPDATE_FAILED	An operating system software update failed.
POWEREDOFF	The database VM is powered off.
POWEREDON	The database VM is running, but awaiting a health check.
WARNING	The database has at least one WARNING-level alert and no LOST_CONNECTIVITY, FATAL, or CRITICAL alerts.

Viewing Database Status

The Status of a database is an indicator of the overall health of the database. You can view database status using the VMware Data Services Manager console or API.

Perform the following procedure to examine the status of a database:

Select Databases from the left navigation pane.

This action displays the Databases view, a table that lists the provisioned database databases.
Examine the databases listed in the table, identify the database of interest, and navigate to that table row.
Examine the Status.

Effect of Database Alerts on Database Status

Different alerts on a database VM affect the status of the database VM in the following ways:

If there are one or more alerts in LOST_CONNECTIVITY state, then the database status is LOST_CONNECTIVITY.
Even if there is one alert in FATAL state and none in LOST_CONNECTIVITY state, then the database status is FATAL.
Even if there is one alert in CRITICAL state and none in LOST_CONNECTIVITY or FATAL state, then the database status is CRITICAL.
Even if there is one alert in WARNING state and none in LOST_CONNECTIVITY, FATAL, or CRITICAL state, then the database status is WARNING.
When all the alerts are cleared, the database status becomes ONLINE.

Viewing Database Health

The health of a database reflects the status of the services running in the VM, the status of certain resources that it consumes, and its connectivity to internal and external components. Database health is directly related to alerts that VMware Data Services Manager may have triggered for the database. Refer to Database Alerts for details on the types and severity of alerts that VMware Data Services Manager may trigger for an database.

Perform the following procedure to view the health of a database:

Select Databases from the left navigation pane.

This action displays the Databases view, a table that lists the provisioned databases.
Examine the databases listed in the table, identify the database of interest, and navigate to that table row.
Click the database VM Name.

The database information Details tab displays.
Select the Monitoring tab.

This action displays monitoring data for the database.
View the Health Status information.
Select the drop down menu in the top right corner to change the time-series aggregation period.

Database Alerts

VMware Data Services Manager triggers an alert on a database when it encounters connectivity or resource issues on the database. You monitor these alerts in the Database Alerts view.

The Database Alerts view displays the following information:

The VM Name identifies the name of the database VM that has alerts.
The Status column identifies the status of the database.
The Critical column identifies the number of CRITICAL-level alerts associated with the database.
The Warning column identifies the number of WARNING-level alerts associated with the database.
The Environment column identifies the infrastructure on which the database is running.
The Triggered Time identifies the time at which the most recent alert was triggered.
The Owner column identifies the VMware Data Services Manager user that owns the database.

About the Alert Levels

VMware Data Services Manager triggers alerts of the following levels:

OK
ONLINE
WARNING
CRITICAL
FATAL
LOST_CONNECTIVITY

About the Alert Types

Alerts that VMware Data Services Manager may trigger on a database include the following:

Alert Name	Threshold	→ Status	Alert Level	Triggered When	Impact
LOST_CONNECTIVITY	N/A	Lost Connectivity	CRITICAL	The database is unreachable by VMware Data Services Manager.	VMware Data Services Manager cannot reach the database but the Database client can reach it through the Application network.
CPU_HEALTH_ALERT	90%	Critical	CRITICAL	vCPU utilization has reached 90%.	If sufficient CPU bandwidth is not available, all Database VM and DB engine operations can get impacted and can result in failure.
CPU_HEALTH_ALERT	70%	Warning	WARNING	vCPU utilization has reached 70%.	If sufficient CPU bandwidth is not available, all Database VM and DB engine operations can get impacted and can result in failure.
DATA_DISK_HEALTH_ALERT	90%	Fatal	CRITICAL	The data disk has reached 90% capacity.	If sufficient Database disk bandwidth is not available, all Database VM and DB engine operations can get impacted and can result in failure. Operations with respect to SSL configuration, backup, read replica, clone, binary logs, and certificate refresh of databases are affected.
DATA_DISK_HEALTH_ALERT	70%	Warning	WARNING	The data disk has reached 70% capacity.	If sufficient Database disk bandwidth is not available, all Database VM and DB engine operations can get impacted and can result in failure.
DATABASE_BIN_LOG_ALERT	N/A	Critical	CRITICAL	The transactions logs on the database are not getting copied to local storage.	PITR and Read replica operations are affected.
DATABASE_BIN_LOG_CLOUD_SYNC_ALERT	N/A	Warning	WARNING	The transaction logs on local and cloud storage are not in sync.	Database bin logs are not copied to the cloud storage.
DATABASE_SERVICE_ALERT	N/A	Fatal	CRITICAL	The database database engine is down.	Database client cannot reach Database engine and operations with respect to scale configuration, backup, read replica, clone, bin logs, and certificate refresh of databases are affected.
MAX_CONNECTIONS_ALERT	90%	Critical	CRITICAL	The number of open connections to the database is approaching the maximum connections allowed.	No new database connections are allowed from the database client. This alert is only available for MySQL and PostgreSQL databases.
MAX_CONNECTIONS_ALERT	80%	Warning	WARNING	The number of open connections to the database has reached 80% of the maximum connections allowed.	No impact. This alert is only available for MySQL and PostgreSQL databases.
METRICS_ALERT	N/A	Warning	WARNING	VMware Data Services Manager cannot pull metrics from the database.	Database metrics are not tracked on the Provider console.
NTP_SYNC_ALERT	N/A	Critical	CRITICAL	The database NTP service is down or not in sync.	If the NTP Service is down for more than 15 minutes, transaction logs, monitoring metrics, and database updates are affected.
SYSTEM_DISK_HEALTH_ALERT	90%	Critical	CRITICAL	The system disk has reached 90% capacity.	If sufficient system disk bandwidth is not available, all Database VM and DB engine operations can get impacted and can result in failure.
SYSTEM_DISK_HEALTH_ALERT	70%	Warning	WARNING	The system disk has reached 70% capacity.	If sufficient system disk bandwidth is not available, all Database VM and DB engine operations can get impacted and can result in failure.
TELEGRAF_SERVICE_ALERT	N/A	Warning	WARNING	The Telgraf service is not responding.	Database metrics are not tracked on the Provider console.
VM_PASSWORD_EXPIRY_ALERT	< = 15 days	Warning	WARNING	The password of the database VM is going to expire in less than or equal 15 days.	No impact.
VM_PASSWORD_EXPIRY_ALERT	< 0 days	Degraded	CRITICAL	The password of the database VM is expired.	All the functions that involve the database VM are impacted.

Clearing Alerts

In some cases, you can clear certain alerts by restarting the affected service on the database VM.

Alert Name	Affected Database OS Service Name
NTP_SYNC_ALERT¹	systemd-timesyncd
TELEGRAF_SERVICE_ALERT	telegraf
METRICS_ALERT	telegraf
METRICS_ALERT²	influxdb
DATABASE_SERVICE_ALERT	dbengine

¹ If an NTP_SYNC_ALERT is raised on a database VM and if NTP is not configured for the Agent VM, ensure that DHCP is configured for VMware Data Services Manager to perform the function of an NTP server.
² If a METRICS_ALERT is raised on all databases, restart the influxdb.service on the Agent VM.

To clear an alert:

SSH into the database or Agent VM.

Restart the affected service. For example:

user@servinstvm$ systemctl restart telegraf.service

Addressing Other Alerts

If the Agent VM triggers a DATABASE_BIN_LOG_ALERT on a database, SSH into the database VM and verify: the connection to Local Storage, the data disk space, and the service Engine status.

Database Metrics

VMware Data Services Manager collects metric data for each database. You can view this data and use it to track the resource consumption, performance, and activity of your databases.

A database is created with NORMAL or ENHANCED monitoring. The default monitoring for a database is ENHANCED monitoring. The metrics for which VMware Data Services Manager collects data in ENHANCED monitoring mode include the NORMAL metrics, plus additional service-specific metrics.

Modifying the Level of Monitoring

You can change the level of monitoring of a database after provisioning it from NORMAL or ENHANCED and vice-versa.

To modify the level of monitoring of a database:

Navigate to the DB Metrics pane in the Monitoring tab.
Click Edit.

The Monitoring Policy dialog displays.
Select Monitoring Type as either Normal or Enhanced, and then click SAVE.
Click the refresh icon on the top right corner of the pane to verify the change in the level of monitoring.

NORMAL Monitoring for MySQL and PostgreSQL Databases

VMware Data Services Manager displays the following DB Metrics when NORMAL monitoring is enabled for a MySQL and PostgreSQL database:

System Uptime - The time since the service or database VM restarted.
Mysql Uptime (MySQL)
Max Connections - The connection limit to the database.
Active Connections per Second (PostgreSQL)
Thread Resource Utilization (MySQL)
CPU Usage % - The ratio of used to allocated CPU.
Memory Usage % - The ratio of used to allocated memroy.
Disk Usage % - The ratio of used to allocated disk.

NORMAL Monitoring for Microsoft SQL Server Databases

VMware Data Services Manager displays the following DB Metrics when NORMAL monitoring is enabled for a Microsoft SQL Server database:

System Uptime - The time since the service or database VM restarted.
Sqlserver Uptime - The time since the service or Sqlserver VM restarted.
Open Connections - The number of connections across frquent intervals of 20 minutes.
CPU Usage % - The ratio of used to allocated CPU.
Memory Usage % - The ratio of used to allocated memroy.
Disk Usage % - The ratio of used to allocated disk.

ENHANCED Monitoring for MySQL and PostgreSQL Databases

In addition to the PostgreSQL statatistics displayed for NORMAL monitoring, the PostgreSQL statistics displayed when ENHANCED monitoring is in effect include:

Write Throughput
Read Throughput
Commits & Rollbacks
Deadlocks & Conflicts

In addition to the MySQL statatistics displayed for NORMAL monitoring, the MySQL statistics displayed when ENHANCED monitoring is in effect include:

Innodb Pool Size
Queries & Questions
Bytes Received & Sent
Slow Queries per Second
InnoDB Buffer Usage %
InnoDB Reads & Writes
Command Reads & Writes

ENHANCED Monitoring for Microsoft SQL Server Databases

In addition to the Microsoft SQL Server statatistics displayed for NORMAL monitoring, the Microsoft SQL Server statistics displayed when ENHANCED monitoring is in effect include:

Lock timeouts, Lock Waits & Deadlocks
Reads & Writes
Transactions (per Sec)
Disk Read & Write Latency

Viewing the Metrics

You view metric data for a database in the Databases view, database Monitoring tab, DB Metrics pane.

By default, VMware Data Services Manager displays the last 3 hours of aggregated metric data. You can change this time period (calculated from current time) via a drop-down in the upper right corner of the DB Metrics pane.