Monitoring is a vital component of maintaining the availability and performance of your databases.
VMware Data Services Manager collects health and metric data for each database. You can view this data and use it to track the resource consumption, performance, and activity of your databases.
VMware Data Services Manager uses the database Status to reflect the availability the database, and in some cases to identify an in-progress operation or a critical operation that failed. The Status of a database is also affected by alerts that VMware Data Services Manager may trigger on the database.
The Status of a database may be one of the following values:
Status | Description |
---|---|
BACKUP_IN_PROGRESS | A backup of the database is in progress. |
CONTROL_PLANE_UPDATE_IN_PROGRESS | A control plane software update is in progress. |
CONTROL_PLANE_UPDATE_FAILED | A control plane software update failed. |
CRITICAL | The Service Engine has at least one CRITICAL-level alert and no LOST_CONNECTIVITY or FATAL alert. |
DB_ENGINE_UPDATE_IN_PROGRESS | A Service Engine software update is in progress. |
DB_ENGINE_UPDATE_FAILED | A Service Engine software update failed. |
DELETED | The database has been deleted. |
DELETING | Deletion of the database has been initiated. |
ERROR | Creation of the database failed. |
FATAL | The database has at least one FATAL-level alert and no LOST_CONNECTIVITY alerts. |
INIT | A create operation has been initiated for the database. |
LOST_CONNECTIVITY | The database has at least one LOST_CONNECTIVITY alert. |
ONLINE | The database was created successfully, is operating, and has no outstanding alerts. |
OS_UPDATE_IN_PROGRESS | An operating system software update is in progress. |
OS_UPDATE_FAILED | An operating system software update failed. |
POWEREDOFF | The database VM is powered off. |
POWEREDON | The database VM is running, but awaiting a health check. |
WARNING | The database has at least one WARNING-level alert and no LOST_CONNECTIVITY, FATAL, or CRITICAL alerts. |
The Status of a database is an indicator of the overall health of the database. You can view database status using the VMware Data Services Manager console or API.
Perform the following procedure to examine the status of a database:
Select Databases from the left navigation pane.
This action displays the Databases view, a table that lists the provisioned database databases.
Examine the databases listed in the table, identify the database of interest, and navigate to that table row.
Examine the Status.
The health of a database reflects the status of the services running in the VM, the status of certain resources that it consumes, and its connectivity to internal and external components. Database health is directly related to alerts that VMware Data Services Manager may have triggered for the database. Refer to Database Alerts for details on the types and severity of alerts that VMware Data Services Manager may trigger for an database.
Perform the following procedure to view the health of a database:
Select Databases from the left navigation pane.
This action displays the Databases view, a table that lists the provisioned databases.
Examine the databases listed in the table, identify the database of interest, and navigate to that table row.
Click the database VM Name.
The database information Details tab displays.
Select the Monitoring tab.
This action displays monitoring data for the database.
View the Health Status information.
Select the drop down menu in the top right corner to change the time-series aggregation period.
VMware Data Services Manager triggers an alert on a database when it encounters connectivity or resource issues on the database. You monitor these alerts in the Database Alerts view.
The Database Alerts view displays the following information:
VMware Data Services Manager triggers alerts of the following levels:
<Any new DB alerts added/modified/deleted ?>
Alerts that VMware Data Services Manager may trigger on a database include the following:
Alert Name | Threshold | → Status | Alert Level | Triggered When | Impact | |
---|---|---|---|---|---|---|
LOST_CONNECTIVITY | N/A | Lost Connectivity | CRITICAL | The database is unreachable by VMware Data Services Manager. | VMware Data Services Manager cannot reach the database but the Database client can reach it through the Application network. | |
CPU_HEALTH_ALERT | 90% | Critical | CRITICAL | vCPU utilization has reached 90%. | If sufficient CPU bandwidth is not available, all Database VM and DB engine operations can get impacted and can result in failure. | |
CPU_HEALTH_ALERT | 70% | Warning | WARNING | vCPU utilization has reached 70%. | If sufficient CPU bandwidth is not available, all Database VM and DB engine operations can get impacted and can result in failure. | |
DATA_DISK_HEALTH_ALERT | 90% | Fatal | CRITICAL | The data disk has reached 90% capacity. | If sufficient Database disk bandwidth is not available, all Database VM and DB engine operations can get impacted and can result in failure. Operations with respect to SSL configuration, backup, read replica, clone, binary logs, and certificate refresh of databases are affected. | |
DATA_DISK_HEALTH_ALERT | 70% | Warning | WARNING | The data disk has reached 70% capacity. | If sufficient Database disk bandwidth is not available, all Database VM and DB engine operations can get impacted and can result in failure. | |
DATABASE_BIN_LOG_ALERT | N/A | Critical | CRITICAL | The transactions logs on the database are not getting copied to local storage. | PITR and Read replica operations are affected. | |
DATABASE_BIN_LOG_CLOUD_SYNC_ALERT | N/A | Warning | WARNING | The transaction logs on local and cloud storage are not in sync. | Database bin logs are not copied to the cloud storage. | |
DATABASE_SERVICE_ALERT | N/A | Fatal | CRITICAL | The database database engine is down. | Database client cannot reach Database engine and operations with respect to scale configuration, backup, read replica, clone, bin logs, and certificate refresh of databases are affected. | |
MAX_CONNECTIONS_ALERT | 90% | Critical | CRITICAL | The number of open connections to the database is approaching the maximum connections allowed. | No new database connections are allowed from the database client. This alert is only available for MySQL and PostgreSQL databases. | |
MAX_CONNECTIONS_ALERT | 80% | Warning | WARNING | The number of open connections to the database has reached 80% of the maximum connections allowed. | No impact. This alert is only available for MySQL and PostgreSQL databases. | |
METRICS_ALERT | N/A | Warning | WARNING | VMware Data Services Manager cannot pull metrics from the database. | Database metrics are not tracked on the Provider console. | |
NTP_SYNC_ALERT | N/A | Critical | CRITICAL | The database NTP service is down or not in sync. | If the NTP Service is down for more than 15 minutes, transaction logs, monitoring metrics, and database updates are affected. | |
SYSTEM_DISK_HEALTH_ALERT | 90% | Critical | CRITICAL | The system disk has reached 90% capacity. | If sufficient system disk bandwidth is not available, all Database VM and DB engine operations can get impacted and can result in failure. | |
SYSTEM_DISK_HEALTH_ALERT | 70% | Warning | WARNING | The system disk has reached 70% capacity. | If sufficient system disk bandwidth is not available, all Database VM and DB engine operations can get impacted and can result in failure. | |
TELEGRAF_SERVICE_ALERT | N/A | Warning | WARNING | The Telgraf service is not responding. | Database metrics are not tracked on the Provider console. | |
VM_PASSWORD_EXPIRY_ALERT | < = 15 days | Warning | WARNING | The password of the database VM is going to expire in less than or equal 15 days. | No impact. | |
VM_PASSWORD_EXPIRY_ALERT | < 0 days | Degraded | CRITICAL | The password of the database VM is expired. | All the functions that involve the database VM are impacted. |
In some cases, you can clear certain alerts by restarting the affected service on the database VM.
Alert Name | Affected Database OS Service Name |
---|---|
NTP_SYNC_ALERT | systemd-timesyncd |
TELEGRAF_SERVICE_ALERT | telegraf |
METRICS_ALERT | telegraf |
METRICS_ALERT1 | influxdb |
DATABASE_SERVICE_ALERT | dbengine |
1 If a METRICS_ALERT is raised on all databases, restart the influxdb.service
on the Agent VM.
To clear an alert:
SSH into the database or Agent VM.
Restart the affected service. For example:
user@servinstvm$ systemctl restart telegraf.service
If the Agent VM triggers a DATABASE_BIN_LOG_ALERT on a database, verify: the connection to Local Storage, the data disk space, and the service Engine status.
VMware Data Services Manager collects metric data for each database. You can view this data and use it to track the resource consumption, performance, and activity of your databases.
A database is created with NORMAL or ENHANCED monitoring. The default monitoring for a database is ENHANCED monitoring. The metrics for which VMware Data Services Manager collects data in ENHANCED monitoring mode include the NORMAL metrics, plus additional service-specific metrics.
You can change the level of monitoring of a database after provisioning it from NORMAL or ENHANCED and vice-versa.
To modify the level of monitoring of a database:
Navigate to the DB Metrics pane in the Monitoring tab.
Click Edit.
The Monitoring Policy dialog displays.
Select Monitoring Type as either Normal or Enhanced, and then click SAVE.
Click the refresh icon on the top right corner of the pane to verify the change in the level of monitoring.
VMware Data Services Manager displays the following DB Metrics when NORMAL monitoring is enabled for a MySQL and PostgreSQL database:
VMware Data Services Manager displays the following DB Metrics when NORMAL monitoring is enabled for a Microsoft SQL Server database:
In addition to the PostgreSQL statatistics displayed for NORMAL monitoring, the PostgreSQL statistics displayed when ENHANCED monitoring is in effect include:
In addition to the MySQL statatistics displayed for NORMAL monitoring, the MySQL statistics displayed when ENHANCED monitoring is in effect include:
In addition to the Microsoft SQL Server statatistics displayed for NORMAL monitoring, the Microsoft SQL Server statistics displayed when ENHANCED monitoring is in effect include:
You view metric data for a database in the Databases view, database Monitoring tab, DB Metrics pane.
By default, VMware Data Services Manager displays the last 3 hours of aggregated metric data. You can change this time period (calculated from current time) via a drop-down in the upper right corner of the DB Metrics pane.