Monitoring is a vital component of maintaining the availability and performance of your Service Instances.

Data Management for VMware Tanzu collects health and metric data for each Service Instance. You can view this data and use it to track the resource consumption, performance, and activity of your instances.

Service Instance Status

Data Management for VMware Tanzu uses the Service Instance Status to reflect the availability the instance, and in some cases to identify an in-progress operation or a critical operation that failed. The Status of a Service Instance is also affected by alerts that Data Management for VMware Tanzu may trigger on the instance.

The Status of a Service Instance may be one of the following values:

Status Description
BACKUP_IN_PROGRESS A backup of the Service Instance is in progress.
CONTROL_PLANE_UPDATE_IN_PROGRESS A control plane software update is in progress.
CONTROL_PLANE_UPDATE_FAILED A control plane software update failed.
CRITICAL The Service Engine has at least one CRITICAL-level alert and no LOST_CONNECTIVITY or FATAL alert.
DB_ENGINE_UPDATE_IN_PROGRESS A Service Engine software update is in progress.
DB_ENGINE_UPDATE_FAILED A Service Engine software update failed.
DELETED The Service Instance has been deleted.
DELETING Deletion of the Service Instance has been initiated.
ERROR Creation of the Service Instance failed.
FATAL The Service Instance has at least one FATAL-level alert and no LOST_CONNECTIVITY alerts.
INIT A create operation has been initiated for the Service Instance.
LOST_CONNECTIVITY The Service Instance has at least one LOST_CONNECTIVITY alert.
ONLINE The Service Instance was created successfully, is operating, and has no outstanding alerts.
OS_UPDATE_IN_PROGRESS An operating system software update is in progress.
OS_UPDATE_FAILED An operating system software update failed.
POWEREDOFF Service Instance is powered off.
POWEREDON The Service Instance VM is running, but awaiting a health check.
WARNING The Service Instance has at least one WARNING-level alert and no LOST_CONNECTIVITY, FATAL, or CRITICAL alerts.

Viewing Service Instance Status

The Status of a Service Instance is an indicator of the overall health of the instance. You can view instance status using the Data Management for VMware Tanzu console or API.

Perform the following procedure to examine the status of a Service Instance:

  1. Select Databases from the left navigation pane.

    This action displays the Databases view, a table that lists the provisioned database Service Instances.

  2. Examine the databases listed in the table, identify the database of interest, and navigate to that table row.

  3. Examine the Status.

Viewing Service Instance Health

The health of a Service Instance reflects the status of the services running in the VM, the status of certain resources that it consumes, and its connectivity to internal and external components. Service Instance health is directly related to alerts that Data Management for VMware Tanzu may have triggered for the instance. Refer to Service Instance Alerts for details on the types and severity of alerts that DMS may trigger for an instance.

Perform the following procedure to view the health of a Service Instance:

  1. Select Databases from the left navigation pane.

    This action displays the Databases view, a table that lists the provisioned database Service Instances.

  2. Examine the databases listed in the table, identify the database of interest, and navigate to that table row.

  3. Click the database Instance Name.

    The database information Details tab displays.

  4. Select the Monitoring tab.

    This action displays monitoring data for the Service Instance.

  5. View the Health Status information.

  6. Select the drop down menu in the top right corner to change the time-series aggregation period.

Service Instance Alerts

Data Management for VMware Tanzu triggers an alert on a database Service Instance when it encounters connectivity or resource issues on the instance. You monitor these alerts in the Database Alerts view.

The Database Alerts view displays the following information:

  • The Instance Name identifies the name of the Service Instance that has alerts.
  • The Status column identifies the status of the Service Instance.
  • The Critical column identifies the number of CRITICAL-level alerts associated with the Service Instance.
  • The Warning column identifies the number of WARNING-level alerts associated with the Service Instance.
  • The Environment column identifies the infrastructure on which the instance is running.
  • The Triggered Time identifies the time at which the most recent alert was triggered.
  • The Owner column identifies the Data Management for VMware Tanzu user that owns the Service Instance.

About the Alert Levels

Data Management for VMware Tanzu triggers alerts of the following levels:

  • OK
  • ONLINE
  • WARNING
  • CRITICAL
  • FATAL
  • LOST_CONNECTIVITY

About the Alert Types

Alerts that Data Management for VMware Tanzu may trigger on a Service Instance include the following:

Alert Name Threshold → Status Alert Level Triggered When
LOST_CONNECTIVITY N/A Lost Connectivity CRITICAL The Service Instance is unreachable by DMS.
CPU_HEALTH_ALERT 90% Critical CRITICAL vCPU utilization has reached 90%.
CPU_HEALTH_ALERT 70% Warning WARNING vCPU utilization has reached 70%.
DATA_DISK_HEALTH_ALERT 90% Fatal CRITICAL The data disk has reached 90% capacity.
DATA_DISK_HEALTH_ALERT 70% Warning WARNING The data disk has reached 70% capacity.
DATABASE_BIN_LOG_ALERT N/A Critical CRITICAL The transactions logs on the Service Instance are not getting copied to local storage.
DATABASE_BIN_LOG_CLOUD_SYNC_ALERT N/A Warning WARNING The transaction logs on local and cloud storage are not in sync.
DATABASE_SERVICE_ALERT N/A Fatal CRITICAL The Service Instance database engine is down.
MAX_CONNECTIONS_ALERT 90% Critical CRITICAL The number of open connections to the Service Instance is approaching the maximum.
MAX_CONNECTIONS_ALERT 80% Warning WARNING The number of open connections to the Service Instance has reached 80% of the maximum.
METRICS_ALERT N/A Warning WARNING DMS cannot pull metrics from the Service Instance.
NTP_SYNC_ALERT N/A Critical CRITICAL The Service Instance NTP service is down or not in sync.
SYSTEM_DISK_HEALTH_ALERT 90% Critical CRITICAL The system disk has reached 90% capacity.
SYSTEM_DISK_HEALTH_ALERT 70% Warning WARNING The system disk has reached 70% capacity.
TELEGRAF_SERVICE_ALERT N/A Warning WARNING The Telgraf service is not responding.

Clearing Alerts

In some cases, you can clear certain alerts by restarting the affected service on the Service Instance VM.

Alert Name Affected VM OS Service Name
NTP_SYNC_ALERT systemd-timesyncd
TELEGRAF_SERVICE_ALERT telegraf
METRICS_ALERT telegraf
METRICS_ALERT1 influxdb
DATABASE_SERVICE_ALERT dbengine

1 If a METRICS_ALERT is raised on all Service Instances, restart the influxdb.service on the Agent VM.

To clear an alert:

  1. SSH into the Service Instance or Agent VM.

  2. Restart the affected service. For example:

    user@servinstvm$ systemctl restart telegraf.service
    

Addressing Other Alerts

If the Agent VM triggers a DATABASE_BIN_LOG_ALERT on a Service Instance, verify: the connection to Local Storage, the data disk space, and the service Engine status.

Service Instance Metrics

Data Management for VMware Tanzu collects metric data for each Service Instance. You can view this data and use it to track the resource consumption, performance, and activity of your instances.

A service instance is created with NORMAL or ENHANCED monitoring. The metrics for which Data Management for VMware Tanzu collects data in ENHANCED monitoring mode includes the NORMAL metrics, plus additional service-specific metrics.

NORMAL Monitoring

Data Management for VMware Tanzu displays the following DB Metrics when NORMAL monitoring is enabled for a Service Instance:

  • System Uptime - The time since the service or Service Instance VM restarted.
  • Mysql Uptime (MySQL)
  • Max Connections - The connection limit to the Service Instance.
  • Active Connections per Second (PostgreSQL)
  • Thread Resource Utilization (MySQL)
  • CPU Usage % - The ratio of used to allocated CPU.
  • Memory Usage % - The ratio of used to allocated memroy.
  • Disk Usage % - The ratio of used to allocated disk.

ENHANCED Monitoring

Additional PostgreSQL statistics displayed when ENHANCED monitoring is in effect include:

  • Write Throughput
  • Read Throughput
  • Commits & Rollbacks
  • Deadlocks & Conflicts

Additional MySQL statistics displayed when ENHANCED monitoring is in effect include:

  • Innodb Pool Size
  • Queries & Questions
  • Bytes Received & Sent
  • Slow Queries per Second
  • InnoDB Buffer Usage %
  • InnoDB Reads & Writes
  • Command Reads & Writes

Viewing the Metrics

You view metric data for a Service Instance in the Databases view, instance Monitoring tab, DB Metrics pane.

By default, Data Management for VMware Tanzu displays the last 3 hours of aggregated metric data. You can change this time period (calculated from current time) via a drop-down in the upper right corner of the DB Metrics tab.

check-circle-line exclamation-circle-line close-line
Scroll to top icon