Alerts (VMware Tanzu Application Service)

The Management Pack for VMware Tanzu Application Service creates alerts (and in most cases provides recommended actions) based on various symptoms it detects in your VMware Tanzu Application Service Environment. See the table below for the list of alerts available in the management pack.

Note: The alerts below are based on VMware Tanzu Application Service KPI best practices. For more information on VMware Tanzu Application Service KPIs, see: VMware Tanzu Application Service's Key Performance Indicators documentation.

Name	Description	Symptom	Recommendation
Cloud Controller Memory Usage is High	This alert indicates that the percentage of memory usage is high	Percentage of Memory Used is High
Cloud Controller Memory Usage is Very High	This alert indicates that the percentage of memory usage is very high	Percentage of Memory Used is Very High
Firehose Throughput delta is high	Firehose log receiver may need to be scaled up.	Total messages received across all Doppler listeners is high	Pivotal recommends that you do not scale down these components on flat or downward delta trends because unexpected spikes in throughput can cause log loss if not scaled appropriately.
Truncating Buffer Dropped Messages is Greater Than 0	This alert indicates that the nozzle or the TrafficController is not keeping up.	Truncating Buffer Dropped Messages is Greater Than 0	In order to reduce logging message loss try scaling up the nozzle or the Traffic Controller.
Doppler Server has Firehose Dropped Messages Critical	This alert indicates that a Critical alert was raised in pcf.	Firehose Dropped Messages Critical	Scale up the Firehose log receiver and Dopplers.
Doppler Server has Firehose Dropped Messages Warning	This alert indicates that a Warning alert was raised in pcf.	Firehose Dropped Messages Warning	Scale up the Firehose log receiver and Dopplers.
total routes delta is low	This alert indicates that the total routes delta is low.	Total routes delta is High	1. For capacity needs, scale up or down the Gorouter VMs as necessary. 2. For significant drops in current total routes, see the gorouter.ms_since_last_registry_update metric value for additional context. 3. Check the Gorouter and Route Emitter logs to see if they are experiencing issues when connecting to NATS. 4. Check the BOSH logs to see if the NATS, Gorouter, or Route Emitter VMs are failing. 5. Look broadly at the health of all VMs, particularly Diego-related VMs. 6. If problems persist, pull the Gorouter and Route Emitter logs and contact Pivotal Support.
total routes delta is high	This alert indicates that the total routes delta is high.	Total routes delta is High	1. For capacity needs, scale up or down the Gorouter VMs as necessary. 2. For significant drops in current total routes, see the gorouter.ms_since_last_registry_update metric value for additional context. 3. Check the Gorouter and Route Emitter logs to see if they are experiencing issues when connecting to NATS. 4. Check the BOSH logs to see if the NATS, Gorouter, or Route Emitter VMs are failing. 5. Look broadly at the health of all VMs, particularly Diego-related VMs. 6. If problems persist, pull the Gorouter and Route Emitter logs and contact Pivotal Support.
Server Errors Rate is high	This alert indicates that an app may be crashing.	Server Errors rate is High	1. Look for out-of-memory errors and other app-level errors. 2. As a temporary measure, ensure that the troublesome app is scaled to more than one instance.
Number of 502 Bad Gateways is high	This alert indicates that route tables may be stale.	Number of 502 Bad Gateways is High	1. Check the Gorouter and Route Emitter logs to see if they are experiencing issues when connecting to NATS. 2. Check the BOSH logs to see if the NATS, Gorouter, or Route Emitter VMs are failing. 3. Look broadly at the health of all VMs, particularly Diego-related VMs. 4. If problems persist, pull Gorouter and Route Emitter logs and contact Pivotal Support to say there has been an unusual increase in Gorouter bad gateway responses.
router handling latency average is high	This alert indicates that a gorouter job may be impacting responsiveness	Router handling latency is High	Extended periods of high latency can point to several factors. The Gorouter latency measure includes network and app latency impacts as well. 1. First inspect logs for network issues and indications of misbehaving apps. 2. If it appears that the Gorouter needs to scale due to ongoing traffic congestion, do not scale on the latency metric alone. You should also look at the CPU utilization of the Gorouter VMs and keep it within a maximum 60-70% range. 3. Resolve high utilization by scaling the Gorouter.
backend exhausted connections is high	This alert indicates that PCF may have one or more unresponsive applications	Backend Exhausted Connections is High	1. If gorouter.backend_exhausted_conns spikes, first look to the Router Throughput metric gorouter.total_requests to determine if this measure is high or low in relation to normal bounds for this deployment. 2. If Router Throughput appears within normal bounds, it is likely that gorouter.backend_exhausted_conns is spiking due to an unresponsive application, possibly due to application code issues or underlying application dependency issues. To help determine the problematic application, look in access logs for repeated calls to one application. Then proceed to troubleshoot this application accordingly. 3. If Router Throughput also shows unusual spikes, the cause of the increase in gorouter.backend_exhausted_conns spikes is likely external to the platform. Unusual increases in load may be due to expected business events driving additional traffic to applications. Unexpected increases in load may indicate a DDoS attack risk.
GoRouter has Maximum file descriptors Critical	This alert indicates that a Warning alert was raised in pcf.	Maximum file descriptors Critical	1. Identify which app(s) are requesting excessive connections and resolve the impacting issues with these apps. 2. If the above recommended mitigation steps have not already been taken, do so. 3. Consider adding more Gorouter VM resources to increase the number of available file descriptors.
GoRouter has Maximum file descriptors Warning	This alert indicates that a Warning alert was raised in pcf.	Maximum file descriptors Warning	1. Identify which app(s) are requesting excessive connections and resolve the impacting issues with these apps. 2. If the above recommended mitigation steps have not already been taken, do so. 3. Consider adding more Gorouter VM resources to increase the number of available file descriptors.
GoRouter has Time Since Last Route Register Received Critical	This alert indicates that a Critical alert was raised in pcf.	Time Since Last Route Register Received Critical	1. Search the Gorouter and Route Emitter logs for connection issues to NATS. 2. Check the BOSH logs to see if the NATS, Gorouter, or Route Emitter VMs are failing. 3. Look more broadly at the health of all VMs, particularly Diego-related VMs. 4. If problems persist, pull the Gorouter and Route Emitter logs and contact Pivotal Support to say there are consistently long delays in route registry.
Route Emitter Sync Pass Duration Max is High	If all or many jobs showing as impacted, there is likely an issue with Diego.	Route Emitter Sync Pass Duration Max is High	1. Investigate the Route Emitter and Diego BBS logs for errors. 2. Verify that app routes are functional by making a request to an app, pushing an app and pinging it, or if applicable, checking that your smoke tests have passed. If one or a few jobs showing as impacted, there is likely a connectivity issue and the impacted job should be investigated further.
Disk Usage is High	This alert indicates that the disk usage for this cell is high	Disk Usage is high
Total Amount of Disk Space Available is Very Low	This alert indicates that the disk usage for this cell is very high	Disk Usage is Very High
Total Amount of Memory Available is Low	This alert indicates that the amount of memory available for this cell to allocate to containers is low	Total Amount of Memory Available is Low
Diego Cell has Remaining Disk Available Critical	This alert indicates that a Critical alert was raised in pcf.	Remaining Disk Available Critical	1. Assign more resources to the cells or assign more cells. 2. Scale additional Diego cells via Ops Manager.
Diego Cell has Remaining Disk Available Warning	This alert indicates that a Warning alert was raised in pcf.	Remaining Disk Available Warning	1. Assign more resources to the cells or assign more cells. 2. Scale additional Diego cells via Ops Manager.
Diego Cell has Unhealthy Cells Critical	This alert indicates that a Critical alert was raised in pcf.	Unhealthy Cells Critical	1. Investigate BBS logs for faults and errors. 2. If a particular cell or cells appear problematic, pull logs for that cell, as well as the BBS logs before contacting Pivotal Support.
Diego Cell has Overall Remaining Memory Available Critical	This alert indicates that a Critical alert was raised in pcf.	Overall Remaining Memory Available Critical	1. Assign more resources to the cells or assign more cells. 2. Scale additional Diego cells via Ops Manager.
Diego Cell has Overall Remaining Memory Available Warning	This alert indicates that a Warning alert was raised in pcf.	Overall Remaining Memory Available Warning	1. Assign more resources to the cells or assign more cells. 2. Scale additional Diego cells via Ops Manager.
Diego Cell has Cell Rep Time to Sync Critical	This alert indicates that a Critical alert was raised in pcf.	Cell Rep Time to Sync Critical	1. Investigate BBS logs for faults and errors. 2. If a particular cell or cells appear problematic, pull logs for the cells and the BBS logs before contacting Pivotal Support.
Diego Cell has Cell Rep Time to Sync Warning	This alert indicates that a Warning alert was raised in pcf.	Cell Rep Time to Sync Warning	1. Investigate BBS logs for faults and errors. 2. If a particular cell or cells appear problematic, pull logs for the cells and the BBS logs before contacting Pivotal Support.
Diego Brain has Auctioneer App Instance (AI) Placement Failures Critical	This alert indicates that a Critical alert was raised in pcf.	Auctioneer App Instance (AI) Placement Failures Critical	1. To best determine the root cause, examine the Auctioneer logs. Depending on the specific error and resource constraint, you may also find a failure reason in the Cloud Controller (CC) API. 2. Investigate the health of your Diego cells to determine if they are the resource type causing the problem. 3. Consider scaling additional cells using Ops Manager. 4. If scaling cells does not solve the problem, pull Diego brain logs and BBS node logs and contact Pivotal Support telling them that LRP auctions are failing.
Diego Brain has Auctioneer App Instance (AI) Placement Failures Warning	This alert indicates that a Warning alert was raised in pcf.	Auctioneer App Instance (AI) Placement Failures Warning	1. To best determine the root cause, examine the Auctioneer logs. Depending on the specific error and resource constraint, you may also find a failure reason in the Cloud Controller (CC) API. 2. Investigate the health of your Diego cells to determine if they are the resource type causing the problem. 3. Consider scaling additional cells using Ops Manager. 4. If scaling cells does not solve the problem, pull Diego brain logs and BBS node logs and contact Pivotal Support telling them that LRP auctions are failing.
Diego Brain has Auctioneer Task Placement Failures Critical	This alert indicates that a Critical alert was raised in pcf.	Auctioneer Task Placement Failures Critical	1. In order to best determine the root cause, examine the Auctioneer logs. Depending on the specific error or resource constraint, you may also find a failure reason in the CC API. 2. Investigate the health of Diego cells. 3. Consider scaling additional cells using Ops Manager. 4. If scaling cells does not solve the problem, pull Diego brain logs and BBS logs for troubleshooting and contact Pivotal Support for additional troubleshooting. Inform Pivotal Support that Task auctions are failing.
Diego Brain has Auctioneer Task Placement Failures Warning	This alert indicates that a Warning alert was raised in pcf.	Auctioneer Task Placement Failures Warning	1. In order to best determine the root cause, examine the Auctioneer logs. Depending on the specific error or resource constraint, you may also find a failure reason in the CC API. 2. Investigate the health of Diego cells. 3. Consider scaling additional cells using Ops Manager. 4. If scaling cells does not solve the problem, pull Diego brain logs and BBS logs for troubleshooting and contact Pivotal Support for additional troubleshooting. Inform Pivotal Support that Task auctions are failing.
Diego Brain has Auctioneer Time to Fetch Cell State Critical	This alert indicates that a Critical alert was raised in pcf.	Auctioneer Time to Fetch Cell State Critical	1. Check the health of the cells by reviewing the logs and looking for errors. 2. Review IaaS console metrics. 3. Pull Diego brain logs and cell logs and contact Pivotal Support telling them that fetching cell states is taking too long.
Diego Brain has Auctioneer Time to Fetch Cell State Warning	This alert indicates that a Warning alert was raised in pcf.	Auctioneer Time to Fetch Cell State Warning	1. Check the health of the cells by reviewing the logs and looking for errors. 2. Review IaaS console metrics. 3. Pull Diego brain logs and cell logs and contact Pivotal Support telling them that fetching cell states is taking too long.
Diego Brain has Auctioneer Lock Lost Critical	This alert indicates that a Critical alert was raised in pcf.	Auctioneer Lock Lost Critical	1. Run monit status on the Diego Database VM to check for failing processes. 2. If there are no failing processes, then review the logs for Auctioneer. - Recent logs for Auctioneer should show all but one of its instances are currently waiting on locks, and the active Auctioneer should show a record of when it last attempted to execute work. This attempt should correspond to app development activity, such as cf push. 3. If you are unable to resolve the issue, pull logs from the Diego BBS and Auctioneer VMs, which includes the Locket service component logs, and contact Pivotal Support.
BBS has Locket Active Presences is high	This alert indicates that the BBS has Locket Active Presences is high	Locket Active Presences is high	The response depends on the job the metric is associated with. If appropriate, scale affected jobs out and monitor for improvement.
BBS has VM Memory Used Critical	This alert indicates that the BBS has VM Memory Used Critical	BBS VM Memory Used Critical	The response depends on the job the metric is associated with. If appropriate, scale affected jobs out and monitor for improvement.
BBS has VM Memory Used Warning	This alert indicates that the BBS has VM Memory Used Warning	BBS VM Memory Used Warning	The response depends on the job the metric is associated with. If appropriate, scale affected jobs out and monitor for improvement.
BBS has VM Ephemeral Disk Used Critical	This alert indicates that the BBS has VM Ephemeral Disk Used Critical	BBS VM Ephemeral Disk Used Critical	1. Run bosh vms --details to view jobs on affected deployments. 2. Determine cause of the data consumption, and, if appropriate, increase disk space or scale out the affected jobs.
BBS has VM Ephemeral Disk Used Warning	This alert indicates that the BBS has VM Ephemeral Disk Used Warning	BBS VM Ephemeral Disk Used Warning	1. Run bosh vms --details to view jobs on affected deployments. 2. Determine cause of the data consumption, and, if appropriate, increase disk space or scale out the affected jobs.
BBS has VM Persistent Disk Used Critical	This alert indicates that the BBS has VM Persistent Disk Used Critical	BBS VM Persistent Disk Used Critical	1. Run bosh vms --details to view jobs on affected deployments. 2. Determine cause of the data consumption, and, if appropriate, increase disk space or scale out affected jobs.
BBS has VM Persistent Disk Used Warning	This alert indicates that the BBS has VM Persistent Disk Used Warning	BBS VM Persistent Disk Used Warning	1. Run bosh vms --details to view jobs on affected deployments. 2. Determine cause of the data consumption, and, if appropriate, increase disk space or scale out affected jobs.