Using debug logs for App Autoscaler

You can monitor App Autoscaler and its supporting VMware Tanzu Application Service for VMs (TAS for VMs) components through debug logging.

Debug logging in Autoscaler

Autoscaler does not record debug logs by default. When you configure Autoscaler to record debug logs, the list of logs that Autoscaler records includes debug logs with its normal logs. Debug logs include DEBUG in their log entries.

You can use the debug logs that Autoscaler generates to:

See whether developers of your apps are using Autoscaler to scale them, and measure the efficacy of apps that use or do not use Autoscaler. For more information, see Reviewing Autoscaler usage.
Verify that if TAS for VMs components are failing to sufficiently support Autoscaler. For more information, see Failure modes.

To configure Autoscaler to record debug logs, see Configure debug logging for Autoscaler.

To review the debug logs for Autoscaler, see Review Autoscaler debug logs.

Configure debug logging for Autoscaler

To configure Autoscaler to record debug logs:

In a terminal window, run:
```
cf target -o system -s autoscaling
```
Set the logging level for Autoscaler to debug by running:
```
cf set-env autoscale LOG_LEVEL debug
```
Restart Autoscaler by running:
```
cf restart autoscale
```
Note Starting Autoscaler again briefly interrupts any processes it is performing.

Review Autoscaler debug logs

To review the debug logs for Autoscaler:

In a terminal window, run:
```
cf target -o system -s autoscaling
```
Run:
```
cf logs autoscale
```
In the terminal output, all logs for Autoscaler appear, including debug logs. Debug logs include DEBUG in their log entries.

Reviewing Autoscaler use

As an operator, you can use debug logs to see whether developers of your apps are using Autoscaler to scale them, and you can measure the efficacy of apps that use Autoscaler.

If you configured Autoscaler correctly, it logs the following metrics:

autoscale.scale_up: The number of times Autoscaler has scaled an app up.
autoscale.scale_down: The number of times Autoscaler has scaled an app down.
autoscale.enabled_bindings: The number of Autoscaler service bindings created.

Note
These metrics appear as log entries and are not emitted to Loggregator.

To view these metrics, see Review Autoscaler debug logs.

You can use one of the following methods to identify which apps are using Autoscaler:

You can review the overall number of running app instances that Diego reports for each Diego Cell to measure how many app instances are running at any given time. For more information, see Running App Instances, Rate of Change in Key Performance Indicators.
You can use cf curl to retrieve information about Autoscaler service instances from the Cloud Foundry API (CAPI). To identify apps that are using Autoscaler through CAPI, see Retrieve Autoscaler service instances from CAPI endpoints.

Because each team of developers might use different autoscaling rules for their apps, it might be difficult to accurately verify how effectively Autoscaler scales each app. However, you can gain a generalized understanding of the effect that Autoscaler has on the apps it scales by comparing the number of running app instances with other scaling metrics that the developers of your apps might be using. For more information about various scaling metrics you can use in your comparison review, see [Key Performance Indicators](../../ operating/monitoring/kpi.html).

Retrieve Autoscaler service instances from CAPI endpoints

To identify apps that are using Autoscaler through CAPI:

In a terminal window, retrieve the global unique identifier (GUID) of the service plan for Autoscaler by running:
```
cf curl /v3/service_plans/:guid
```
In the output for this command, the value of guid is the GUID of the service plan for Autoscaler. For more information, see the CAPI documentation.
Record the value of guid you retrieved earlier in this procedure.
Retrieve information about apps that are using Autoscaler using one of the following methods:
- Retrieve a list of the names of Autoscaler service instances by running:
```
cf curl /v3/service_plans?service_offering_names=app-autoscaler | jq -r '.resources[] | .guid' SERVICE-PLAN-GUID
```
  Where SERVICE-PLAN-GUID is the GUID of the service plan for Autoscaler that you recorded earlier in this procedure. For more information, see the CAPI documentation.
  
  To run this command, you must have jq installed. To install jq, see the jq website.
- Retrieve a list of the GUIDs of Autoscaler service instances by running:
```
cf curl '/v3/service_instances?service_plan_guids=SERVICE-PLAN-GUID'
```
  Where SERVICE-PLAN-GUID is the GUID of the service plan for Autoscaler that you recorded earlier. For more information, see the CAPI documentation.

Failure modes

Rather than always maintaining enough app instances to handle the maximum amount of traffic that you expect for your apps, you can use Autoscaler to provision only the number of instances that your apps require to handle the amount of traffic that you expect at any given time. However, if Autoscaler or the TAS for VMs components that support Autoscaler fail, Autoscaler cannot scale the number of app instances up enough to meet the needs of your apps.

If you are not using Autoscaler, TAS for VMs component failures might have little impact on your apps. Because Autoscaler depends on TAS for VMs components such as Log Cache, the Cloud Controller, and MySQL to scale apps, using Autoscaler greatly increases the impact of TAS for VMs component failures on those apps.

This section describes some of the ways in which the TAS for VMs components that support Autoscaler might fail, with examples of the debug log metrics, autoscaling events, and errors in Autoscaler debug logs that might appear when these TAS for VMs components fail.

Log Cache

Autoscaler depends heavily on Log Cache, which provides the majority of metrics that Autoscaler uses to verify when to scale an app. If Log Cache fails or is insufficiently scaled, Autoscaler might not retrieve metrics. This means that it cannot not determine when to scale an app.

If you configured Autoscaler correctly, it logs the following metrics:

autoscale.logcache.read_errors: Errors that Autoscaler encounters when reading from Log Cache.
autoscale.logcache_metric_fetch_duration: The time Autoscaler takes to retrieve metrics from Log Cache.

These metrics appear as log entries and are not emitted to Loggregator.

To view these metrics, see Review Autoscaler debug logs.

For more information about Log Cache, see Logging and metrics architecture.

Log cache gateway is unavailable

When Autoscaler cannot connect to the Log Cache gateway, you see an autoscaling event similar to the following example:

Autoscaler did not receive any metrics for memory during the scaling window. Scaling down is deferred until these metrics are available.

You also see an error similar to the following example in the debug logs for Autoscaler:

[APP/PROC/WEB/2] OUT logcache walker: unexpected status code 502

To view autoscaling events for an app, see View Autoscaling events in Using the App Autoscaler CLI. To review Autoscaler debug logs, see Review Autoscaler debug logs.

Individual log cache instance fails

Log Cache holds the log and metric envelopes for an app on a single Log Cache instance. When that Log Cache instance fails, Autoscaler cannot retrieve those envelopes, so it cannot verify when to scale an app.

When the Log Cache instance that holds the envelopes for an app fails, you see an autoscaling event similar to the following example:

Autoscaler did not receive any metrics for memory during the scaling window. Scaling down is deferred until these metrics are available.

You see an error similar to the following example in the debug logs for Autoscaler:

[APP/PROC/WEB/1] OUT logcache walker: unexpected status code 503

To view autoscaling events for an app, see View Autoscaling events in Using the App Autoscaler CLI. To review Autoscaler debug logs, see Review Autoscaler debug logs.

Log cache instances are unresponsive

When Log Cache instances do not respond to requests from Autoscaler, you see an autoscaling event similar to the following example:

Autoscaler did not receive any metrics for memory during the scaling window. Scaling down is deferred until these metrics are available.

You also see an error similar to the following example in the debug logs for Autoscaler:

[APP/PROC/WEB/2] OUT logcache walker: Get "https://log-cache.sys.example.com/api/v1/read/85a6aff3-c739-4364-8df3-44f1aa50662b": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

To view autoscaling events for an app, see View event history in Scaling an app using App Autoscaler or View Autoscaling events in Using the App Autoscaler CLI. To review Autoscaler debug logs, see Review Autoscaler debug logs.

Cloud Controller

Autoscaler reads the app metadata that the Cloud Controller stores to discover how many instances an app has running. When Autoscaler scales the number of app instances up or down, it sends a request to the Cloud Controller. Then, the Cloud Controller directs the Diego Cell that runs the app to stage the number of app instances that Autoscaler has requested.

Autoscaler also uses the Cloud Controller to retrieve the credentials stored in RabbitMQ service bindings. Autoscaler uses these credentials to read the metadata for RabbitMQ queues.

The Cloud Controller also authorizes Log Cache to provide log and metric envelopes to Autoscaler.

If you have configured Autoscaler to record debug logs, Autoscaler logs the following metrics:

autoscale.cloud_controller.calls: The number of requests that Autoscaler has made to the Cloud Controller.
autoscale.cloud_controller.request_errors: The number of requests that Autoscaler has made to the Cloud Controller that resulted in an error.
autoscale.cloud_controller.scale_errors: The number of requests to scale an app that have resulted in an error.

These metrics appear as log entries and are not emitted to Loggregator.

To view these metrics, see Review Autoscaler debug logs.

For more information about the Cloud Controller, see Cloud Controller.

Quota has been reached

When scaling causes the application to exceed a quota, you see an autoscaling event similar to the following example:

Unable to scale. App instance quota has been reached

Cloud Controller API is stopped

When the Cloud Controller API is stopped, you see an autoscaling event similar to the following example:

Unable to scale due to Cloud Controller error.

To discover the cause of this error, see Autoscaler Error "Unable to scale due to Cloud Controller error.

You also see an error similar to the following example in the debug logs for Autoscaler:

[APP/PROC/WEB/0] OUT level=error msg="[Cloud Controller - app summary] CloudController - route not found"`

To view autoscaling events for an app, see View Autoscaling events in Using the App Autoscaler CLI. To review Autoscaler debug logs, see Review Autoscaler debug logs.

Cloud Controller is unresponsive

When the Cloud Controller does not respond to requests from Autoscaler or Log Cache, you see an autoscaling event similar to the following example:

Unable to scale due to Cloud Controller error.

To discover the cause of this error, see Autoscaler Error "Unable to scale due to Cloud Controller error.

You also see an error similar to the following example in the debug logs for Autoscaler:

[APP/PROC/WEB/0] OUT level=error msg="[Cloud Controller - app summary] failure connecting to Cloud Controller. Error: Get \"https://apisys.example.com/v2/apps/85a6aff3-c739-4364-8df3-44f1aa50662b/summary\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"
[APP/PROC/WEB/1] OUT level=error msg="[Cloud Controller - app summary] Cloud Controller- Failure"

To view autoscaling events for an app, see View Autoscaling events in Using the App Autoscaler CLI. To review Autoscaler debug logs, see Review Autoscaler debug logs.

MySQL

Autoscaler uses a MySQL database to store persistent information about Autoscaler service bindings and the autoscaling rules you configure. If Autoscaler fails to select from or update the MySQL database, Autoscaler cannot scale your apps.

You see an error similar to the following example in the debug logs for Autoscaler:

[APP/PROC/WEB/0] OUT level=error msg="Unable to find any enabled service bindings by instance id" error="driver: bad connection" instance_id=0
[APP/PROC/WEB/2] ERR [mysql] packets.go:37: unexpected EOF

To review Autoscaler debug logs, see Review Autoscaler debug logs.