Using App Autoscaler in production

When using Autoscaler to scale an app in a production environment, you must consider your goals for the app.

You might want your app to be capable of processing a certain number of business transactions per second. Each of those transactions might be comprised of multiple HTTP requests to the app itself and to services and other apps in your TAS for VMs deployment. With this number of HTTP requests, you can determine how many instances of each app are necessary to process your wanted number of business transactions per second. You can use this information to create effective autoscaling rules with Autoscaler.

Autoscaling limits

The autoscaling limits for an app are the upper and lower limits for the number of instances Autoscaler can create for that app. The lower scaling limit is the minimum number of instances to which Autoscaler can scale an app down, and the upper scaling limit is the maximum number of instances to which Autoscaler can scale an app up. Setting appropriate upper and lower scaling limits ensures that Autoscaler does not create too few or too many app instances.

Apps often have dependencies on an external app or service that can constrain Autoscaler’s ability to scale app instances effectively. Because of this, scaling instances up past a certain number might not address problems you encounter until you address issues within those dependencies. For example, if all of your app instances use a single shared MySQL service instance, scaling the number of instances up might cause them to reach the maximum number of connections allowed by MySQL. You must address this constraint before you can scale your app further.

VMware recommends load-testing your app to confirm that you have set appropriate scaling limits before pushing the app to a production environment. For more information, see Load-testing your app.

Configure autoscaling limits

To configure the upper and lower scaling limits for an app:

In a terminal window, run:
```
cf update-autoscaling-limits APP-NAME LOWER-SCALING-LIMIT UPPER-SCALING-LIMIT
```
Where:
- APP-NAME is the name of the app for which you want to configure autoscaling limits.
- LOWER-SCALING-LIMIT is the minimum number of instances you want Autoscaler to create for the app.
- UPPER-SCALING-LIMIT is the maximum number of instances you want Autoscaler to create for the app.
To set the lower scaling limit for example-app to 20 and the upper scaling limit to 100, run:
```
cf update-autoscaling-limits example-app 20 100
```

Resource quotas

Each space and org in a TAS for VMs deployment has a quota plan that defines memory, service, and instance usage quotas for all apps and services deployed in them. For example, a quota plan for a space can set up to 10 services, 10 routes, and 2 GB of RAM, while a quota plan for an org might allow up to 100 services, 100 routes,and 10 GB of RAM.

When setting autoscaling limits for Autoscaler, you must ensure that the resource quotas allocated to the app are sufficient for Autoscaler to scale the app within those limits. If the resource quotas that are available for the app to use are lower than the resources required to meet the upper scaling limit you set, Autoscaler cannot scale up to that upper scaling limit.

You can create quota plans for spaces and orgs. If you do not have permissions to create quota plans or you need an existing resource quota limit increased, contact the operator for your TAS for VMs deployment.

When defining resource quotas for a space or org, you must consider not only the scaling needs of an individual app, but the potential scaling needs of other apps in the same space or org.

For more information about creating and editing quota plans for spaces and orgs, see Creating and modifying quota plans.

Reviewing space and org quota plans

For Autoscaler to scale the number of app instances up to the upper scaling limit you configure, the quota plans for both the space and org must have enough of the following resources available for the app to use:

App Instances: The number of instances alloted per app.
Total Memory: The total amount of memory alloted for all app instances in the space or org.
Instance Memory: The amount of memory alloted per app instance.

If your app is configured with a memory limit of 1 G per app instance, and the upper scaling limit you configured is 20 app instances, 20 G of memory must be available in your space or org quota plan. Otherwise, Autoscaler cannot scale the app up to the upper scaling limit.

When an app reaches a resource quota limit for the space or org in which it is deployed, Autoscaler does not record an autoscaling event. You can review the quota plans for the space or org to find out if an app has reached a resource quota limits.

For more information about reviewing space and org quota plans, see Space or org quota is reached in Troubleshooting App Autoscaler.

Setting per-app limits

You can adjust the resource limits for individual apps. If an app is allocated significantly more RAM than it requires per app instance, Autoscaler can create more app instances using the same number of resources.

Important In a TAS for VMs deployment, the amount of memory allocated to an app instance determines the number of CPU shares available to it. Reducing the amount of memory allocated to an app instance also reduces its CPU shares.

To set resource limits for individual apps, you can deploy your apps with app manifests or vertically scale them using the Cloud Foundry Command-Line Interface (cf CLI). For more information about app manifests, see App manifest attribute reference. For more information about vertically scaling an app, see Scaling vertically in Scaling an App Using cf scale.

Scaling metrics

You can define rules for Autoscaler to scale your app instances up or down using a particular scaling metric.

Initially, VMware recommends that you define a single autoscaling rule using a scaling metric such as HTTP Request Latency or RabbitMQ Queue Depth. At a later time, it might be appropriate to define more than one autoscaling rule for the same app.

These are two of the default scaling metrics you can select when you define an autoscaling rule for an app:

HTTP Request Latency: Use this scaling metric if your app receives HTTP requests directly from the Gorouter, and you want to scale the number of app instances up as the average latency of the requests increases. Before choosing this metric as your scaling metric, consider whether scaling your app might increase HTTP Request Latency. If additional app instances must use the same external resource, such as a MySQL service instance or another app, this might increase HTTP Request Latency. If Autoscaler scales the number of app instances up, and HTTP Request Latency is still over the maximum metric threshold you configured because all app instances rely on the same external resource, Autoscaler continues to create more app instances to no effect.
RabbitMQ Queue Depth: Use this scaling metric if your app reads messages from a RabbitMQ queue, and you want to scale the number of app instances up to read messages from the RabbitMQ queue more quickly.

You can create custom autoscaling rules using other metrics.

If your app deployment is comprised of multiple TAS for VMs apps, consider using the same scaling metric for each app. Using the same scaling metric across all of the apps that are part of your app deployment can help you to understand the scaling behavior and needs of your app deployment.

Note If the autoscaling metric thresholds you configured are too low, scaling the number of app instances up might not be enough to bring the scaling metric back below the maximum metric threshold.

To define autoscaling rules for Autoscaler, see:

Add or delete scaling rules in About App Autoscaler
Create an autoscaling rule in Using the App Autoscaler CLI.

For more information about the scaling metrics you can use when defining autoscaling rules for Autoscaler, see:

Default metrics for scaling rules
Custom App Metrics for scaling rules in About App Autoscaler.

Autoscaler and C2C networking

Autoscaler can only use HTTP Request Latency and HTTP Request Throughput as a scaling metric for apps that receive requests directly through the Gorouter. Autoscaler does not support using these two metrics for apps that receive requests from other apps through container-to-container (C2C) networking or TCP routers.

If your app relies on back-end HTTP services that apps in your TAS for VMs deployment must access through C2C networking, the Gorouter does not generate HTTP events for those requests. As a result, Autoscaler cannot scale those HTTP services. For Autoscaler to scale them, you must either use a different default scaling metric or create a custom scaling metric for them.

Scaling factors

You can define scale-up and scale-down factors to control how quickly Autoscaler scales app instances in each direction. For example, when you configure an app with a scale-up factor of 20 and a scale-down factor of 10, Autoscaler scales the number of app instances up by 20 instances at a time and down by 10 instances at a time.

When you define a larger scale-up factor for an app, Autoscaler scales the number of app instances up more quickly in response to the scaling metric you defined reaching its maximum metric threshold. Because of this, using a particular scaling metric for the app might cause or exacerbate performance issues for the app in some circumstances. If networking issues or an unresponsive downstream service causes greater HTTP Request Latency, configuring Autoscaler to scale app instances up by a larger number might cause HTTP Request Latency to worsen even more.

In many production deployments, the scale-down factor for an app is lower than the scale-up factor. This helps an app to scale up, then return to an earlier baseline number of instances more slowly by stepping down in smaller increments. You might want to configure a lower scale-down factor if you do not want Autoscaler to scale the number of app instances down too quickly, and you are less concerned about freeing up resources.

Configure scaling factors

To configure scaling factors for an app:

In a terminal window, run:
```
cf update-autoscaling-factors APP-NAME SCALE-UP-FACTOR SCALE-DOWN-FACTOR
```
Where:
- APP-NAME is the name of the app for which you want to configure scaling factors.
- SCALE-UP-FACTOR is the number of app instances by which you want Autoscaler to scale up at a time.
- SCALE-DOWN-FACTOR is the number of app instances by which you want Autoscaler to scale down at a time.
To configure Autoscaler to scale example-app up by 20 instances at a time and down by 10 instances at a time, run:
```
cf update-autoscaling-factors example-app 20 10
```

Load-testing your app

To ensure that you have configured effective autoscaling for an app, VMware recommends that you load-test the app. Load testing demonstrates that Autoscaler scales the number of app instances as expected when the scaling metric you define for the app increases or decreases.

To load test an app effectively, you must do so in an environment that is as similar to your production deployment as possible.

Before you begin load-testing an app, you can plan your load test with the operator or team that operates your TAS for VMs deployment. This ensures that they are accepting of and can plan for the load you intend to generate. Additionally, VMware recommends working with the operator or team that operates your TAS for VMs deployment during your load test, because load testing can sometimes reveal performance issues within your TAS for VMs deployment.

The next sections provide more information about load testing to confirm the efficacy of Autoscaler for an app.

Generating load and monitoring scaling events

There are a variety of tools you can use for load-testing. The method that you use for generating load varies based on both the load-testing tool you use and the scaling metric that you define for the app, because you must generate load that causes the scaling metric to fluctuate. For example, you might generate HTTP requests for apps that use HTTP Request Latency as their scaling metric, or you might generate RabbitMQ messages for apps that use RabbitMQ queue depth as their scaling metric.

VMware recommends incrementally increasing the load you generate for an app. With each increase, you can see whether Autoscaler is scaling the number of app instances as expected by monitoring the autoscaling events that Autoscaler records.

To monitor the autoscaling events that Autoscaler records as it scales an app:

In a terminal window, run:
```
cf autoscaling-events
```

Confirming metric values from Autoscaler

For some scaling metrics, you can verify whether Autoscaler is using the correct metric values by comparing the load that your load-testing tool generates to the metric values that Autoscaler records. If these two values are not the same, such as if Autoscaler reports a lower number of requests per second than your load-testing tool generated, this might indicate that you must further scale TAS for VMs components on which Autoscaler relies.

For more information about TAS for VMs components you can scale to better support Autoscaler, see Operating Autoscaler.

Confirming scaling efficacy

To test each autoscaling step, verify that:

Autoscaler records when a scaling metric has crossed the maximum metric threshold
Verify that Autoscaler scaled the number of app instances up
Creating more app instances brings the scaling metric back to the maximum metric threshold

If Autoscaler continues to scale app instances up, but the increased number of app instances does not bring the scaling metric back to below the maximum metric threshold, another part of your TAS for VMs deployment might be experiencing issues that are causing the value of the scaling metric to remain above the maximum metric threshold. You can identify and troubleshoot the issue by monitoring the app or working with the operator or team that operates your TAS for VMs deployment.