Using App Autoscaler in production

How do you use App Autoscaler for scaling your apps in a production environment? This article tells you what you must know for your TAS for VMs deployment.

Overview of autoscaling in a production environment

When using Autoscaler to scale an app in a production environment, you should consider your goals for the app.

For example, you may want your app to be capable of processing a certain number of business transactions per second. Each of those transactions may be comprised of multiple HTTP requests to the app itself, as well as to services and other apps in your TAS for VMs deployment. With this number of HTTP requests, you can determine how many instances of each app are necessary to allow your app to process your desired number of business transactions per second. You can then use this information to create effective autoscaling rules with Autoscaler.

For information about how to create effective autoscaling rules with Autoscaler, see the sections below:

Autoscaling Limits
Resource Quotas
Scaling Metrics
Scaling Factors
Load-Testing Your App

Autoscaling limits

The autoscaling limits for an app are the upper and lower limits for the number of instances Autoscaler can create for that app. The lower scaling limit is the minimum number of instances to which Autoscaler can scale an app down, and the upper scaling limit is the maximum number of instances to which Autoscaler can scale an app up. Setting appropriate upper and lower scaling limits ensures that Autoscaler does not create too few or too many app instances.

Apps often have dependencies on an external app or service that can constrain Autoscaler’s ability to scale app instances effectively. Because of this, scaling instances up past a certain number may not address problems you encounter until you address issues within those dependencies. For example, if all of your app instances use a single shared MySQL service instance, scaling the number of instances up might cause them to reach the maximum number of connections allowed by MySQL. You must address this constraint before you can scale your app further.

VMware recommends load-testing your app to confirm that you have set appropriate scaling limits before pushing the app to a production environment. For more information, see Load-Testing Your App below.

Configure autoscaling limits

To configure the upper and lower scaling limits for an app:

In a terminal window, run:
```
cf update-autoscaling-limits APP-NAME LOWER-SCALING-LIMIT UPPER-SCALING-LIMIT
```
Where:
- APP-NAME is the name of the app for which you want to configure autoscaling limits.
- LOWER-SCALING-LIMIT is the minimum number of instances you want Autoscaler to create for the app.
- UPPER-SCALING-LIMIT is the maximum number of instances you want Autoscaler to create for the app.
For example, running the following command sets the lower scaling limit for example-app to 20 and the upper scaling limit to 100:
```
cf update-autoscaling-limits example-app 20 100
```

Resource quotas

Each space and org in a TAS for VMs deployment has a quota plan that defines memory, service, and instance usage quotas for all apps and services deployed in them. For example, a quota plan for a space might allow up to 10 services, 10 routes, and 2 GB of RAM, while a quota plan for an org might allow up to 100 services, 100 routes, and 10 GB of RAM.

When setting autoscaling limits for Autoscaler, you must ensure that the resource quotas allocated to the app are sufficient to allow Autoscaler to scale the app within those limits. If the resource quotas that are available for the app to use are lower than the resources required to meet the upper scaling limit you set, Autoscaler cannot scale up to that upper scaling limit.

You can create quota plans for spaces and orgs. If you do not have permissions to create quota plans or you need an existing resource quota limit increased, contact the operator for your TAS for VMs deployment.

When defining resource quotas for a space or org, you must consider not only the scaling needs of an individual app, but the potential scaling needs of other apps in the same space or org.

For more information about creating and modifying quota plans for spaces and orgs, see Creating and Modifying Quota Plans.

Reviewing space and org quota plans

In order for Autoscaler to scale the number of app instances up to the upper scaling limit you configure, the quota plans for both the space and org must have enough of the following resources available for the app to use:

App Instances: The number of instances allowed per app.
Total Memory: The total amount of memory allowed for all app instances in the space or org.
Instance Memory: The amount of memory allowed per app instance.

For example, if your app is configured with a memory limit of 1 G per app instance, and the upper scaling limit you configured is 20 app instances, 20 G of memory must be available in your space or org quota plan. Otherwise, Autoscaler cannot scale the app up to the upper scaling limit.

When an app reaches a resource quota limit for the space or org in which it is deployed, Autoscaler does not record an autoscaling event. You can review the quota plans for the space or org to determine if an app has reached a resource quota limits.

For more information about reviewing space and org quota plans, see Space or Org Quota is Reached in Troubleshooting App Autoscaler.

Setting per-app limits

You can adjust the resource limits for individual apps. For example, if an app is allocated significantly more RAM than it needs, you can reduce the maximum amount of RAM allowed per app instance. This allows Autoscaler to create more app instances using the same number of resources.

Note In a TAS for VMs deployment, the amount of memory allocated to an app instance determines the number of CPU shares available to it. Reducing the amount of memory allocated to an app instance also reduces its CPU shares.

To set resource limits for individual apps, you can deploy your apps with app manifests or vertically scale them using the Cloud Foundry Command-Line Interface (cf CLI). For more information about app manifests, see App Manifest Attribute Reference. For more information about vertically scaling an app, see Scaling Vertically in Scaling an App Using cf scale.

Scaling metrics

You can define rules for Autoscaler to scale your app instances up or down using a particular scaling metric.

Initially, VMware recommends that you define a single autoscaling rule using a scaling metric such as HTTP Latency or RabbitMQ Queue Depth. At a later time, it may be appropriate to define more than one autoscaling rule for the same app.

The following list describes two of the default scaling metrics you can select when you define an autoscaling rule for an app:

HTTP Latency: Use this scaling metric if your app receives HTTP requests directly from the Gorouter, and you want to scale the number of app instances up as the average latency of the requests increases. Before choosing this metric as your scaling metric, consider whether scaling your app might increase HTTP latency. For example, if additional app instances must use the same external resource, such as a MySQL service instance or another app, this may increase HTTP latency. If Autoscaler scales the number of app instances up, and HTTP latency is still above the maximum metric threshold you configured because all app instances rely on the same external resource, Autoscaler continues to create more app instances to no effect.
RabbitMQ Queue Depth: Use this scaling metric if your app reads messages from a RabbitMQ queue, and you want to scale the number of app instances up to read messages from the RabbitMQ queue more quickly.

You can also create custom autoscaling rules using other metrics.

If your app deployment is comprised of multiple TAS for VMs apps, consider using the same scaling metric for each app. Using the same scaling metric across all of the apps that are part of your app deployment may allow you to more easily understand the scaling behavior and needs of your app deployment.

Note If the autoscaling metric thresholds you configured are too low, scaling the number of app instances up may not be enough to bring the scaling metric back below the maximum metric threshold.

To define autoscaling rules for Autoscaler, see Add or Delete Scaling Rules in About App Autoscaler and Create an Autoscaling Rule in Using the App Autoscaler CLI. For more information about the scaling metrics you can use when defining autoscaling rules for Autoscaler, see Default Metrics for Scaling Rules and Custom Metrics for Scaling Rules in About App Autoscaler.

Autoscaler and C2C networking

Autoscaler can only use HTTP Latency and HTTP Throughput as a scaling metric for apps that receive requests directly through the Gorouter. Autoscaler does not currently support using these two metrics for apps that receive requests from other apps through container-to-container (C2C) networking or TCP routers.

If your app relies on back-end HTTP services that apps in your TAS for VMs deployment must access through C2C networking, the Gorouter does not generate HTTP events for those requests. As a result, Autoscaler cannot scale those HTTP services. In order for Autoscaler to scale them, you must either use a different default scaling metric or create a custom scaling metric for them.

Scaling factors

You can define scale-up and scale-down factors to control how quickly Autoscaler scales app instances in each direction. For example, when you configure an app with a scale-up factor of 20 and a scale-down factor of 10, Autoscaler scales the number of app instances up by 20 instances at a time and down by 10 instances at a time.

When you define a larger scale-up factor for an app, Autoscaler scales the number of app instances up more quickly in response to the scaling metric you defined reaching its maximum metric threshold. Because of this, using a particular scaling metric for the app may cause or exacerbate performance issues for the app in some circumstances. For example, if networking issues or an unresponsive downstream service causes greater HTTP latency, configuring Autoscaler to scale app instances up by a larger number may cause HTTP latency to worsen even more.

In many production deployments, the scale-down factor for an app is lower than the scale-up factor. This allows an app to scale up faster, then return to an earlier baseline number of instances more slowly by stepping down in smaller increments. You may want to configure a lower scale-down factor if you do not want Autoscaler to scale the number of app instances down too quickly, and you are less concerned about freeing up resources.

Configure scaling factors

To configure scaling factors for an app:

In a terminal window, run:
```
cf update-autoscaling-factors APP-NAME SCALE-UP-FACTOR SCALE-DOWN-FACTOR
```
Where:
- APP-NAME is the name of the app for which you want to configure scaling factors.
- SCALE-UP-FACTOR is the number of app instances by which you want Autoscaler to scale up at a time.
- SCALE-DOWN-FACTOR is the number of app instances by which you want Autoscaler to scale down at a time.
For example, running the following command configures Autoscaler to scale example-app up by 20 instances at a time and down by 10 instances at a time:
```
cf update-autoscaling-factors example-app 20 10
```

Load-testing your app

To ensure that you have configured effective autoscaling for an app, VMware recommends that you load-test the app. Load-testing demonstrates that Autoscaler scales the number of app instances as expected when the scaling metric you define for the app increases or decreases.

To load-test an app effectively, you should do so in an environment that is as similar to your production deployment as possible.

Before you begin load-testing an app, you should plan your load test with the operator or team that operates your TAS for VMs deployment. This ensures that they are accepting of and can plan for the load you intend to generate. Additionally, VMware recommends working with the operator or team that operates your TAS for VMs deployment during your load test, because load-testing can sometimes reveal performance issues within your TAS for VMs deployment.

For more information about load-testing to confirm the efficacy of Autoscaler for an app, see the sections below:

Generating Load and Monitoring Scaling Events
Confirming Metric Values from Autoscaler
Confirming Scaling Efficacy

Generating load and monitoring scaling events

There are a variety of tools you can use for load-testing. The method that you use for generating load varies based on both the load-testing tool you use and the scaling metric that you define for the app, because you must generate load that causes the scaling metric to fluctuate. For example, you might generate HTTP requests for apps that use HTTP latency as their scaling metric, or you might generate RabbitMQ messages for apps that use RabbitMQ queue depth as their scaling metric.

VMware recommends incrementally increasing the load you generate for an app. With each increase, you can see whether Autoscaler is scaling the number of app instances as expected by monitoring the autoscaling events that Autoscaler records.

To monitor the autoscaling events that Autoscaler records as it scales an app:

In a terminal window, run:
```
cf autoscaling-events
```

Confirming metric values from Autoscaler

For some scaling metrics, you can determine whether Autoscaler is using the correct metric values by comparing the load that your load-testing tool generates to the metric values that Autoscaler records. If these two values are not the same, such as if Autoscaler reports a lower number of requests per second than your load-testing tool generated, this may indicate that you need to further scale TAS for VMs components on which Autoscaler relies.

For more information about TAS for VMs components you can scale to better support Autoscaler, see Operating Autoscaler.

Confirming scaling efficacy

For each autoscaling step that you test, you should confirm the following:

Autoscaler records when a scaling metric has crossed the maximum metric threshold
Autoscaler successfully scales the number of app instances up
Creating more app instances brings the scaling metric back to below the maximum metric threshold

If Autoscaler continues to scale app instances up, but the increased number of app instances does not bring the scaling metric back to below the maximum metric threshold, another part of your TAS for VMs deployment may be experiencing issues that are causing the value of the scaling metric to remain above the maximum metric threshold. You might be able to identify and troubleshoot the issue by monitoring the app or working with the operator or team that operates your TAS for VMs deployment.