Find answers to some frequently asked questions about Tanzu Service Mesh Autoscaler.

Does the Tanzu Service Mesh Service Autoscaler use Kubernetes Horizontal Pod Autoscaler (HPA) to help with autoscaling?

No, the Tanzu Service Mesh Service Autoscaler uses its own algorithm. This algorithm collects data from the Tanzu Service Mesh data model to affect the workload controllers. The algorithm is based on experience our site reliability engineers (SREs) have developed over the years offering many features, for example, multiple modes (efficiency or performance), scaling grace periods, and min/max guardrails. Additionally, with Tanzu Service Mesh, you can set up SLOs to track scaling behavior with various graphical charts available in the Tanzu Service Mesh user interface to help with various performance troubleshooting activities an SRE may have. See the workflow diagram in Appendix B.

What is the general workflow for how Tanzu Service Mesh Service Autoscaler does the autoscaling?

See the workflow diagram in Appendix B.

What metric should I use to determine what should be scaled up or scaled down?

Use CPU utilization or Memory utilization. These are good signals and direct measures of finite resources and their saturation. Other measures, such as latency and requests per second, are side effects that become more apparent as resources are further consumed.

Can I change the percent of instances affected during scaling up or down?

Currently, there is no way to adjust the proportion at which scaling occurs. The scaling proportion is determined by how many of the resources are being consumed compared to the scaleUp or scaleDown targets. You can influence the scaling algorithm by adjusting the scaleUp or scaleDown values.

Can I control the number of instances that are added when a scale-up occurs?

By default, the number of instances is scaled proportionately based on the current metrics for the service. If a specific number of instances to be added is preferred during a scale-up, set the stepsUp property in the autoscaling definition. Any time scaling up is required, it increases the number of replicas by the stepsUp amount (or fewer if instances.max has been reached). If stepsUp is set, you must also set stepsDown.

Can I control the number of instances that are removed when a scale-down occurs?

By default, the number of instances is scaled proportionately based on the current metrics for the service. If a specific number of instances to be removed is preferred during scale down, set the stepsDown property in the autoscaling definition. Any time scaling down is required, it decreases the number of replicas by the stepsDown amount (or fewer if instances.min has been reached). If stepsDown is set, you must also set stepsUp.

How do I get scaling with latencies or requests per time to work?

Although it is recommended to use CPU or memory usage on which to base scaling, you can make latencies or requests work. Because these metrics can change drastically moment to moment, and an accurate proportion with which to scale is difficult to calculate, using stepsUp and stepsDown might work better. Also, make sure that the scaleUp and scaleDown threshold values are sufficiently far apart to distinguish between what requires scaling up and scaling down.

Why won’t the number of pods scale any more (or less) than X number of instances?

Two required settings in the ASD are the service instance min and max properties. The number of service instances (pods) cannot scale any more than the service instance max and not any less than scale min. Values set for spec.scaleRule.instances.min and spec.scaleRule.instances.max might be limiting the scaling.

What is the difference between PERFORMANCE and EFFICIENCY modes?

Performance mode scales up when more resources (set by the scaleUp threshold for a specific metric) are needed, but it does not scale down if there is an excess of resources. Efficiency mode does what performance mode does and also scales down when there is an excess of resources (determined by the scaleDown threshold for the metric).

I don’t want scaling to happen, but I want to see what it would do if it was enabled. How do I do a dry run or trial run so that I can see what scaling would do but not actually do the scaling?

There is a field in the Autoscaling Configuration, spec.scaleRule.enabled, that you can set to false. Once this field is set to false, you can display the result of scaling calculations with kubectl get asd -n {namespace} without applying it.

To see the result of autoscaling without activating an autoscaling policy, you can also enable simulation mode in the Tanzu Service Mesh Console UI. For more information, see Use Case 6: Tanzu Service Mesh Autoscaler Simulation Mode.

Can we trigger scaling by a combination of metrics, for example, latency and memory usage?

Currently, that feature is not available.

Can we trigger scaling on custom metrics, for example, queue depth or database connections?

Currently, that feature is not available.

What do we do if we don’t know the service instance number max?

You can use Tanzu Service Mesh Service Autoscaler in a test environment to experiment with different max values and see how the system responds. Integrate Tanzu Service Mesh Service Autoscaler with automated performance testing to get started. Determining the service instance max requires knowing the capacity of the cluster the application is in, what other applications are running on the cluster, dependencies both inside and outside the cluster, and a traffic/usage profile expected of the application. Do some benchmarking of the apps on the cluster. Put a load of various requests per second and note the amount of CPU and memory used at each load. Repeat with other apps on the cluster and ensure that if every app reaches maximum service instance numbers, the system can function smoothly. Keep in mind that dependencies, both internal and external, might be bottlenecks, and having a high max number of service instances downstream will only lead to wasted service instances. Ensure that the service instances can meet expected traffic or demand during peak usage periods. Consider having extra buffer room for more service instances but be aware that doing this can prevent other services from being provisioned. Be sure to coordinate with dependencies when determining the max service instances.

What is a good minimum for the number of service instances?

The CRD enforces a minimum of 1. A setting of 0 means the service is turned off. To turn off the service, turn off the autoscaler first (set spec.scaleRule.enabled to false in the Definition custom resource), then manually scale to zero by using kubectl scale command to 0 replicas. If service availability is a concern, at least 2 service instances are recommended.

What is the point of grace period?

Grace period only applies before a scale-down event, which happens when Tanzu Service Mesh Service Autoscaler is in efficiency mode. Having a grace period is optional (you can deactivate it by setting it to 0), but it is recommended to have. In general, enable this feature to ensure smooth operations by waiting for scaling to take effect. If nothing is set, the default of 300 seconds is used.

What will happen if I change the number of replicas manually while Tanzu Service Mesh Service Autoscaler is running?

Bypassing Tanzu Service Mesh Service Autoscaler to set service instance counts can lead to incorrect behavior, for example, reflecting wrong status information, incorrect counts, and unexpected scaling actions. The attempts of the autoscaler to enforce limits and/or the mode after a manual scaling action will likely result in unintended consequences. It is strongly advised to not manually set service instance replica counts when Tanzu Service Mesh Service Autoscaler is enabled. Set ENABLED to false in the ASD custom resource if temporary changes by an operator or manual setting of replica count is needed.

Do you need additional plugins, like Kubernetes metrics server, for metrics to enable Tanzu Service Mesh Service Autoscaler?

No other metrics plugins or code changes are needed to enable Tanzu Service Mesh Service Autoscaler. The out-of-the-box Tanzu Service Mesh Service Autoscaler in Tanzu Service Mesh can see the status and resource usage of each service instance, such as requests per time period, memory usage, CPU usage, and latencies.