The two key benefits of rightsizing are infrastructure optimization and cost reduction. During a rightsizing analysis of their infrastructure, organizations discover assets that can be downsized or terminated to save money or upgraded to improve performance.
An asset is underutilized and a strong candidate for downsizing if it exhibits low utilization (less than 20%, for example) for core performance metrics. In this case, the best practice is to downgrade the asset to a smaller footprint.
Your cloud infrastructure will likely contain assets that are running but not being used. These assets are called zombies, and they are good candidates for termination. Zombies result when someone forgets to turn the assets off after use or when the asset fails due to script errors. Regardless of the cause, cloud providers continue to charge for these unused assets because they are in a running state. You can reduce costs by proactively identifying and terminating these assets.
By downgrading underutilized assets and terminating unused ones, you can optimize for performance as well as reduce costs. Upgrading assets, on the other hand, results in increased spend. However, by upgrading you can ensure that your assets are able to meet surges in demand.
The platform retrieves the following metrics for Data Center machines to determine how system resources are being utilized.
Metrics are retrieved via the Tanzu CloudHealth Agent for non-vSphere accounts and via the VMware Aggregator for vSphere accounts.
The threshold percentages you specify in an Instance Rightsizing Policy are only used to score each resource. Tanzu CloudHealth rightsizing recommendations do not consider your threshold percentages; instead, they consider the resource’s metrics.
When you configure the Instance Rightsizing Policy, you must specify for each metric a threshold measure of Maximum, Minimum, or Average utilization for both the Severely underutilized when and Moderately underutilized when sections.
The Tanzu CloudHealth Rightsizing report uses the threshold measure specified in the Severely Underutilized when section for each metric to determine whether to use the metric’s Maximum, Minimum, or Average data when calculating a rightsizing recommendation.
By default, the Instance Rightsizing Policy uses Average metrics for the Severely underutilized when threshold for CPU, Memory, and Disk.
The Machine Rightsizing report collects metrics for the current month. Consequently, the report cannot make recommendations for the first two days of each calendar month due to insufficient metrics data.
Tanzu CloudHealth calculates the ideal configuration needed to fulfill the CPU, Memory, and Disk requirements of the source machine. A custom candidate is then built for the calculated configuration.
Machine metrics scale as follows:
CPU and Memory follow scale, except when the source machine’s current metric already meets the machine’s needs. For example, if a source machine currently has 6 CPU cores and requires 5 CPU cores, then the custom candidate has 6 CPU cores. However, if a source machine currently has 9 CPU cores and requires 5 cores, then the custom candidate follows scale and has 8 CPU cores.
Disk follows scale, regardless of the source machine’s current metric. If a source machine currently has 1.6 GB of Disk storage and requires 1.4 GB, the custom candidate follows scale and has 2 GB of Disk storage.
The recommendation is blank if there is less than two days’ worth of metric data available.
The recommendation is No data if the source machine is not running or not active.
The recommendation result is No recommended change if the calculated ideal configuration matches the existing configuration of the source machine.
The rightsized recommendation is the ideal machine configuration that most closely matches the source machine’s metrics requirements.
For example, consider that the source machine has this configuration: 2 Cores, 16 GB Memory, 50 GB Disk.
If only CPU of this machine is underutilized, Tanzu CloudHealth recommends that you downgrade to a machine with 1 Core, 16 GB Memory, and 50 GB Disk.
If both CPU and Memory are underutilized, Tanzu CloudHealth recommends that you downgrade to a machine with 1 Core, and depending on memory usage, to 8 GB Memory.
The recommendation result is Terminate instance if all these conditions are true for the candidate machine:
The Machine Rightsizing report provides the following insights:
In order to understand machine performance and utilization on both granular and macro levels, Tanzu CloudHealth ingests data on CPU, Memory, and Disk. These metrics are gathered through the Tanzu CloudHealth Agent and VMware Aggregator.
Tanzu CloudHealth gathers CPU, Memory, and Disk metrics for each machine in your infrastructure. Each metric is then assigned a numeric value called a Score. The individual metric scores are used to compute a Total Score for each instance.
While the individual metric scores indicate the utilization of each metric, the Total Score represents how well each machine is being utilized.
Tanzu CloudHealth bases metric scores and total scores on utilization thresholds that you specify in the Instance Rightsizing Policy.
Use the Severely underutilized when and Moderately underutilized when categories to specify the thresholds that reflect your internal business standards for a metric. When the utilization for a metric lies within a specific range, a numeric score is assigned to the metric.
There are three threshold categories:
Category | Score Range for Metric |
---|---|
Severely underutilized | 0 to 33 |
Moderately underutilized | 34 to 67 |
Well utilized | 68 to 100 |
The thresholds you specify are only used to score each resource. Tanzu CloudHealth rightsizing recommendations, however, do not consider your thresholds; instead, they consider the source machine’s metrics.
Let’s consider that you define the threshold for Severely underutilized as <40%
. If the average usage for a metric is measured as 20%
, the usage is halfway through the threshold range (0%
–40%
).
Consequently, the corresponding score for the usage is halfway through the score range for the Severely underutilized category and is calculated as 50% of 33
, which after rounding is 17
.
In addition to specifying the thresholds for each category, you can also assign a weight to each metric, including lowering a weight to 0
. Tanzu CloudHealth uses the weights you assign to calculate the Total Score for each instance as follows:
Total Score = f (CPU, Memory, Disk)
Here, f is the weighted average
Weighted Score = (Weight/Sum of weights) * Score
If you assign CPU a weight of 2
, and assign Memory and Disk each a weight of 1
, the weighted score for each metric is calculated as follows:
Weighted CPU Score = (2/4) * Score
Weighted Memory Score = (1/4) * Score
Weighted Disk Score = (1/4) * Score
Let’s consider these threshold specifications for the CPU utilization of a machine. Based on the settings in this policy, the usage thresholds are defined as follows:
Category | CPU Usage | Score |
---|---|---|
Severely underutilized | <20% | 0 to 33 |
Moderately underutilized | 20%–49% | 34 to 67 |
Well utilized | >=50% | 68 to 100 |
Here is how the score for CPU usage changes.
CPU Usage | Score |
---|---|
50% | 68 |
75% | 68 + [(100 - 68)/2] = 85 |
Find the Virtual Machine Rightsizing Report at Recommendations > Rightsizing > Machine Rightsizing.
This report highlights underutilized machines.
Scores for individual metrics and the Total Score are visually represented as battery meters with colored zones. Lower scores are indicated by fewer bars colored red through orange. Larger scores are indicated by more bars colored orange through green.
Hover over the battery meter for a metric to get more details. Individual metric scores are calculated using average performance over the current month or previous month. The hover indicates the minimum and maximum performance measured during that period.
Click the battery meter for a metric for deeper trend analysis.
Efficiency is not one of the default columns in this report. To display it, click Edit Columns and add it.
The Efficiency of a machine ranges from 0 to 100, and it is based on the Total Score for that machine and the cost of running it.
For example, a severely underutilized machine that is very expensive has a very low Efficiency, while a severely underutilized machine that is very inexpensive has a higher Efficiency.
Use Efficiency as a way to prioritize which machines to rightsize for maximizing savings.
Evaluate a recommendation against internal business knowledge of your infrastructure.
A recommendation is the course of action Tanzu CloudHealth computes after analyzing machine utilization.
Projected Total Cost is calculated as follows:
(machine Usage Hours over analysis period) x (Hourly machine price you provide in the Tanzu CloudHealth platform)