The two key benefits of rightsizing are infrastructure optimization and cost reduction. During a rightsizing analysis of their infrastructure, organizations discover assets that can be downsized or terminated to save money or upgraded to improve performance.
An asset is underutilized and a strong candidate for downsizing if it exhibits low utilization (less than 20%, for example) for core performance metrics. In this case, the best practice is to downgrade the asset to a smaller footprint.
For example, in AWS, if you are running a workload on an r3.2xlarge
but determine via rightsizing analysis that you could downgrade the instance type to an r3.xlarge
instance without negatively impacting the workload, you can cut your operating costs by 50%.
Your cloud infrastructure will likely contain assets that are running but not being used. These assets are called zombies, and they are good candidates for termination. Zombies result when someone forgets to turn the assets off after use or when the asset fails due to script errors. Regardless of the cause, cloud providers continue to charge for these unused assets because they are in a running state. You can reduce costs by proactively identifying and terminating these assets.
The platform retrieves the following metrics for instances to determine how system resources are being utilized:
Source instances must be active for at least 48 hours in the Platform to begin generating rightsizing recommendations.
In order to make a rightsizing recommendation, Tanzu CloudHealth considers the thresholds for Maximum or Average utilization that you specify in the Severely Underutilized when section of a Rightsizing Policy.
A list of potential Instance Type candidates is built based on the following criteria:
For each candidate in the list, the platform performs the following actions:
If Memory or Disk metrics for the source instance are unavailable, Tanzu CloudHealth ensures that the target instance has at least as much or more memory and disk capacity as the source instance.
Tanzu CloudHealth provides cross-family recommendations for AWS instances. Downgrade recommendations help you identify the smallest-sized instance, and consequently the least costly candidate instance, that is capable of handling the usage for the source instance. Recommendations for downgrades can span across different families. For example, if you have an m4.xlarge
instance with very low CPU and Memory utilization, Tanzu CloudHealth may recommend that you downgrade to the lower cost c4.large
instead of downgrading to an m4.large
, which is more expensive than the c4.large
and has memory capacity that the instance may not need.
Consider the following example: You have an m5.xlarge
instance, which has a capacity of 4.0 CPU and 16,000 MB of memory. You have configured your Rightsizing policy to measure the CPU threshold for average utilization and the memory threshold for maximum utilization. Your instance uses an average of 1.45 CPU and a maximum of 15,776 MB of memory. The memory capacity is correct for your instance’s memory usage, but the m5.xlarge
provides more than twice the CPU that the instance requires. Therefore, the Rightsizing report searches for the least expensive instance type that has the same memory as the m5.xlarge
but half the CPU. The Rightsizing report selects the r5.large
as the least expensive option that matches those requirements.
Instance | Average CPU | Max Memory |
---|---|---|
Available with m5.xlarge |
4.0 | 16,000 |
Used by instance | 1.45 | 15,776 |
Rightsized recommendation for r5.large |
2.0 | 16,000 |
The recommendation is No data if any of these conditions are true.
The recommendation result is No recommended change if any of these conditions are true.
Depending on whether you specify a Maximum or Average utilization measure in the Severely Underutilized when section of the Rightsizing Policy, Tanzu CloudHealth considers the maximum or average CPU or memory for the source instance.
A candidate is recommended when the following condition is true:
The candidate can handle the CPU, memory, disk IO, and network I/O requirements of the source instance.
Depending on whether you specify a Maximum or Average utilization measure in the Severely Underutilized when section of the Rightsizing Policy, Tanzu CloudHealth considers the maximum or average CPU, memory, disk IO, and network I/O requirements for the source instance.
The recommendation result is Terminate instance if all these conditions are true for the candidate instance.
Depending on whether you specify a Maximum or Average utilization measure in the Severely Underutilized when section of the Rightsizing Policy, Tanzu CloudHealth considers the maximum or average CPU, memory, disk IO, and network I/O requirements for the candidate instance.
The EC2 Instance Rightsizing reports provide the following insights:
In order to understand EC2 instance performance and utilization on both granular and macro levels, Tanzu CloudHealth ingests data on CPU, Memory, Disk, Network I/O, and Disk I/O. These metrics are gathered through one or more of these sources:
Tanzu CloudHealth gathers CPU, Memory, Disk (including both internal store and attached EBS storage), Network I/O, and Disk I/O metrics for each EC2 instance in your infrastructure. Each metric is then assigned a numeric value called a Score. The individual metric scores are used to compute a Total Score for each instance.
While the individual metric scores indicate the utilization of each metric, the Total Score represents how well each EC2 instance is being utilized.
Tanzu CloudHealth bases metric scores and total scores on utilization thresholds that you specify in the Instance Rightsizing Policy.
When creating a new Instance Rightsizing policy, the initial values are set to the Tanzu CloudHealth default. You can modify the score value and the weight of each topic score.
Use the Severely underutilized when and Moderately underutilized when categories to specify the thresholds that reflect your internal business standards for a metric. When the utilization for a metric lies within a specific range, a numeric score is assigned to the metric.
There are three threshold categories:
Category | Score Range for Metric |
---|---|
Severely underutilized | 0 to 33 |
Moderately underutilized | 34 to 67 |
Well utilized | 68 to 100 |
In order to make a rightsizing recommendation, Tanzu CloudHealth considers the thresholds for Maximum or Average utilization that you specify in the Severely Underutilized when section of a Rightsizing Policy.
Let’s consider that you define the threshold for Severely underutilized as < 40%
. If the average usage for a metric is measured as 20%
, the usage is halfway through the threshold range (0%
–40%
).
Consequently, the corresponding score for the usage is halfway through the score range for the Severely underutilized category and is calculated as 50% of 33
, which after rounding is 17
.
In addition to specifying the thresholds for each category, you can also assign a weight to each metric, including lowering a weight to 0
. Tanzu CloudHealth uses the weights you assign to calculate the Total Score for each instance as follows:
Total Score = f (CPU, Memory, Disk, Network In, Network Out)
Here, f is the weighted average
Weighted Score = (Weight/Sum of weights) * Score
If you assign CPU a weight of 2
, and assign Memory, Disk, Network In, and Network Out each a weight of 1
, the weighted score for each metric is calculated as follows:
Weighted CPU Score = (2/6) * Score
Weighted Memory Score = (1/6) * Score
Weighted Disk Score = (1/6) * Score
Weighted Network In Score = (1/6) * Score
Weighted Network Out Score = (1/6) * Score
Let’s consider these threshold specifications for the CPU utilization of an EC2 Instance. Based on the settings in a policy, the usage thresholds are defined as follows:
Category | CPU Usage | Score |
---|---|---|
Severely underutilized | <20% | 0 to 33 |
Moderately underutilized | 20%–49% | 34 to 67 |
Well utilized | >=50% | 68 to 100 |
Here is how the score for CPU usage changes.
CPU Usage | Score |
---|---|
50% | 68 |
75% | 68 + [(100 - 68)/2] = 85 |
Find the EC2 Rightsizing Report at Recommendations > Rightsizing (Old) > EC2 Rightsizing.
This report highlights underutilized instances.
Scores for individual metrics and the Total Score are visually represented as battery meters with colored zones. Lower scores are indicated by fewer bars colored red through orange. Larger scores are indicated by more bars colored orange through green.
Hover over the battery meter for a metric to get more details. Individual metric scores are calculated using average performance over the current month or previous month. The hover indicates the minimum and maximum performance measured during that period.
Click the battery meter for a metric for deeper trend analysis.
For more information, see How Tanzu CloudHealth Makes Instance Rightsizing Recommendations.
Efficiency is not one of the default columns in this report. To display it, click Edit Columns and add it.
The Efficiency of an EC2 Instance ranges from 0 to 100, and it is based on the Total Score for that instance and the cost of running it.
For example, a severely underutilized EC2 Instance that is very expensive has a very low Efficiency, while a severely underutilized EC2 Instance that is very inexpensive has a higher Efficiency.
Use Efficiency as a way to prioritize which EC2 Instances to rightsize for maximizing savings. This metric is not in the default report, you must add it by clicking on the Edit Columns button.
Recommendation Savings are the estimated monthly savings that you gain by implementing the rightsizing recommendation for an EC2 Instance. It is calculated as follows:
((Actual Compute Cost of EC2 Instance for Month / Hours Instance was Active in Month) - (Hourly On-demand Price of Recommended EC2 Instance)) * 730
The assumption made in this calculation is that the number of instance hours remain unchanged before and after you implement the rightsizing recommendation.
Evaluate a recommendation against internal business knowledge of your infrastructure.
Recommendation is the course of action Tanzu CloudHealth computes after analyzing EC2 Instance utilization. For more information, see How Tanzu CloudHealth Makes Instance Rightsizing Recommendations.
Projected Cost is calculated through linear extrapolation:
(Actual MTD cost) + [(Average daily cost based on the last 31 days) x (Remaining days in the month)]
Use the Projected Cost as a guideline that becomes more accurate as the month progresses and more actual MTD data is available.
Use the Actions button associated with each EC2 Instance to initiate an operation through the Platform. You can start, stop, reboot, or delete the instance. You can also run your own Lambda functions or take custom actions.
Find the EC2 Perspective Rightsizing Report at Recommendations > Rightsizing (Old) > EC2 Perspective Rightsizing.
This report groups the average scores for your EC2 Instances by Perspectives that you define in the Platform. Essentially, this report is the EC2 Rightsizing report filtered by Perspectives.
Scores for individual metrics and the Total Score are visually represented as battery meters with colored zones. Lower scores are indicated by fewer bars colored red through orange. Larger scores are indicated by more bars colored orange through green.
Hover over the battery meter for a metric to get more details. Individual metric scores are calculated using average performance over the current month or previous month. The hover indicates the minimum and maximum scores measured during that period.
Recommendation Savings are the estimated monthly savings that you gain by implementing the rightsizing recommendation for EC2 Instances that belong to a specific Perspective. It is calculated as follows:
((Actual Compute Cost of EC2 Instance for Month / Hours Instance was Active in Month) - (Hourly On-demand Price of Recommended EC2 Instance)) * 730
The assumption made in this calculation is that the number of instance hours remain unchanged before and after you implement the rightsizing recommendation.
Click the View icon next to each Perspective name to dive deeper and see recommendations for individual EC2 Instances in that Perspective.
The EBS Volume Rightsizing reports provide the following insights:
In order to understand EBS volume performance and utilization on both granular and macro levels, Tanzu CloudHealth ingests data on Read/Write Bytes, Read/Write IOPS, Read/Write Time, and Throughput. These metrics are gathered through one or more of these sources:
Find the EBS Rightsizing Report at Recommendations > Rightsizing (Old) > EBS Rightsizing.
This report shows the performance of the storage volumes in your cloud infrastructure from an underutilization point of view. For each EBS Volume, Tanzu CloudHealth calculates a Usage Score, Read Score, and Write Score.
The default policy enabled for rightsizing volumes is as follows:
Usage
Read Throughput
Write Throughput
Wasted Cost = (1 - Total Score/ 100) * Cost
. If it can be computed, then it is equal to the percentage of Projected Cost the Total Score represents. Example: A Total Score of 5
would result in 1 – (5/100)
of the Projected cost as Wasted (20% of Projected Monthly Cost). Because Wasted cost depends upon a Recommendation, and only Volumes of type standard will have this field populated, and then only a recommendation to move to SSD can be made.Switch to SSD & save $<savings_amount>
is given./reports/usage/EBS volume-hours
report, under the SSD Savings column. If it is zero, then no recommendation will be given.(Total Cost) + [(Average daily cost based on the last 31 days) x (Remaining days in the month)]
.The rightsizing report data is generated daily for the active customers who were created at least a month ago.
Find the EBS Perspective Rightsizing Report at Recommendations > Rightsizing (Old) > EBS Perspective Rightsizing.
This view gives your aggregate average score across your instances categorized into business groups of the Perspective selected.
You can click on any group listed under the Group Name column to drill-down into more detailed information or click on the elements within a row:
Wasted Cost = (1 - Total Score/ 100) * Cost
/reports/usage/EBS volume-hours
report SSD Savings column. If zero, then there are no savings, and no recommendation will be given.