Classic AWS Rightsizing

Benefits of AWS Rightsizing

The two key benefits of rightsizing are infrastructure optimization and cost reduction. During a rightsizing analysis of their infrastructure, organizations discover assets that can be downsized or terminated to save money or upgraded to improve performance.

Downsizing

An asset is underutilized and a strong candidate for downsizing if it exhibits low utilization (less than 20%, for example) for core performance metrics. In this case, the best practice is to downgrade the asset to a smaller footprint.

For example, in AWS, if you are running a workload on an r3.2xlarge but determine via rightsizing analysis that you could downgrade the instance type to an r3.xlarge instance without negatively impacting the workload, you can cut your operating costs by 50%.

Terminating

Your cloud infrastructure will likely contain assets that are running but not being used. These assets are called zombies, and they are good candidates for termination. Zombies result when someone forgets to turn the assets off after use or when the asset fails due to script errors. Regardless of the cause, cloud providers continue to charge for these unused assets because they are in a running state. You can reduce costs by proactively identifying and terminating these assets.

How are Instance Rightsizing Recommendations made

Step 1: Metrics for Source Instance

The platform retrieves the following metrics for instances to determine how system resources are being utilized:

CPU: CPU Utilization scaled to VCPUs Utilized
Memory: Memory Utilized (MB)
Network: Combined I/O GB/S
Local Attached Storage: Type and Size of Attached Storage

Source instances must be active for at least 48 hours in the Platform to begin generating rightsizing recommendations.

In order to make a rightsizing recommendation, Tanzu CloudHealth considers the thresholds for Maximum or Average utilization that you specify in the Severely Underutilized when section of a Rightsizing Policy.

Step 2: Candidate List

A list of potential Instance Type candidates is built based on the following criteria:

The candidates are sorted from least expensive to most expensive. If a candidate is covered by a reservation or a savings plan, the effective cost due to the reservation or savings plan is determined.
Candidates are removed in the following scenarios:
- If the source instance and candidate do not have matching AMD/Intel/ARM processors, the candidate is removed.
- If the source instance and candidate have different storage types, the candidate is removed.
Recommendations are not generated for the following source instance types:
- cc1.4xlarge
- cc2.8xlarge
- cg1.4xlarge
- bare metal instances
- Accelerated Computing Family instances Tanzu CloudHealth cannot make rightsizing recommendations for Accelerated Computing Family source instances because of the difficulty in collecting accurate GPU-related metrics.

Step 3: Metrics for Candidates

For each candidate in the list, the platform performs the following actions:

The metrics retrieved for the source instance are transposed on the candidate. The following metrics are adjusted.
- CPU utilization is recast as a scaled CPU utilization based on VCPU utilization.
- Disk and Disk I/O metrics are dropped when the candidate does not belong to the same family as the source instance.
- Network utilization is capped at 100%.
Individual and total scores are computed based on the adjusted metrics.

Step 4: Comparison and Recommendation

If Memory or Disk metrics for the source instance are unavailable, Tanzu CloudHealth ensures that the target instance has at least as much or more memory and disk capacity as the source instance.

Tanzu CloudHealth provides cross-family recommendations for AWS instances. Downgrade recommendations help you identify the smallest-sized instance, and consequently the least costly candidate instance, that is capable of handling the usage for the source instance. Recommendations for downgrades can span across different families. For example, if you have an m4.xlarge instance with very low CPU and Memory utilization, Tanzu CloudHealth may recommend that you downgrade to the lower cost c4.large instead of downgrading to an m4.large, which is more expensive than the c4.large and has memory capacity that the instance may not need.

Consider the following example: You have an m5.xlarge instance, which has a capacity of 4.0 CPU and 16,000 MB of memory. You have configured your Rightsizing policy to measure the CPU threshold for average utilization and the memory threshold for maximum utilization. Your instance uses an average of 1.45 CPU and a maximum of 15,776 MB of memory. The memory capacity is correct for your instance’s memory usage, but the m5.xlarge provides more than twice the CPU that the instance requires. Therefore, the Rightsizing report searches for the least expensive instance type that has the same memory as the m5.xlarge but half the CPU. The Rightsizing report selects the r5.large as the least expensive option that matches those requirements.

Instance	Average CPU	Max Memory
Available with `m5.xlarge`	4.0	16,000
Used by instance	1.45	15,776
Rightsized recommendation for `r5.large`	2.0	16,000

No data

The recommendation is No data if any of these conditions are true.

The source machine is not active.
Tanzu CloudHealth is unable to retrieve metrics from the source instance.
Tanzu CloudHealth is unable to retrieve cost data from the bill.
Cost data from the bill is pending.

No recommended change

The recommendation result is No recommended change if any of these conditions are true.

The source instance was launched after the start of the current reporting interval.
No pricing information is available for the source or candidate instance.
The source instance belongs to an accelerated computing family.
The on-demand compute cost per hour (based on an average month) of each candidate instance is greater than the actual compute cost/hour of the source instance. This difference can result when the source is using reserved instances (RIs) and the compute cost is discounted below the on-demand cost of the candidates.
The CPU or memory usage for the source instance does not match that for the candidate instance.

Depending on whether you specify a Maximum or Average utilization measure in the Severely Underutilized when section of the Rightsizing Policy, Tanzu CloudHealth considers the maximum or average CPU or memory for the source instance.

Rightsized recommendation

A candidate is recommended when the following condition is true:

The candidate can handle the CPU, memory, disk IO, and network I/O requirements of the source instance.
- For non-burstable candidate instances, the source workload is under 100% of the CPU Baseline.
- For burstable candidate instances, the source workload is under 85% of the Burstable CPU Baseline.
Depending on whether you specify a Maximum or Average utilization measure in the Severely Underutilized when section of the Rightsizing Policy, Tanzu CloudHealth considers the maximum or average CPU, memory, disk IO, and network I/O requirements for the source instance.

Termination recommendation

The recommendation result is Terminate instance if all these conditions are true for the candidate instance.

CPU utilization is less than 1%.
Memory utilization is 0% or no data is available.
Disk usage is 0%.
Disk I/O utilization (both bytes/sec and IOPS) is 0%.
Network I/O utilization is 0%.

Depending on whether you specify a Maximum or Average utilization measure in the Severely Underutilized when section of the Rightsizing Policy, Tanzu CloudHealth considers the maximum or average CPU, memory, disk IO, and network I/O requirements for the candidate instance.

EC2 Instance Rightsizing Reports

Insights These Reports Provide

The EC2 Instance Rightsizing reports provide the following insights:

How well your EC2 instances are being utilized in terms of the workloads you are running on them.
How various metrics such as CPU, Memory, Disk, Network I/O, and Disk I/O contribute to instance utilization.
Opportunities for rightsizing EC2 instances, thereby saving costs and optimizing workloads.

Sources of EC2 Instance Metrics

In order to understand EC2 instance performance and utilization on both granular and macro levels, Tanzu CloudHealth ingests data on CPU, Memory, Disk, Network I/O, and Disk I/O. These metrics are gathered through one or more of these sources:

New Relic
Datadog
AWS CloudWatch
Tanzu CloudHealth Metrics API
Tanzu CloudHealth Agent
Wavefront

How EC2 Instance Utilization are Interpreted

Tanzu CloudHealth gathers CPU, Memory, Disk (including both internal store and attached EBS storage), Network I/O, and Disk I/O metrics for each EC2 instance in your infrastructure. Each metric is then assigned a numeric value called a Score. The individual metric scores are used to compute a Total Score for each instance.

While the individual metric scores indicate the utilization of each metric, the Total Score represents how well each EC2 instance is being utilized.

Scoring Mechanism

Tanzu CloudHealth bases metric scores and total scores on utilization thresholds that you specify in the Instance Rightsizing Policy.

When creating a new Instance Rightsizing policy, the initial values are set to the Tanzu CloudHealth default. You can modify the score value and the weight of each topic score.

Use the Severely underutilized when and Moderately underutilized when categories to specify the thresholds that reflect your internal business standards for a metric. When the utilization for a metric lies within a specific range, a numeric score is assigned to the metric.

There are three threshold categories:

Category	Score Range for Metric
Severely underutilized	0 to 33
Moderately underutilized	34 to 67
Well utilized	68 to 100

In order to make a rightsizing recommendation, Tanzu CloudHealth considers the thresholds for Maximum or Average utilization that you specify in the Severely Underutilized when section of a Rightsizing Policy.

Let’s consider that you define the threshold for Severely underutilized as < 40%. If the average usage for a metric is measured as 20%, the usage is halfway through the threshold range (0%–40%).

Consequently, the corresponding score for the usage is halfway through the score range for the Severely underutilized category and is calculated as 50% of 33, which after rounding is 17.

In addition to specifying the thresholds for each category, you can also assign a weight to each metric, including lowering a weight to 0. Tanzu CloudHealth uses the weights you assign to calculate the Total Score for each instance as follows:

Total Score = f (CPU, Memory, Disk, Network In, Network Out)
Here, f is the weighted average

Weighted Score = (Weight/Sum of weights) * Score

If you assign CPU a weight of 2, and assign Memory, Disk, Network In, and Network Out each a weight of 1, the weighted score for each metric is calculated as follows:

Weighted CPU Score = (2/6) * Score
Weighted Memory Score = (1/6) * Score
Weighted Disk Score = (1/6) * Score
Weighted Network In Score = (1/6) * Score
Weighted Network Out Score = (1/6) * Score

Example: EC2 Instance Scoring

Let’s consider these threshold specifications for the CPU utilization of an EC2 Instance. Based on the settings in a policy, the usage thresholds are defined as follows:

Category	CPU Usage	Score
Severely underutilized	<20%	0 to 33
Moderately underutilized	20%–49%	34 to 67
Well utilized	>=50%	68 to 100

Here is how the score for CPU usage changes.

CPU Usage	Score
50%	`68`
75%	`68 + [(100 - 68)/2] = 85`

How to Interpret EC2 Rightsizing Report

Find the EC2 Rightsizing Report at Recommendations > Rightsizing (Old) > EC2 Rightsizing.

This report highlights underutilized instances.

How to Analyze Scores

Scores for individual metrics and the Total Score are visually represented as battery meters with colored zones. Lower scores are indicated by fewer bars colored red through orange. Larger scores are indicated by more bars colored orange through green.

Hover over the battery meter for a metric to get more details. Individual metric scores are calculated using average performance over the current month or previous month. The hover indicates the minimum and maximum performance measured during that period.

Click the battery meter for a metric for deeper trend analysis.

For more information, see How Tanzu CloudHealth Makes Instance Rightsizing Recommendations.

How to Use Efficiency

Efficiency is not one of the default columns in this report. To display it, click Edit Columns and add it.

The Efficiency of an EC2 Instance ranges from 0 to 100, and it is based on the Total Score for that instance and the cost of running it.

For example, a severely underutilized EC2 Instance that is very expensive has a very low Efficiency, while a severely underutilized EC2 Instance that is very inexpensive has a higher Efficiency.

Use Efficiency as a way to prioritize which EC2 Instances to rightsize for maximizing savings. This metric is not in the default report, you must add it by clicking on the Edit Columns button.

How to Use Recommendation Savings

Recommendation Savings are the estimated monthly savings that you gain by implementing the rightsizing recommendation for an EC2 Instance. It is calculated as follows:

((Actual Compute Cost of EC2 Instance for Month / Hours Instance was Active in Month) - (Hourly On-demand Price of Recommended EC2 Instance)) * 730

The assumption made in this calculation is that the number of instance hours remain unchanged before and after you implement the rightsizing recommendation.

How to Interpret the Recommendation

Evaluate a recommendation against internal business knowledge of your infrastructure.

Recommendation is the course of action Tanzu CloudHealth computes after analyzing EC2 Instance utilization. For more information, see How Tanzu CloudHealth Makes Instance Rightsizing Recommendations.

How to Interpret Projected Cost

Projected Cost is calculated through linear extrapolation:

(Actual MTD cost) + [(Average daily cost based on the last 31 days) x (Remaining days in the month)]

Use the Projected Cost as a guideline that becomes more accurate as the month progresses and more actual MTD data is available.

Initiate Action from Report

Use the Actions button associated with each EC2 Instance to initiate an operation through the Platform. You can start, stop, reboot, or delete the instance. You can also run your own Lambda functions or take custom actions.

How to Interpret EC2 Perspective Rightsizing Report

Find the EC2 Perspective Rightsizing Report at Recommendations > Rightsizing (Old) > EC2 Perspective Rightsizing.

This report groups the average scores for your EC2 Instances by Perspectives that you define in the Platform. Essentially, this report is the EC2 Rightsizing report filtered by Perspectives.

How to Analyze Scores

How to Use Recommendation Savings

Recommendation Savings are the estimated monthly savings that you gain by implementing the rightsizing recommendation for EC2 Instances that belong to a specific Perspective. It is calculated as follows:

((Actual Compute Cost of EC2 Instance for Month / Hours Instance was Active in Month) - (Hourly On-demand Price of Recommended EC2 Instance)) * 730

The assumption made in this calculation is that the number of instance hours remain unchanged before and after you implement the rightsizing recommendation.

Click the View icon next to each Perspective name to dive deeper and see recommendations for individual EC2 Instances in that Perspective.

EBS Volume Rightsizing Reports

Insights These Reports Provide

The EBS Volume Rightsizing reports provide the following insights:

How well your EBS volumes are being utilized in terms of storage and throughput.
How various metrics such as Read/Write Bytes, Read/Write IOPS, Read/Write Time, and Throughput contribute to volume utilization.
Opportunities for rightsizing EBS volumes, thereby saving costs and optimizing storage.

Sources of EBS Volume Metrics

In order to understand EBS volume performance and utilization on both granular and macro levels, Tanzu CloudHealth ingests data on Read/Write Bytes, Read/Write IOPS, Read/Write Time, and Throughput. These metrics are gathered through one or more of these sources:

AWS CloudWatch
Chef
Tanzu CloudHealth Agent

How to Interpret EBS Rightsizing Report

Find the EBS Rightsizing Report at Recommendations > Rightsizing (Old) > EBS Rightsizing.

This report shows the performance of the storage volumes in your cloud infrastructure from an underutilization point of view. For each EBS Volume, Tanzu CloudHealth calculates a Usage Score, Read Score, and Write Score.

The default policy enabled for rightsizing volumes is as follows:

Usage

Severely Underutilized: Average Used % < 35%
Moderately Underutilized: Average Used % >= 35% and Average Used < 50%

Read Throughput

Severely Underutilized: Average Read Ops % < 20%
Moderately Underutilized: Average Read Ops % >= 20% and Average Read Ops < 50%

Write Throughput

Severely Underutilized: Average Write Ops % < 20%
Moderately Underutilized: Average Write Ops % >= 20% and Average Write Ops < 50%

Total Score

A number representing how well this volume is being used in general.
It is made up of equally weighted usage and read/write throughput scores by default, though you can specify your own thresholds in a Volume Rightsizing Policy.
Total Score is used only in pro-rating Wasted Cost and is not used in Recommendation.
Scores are calculated for all processed volume types.

Usage Score

A number representing how well this volume is being used with respect to its capacity. If there is usage data, then the usage score will be displayed as a number between 0 and 100. Otherwise, it will show no data.

Read/Write Score

A number representing how well this volume is being used with respect to the throughput. A score is a number between 0 and 100.
If there is usage data, then the score will be displayed as a number between 0 and 100. Otherwise, it will show no data.

Volume Name

The name of the volume.

Size (GB)

The amount of space provisioned for the volume.

Status

The current status of the volume.

Type

The type of the volume; standard for magnetic storage, gp2 for general purpose (SSD), or io1 for Provisioned IOPS, st1 for throughput optimized HDD, & sc1 for Cold HDD.

Zone name

The availability zone the volume is provisioned in.

Instance name

The name of the instance to which the volume is attached.

Total Cost

The MTD cost of each component of storage, IO, and PIOPS.

Storage Cost

The MTD cost of the storage component of the volume.

Wasted Cost

Wasted Cost is only computed and displayed if a Recommendation was made AND if there was a Projected Monthly cost calculated AND if a Total Score was calculated. Otherwise, it will always be empty.
No data will be seen if there is no Total Score available.
The percentage of the cost that is being wasted due to underutilization. It is calculated as Wasted Cost = (1 - Total Score/ 100) * Cost. If it can be computed, then it is equal to the percentage of Projected Cost the Total Score represents. Example: A Total Score of 5 would result in 1 – (5/100) of the Projected cost as Wasted (20% of Projected Monthly Cost). Because Wasted cost depends upon a Recommendation, and only Volumes of type standard will have this field populated, and then only a recommendation to move to SSD can be made.

Recommendations

Recommendations are only processed for Volume type standard. No recommendations are provided for the other volume types.
The Volume must be active and it should have cost attached to it. Otherwise, no recommendation will be given.
We calculate the cost savings of the current volume as compared to the same size SSD volume and if savings can be realized then the recommendation Switch to SSD & save $<savings_amount> is given.
You can visualize the Savings amount in the /reports/usage/EBS volume-hours report, under the SSD Savings column. If it is zero, then no recommendation will be given.

Projected Monthly

This can be added as a column in the Rightsizing Report and is only computed if Total Cost is available for this asset.
Its value is either the Total Cost (for a full month) or it is the same as EC2 Projected Cost i.e. the (Total Cost) + [(Average daily cost based on the last 31 days) x (Remaining days in the month)].

The rightsizing report data is generated daily for the active customers who were created at least a month ago.

How to Interpret EBS Perspective Rightsizing Report

Find the EBS Perspective Rightsizing Report at Recommendations > Rightsizing (Old) > EBS Perspective Rightsizing.

This view gives your aggregate average score across your instances categorized into business groups of the Perspective selected.

You can click on any group listed under the Group Name column to drill-down into more detailed information or click on the elements within a row:

View button or ## of Volumes (the actual number, not the column header) - a report list of the instances for this group name and their individual scores
Group Name - the Perspective for which the group is a part
Total Score, Usage Score, Read Score, Write Score - a utilization line bar graph for that topic, filtered by the particular Group
Total Cost - The MTD cost of each component of storage, IO, and PIOPS
Storage Cost - The MTD cost of the storage component of the volume
Wasted Cost - The percentage of the cost that is being wasted due to underutilization. It is calculated as Wasted Cost = (1 - Total Score/ 100) * Cost

Troubleshooting Tips

The Rightsizing report does not provide any recommendation

Check if the volume type is standard. If not, then no recommendation is given in this version.
Check the /reports/usage/EBS volume-hours report SSD Savings column. If zero, then there are no savings, and no recommendation will be given.

Wasted Cost column displays zero cost

Check if the volume type is standard. If not, then no recommendation is given, and no Wasted Cost will be shown.
Check for a Recommendation. If none, then no Wasted Cost will be displayed.
Check the Projected Monthly cost in the Rightsizing report. If none, then there will be no recommendation.
Check the Total Score in the Rightsizing report. If none, then there will be no recommendation.