Data Center Rightsizing

Benefits of Data Center Rightsizing

The two key benefits of rightsizing are infrastructure optimization and cost reduction. During a rightsizing analysis of their infrastructure, organizations discover assets that can be downsized or terminated to save money or upgraded to improve performance.

Downsizing

An asset is underutilized and a strong candidate for downsizing if it exhibits low utilization (less than 20%, for example) for core performance metrics. In this case, the best practice is to downgrade the asset to a smaller footprint.

Terminating

Your cloud infrastructure will likely contain assets that are running but not being used. These assets are called zombies, and they are good candidates for termination. Zombies result when someone forgets to turn the assets off after use or when the asset fails due to script errors. Regardless of the cause, cloud providers continue to charge for these unused assets because they are in a running state. You can reduce costs by proactively identifying and terminating these assets.

Upgrading

By downgrading underutilized assets and terminating unused ones, you can optimize for performance as well as reduce costs. Upgrading assets, on the other hand, results in increased spend. However, by upgrading you can ensure that your assets are able to meet surges in demand.

How Tanzu CloudHealth Makes Data Center Machine Rightsizing Recommendations

Step 1: Metrics for Source Machine

The platform retrieves the following metrics for Data Center machines to determine how system resources are being utilized.

CPU
Memory
Disk

Metrics are retrieved via the Tanzu CloudHealth Agent for non-vSphere accounts and via the VMware Aggregator for vSphere accounts.

The threshold percentages you specify in an Instance Rightsizing Policy are only used to score each resource. Tanzu CloudHealth rightsizing recommendations do not consider your threshold percentages; instead, they consider the resource’s metrics.

Effect of Severe Underutilization Threshold on Recommendations

When you configure the Instance Rightsizing Policy, you must specify for each metric a threshold measure of Maximum, Minimum, or Average utilization for both the Severely underutilized when and Moderately underutilized when sections.

The Tanzu CloudHealth Rightsizing report uses the threshold measure specified in the Severely Underutilized when section for each metric to determine whether to use the metric’s Maximum, Minimum, or Average data when calculating a rightsizing recommendation.

If you set the Severely underutilized when thresholds to Maximum for CPU, Memory, or Disk, Tanzu CloudHealth computes rightsized recommendations based on maximum data for that metric.
If you set the Severely underutilized when thresholds to Minimum for CPU, Memory, or Disk, Tanzu CloudHealth computes rightsized recommendations based on minimum data for that metric.
If you set the Severely underutilized when thresholds to Average, Tanzu CloudHealth computes rightsized recommendations based on average data for that metric.

By default, the Instance Rightsizing Policy uses Average metrics for the Severely underutilized when threshold for CPU, Memory, and Disk.

The Machine Rightsizing report collects metrics for the current month. Consequently, the report cannot make recommendations for the first two days of each calendar month due to insufficient metrics data.

Step 2: Custom Candidate Configuration

Tanzu CloudHealth calculates the ideal configuration needed to fulfill the CPU, Memory, and Disk requirements of the source machine. A custom candidate is then built for the calculated configuration.

Machine metrics scale as follows:

CPU: 1, 2, 4, 8, 16, etc. CPU cores
Memory: 0.0625, 0.125, 0.25, 0.5, 1, 2, 4, 8, 16, 32, 64, etc. GB
Disk: 1, 2, 3, 4, 5, etc. GB

CPU and Memory follow scale, except when the source machine’s current metric already meets the machine’s needs. For example, if a source machine currently has 6 CPU cores and requires 5 CPU cores, then the custom candidate has 6 CPU cores. However, if a source machine currently has 9 CPU cores and requires 5 cores, then the custom candidate follows scale and has 8 CPU cores.

Disk follows scale, regardless of the source machine’s current metric. If a source machine currently has 1.6 GB of Disk storage and requires 1.4 GB, the custom candidate follows scale and has 2 GB of Disk storage.

Step 3: Comparison and Recommendation

Blank recommendation

The recommendation is blank if there is less than two days’ worth of metric data available.

No data

The recommendation is No data if the source machine is not running or not active.

No recommended change

The recommendation result is No recommended change if the calculated ideal configuration matches the existing configuration of the source machine.

Rightsized recommendation

The rightsized recommendation is the ideal machine configuration that most closely matches the source machine’s metrics requirements.

For example, consider that the source machine has this configuration: 2 Cores, 16 GB Memory, 50 GB Disk.

If only CPU of this machine is underutilized, Tanzu CloudHealth recommends that you downgrade to a machine with 1 Core, 16 GB Memory, and 50 GB Disk.

If both CPU and Memory are underutilized, Tanzu CloudHealth recommends that you downgrade to a machine with 1 Core, and depending on memory usage, to 8 GB Memory.

Termination recommendation

The recommendation result is Terminate instance if all these conditions are true for the candidate machine:

Average CPU utilization is less than 1%.
Average memory utilization is 0%.
Average disk usage is 0%.

Machine Rightsizing Report

Insights This Report Provides

The Machine Rightsizing report provides the following insights:

How well your machines are being utilized in terms of the workloads you are running on them.
How various metrics such as CPU, Memory, and Disk contribute to machine utilization.
Opportunities for rightsizing machines, thereby improving performance and optimizing workloads.

Sources of Machine Metrics

In order to understand machine performance and utilization on both granular and macro levels, Tanzu CloudHealth ingests data on CPU, Memory, and Disk. These metrics are gathered through the Tanzu CloudHealth Agent and VMware Aggregator.

How Tanzu CloudHealth Interprets Machine Utilization

Tanzu CloudHealth gathers CPU, Memory, and Disk metrics for each machine in your infrastructure. Each metric is then assigned a numeric value called a Score. The individual metric scores are used to compute a Total Score for each instance.

While the individual metric scores indicate the utilization of each metric, the Total Score represents how well each machine is being utilized.

Scoring Mechanism

Tanzu CloudHealth bases metric scores and total scores on utilization thresholds that you specify in the Instance Rightsizing Policy.

Use the Severely underutilized when and Moderately underutilized when categories to specify the thresholds that reflect your internal business standards for a metric. When the utilization for a metric lies within a specific range, a numeric score is assigned to the metric.

There are three threshold categories:

Category	Score Range for Metric
Severely underutilized	0 to 33
Moderately underutilized	34 to 67
Well utilized	68 to 100

The thresholds you specify are only used to score each resource. Tanzu CloudHealth rightsizing recommendations, however, do not consider your thresholds; instead, they consider the source machine’s metrics.

Let’s consider that you define the threshold for Severely underutilized as <40%. If the average usage for a metric is measured as 20%, the usage is halfway through the threshold range (0%–40%).

Consequently, the corresponding score for the usage is halfway through the score range for the Severely underutilized category and is calculated as 50% of 33, which after rounding is 17.

In addition to specifying the thresholds for each category, you can also assign a weight to each metric, including lowering a weight to 0. Tanzu CloudHealth uses the weights you assign to calculate the Total Score for each instance as follows:

Total Score = f (CPU, Memory, Disk)
Here, f is the weighted average

Weighted Score = (Weight/Sum of weights) * Score

If you assign CPU a weight of 2, and assign Memory and Disk each a weight of 1, the weighted score for each metric is calculated as follows:

Weighted CPU Score = (2/4) * Score
Weighted Memory Score = (1/4) * Score
Weighted Disk Score = (1/4) * Score

Example: Machine Scoring

Let’s consider these threshold specifications for the CPU utilization of a machine. Based on the settings in this policy, the usage thresholds are defined as follows:

Category	CPU Usage	Score
Severely underutilized	<20%	0 to 33
Moderately underutilized	20%–49%	34 to 67
Well utilized	>=50%	68 to 100

Here is how the score for CPU usage changes.

CPU Usage	Score
50%	`68`
75%	`68 + [(100 - 68)/2] = 85`

How to Interpret Machine Rightsizing Report

Find the Virtual Machine Rightsizing Report at Recommendations > Rightsizing > Machine Rightsizing.

This report highlights underutilized machines.

How to Analyze Scores

Scores for individual metrics and the Total Score are visually represented as battery meters with colored zones. Lower scores are indicated by fewer bars colored red through orange. Larger scores are indicated by more bars colored orange through green.

Hover over the battery meter for a metric to get more details. Individual metric scores are calculated using average performance over the current month or previous month. The hover indicates the minimum and maximum performance measured during that period.

Click the battery meter for a metric for deeper trend analysis.

How to Use Efficiency

Efficiency is not one of the default columns in this report. To display it, click Edit Columns and add it.

The Efficiency of a machine ranges from 0 to 100, and it is based on the Total Score for that machine and the cost of running it.

For example, a severely underutilized machine that is very expensive has a very low Efficiency, while a severely underutilized machine that is very inexpensive has a higher Efficiency.

Use Efficiency as a way to prioritize which machines to rightsize for maximizing savings.

How to Interpret the Recommendation

Evaluate a recommendation against internal business knowledge of your infrastructure.

A recommendation is the course of action Tanzu CloudHealth computes after analyzing machine utilization.

How to Interpret Projected Cost

Projected Total Cost is calculated as follows:

(machine Usage Hours over analysis period) x (Hourly machine price you provide in the Tanzu CloudHealth platform)