This section provides guidance regarding CPU considerations in VMware VMware Cloud on AWS hosts.
CPU virtualization adds varying amounts of overhead depending on the percentage of the virtual machine’s workload that can be run on the physical processor as is and the cost of virtualizing the remainder of the workload:
For many workloads, CPU virtualization adds only a very small amount of overhead, resulting in performance essentially comparable to native.
Many workloads to which CPU virtualization does add overhead are not CPU-bound—that is, most of their time is spent waiting for external events such as user interaction, device input, or data retrieval, rather than running instructions. Because in this case otherwise-unused CPU cycles are available to absorb the virtualization overhead, these workloads will typically have throughput similar to native, but potentially with a slight increase in latency.
For a small percentage of workloads, for which CPU virtualization adds overhead and which are CPU-bound, there might be a noticeable degradation in both throughput and latency.
The rest of this section lists practices and configurations recommended by VMware for optimal CPU performance.
In most environments VMware Cloud on AWS allows significant levels of CPU overcommitment (that is, running more vCPUs on a host than the total number of physical processor cores in that host) without impacting virtual machine performance.
If a host becomes CPU saturated (that is, the virtual machines and other loads on the host demand all the CPU resources the host has), latency-sensitive workloads might not perform well. In this case Distributed Resource Scheduler (DRS) will attempt to migrate one or more virtual machines to another host if there's one available with sufficient resources. You might also want to reduce the CPU load (by powering off some virtual machines, for example).
It is a good idea to periodically monitor the CPU usage of the host. This can be done through the vSphere Client, using the VMware vRealize®Operations™ management suite, or by using resxtop. Below we describe how to interpret resxtop data:
If the load average on the first line of the resxtop CPU panel is equal to or greater than 1, this indicates that the system is overloaded.
The usage percentage for the physical CPUs on the PCPU line can be another indication of a possibly overloaded condition. In general, 80% usage is a reasonable ceiling and 90% should be a warning that the CPUs are approaching an overloaded condition. However organizations will have varying standards regarding the desired load percentage.
Although esxtop can’t be used in VMware Cloud on AWS, resxtop can. See the blog post ESXTOP and VMware Cloud on AWS for guidance. For information about using resxtop see Performance Monitoring Utilities: resxtop and esxtop in vSphere Monitoring and Performance.
Configuring a virtual machine with more virtual CPUs (vCPUs) than its workload can use might cause slightly increased resource usage, potentially impacting performance on very heavily loaded systems. Common examples of this include a single-threaded workload running in a multiple-vCPU virtual machine or a multi-threaded workload in a virtual machine with more vCPUs than the workload can effectively use.
Even if the guest operating system doesn’t use some of its vCPUs, configuring virtual machines with those vCPUs still imposes some small resource requirements that translate to real CPU consumption on the host. For example:
Maintaining a consistent memory view among multiple vCPUs can consume additional resources, both in the guest operating system and in the host.
Most guest operating systems run an idle loop during periods of inactivity. Within this loop, most of these guest operating systems halt by running the HLT or MWAIT instructions. Some very old guest operating systems, however, use busy-waiting within their idle loops. This results in the consumption of resources that might otherwise be available for other uses (other virtual machines, the VMkernel, and so on).
VMware Cloud on AWS automatically detects these loops and de-schedules the idle vCPU. Though this reduces the CPU overhead, it can also reduce the performance of some I/O-heavy workloads. For additional information see VMware KB articles 1077 and 2231.
The guest operating system’s scheduler might migrate a single-threaded workload amongst multiple vCPUs, thereby losing cache locality.
Some workloads can easily be split across multiple virtual machines. In some cases, for the same number of vCPUs, using more smaller virtual machines (sometimes called “scaling out”) will provide better performance than fewer larger virtual machines (sometimes called “scaling up”). In other cases the opposite is true, and fewer larger virtual machines will perform better. The variations can be due to a number of factors, including NUMA node sizes, CPU cache locality, and workload implementation details. The best choice can be determined through experimentation using your specific workload in your environment.