This topic explains how to improve VMware Tanzu GemFire performance on vSphere.
Operating System Guidelines
Use the latest supported version of the guest OS, and use Java large paging.
NUMA, CPU, and BIOS Settings
This section provides VMware-recommended NUMA, CPU, and BIOS settings for your hardware and virtual machines.
- Always enable hyper-threading, and do not overcommit CPU.
- For most production VMware Tanzu GemFire servers, always use virtual machines with at least two vCPUs .
- Apply non-uniform memory access (NUMA) locality by sizing virtual machines to fit within the NUMA node.
- VMware recommends the following BIOS settings:
- BIOS Power Management Mode: Maximum Performance.
- CPU Power and Performance Management Mode: Maximum Performance.
- Processor Settings: Turbo Mode activated.
- Processor Settings: C States deactivated.
Note: Settings may vary slightly depending on your hardware make and model. Use the settings above or equivalents as needed.
Physical and Virtual NIC Settings
These guidelines help you reduce latency.
-
Physical NIC: VMware recommends that you deactivate interrupt coalescing on the physical NIC of your ESXi host by using the following command:
ethtool -C vmnicX rx-usecs 0 rx-frames 1 rx-usecs-irq 0 rx-frames-irq 0
where vmnicX
is the physical NIC as reported by the ESXi command:
esxcli network nic list
You can verify that your settings have taken effect by issuing the command:
ethtool -C vmnicX
If you restart the ESXi host, the above configuration must be reapplied.
Note: Deactivating interrupt coalescing can reduce latency in virtual machines; however, it can impact performance and cause higher CPU utilization. It can also defeat the benefits of large receive offloads (LRO) because some physical NICs (such as Intel 10GbE NICs) automatically deactivate LRO when interrupt coalescing is deactivated. This type of tuning benefits Tanzu GemFire workloads, but it can hurt other non-VMware Tanzu GemFire workloads that are memory throughput-bound, as opposed to latency sensitive as in the case of Tanzu GemFire workloads. For more information, see Poor TCP performance might occur in Linux virtual machines with LRO enabled.
-
Virtual NIC: Use the following guidelines when configuring your virtual NICs:
- Use VMXNET3 virtual NICs for your latency-sensitive or otherwise performance-critical virtual machines. For details about selecting the appropriate type of virtual NIC for your virtual machine, see Choosing a network adapter for your virtual machine.
- VMXNET3 supports adaptive interrupt coalescing that can help drive high throughput to virtual machines that have multiple vCPUs with parallelized workloads (multiple threads), while minimizing latency of virtual interrupt delivery. However, if your workload is extremely sensitive to latency, VMware recommends that you deactivate virtual interrupt coalescing for your virtual NICs. You can do this programmatically via API or by editing your virtual machine’s .vmx configuration file. Refer to your vSphere API Reference or VMware ESXi documentation for specific instructions.
VMware vSphere vMotion and DRS Cluster Usage
This topic discusses use limitations of vSphere vMotion, including its use with DRS.
When vMotion migrations occur, there is an expected temporary drop in the performance of both read-operation and write-operation workloads. These workloads resume their normal rate of operation once the vMotion migration of the servers is completed.
VMware recommends that all vMotion migration activity of VMware Tanzu GemFire members occurs over 10GbE, during periods of low activity and scheduled maintenance windows. Test vMotion migrations in your own environment to assess differences in workload, networking, and scale.
If you wish to prevent automatic VMware vSphere vMotion® operations that can affect response times, place VMware vSphere Distributed Resource Scheduler™ (DRS) in manual mode when you first commission the data management system.
Placement and Organization of Virtual Machines
This section provides guidelines on JVM instances and placement of redundant copies of cached data.
- Have one JVM instance per virtual machine.
- Increasing the heap space to service the demand for more data is better than installing a second instance of a JVM on a single virtual machine. If increasing the JVM heap size is not an option, consider placing the second JVM on a separate newly created virtual machine, thus promoting more effective horizontal scalability. As you increase the number of VMware Tanzu GemFire servers, also increase the number of virtual machines to maintain a 1:1:1 ratio among the VMware Tanzu GemFire server, the JVM, and the virtual machines.
- Size for a minimum of four vCPU virtual machines with one VMware Tanzu GemFire server running in one JVM instance. This allows ample CPU cycles for the garbage collector, and the rest for user transactions.
- Because VMware Tanzu GemFire can place redundant copies of cached data on any virtual machine, it is possible to inadvertently place two redundant data copies on the same ESX/ESXi host. This is not optimal if a host fails. To create a more robust configuration, use VM1-to-VM2 anti-affinity rules, to indicate to vSphere that VM1 and VM2 can never be placed on the same host because they hold redundant data copies.
Virtual Machine Memory Reservation
This section provides guidelines for sizing and setting memory.
- Set memory reservation at the virtual machine level so that ESXi provides and locks down the needed physical memory upon virtual machine startup. Once allocated, ESXi does not allow the memory to be taken away.
- Do not overcommit memory for Tanzu GemFire hosts.
- When sizing memory for a Tanzu GemFire server within one JVM on one virtual machine, the total reserved memory for the virtual machine should not exceed what is available within one NUMA node for optimal performance.
vSphere High Availability and VMware Tanzu GemFire
On VMware Tanzu GemFire virtual machines, deactivate vSphere High Availability (HA).
If you are using a dedicated VMware Tanzu GemFire DRS cluster, then you can deactivate HA across the cluster. However, if you are using a shared cluster, exclude Tanzu GemFire virtual machines from vSphere HA.
Additionally, to support high availability, you can also set up anti-affinity rules between the VMware Tanzu GemFire virtual machines to prevent two VMware Tanzu GemFire servers from running on the same ESXi host within the same DRS cluster.
Storage Guidelines
This section provides storage guidelines for persistence files, binaries, logs, and more.
- Use the PVSCSI driver for I/O intensive VMware Tanzu GemFire workloads.
- Align disk partitions at the VMFS and guest operating system levels.
- Provision VMDK files as eagerzeroedthick to avoid lazy zeroing for VMware Tanzu GemFire members.
- Use separate VMDKs for VMware Tanzu GemFire persistence files, binaries, and logs.
- Map a dedicated LUN to each VMDK.
- For Linux virtual machines, use NOOP scheduling as the I/O scheduler instead of Completely Fair Queuing (CFQ). Starting with the Linux kernel 2.6, CFQ is the default I/O scheduler in many Linux distributions. For more information, see Tuning options for disk I/O performance in Linux 2.6 kernel-based virtual machines.
Additional Resources
These older VMware publications provide additional resources on optimizing for vSphere.