Intel provides an Intel Optane Persistent Memory Mode (PMem), in which the hardware hides the DRAM as cache and exposes PMem as the system memory. Although PMem is cheaper than DRAM, it has higher access latency which could lead to performance degradation issues.
Problem
- If the active memory is higher than some percentage of the available DRAM memory, the VM performance could degrade as memory accesses may have to go to PMem.
- Any two random VMs can have higher level of page collision due to hardware implementation resulting in degraded VM performance even if the available DRAM memory is fully utilized.
Solution
vSphere performs real-time monitoring using vSphere Memory Monitoring and Remediation (vMMR). vMMR collects both Host level and VM level memory statistics like DRAM/PMem bandwidth, latency, miss rate which provides additional insights. This is helpful in analyzing if the host is experiencing issues due to running in Memory Mode and if there is a need to re-distribute workload. If the analysis indicates that some workloads are having performance degradation due to running on systems configured in Memory mode, the VMs can be migrated from the current host to other hosts to balance the load.
- Two preconfigured default alarms have been added based on the newly collected statistics. One at the host level (Host Memory Mode High Active DRAM Usage) and another at the VM level (Virtual Machine High PMem Bandwidth Usage). If the alarm condition is met, an event will be published to trigger the corresponding alarm. If alarm is triggered, then it is indicative that there may be some issue with Memory Mode on this system. You can further analyze, if it is a real problem by using Performance charts.
- You can also create custom alarms based on new performance metrics at cluster/host or VM level. For example, you can create an alarm if the PMem bandwidth is observed to be higher than some value. vMMR alarms only work on systems with Memory Mode. For more information on how to create a custom alarm, see Create or Edit Alarms section.
- If the host is experiencing a performance issue, then it can be narrowed down to CPU, Memory, Disk, or Network issue by looking at the existing performance charts.
In vSphere Client, a new Memory pane is added under Performance tab for both Host and VM. The host level performance chart displays read/write bandwidth, missrate for the different memory types (DRAM, PMem). The VM Level performance chart displays DRAM and PMem read bandwidth of the VM. These performance charts help customers to analyze the statistics and determine if their application workload is regressed due to Memory Mode. For example, if significant higher PMem bandwidth is observed, it is indicative of issues originating due to Memory Mode and can be investigated further.
- You can also plot custom based performance charts at host and VM level by using Advanced option and plotting some of Memory Mode related metrices.
- From the VMs tab of an ESXi host, you can view a list containing performance information about all virtual machines that reside on the host. To display information about the Memory Mode impact on a virtual machine, click the view columns () icon and select the newly added Active Memory, DRAM Read Bandwidth, and PMem Read Bandwidth metrices. This is helpful in identifying the most impacted VMs.
For more information on vMMR, see vSphere Memory Monitoring and Remediation document.