check-circle-line exclamation-circle-line close-line

I/O Performance with FIO

FIO is an I/O microbenchmark used to measure I/O performance. We used it to quantify the virtualization overhead and measure raw bandwidth and latency.

Highlights

  • The virtualization overhead of PMEM is less than 3%.
  • The vPMEM-aware configuration can give up to 8x more bandwidth compared to that of an NVMe SSD.
  • The latency with vPMEM configurations is less than 1 microsecond.
  • The vPMEM-aware configuration can achieve bandwidth close to the device (memory) bandwidth.

Configuration

Table 4 gives the details of the FIO.

OS

CentOS 7.4

CPU

4 vCPU

vRAM

16 GB

NVMe SSD

21 GB

vPMEMdisk

21 GB

vPMEM

21 GB

Table 4: FIO VM configuration

Table 5 gives the FIO parameters used.

Ioengines

libaio (default) and libpmem (in vPMEM-aware)

Test cases

random read, random read-write (50-50), random write

Threads

4 for throughput runs; 1 for latency run

File size

5 GB per thread

OIOs

16 for NVMe SSD; 4 for vPMEMDisk; 1 for vPMEM

Table 5: FIO workload configuration

Virtualization Overhead

To quantify the virtualization overhead of PMEM, we compared FIO throughput on a bare-metal installation of CentOS and a CentOS VM running on ESXi. Figure 6 shows the virtual to native ratio.

In all the scenarios, we measured less than 3% overhead. We selected FIO to show the virtualization overhead because microbenchmarks typically stress the system the most and are expected to show the maximum overhead when virtualized.

Figure 6: Virtual to native ratio (FIO 4 KB)

Bandwidth and Latency

Figure 7 shows the bandwidth measured in megabytes per second (MBPS) for different configurations with a 4 KB I/O size. Note that the vPMEM cases are run with 1 thread to have a fair comparison with vNVMe-based configurations in which the I/O is performed by one vNVMe world/thread. In the random read case, the vPMEM-aware configuration yields approximately 5x the bandwidth compared to that of an NVMe SSD.

In the randwrite test, vPMEMDisk throughput is slightly lower than NVMe SSD. This is caused by the inefficient implementation of cache flush instructions in the current processors. We expect this to become better in next generation processors.

Figure 7: FIO 4KB throughput

Figure 8 shows the bandwidth measured with 512 KB I/O. In the random read test, the vPMEM-aware case achieved more than 11 gigabytes per second of bandwidth using one thread, which is around 8x compared to NVMe SSD.

Figure 8: FIO 512KB throughput

Figure 9 shows the raw latency in microseconds with different configurations. Both vPMEM configurations yielded sub-microsecond latencies.

Figure 9: FIO 4KB latency

For scenarios that demand more I/O bandwidth, we used more FIO I/O threads and observed that the bandwidth scales linearly. To measure the peak memory bandwidth on this system, we used STREAM, which reported 67 gigabytes per second using 8 threads. With FIO using 8 threads, we measured 66 gigabytes per second using the vPMEM-aware configuration.