This section discusses the storage design for Private AI Ready Infrastructure for VMware Cloud Foundation.

Storage design for Private AI Ready Infrastructure for VMware Cloud Foundation includes the vSAN design for principal and supplemental storage.

For design information on supported storage types in VMware Cloud Foundation, see VMware Cloud Foundation Design Guide.

vSAN ESA for AI Workloads

vSAN ESA (Express Storage Architecture) is optimized for performance through several mechanisms and architectural enhancements, making it suitable for high-performance workloads, such as AI, when deployed with 100 Gbps networking and faster. vSAN ESA also supports RDMA over Converged Ethernet (RoCE v2). When configured on compatible switches and NICs, RoCE v2 can significantly reduce host CPU utilization and enhance performance. vSAN over RoCE can efficiently handle high-throughput and low-latency data transfers, making it well-suited for demanding AI applications.

Storage platforms must provide high capacity, high performance, substantial bandwidth, and minimal latency to efficiently support the stages of generative AI workflows, including data ingestion, preparation, fine-tuning, and inference. Fine-tuning generative AI models, particularly LLMs with billions of parameters and numerous intermediate outputs, requires considerable performance and storage capacity. Additionally, the storage requirements during AI model inference fluctuate according to the specific needs of the AI model and the deployment environment's characteristics. vSAN ESA provides the essential operating system and container storage capabilities to manage both GPU-resident models and data stored outside GPU memory with the required performance.

As one of the popular use case for inference, vector databases are an important component of RAG systems. As RAG systems accumulate more data over time, the size of the vector database grows. This growth is influenced by factors such as the frequency of user queries, the diversity of content being indexed, and the rate of new data ingestion. You can store such vector databases on vSAN clusters in a VI workload domain, mitigating noisy neighbor events and resource contention with productive inference workloads.

Storing vector databases on vSAN ESA clusters also benefits from the advanced compression available with vSAN ESA, enabled by default. By compressing data at the top of the storage stack, all data written to other hosts in the cluster is transmitted across the network in a compressed state, optimizing storage efficiency and network bandwidth utilization.

Providing Read/Write Shared Persistent Volumes for Containarized Workloads with vSAN ESA

Tanzu Kubernetes Grid clusters can use ReadWriteMany persistent volumes backed by vSAN File Service for model repositories, model versioning and management, model ensembles, and the storage and archiving of inference data. vSphere with Tanzu uses cloud native storage (CNS) file volumes backed by vSAN file shares for ReadWriteMany persistent volumes.

To use vSAN shares, you set up vSAN File Service on the vSAN datastore and activate file volume support on the Supervisor.

When using vSAN File Service for ReadWriteMany volumes as shared data repositories, consider the following limits:

  • A maximum of 100 file shares per vSAN cluster.

  • The maximum size of a file share is equal to the maximum available capacity of the vSAN cluster.

For more considerations regarding vSAN File Services, see Limitations and Considerations of vSAN File Service in the Administering VMware vSAN documentation.

Note:

External storage solutions can be leveraged as supplemental storage for the VI workload domain. You can use S3-compatible object storage and NFS exports, provided directly to containers and virtual machines together with vSAN for shared datasets, model repositories, and archival purposes. This approach is particularly beneficial for highly scalable architectures. For more detailed guidance and best practices, see the documentation from your storage vendor.

Design Decisions on Storage Design for Private AI Infrastructure

Table 1. Design Decisions on Storage for Private AI Ready Infrastructure for VMware Cloud Foundation

Decision ID

Design Decision

Design Justification

Design Implication

AIR-STORAGE-001

Use vSAN ESA with 100 Gbps networking and,if possible, RDMA.

Provides high performance and efficiency. Although the minimum bandwidth for vSAN ESA is 25 Gbps, 100 Gbps and faster provides the best performance in terms of bandwidth and latency for all AI use cases.

  • The cost of the solution is increased.
  • RDMA increases the design complexity.

  • The choice of vSAN ReadyNodes is limited to nodes that are approved for use with vSAN ESA.

AIR-STORAGE-002

Use vSAN ESA RAID 5 or RAID 6 erasure coding.

Provides performance equal to RAID 1 mirroring.

None.

AIR-STORAGE-003

Leave data compression enabled for vSAN ESA.

Enables transmitting data in compressed state across hosts in the cluster. Data compression in vSAN ESA is controllable using storage policies.

None.