Do all platform VMs have to be on the same L2/L3 segment?

No. However, it is best to keep all platform nodes on a common network with low latencies between nodes. This is because many of the distributed components replicate data among the nodes and high latencies can cause system performance and stability issues.

Can a cluster be upgraded using in-product upgrade feature?

Online upgrades are not supported for cluster till 3.7. From 3.8 and the succeeding releases, a cluster can be upgraded using the online upgrade method.

What happens if there is a failure during the cluster creation process?

It is a best practice to take a backup of the primary platform and proxies before starting the cluster creation process. If there is a failure, delete the secondary platform nodes and recover primary platform and collector VMs from the backup.
Note:
  • If you are unable take a backup using EMC Avamar or VMware VDP, take the snapshots of the platform and the collector VMs when the VMs are switched off.
  • Snapshots are not recommended in production environments and VMware does not recommend you to run VMs with snapshots for more than 3 days.
  • You cannot consider snapshots as a backup.

What happens to the existing data and configuration when I expand the single node deployment to a cluster?

All data and configuration is maintained without any change. The data will be accessible after cluster creation.

Can you have platform VM in different regions?

No, we require the Platform nodes to co-located be in the same site. The collector servers can be geo-distributed.

Can platform hosted on vSAN Stretch clusters (2 Data centers …)?

Yes, vSAN clusters within same or across data centers would still ensure certain IO performance like local storage.

Can we host cluster nodes on different vSAN Clusters?

Yes, Different nodes of a Platform cluster could be hosted on different underlying datastores.

Do you need to backup platform nodes?

Yes, backups must be taken using VMware recommended snapshot/backup technologies.

How to estimate the bandwidth between the cluster collector VM on a region and the platform VM cluster on another region?

In some large deployments, we have seen this number ranging from 1 mbps to 20 mbps. There is much of deduplication or compression that happens in collector VM before data is sent to platform VM.

How much network traffic will be between cluster node?

Traffic usually depends on size of cluster & type of data center environment.

For installations with 30-50k VMs:
  • Between clusters: 50-400Mbps approx.
  • Between collector & platform: 100Kbps-15Mbps approx.

What is the maximum admissible latency between nodes in a cluster?

The platform nodes have to be co-located in the same site. In such cases, the latency is minimal. If the platform nodes are hosted on vSAN stretch clusters (two data centers), the vSAN clusters within or across the clusters ensure certain IO performance like local storage. The applications running on data centers such as VMware Aria Operations for Networks work fine. You can host different nodes of a platform cluster on different underlying data stores. But you need to ensure that all the platform VMs in a cluster are co-located within the same site.

What is the maximum admissible latency between the collector VMs on a region and the platform VM cluster on another region?

You can have geo-distributed proxies in your setup. There is an HTTPS connection from collector VM to platform VM so it can tolerate high latencies, to order of few seconds. VMware Aria Operations for Networks supports maximum of 10 nodes in a cluster (30,000 VMs w/ flows Or 50,000 VMs without flows).

What should be size of collector/platform VM?

Use large brick configuration: refer installation guide.