vCenter Server supports a set of customization options, including monitoring, virtual machine fault tolerance, and so on.

VM and Application Monitoring Service

When enabled, the Virtual Machine and Application Monitoring service, which uses VMware Tools, evaluates whether each virtual machine in the cluster is running. The service checks for regular heartbeats and I/O activity from the VMware Tools process that is running on the guest OS. If the service receives no heartbeats or determines I/O activity, the guest operating system has likely failed or VMware Tools is not being allocated time for heartbeats or I/O activity. In this case, the service determines that the virtual machine has failed and reboots the virtual machine.

Enable VM Monitoring for automatic restart of a failed virtual machine. The application or service running on the virtual machine must be capable of restarting successfully after a reboot or the virtual machine restart is not sufficient.

Table 1. Design Decisions on Monitoring and Startup Order Configuration for Virtual Machines

Decision ID

Design Decision

Design Justification

Design Implication

SDDC-VI-VC-037

Enable VM Monitoring for each cluster.

VM Monitoring provides in-guest protection for most VM workloads.

None.

SDDC-VI-VC-038

Create virtual machine groups for use in startup rules in the management and shared edge and compute clusters.

By creating virtual machine groups, you can use rules to configure the startup order of the SDDC management components.

Creating the groups is a manual task and adds administrative overhead.

SDDC-VI-VC-039

Create virtual machine rules to specify the startup order of the SDDC management components.

Rules enforce the startup order of virtual machine groups, hence, the startup order of the SDDC management components.

Creating the rules is a manual task and adds administrative overhead.

VMware vSphere Distributed Resource Scheduling (DRS)

vSphere Distributed Resource Scheduling provides load balancing of a cluster by migrating workloads from heavily loaded ESXi hosts to less utilized ESXi hosts in the cluster. vSphere DRS supports manual and automatic modes.

Manual

Recommendations are made but an administrator needs to confirm the changes.

Automatic

Automatic management can be set to five different levels. At the lowest setting, workloads are placed automatically at power-on and only migrated to fulfill certain criteria, such as entering maintenance mode. At the highest level, any migration that would provide a slight improvement in balancing is performed.

When using two availability zones, enable vSphere DRS to create VM/Host group affinity rules for initial placement of VMs and impacting read locality. In this way, you avoid unnecessary vSphere vMotion migration of VMs between availability zones. Because the stretched cluster is still a single cluster, vSphere DRS is unaware that it stretches across different physical locations. As result, it might decide to move virtual machines between them. By using VM/Host group affinity rules, you can pin virtual machines to an availability zone. If a virtual machine, VM1, that resides in availability zone 1 moves freely across availability zones, VM1 could end up on availability zone 2. Because vSAN stretched clusters implement read locality, the cache on availability zone 1 is warm whereas the cache in availability zone 2 is cold. This may impact the performance of VM1 until the cache in availability zone 2 has been warmed.

Table 2. Design Decisions on vSphere DRS

Decision ID

Design Decision

Design Justification

Design Implication

SDDC-VI-VC-040

Enable vSphere DRS on all clusters and set it to Fully Automated, with the default setting (medium).

Provides the best trade-off between load balancing and excessive migration with vSphere vMotion events.

If a vCenter Server outage occurs, mapping from virtual machines to ESXi hosts might be more difficult to determine.

SDDC-VI-VC-041

When using two availability zones, create a host group and add the ESXi hosts in Region A - Availability Zone 1 to it.

Makes it easier to manage which virtual machines should run in which availability zone.

You must create and maintain VM/Host DRS group rules.

SDDC-VI-VC-042

When using two availability zones, create a host group and add the ESXi hosts in Region A - Availability Zone 2 to it.

Makes it easier to manage which virtual machines should run in which availability zone.

You must create and maintain VM/Host DRS group rules.

SDDC-VI-VC-043

When using two availability zones, create a virtual machine group and add the Virtual Machines in Region A - Availability Zone 1 to it.

Ensures that virtual machines are located only in the assigned availability zone.

You must add VMs to the correct group manually to ensure they are not powered-on in or migrated to the wrong availability zone.

SDDC-VI-VC-044

When using two availability zones, create a virtual machine group and add the Virtual Machines in Region A - Availability Zone 2 to it.

Ensures that virtual machines are located only in the assigned availability zone.

You must add VMs to the correct group manually to ensure they are not powered-on in or migrated to the wrong availability zone.

Enhanced vMotion Compatibility (EVC)

EVC works by masking certain features of newer CPUs to allow migration between ESXi hosts containing older CPUs. EVC works only with CPUs from the same manufacturer and there are limits to the version difference gaps between the CPU families.

If you set EVC during cluster creation, you can add ESXi hosts with newer CPUs at a later date without disruption. You can use EVC for a rolling upgrade of all hardware with zero downtime.

Set the cluster EVC mode to the highest available baseline that is supported for the lowest CPU architecture on the hosts in the cluster.

Table 3. Design Decisions on VMware Enhanced vMotion Compatibility

Decision ID

Design Decision

Design Justification

Design Implication

SDDC-VI-VC-045

Enable Enhanced vMotion Compatibility (EVC) on all clusters. Set the cluster EVC mode to the highest available baseline that is supported for the lowest CPU architecture on the hosts in the cluster.

Supports cluster upgrades without virtual machine downtime.

You can enable EVC only if clusters contain hosts with CPUs from the same vendor.