These security controls provide a baseline set of vSphere system design best practices.
Eliminate vCenter Server Third-Party Plug-ins
Reduce or eliminate third-party vCenter Server plug-ins.
Installation of plug-ins and other third-party cross-connections between systems can erode boundaries between different infrastructure systems, offering opportunities for attackers who have compromised one system to move laterally to another. Tight coupling of other systems to vSphere also often creates impediments to timely patching and upgrades. Ensure that any third-party plug-ins or add-ons to vSphere components create value. If you choose to use plug-ins rather than individual management consoles, be sure that their use offsets the risks that they create.
Use Caution with Infrastructure Management Interfaces
Use caution when connecting infrastructure management interfaces to general-purpose authentication and authorization sources.
Centralized enterprise directories are targets for attackers because of their role in authorization across an enterprise. An attacker can move freely inside an organization once that directory is compromised. Connecting IT infrastructure to centralized directories has proven to be a considerable risk for ransomware and other attacks. Isolate the authentication and authorization of all infrastructure systems.
For ESXi:
- Conduct all host management through vCenter Server
- Deactivate the ESXi Shell
- Place ESXi in normal lockdown mode
- Set the ESXi root password to a complex password
Activate vSphere Distributed Resource Scheduler
Activate vSphere Distributed Resource Scheduler (DRS) in Fully automated mode.
vSphere DRS uses vMotion to move workloads between physical hosts to ensure performance and availability. Fully automated mode ensures that the vSphere Lifecycle Manager can work with DRS to activate patching and update operations.
If specific VM-to-host mappings are needed, use DRS rules. Where possible, use "should" rules instead of "must" so that you can suspend the rule temporarily during patching and high availability recovery.
Activate vSphere High Availability
vSphere High Availability (HA) restarts workloads on other ESXi hosts in a cluster if an ESXi host fails suddenly. Ensure that the settings for HA are configured correctly for your environment.
Activate Enhanced vMotion Compatibility
vSphere Enhanced vMotion Compatibility (EVC) ensures that workloads can be live-migrated by using vMotion between ESXi hosts in a cluster that are running different CPU generations. EVC also assists in situations with CPU vulnerabilities, where new microcode instructions might be introduced to CPUs, which makes them temporarily incompatible with one another.
Protect Systems from Tampering
Ensure that ESXi hosts and related storage and networking components are protected from tampering, unauthorized access, and unauthorized removal. Also, protect hosts from damage from environmental factors such as flooding, extreme temperatures (low or high), and dust and debris.
Use of security features, such as vSphere Native Key Provider and ESXi Key Persistence, might cause secure material to be stored locally on ESXi hosts, enabling attackers to boot and unlock otherwise protected clusters. Consideration of physical security and appropriate threats, like theft, is important.
Beyond theft, being security-minded also means asking yourself and your organization questions such as the following:
- What could go wrong?
- How would I know if something went wrong?
These questions take on added importance when dealing with unstaffed data center locations and collocation facilities. With respect to data centers and rack configurations, ask the following questions:
- Do the doors to the data center automatically close and lock properly on their own?
- If the doors were left ajar, would there be a proactive alert?
- If your rack doors are locked, is it still possible to reach into the rack from the side or top and disconnect a cable? Can an unauthorized person connect a cable to a network switch?
- Is it possible to remove a device, like a storage device or even an entire server? What would happen in such a scenario?
Other questions to ask include:
- Could someone glean information about your environment or your business from information displays on the servers, such as LCD panels or consoles?
- If those information displays are inactive, could they be triggered from outside the rack, for example, with the use of a stiff metal wire?
- Are there other buttons, such as the power button, that could be pushed to create a service disruption to your company?
Finally, ask yourself, are there other physical threats, such as the possibility of flooding, freezing or high heat, or dust and debris from the environment, that would impact availability?
Name vSphere Objects Descriptively
Ensure that you name vSphere objects descriptively, changing the default names of objects to ensure accuracy and reduce confusion.
Use good naming practices for vSphere objects, changing default names such as "Datacenter," "vSAN Datastore," "DSwitch," "VM Network," and so on, to include additional information. This helps improve accuracy and reduce errors when developing, implementing, and auditing security policies and operational processes.
Port groups using 802.1Q VLAN tagging could include the VLAN number. Data centers and cluster names could reflect locations and purposes. Datastore and virtual distributed switch names could reflect the data center and cluster names to which they are attached. Key provider names are particularly important, especially when protecting encrypted virtual machines with replication to alternate sites. Work to avoid potential "name collisions" with objects present in other data centers and clusters.
Some organizations do not name systems with physical location identifiers such as street addresses, preferring to obscure the physical location of data centers through the use of terms like "Site A," "Site B," and so on. This also helps if sites are relocated, preventing the need to rename everything or endure inaccurate information.
When deciding on a naming scheme, keep in mind that many objects can have similar properties. For example, two port groups could both have the same VLAN assigned, but have different traffic filtering and marking rules. Incorporating a project name or short description in the name might be helpful for disambiguating objects of this type.
Lastly, consider automation when developing a naming scheme. Names that can be derived programmatically are often helpful when scripting and automating tasks.
Isolate Infrastructure Management Interfaces
Ensure that IT infrastructure management interfaces are isolated on their own network segment or as part of an isolated management network.
Ensure that all management interfaces configured for virtualization components are on a network segment (VLAN, and so on) that is dedicated only to virtualization management, free of workloads and unrelated systems. Ensure that management interfaces are controlled with perimeter security controls such that only authorized vSphere administrators can access those interfaces from authorized workstations.
Some system designs put vCenter Server and other management tools on their own network segments, isolated from ESXi, because it offers better monitoring of those systems. Other designs put vCenter Server in with ESXi management because of the relationship between the two products, and the possibility of firewall configuration errors or outages disrupting service. Whichever design you choose, do so thoughtfully.
Use vMotion Properly
Ensure that vMotion uses data-in-transit encryption (set to "Required" for virtual machines), or that VMkernel network interfaces used for vMotion are isolated on their own network segments that have perimeter controls.
vMotion and Storage vMotion copy virtual machine memory and storage data, respectively, across the network. Ensuring that the data is encrypted in transit ensures confidentiality. Isolation to a dedicated network segment with appropriate perimeter controls can add defense-in-depth and also allow for network traffic management.
Like all forms of encryption, vMotion encryption does introduce performance loss, but that performance change occurs on the background vMotion process and does not impact virtual machine operation.
Use vSAN Properly
Ensure that vSAN uses data-in-transit encryption or that VMkernel network interfaces used for vSAN are isolated on their own network segments that have perimeter controls.
vSAN features data-in-transit encryption that can help maintain confidentiality as vSAN nodes communicate. As with many security controls, there is a tradeoff with performance. Monitor storage latency and performance as data-in-transit encryption is activated. Organizations that do not or cannot activate vSAN data-in-transit encryption should isolate the network traffic to a dedicated network segment with appropriate perimeter controls.
Activate Network I/O Control
Ensure that you have resilience to network denial-of-service by activating Network I/O Control (NIOC).
vSphere Network I/O Control (NIOC) is a traffic management technology that offers quality of service at the hypervisor level, enhancing network performance by prioritizing resources in multi-tenant cloud and shared workload environments. Incorporated into the vSphere Distributed Switch (vDS), NIOC partitions network adapter bandwidth into "network resource pools" that correspond to different traffic types, such as vMotion and management traffic. Use of NIOC allows users to allocate shares, limits, and reservations to these pools.
NIOC preserves network availability for essential services and prevents congestion by limiting less critical traffic. This is achieved by enabling the creation of network control policies per business requirements, ensuring traffic type isolation, and allowing dynamic resource reallocation based on priority and usage.
Do Not Configure Vendor-Reserved VLANs
Ensure that the physical switch uplinks from ESXi hosts are not configured with vendor-reserved VLANs.
Some network vendors reserve particular VLAN IDs for internal or specific use. Ensure that your vSphere network configurations do not include these values.
Configure ESXi Uplinks as Access Ports
Ensure that the physical switch uplinks from ESXi hosts are configured as "access ports" assigned to a single VLAN, or as tagged 802.1Q VLAN trunks with no native VLAN. Ensure that vSphere port groups do not allow access to VLAN 1 or untagged native VLANs.
Network connections that have a "native" VLAN configured to accept untagged traffic, or that have access to VLAN 1, might offer opportunities for attackers to craft specialized packets that defeat network security controls. VLAN 1 is the default often used for network management and communications and should be isolated from workloads. Ensure that port groups are not configured for access to native VLANs. Ensure that VLAN trunk ports are configured with specific definitions of VLANs (not "all"). Finally, ensure that port groups are configured appropriately so that attackers cannot use a virtualized environment to circumvent network security controls.
Configure Storage Fabric Connections Properly
Ensure that the storage fabric connections use data-in-transit encryption or are isolated on their own network segments or SANs that have perimeter controls.
Protecting storage data while in transit helps ensure the confidentiality of the data. Encryption is not an option for many storage technologies, often because of availability or performance concerns. In those cases, isolation to a dedicated network segment with the appropriate perimeter controls can be an effective compensating control and can add defense-in-depth.
Use LUN Masking on Storage Systems
Ensure that the storage systems employ LUN masking, zoning, and other storage-side security techniques to ensure that storage allocations are only visible to the vSphere cluster in which it is to be used.
LUN masking on the storage controller and SAN zoning help to ensure that storage traffic is not visible to unauthorized hosts and that unauthorized hosts cannot mount the datastores, bypassing other security controls.
Limit Connections to Authorized Systems
Consider the use of the vCenter Server Appliance firewall to limit connections to authorized systems and administrators.
The vCenter Server Appliance contains a basic firewall that you can use to limit the incoming connections to vCenter Server. This can be an effective layer of defense-in-depth in conjunction with perimeter security controls.
As always, before adding rules to block connections, ensure that rules are in place to allow access from administrative workstations.
Do Not Store Encryption Keys on ESXi Hosts Without Securing Physical Access
The environment must not store encryption keys on ESXi hosts without also securing physical access to the hosts.
To prevent dependency loops, the vSphere Native Key Provider stores decryption keys directly on the ESXi hosts, either in a Trusted Platform Module (TPM) or as part of the encrypted ESXi configuration. However, if you do not physically secure a host, and an attacker steals the host, the attacker possesses the means to unlock and execute encrypted workloads. Therefore, it is crucial to ensure physical security (see Protect Systems from Tampering), or to opt for using a Standard Key Provider (see What Is a Standard Key Provider) that includes additional network security controls.
Use Adequately Sized Persistent, Non-SD, Non-USB Devices for ESXi Boot Volumes
The environment must use adequately sized persistent, non-SD, non-USB devices for ESXi boot volumes.
Flash memory is a component that wears out over time, with each data write shortening its lifespan. SSDs and NVMe devices have built-in features to reduce this wear, making them more reliable. However, SD cards and most USB flash drives do not have these features and can develop reliability issues, such as bad sectors, often without any obvious signs.
To lessen wear and make SD and USB devices last longer, when you install ESXi on these devices, you can save audit and system logs to a RAM disk instead of constantly writing to the device. This means that you must set up new, long-term storage locations for these logs and change the log output to go to these new locations.
Choosing a reliable boot device removes these extra steps and helps ESXi automatically pass security audits.
Properly Configure the vSAN iSCSI Target
Ensure that the vSAN iSCSI Target uses its own VMkernel network interfaces, isolated on its own network segment and employing separate perimeter controls using Distributed Port Group Traffic Filtering and Marking, NSX, or external network security controls.
Because the iSCSI Target clients are external to the cluster, isolate them on their own network interfaces. In this way, you can separately restrict other, internal-only network communications. Isolation of this type also helps diagnose and manage performance.