Physical Infrastructure Design

This section focuses on the hosts or servers required to provide compute, storage, and networking. It also discusses sizing and scaling, and unique power industry considerations for hardware classification.

Physical Scale

First, the size of the edge environment must be examined. Power stations at the edge (for example, substations, switchyards) are typically represented by single line diagrams. These drawings are used to show the electrical connections of the high voltage equipment, and the supporting secondary devices accompanied by descriptions of their functionality. These drawings are utilized by the design personnel or architects of a substation solution to scope the vPAC workloads required at a given site.

A further simplified version of this diagram is often called a switching diagram, which contains only the high voltage equipment. Utility personnel who are familiar with standard company practices might have the experience necessary to derive the total required protection, automation, and control functionality based on this diagram.

Based on the native capabilities of the vPAC application to be deployed and the level of redundancy desired, you can determine how much compute is required. These decisions are summarized in the following table.

Table 1. Design Decisions on ESXi Sizing for vPAC Ready Infrastructure
Decision ID	Design Decision	Design Justification	Design Implication
VPAC-PHY-001	Number of power system elements to be protected and controlled.	The core purpose for vPAC, and each system element (bus, line, transformer, feeder) requires specific functionalities, as defined by the end user.	Certain functions (such as bus differential) consume more processing power and might dictate the required number of VM instances.
VPAC-PHY-002	Redundancy requirements for each hardware and software component.	Typically, no single point of failure is permitted for a protection function. However, overlapping zones of protection might offer functionality outside of the host environment.	Active-active hosts and virtual protection appliances for each host might be required.
VPAC-PHY-003	Workload deployment division based on departmental responsibility.	Segregation between station and process buses, and real-time against non-real-time applications might be desired.	With disaggregation, a less efficient overall design can result, but can be simpler for certain end user organizations to administer.
VPAC-PHY-004	Availability levels in VMware architecture (see Host Configurations and Resulting Infrastructure Availability Levels).	There are certain benefits in redundancy mechanisms and failure tolerance provided by certain VMware architectures.	A three-node local cluster can provide the highest flexibility and failure tolerance.

To guarantee performance for the real-time workloads running in the VMs, the underlying physical CPUs must be reserved and passed through to each VM to ensure resource availability and to eliminate congestion on the cores. Therefore, CPU reservation dictates the initial sizing of the environment. The remaining cores (physical or virtual) are available for non-real-time workloads.

An example of the sizing of a virtual environment capable of serving Substation Switching Diagram is shown in the following table.


Design Decision ID	Quantity	Bus Zones	Xfmr Zones	Witnes Location
VPAC-PHY-001 (quantity of power system elements)	34
Differential logical elements (bus or transformer) limited to four per VM, based on the vPAC appliance utilized		6	4
VPAC-PHY-002 (redundancy)	2
VPAC-PHY-003 (workload segregation)	1
VPAC-PHY-004 (failures to tolerate / onsite or offsite vSAN witness)	1			Offsite

Table 2. Example Site Compute Resource Calculations
Host Service	Quantity	Physical CPU Cores/VM	Threaded CPU Cores/VM	RAM (GB)	Storage (GB)
ESXi	1	2		8	32
ABB SSC600 SW VM	2	4		4	30
Kalkitech vPR VM	0	1		2	30

vSAN (FTT1 – RAID1 – use vSAN Node Sizer to calculation)	1		1	28	Use sizing tool to estimate
Cyber Protection VM	1		1	4	200
Intrusion Detection System VM	1		2	4	200
Patching Test Machine VM	1		2	8	200
Windows Server Gateway VM	1		2	4	200
Windows Server Update Services VM	1		1	4	200
SUB-TOTALS		10	9	68	1092
OVERHEAD (%)	30
GROWTH (%)	20
TOTALS		15	14	102	1638

Based on the values provided in the table, this site requires at least a two-node cluster with remote vSAN witness. Each node (or host) must have two protective relaying VMs, based on their logical limits, to cover the example site. With the redundancy requirement of two, each server specified must have minimum compute resources as calculated in the TOTALS row. A CPU with at least 22 cores (where greater than or equal to seven must also feature hyperthreading). Additionally, this includes percentages for overhead and growth. Overcommitment of CPU and RAM is not considered here, but might be feasible for less critical applications, along with thin provisioning of disks. With proper monitoring, these methods can allow for safe and very efficient usage of compute resources.

In the preceding example, VMware infrastructure layers (ESXi hypervisor and vSAN) are accounted for with all the production VMs, for one site and one server. VMs running in active-active configuration are installed on more than one host, while active-standby machines are evenly distributed among the hosts’ compute with reserved resources for failover mechanisms such as High Availability (HA).

Hardware Considerations

There are hardware standards for the power industry that are developed to ensure reliable operation within the harsh environment of a substation. These standards include IEC 61850-3 and IEEE 1613, which specify extreme conditions the equipment must be able to survive while maintaining consistent operation without adverse reaction. These parameters include temperature, humidity, dielectric and surge withstand, electromagnetic compatibility, electrostatic discharge protection, and anti-vibration or anti-shock.

Depending on end user organizational standards, and the ultimate installation location for the host hardware, there can be requirements prohibiting the use of fans in field devices. However, there are hardened, standard-certified servers available which feature (n-1) failure tolerance and hot-swappable fans, along with monitoring services at both the hardware and infrastructure (VMware) software level to ensure adverse conditions are alerted, allowing for mitigation before advanced failures. Passively cooled servers are less capable and therefore might require larger cluster sizing to facilitate all edge workloads.

The server hardware used to test the VVS reference architecture is noted in the Virtual Machine Configuration for Performance section, listed by manufacturer and model numbers.

Pay close attention to the mounting requirements (such as rackmount unit sizing, any rails required, weigh), power requirements (such as acceptable voltage levels, peak power draw), and the air flow design of each server type. When adding components, such as NICs, ensure they are the correct type in terms of bandwidth capabilities and form factor, and are procured with any necessary ancillary parts (compatible small form pluggable adapters, SFPs).