Utility Substation Virtual Protection, Automation, and Control (vPAC) Ready Infrastructure

The Utility Substation vPAC Ready Infrastructure validated solution guide provides design, implementation, and operational guidance for a workload domain that runs vSphere, and is configured to ensure maximum performance, when working with real-time systems in a power substation.

The technical implementation is constructed and tested by VMware and its partners to help power service providers resolve their common business use cases. VMware validated solutions are operational, performant, reliable, and secure. Each solution contains detailed design implementation, operational guidance, and interoperability.

Support Matrix

vPAC Ready Infrastructure is compatible with certain versions of the VMware products that are used for implementing the solution.

Table 1. Software Components found in vPAC Ready Infrastructure
Product Name	Product Version	Release Notes
VMware ESXi	8.0 Update 2	See VMware vSphere 8.0 Update 2 Release Notes
VMware vCenter Server	8.0 Update 2	See VMware vSphere 8.0 Update 2 Release Notes

Additional supporting products, such as VMware vSAN, VMware Tanzu, and VeloCloud SD-WAN, are discussed in this guide and its appendices, in terms of how they can be activated and integrated to enhance the base components listed in Software Components found in vPAC Ready Infrastructure.

Intended Audience

This guide is intended for both informational and operational technology systems architects and administrators who are already generally familiar with, and use, VMware software to deploy and manage software-defined substation application architecture running on virtual workloads, within a power facility or substation.

This guide provides guidance for capacity, scalability, backup and restoration, and extensibility for disaster recovery support. It is also assumed that the power system protection, automation, control, and telecommunications professionals who are involved in the implementation of vPAC Ready Infrastructure are already generally familiar with networking and IEC 61850/61869 standards.

Related to the virtualization environment, a training section is included in Appendix C: OT Personnel Training that recommends several levels of educational materials for users just beginning with the technology.

vPAC Ready Infrastructure Overview

Virtual Protection, Automation, and Control (vPAC) is architected to improve upon traditional grid devices such as microprocessor, solid state, and electromechanical appliances, which feature mainly fixed functionality. As this new technology becomes prominent, the key benefits are expected to be:

Flexibility in grid operations, to include integration of high penetrations of distributed generation.
Significant increases in data collection and analysis.
Simplified asset management.
Reduction in the quantity of devices to own and maintain.
Safer physical working environments in field locations.
Decreased labor costs in both capital and operations expenditures.
Improvements in standardization and interoperability.

There are three main components considered in the composition of this infrastructure:

Rugged, high-powered computing hardware.
Software-defined protection, automation, and control applications.
Virtualization environment with real-time capabilities.

This guide focuses primarily on virtualization environment with real-time capabilities, providing guidance to attain the highest levels of persistent performance and availability, coupled with the lowest levels of latency achievable.

Glossary of Terms

The following terminology and product names or features are used throughout this guide.

Table 2. VMware Terminology
Terminology	Definition
Aria Operations for Applications	Provides a centralized management platform for consistently operating and securing Kubernetes infrastructure and modern applications across multiple teams and clouds.
Container	A container encapsulates an application in a form that is portable and easy to deploy. Containers can run without changes on the VMware platform with VMware Tanzu Kubernetes Grid (TKG). They consume resources efficiently, enabling high density within a virtual environment. Although containers can be used with almost any application, they are frequently associated with microservices, in which multiple containers run separate application components or services. The containers that make up microservices are typically coordinated and managed using a container orchestration platform, such as Kubernetes.
Edge Compute Stack	Edge Compute Stack (ECS) is a portfolio of VMware products tailored to build, run, manage, connect, and protect edge-native applications at the near edge (larger, primary sites) and the far edge (smaller, secondary sites).
ESXi	A bare-metal hypervisor that installs directly onto a physical server. With direct access to, and control of, underlying resources, VMware ESXi effectively partitions hardware to consolidate applications and cut costs. It is the industry leader for efficient architecture, setting the standard for reliability, performance, and support.
Harbor Image Registry	Provides a centralized location to push, pull, store, and scan container images used in Kubernetes workloads. It supports storing artifacts such as Helm Charts and includes enterprise grade features such as Role-Based Access Control (RBAC), retention policies, automated garbage clean up, and docker hub proxying. A Helm Chart is a collection of files that describe a related set of Kubernetes resources. A single chart can be used to deploy something simple (for example, a mem-cached pod) or something complex (such as a full web application stack). RBAC is a method of restricting access based on roles or privileges of individuals.
Tanzu Kubernetes Grid (TKG)	Enables the creation and lifecycle management of Kubernetes clusters. TKG is a set of nodes running containerized applications.
Tanzu Mission Control (TMC)	Provides a global view of Kubernetes clusters and allows for centralized policy management across all deployed and attached clusters.
Tanzu Service Mesh	Provides consistent control and security for microservices, end users, and data, across all clusters and clouds.
Virtual Machine (VM)	A VM is a compute resource that uses software instead of a physical computer to run programs and deploy applications. One or more virtual machines can run on a physical (VMware ESXi) server or cluster of servers. Each virtual machine runs its own operating system and functions separately from other VMs, even when running on the same physical host.
vCenter	An advanced server management software that provides a centralized platform for controlling vSphere environments for visibility across hybrid clouds (across from data center to edge).
vSAN	Shared storage for VMs, works in conjunction with vSphere High Availability (HA) and Dynamic Resource Scheduler (DRS).
vSphere	VMware’s virtualization platform, aggregating compute infrastructure (CPU, storage, and networking) resources and managing within a unified operating environment. vSphere encompasses several distinct products and technologies that work together to provide a complete infrastructure for virtualization.

Table 3. Power Industry Terminology
Terminology	Definition
Generic Object Oriented Substation Event (GOOSE)	GOOSE is a controlled model mechanism in which any format of data (status, value) is grouped into a data set and transmitted within a period of four milliseconds. GOOSE is a communications protocol defined by the IEC 61850 standard, which was originally intended for LAN-restricted traffic in layer 2. A routable version of the protocol (known as R-GOOSE) is defined within IEC 61850-90-5.
High-availability Seamless Redundancy (HSR)	HSR protocol is a communications protocol that achieves 0 ms recovery time for network device failures. Each participating device is attached together in a ring topology. Devices not participating in HSR must not be connected to the same network. If the virtualized environment does not participate in the HSR protocol, it requires a special Dual Attached Node (DAN) Network Interface Card (NIC) with two ports specific to the HSR ring traffic. This functionality is commonly referred to as a RedBox or redundancy box feature. The external-facing NIC ports are connected to separate Ethernet managed switches and generate TCP/IP traffic across both NIC ports simultaneously, one in each direction on the ring. Similarly, when two HSR networks are connected, a QuadBox function is required, which is usually applied redundantly to prevent any single points of failure. HSR is defined in IEC 62439-3 Clause 5.
Intelligent Electronic Device (IED)	IED is how traditional microprocessor protection, automation, and control devices are referred to, having integrated, multi-function capabilities.
Merging Unit (MU)	MU or Process Interface Unit (PIU), is used to convert analog (typically currents and voltages) signals from the instrument transformers, merges and sends them to the protective devices in a standard-based digital output format.
Manufacturing Message Specification (MMS)	MMS is a client-server protocol used for information exchange between protection, automation, and control devices or applications and higher-level systems (for example, Supervisory Control And Data Acquisition or SCADA) over the Ethernet. The MMS protocol is mapped on TCP/IP and enables TCP/IP communications between networked devices to read or write data, read configurations, and exchange files. MMS resides on the station bus and is an ISO 9506 and IEC 61850-8-1 standard.
Protection, Automation, and Control (PAC)	PAC refers to orchestrated, intelligent, logic systems within a power grid. These systems might be made up of analog, electromechanical, solid-state, or microprocessor devices, or virtual applications. Protection typically refers to dedicated devices (often referred to as relays) or applications used to provide selective high-speed isolation of a power system fault from all sources of generation. These devices operate high voltage apparatus to segment the grid from an undesirable condition that is detected internally within its designated zone of protection. Original protection algorithm implementations are described as ANSI number functions (for example, 50, 51, and so on). Special requirements are high levels of determinism, real-time or low latency networking, high availability, and redundancy. vPR is then a software-defined version of a protection relay, provided as an application that operates as part of a virtual machine or within a container-based format. Automation typically refers to devices or applications used to automate power system functionality using a collection of components that monitors and controls high voltage apparatus. For example, Fault Location, Isolation, and Service Restoration (FLISR). Control typically refers to devices or applications used to provide local and remote operability and collect and logically provide indication and annunciation through monitoring power system assets. For example, a Remote Telemetry Unit (RTU) or a Human Machine Interface (HMI). vAC is then a software-defined version of either automation or control applications, which is intended to include any non-protection function used to operate the power grid.
Parallel Redundancy Protocol (PRP)	PRP is a communications protocol that achieves 0 ms recovery time for network device failures. Each participating device is attached to two separate parallel networks as a Dual Attached Node (DAN). Devices can be attached to either network with a single connection as a Singly Attached Node (SAN), but they do not benefit from the PRP redundancy. If the virtualized environment does not participate in the PRP protocol, it requires a special DAN NIC with two ports specific to the PRP traffic. This functionality is commonly referred to as a RedBox or redundancy box feature. The external-facing NIC ports are connected to separate Ethernet managed switches and generate TCP/IP traffic across both NIC ports simultaneously. PRP is defined in IEC 62439-3, Clause 4.
Precision Time Protocol (PTP)	PTP provides a method to precisely coordinate timestamps throughout a network. Time synchronization is achieved through packets that are transmitted and received in a session between the GNSS synchronized originating signal at the grandmaster clock and all the subsequent participating devices (ordinary, transparent, and boundary clocks). PTP networks can achieve nanosecond-level synchronization compared to Network Time Protocol (NTP) which can only achieve millisecond-level synchronization. PTP is part of the IEEE-1588 standard. The Power Profile or the IEC 61850-9-3 and IEEE C37.238 applications of the standard are typically used in the power industry, due to the hard-coded requirements used to meet the highest synchronization needs found in PAC systems. It is common to find PTP messages on the process bus, but may also be orchestrated within the station bus, or both. An Ordinary Clock (OC) is typically an end-device accepting PTP packets from the grandmaster clock or the nearest boundary clock, and reporting on network characteristics. A Transparent Clock (TC) participates only in correcting and forwarding PTP packets for delay calculations elsewhere. These devices have the lowest latency added to the PTP network. A Boundary Clock (BC) accepts a PTP packet from a grandmaster and adjusts it for network path delays before re-distributing it as a master clock signal to nearby ordinary clocks or secondary devices. A Grandmaster Clock (GMC) is a singular source of network time, derived from an originating GNSS signal. Networks can contain more than one clock capable of becoming a grandmaster, in the event of a failure. The Best Master Class Algorithm (BMCA) is a protocol feature operating exclusively within an individual PTP domain that listens for participating clocks, comparing a hierarchy of attributes, and electing the best qualified master.
Process Bus	Process bus typically refers to the digital transmission of analog measurements or binary signals over the Ethernet between the power station apparatus and low-level sensors, and the bay-level protection, automation, and control devices or applications. Process bus is often restricted to Layer 2 network protocols (SV, GOOSE, or PTP).
Sampled Values	Sampled Values (SV) or Sampled Message Values (SMV) are current and voltage signals from instrument transformers that are digitized and then communicated using an Ethernet-based Local Area Network (LAN). Sampled Values are transmitted as high-speed streams of data set samples encoded in multicast Ethernet frames. The protocol uses a publisher or subscriber model, in which a publisher transmits unacknowledged data to subscribers. SV/SMV is a layer-2 protocol and typically resides on the process bus. SV/SMV is defined in the IEC 61850-9-2 standard. Typical standards used for publishing SVs include: IEC 61850-9-2LE (Light Edition), which has a protection-class sample rate of 4,800 messages (packets) per second (US – 60 Hz @ 80 samples per electrical cycle) and 4,000 messages (packets) per second (EMEA – 50 Hz @ 80 samples per electrical cycle) or a metering-class sample rate of 14.4 kHz (US – 60 Hz) or 12.8 kHz (EMEA – 50 Hz). IEC 61869-9, backwards compatible with 61850-9-2 LE but standardizing packet messages and associated sample rates, while allowing a mixed number of currents and voltages to be contained (up to a total of 24). Examples of the 61850-9-2LE packet designations within the parameters of 61869-9 are F4800S1I4U4 or F14400S6I4U4, which indicate that a message has a frequency of 4800 Hz or 14400 Hz, a quantity of 1 or 6 samples per message, and 4 currents and 4 voltages in each. This standard provides much more flexibility and scalability in deploying analog-to-digital conversion devices. Bandwidth usage is high at approximately 5.3 Mbps for 4.8 kHz and 13.5 Mbps for 14.4 kHz sample rates. PRP, vLANs, and QoS are used to help ensure reliable transmission of the packets. Much like GOOSE, SVs are another communications protocol defined by the IEC 61850 standard, originally intended for LAN-restricted traffic in Layer 2. A routable version of the protocol (known as R-SV) has been defined within IEC 61850-90-5.
Station Bus	Station bus typically refers to the digital transmission of analog or binary data over the Ethernet between the bay-level protection, automation, and control devices or applications and the power station-level supervisory or management systems and applications. Station bus often includes up to Layer 3 network protocols (including MMS).

Acronyms and Definitions

This section lists the acronyms used frequently in this reference architecture guide.


Acronym	Definition
ECS	Edge Compute Stack
LCM	Lifecycle Management
TKG	Tanzu Kubernetes Grid
TMC	Tanzu Mission Control
SDDC	Software Defined Data Center
VVS	VMware Validated Solution
vPR	Virtual Protection and Relay
vAC	Virtual Automation Control
GOOSE	Generic Object Oriented Substation Event
HSR	High-availability Seamless Redundancy
HMI	Human Machine Interface
MU	Merging Unit
PIU	Process Interface Unit
MMS	Manufacturing Message Specification
PAC	Protection, Automation, and Control
PRP	Parallel Redundancy Protocol
PTP	Precision Time Protocol
CSP	Common Substation Platform
PCR	Platform Configuration Registers
AK	Attestation Key
TPM	Trusted Platform Module
CRB	Command Response Buffer
BES	Bulk Electric System
NIST	National Institute of Standards and Technology
KEK	Key Encryption Key
DEK	Data Encryption Key
BES	Bulk Electric System
EACMS	Electronic Access Controller Monitoring Systems
PACS	Physical Access Control System
SCI	Shared Cyber Infrastructure
VCA	Virtual Cyber Asset
SIEM	Security Information and Event Management
ATP	Advanced Threat Prevention