The Utility Substation vPAC Ready Infrastructure validated solution guide provides design, implementation, and operational guidance for a workload domain that runs vSphere, and is configured to ensure maximum performance, when working with real-time systems in a power substation.

The technical implementation is constructed and tested by VMware and its partners to help power service providers resolve their common business use cases. VMware validated solutions are operational, performant, reliable, and secure. Each solution contains detailed design implementation, operational guidance, and interoperability.

Support Matrix

vPAC Ready Infrastructure is compatible with certain versions of the VMware products that are used for implementing the solution.

Table 1. Software Components found in vPAC Ready Infrastructure

Product Name

Product Version

Release Notes

VMware ESXi

8.0 Update 2

See VMware vSphere 8.0 Update 2 Release Notes

VMware vCenter Server

8.0 Update 2

See VMware vSphere 8.0 Update 2 Release Notes

Additional supporting products, such as VMware vSAN, VMware Tanzu, and VeloCloud SD-WAN, are discussed in this guide and its appendices, in terms of how they can be activated and integrated to enhance the base components listed in Software Components found in vPAC Ready Infrastructure.

Intended Audience

This guide is intended for both informational and operational technology systems architects and administrators who are already generally familiar with, and use, VMware software to deploy and manage software-defined substation application architecture running on virtual workloads, within a power facility or substation.

This guide provides guidance for capacity, scalability, backup and restoration, and extensibility for disaster recovery support. It is also assumed that the power system protection, automation, control, and telecommunications professionals who are involved in the implementation of vPAC Ready Infrastructure are already generally familiar with networking and IEC 61850/61869 standards.

Related to the virtualization environment, a training section is included in Appendix C: OT Personnel Training that recommends several levels of educational materials for users just beginning with the technology.

vPAC Ready Infrastructure Overview

Virtual Protection, Automation, and Control (vPAC) is architected to improve upon traditional grid devices such as microprocessor, solid state, and electromechanical appliances, which feature mainly fixed functionality. As this new technology becomes prominent, the key benefits are expected to be:

  • Flexibility in grid operations, to include integration of high penetrations of distributed generation.

  • Significant increases in data collection and analysis.

  • Simplified asset management.

  • Reduction in the quantity of devices to own and maintain.

  • Safer physical working environments in field locations.

  • Decreased labor costs in both capital and operations expenditures.

  • Improvements in standardization and interoperability.

There are three main components considered in the composition of this infrastructure:

  • Rugged, high-powered computing hardware.

  • Software-defined protection, automation, and control applications.

  • Virtualization environment with real-time capabilities.

This guide focuses primarily on virtualization environment with real-time capabilities, providing guidance to attain the highest levels of persistent performance and availability, coupled with the lowest levels of latency achievable.

Glossary of Terms

The following terminology and product names or features are used throughout this guide.

Table 2. VMware Terminology

Terminology

Definition

Aria Operations for Applications

Provides a centralized management platform for consistently operating and securing Kubernetes infrastructure and modern applications across multiple teams and clouds.

Container

A container encapsulates an application in a form that is portable and easy to deploy. Containers can run without changes on the VMware platform with VMware Tanzu Kubernetes Grid (TKG). They consume resources efficiently, enabling high density within a virtual environment. Although containers can be used with almost any application, they are frequently associated with microservices, in which multiple containers run separate application components or services. The containers that make up microservices are typically coordinated and managed using a container orchestration platform, such as Kubernetes.

Edge Compute Stack

Edge Compute Stack (ECS) is a portfolio of VMware products tailored to build, run, manage, connect, and protect edge-native applications at the near edge (larger, primary sites) and the far edge (smaller, secondary sites).

ESXi

A bare-metal hypervisor that installs directly onto a physical server. With direct access to, and control of, underlying resources, VMware ESXi effectively partitions hardware to consolidate applications and cut costs. It is the industry leader for efficient architecture, setting the standard for reliability, performance, and support.

Harbor Image Registry

Provides a centralized location to push, pull, store, and scan container images used in Kubernetes workloads. It supports storing artifacts such as Helm Charts and includes enterprise grade features such as Role-Based Access Control (RBAC), retention policies, automated garbage clean up, and docker hub proxying.

  • A Helm Chart is a collection of files that describe a related set of Kubernetes resources. A single chart can be used to deploy something simple (for example, a mem-cached pod) or something complex (such as a full web application stack).

  • RBAC is a method of restricting access based on roles or privileges of individuals.

Tanzu Kubernetes Grid (TKG)

Enables the creation and lifecycle management of Kubernetes clusters. TKG is a set of nodes running containerized applications.

Tanzu Mission Control (TMC)

Provides a global view of Kubernetes clusters and allows for centralized policy management across all deployed and attached clusters.

Tanzu Service Mesh

Provides consistent control and security for microservices, end users, and data, across all clusters and clouds.

Virtual Machine (VM)

A VM is a compute resource that uses software instead of a physical computer to run programs and deploy applications. One or more virtual machines can run on a physical (VMware ESXi) server or cluster of servers. Each virtual machine runs its own operating system and functions separately from other VMs, even when running on the same physical host.

vCenter

An advanced server management software that provides a centralized platform for controlling vSphere environments for visibility across hybrid clouds (across from data center to edge).

vSAN

Shared storage for VMs, works in conjunction with vSphere High Availability (HA) and Dynamic Resource Scheduler (DRS).

vSphere

VMware’s virtualization platform, aggregating compute infrastructure (CPU, storage, and networking) resources and managing within a unified operating environment. vSphere encompasses several distinct products and technologies that work together to provide a complete infrastructure for virtualization.

Table 3. Power Industry Terminology

Terminology

Definition

Generic Object Oriented Substation Event (GOOSE)

GOOSE is a controlled model mechanism in which any format of data (status, value) is grouped into a data set and transmitted within a period of four milliseconds.

GOOSE is a communications protocol defined by the IEC 61850 standard, which was originally intended for LAN-restricted traffic in layer 2. A routable version of the protocol (known as R-GOOSE) is defined within IEC 61850-90-5.

High-availability Seamless Redundancy (HSR)

HSR protocol is a communications protocol that achieves 0 ms recovery time for network device failures. Each participating device is attached together in a ring topology. Devices not participating in HSR must not be connected to the same network.

If the virtualized environment does not participate in the HSR protocol, it requires a special Dual Attached Node (DAN) Network Interface Card (NIC) with two ports specific to the HSR ring traffic. This functionality is commonly referred to as a RedBox or redundancy box feature. The external-facing NIC ports are connected to separate Ethernet managed switches and generate TCP/IP traffic across both NIC ports simultaneously, one in each direction on the ring.

Similarly, when two HSR networks are connected, a QuadBox function is required, which is usually applied redundantly to prevent any single points of failure. HSR is defined in IEC 62439-3 Clause 5.

Intelligent Electronic Device (IED)

IED is how traditional microprocessor protection, automation, and control devices are referred to, having integrated, multi-function capabilities.

Merging Unit (MU)

MU or Process Interface Unit (PIU), is used to convert analog (typically currents and voltages) signals from the instrument transformers, merges and sends them to the protective devices in a standard-based digital output format.

Manufacturing Message Specification (MMS)

MMS is a client-server protocol used for information exchange between protection, automation, and control devices or applications and higher-level systems (for example, Supervisory Control And Data Acquisition or SCADA) over the Ethernet.

The MMS protocol is mapped on TCP/IP and enables TCP/IP communications between networked devices to read or write data, read configurations, and exchange files. MMS resides on the station bus and is an ISO 9506 and IEC 61850-8-1 standard.

Protection, Automation, and Control (PAC)

PAC refers to orchestrated, intelligent, logic systems within a power grid. These systems might be made up of analog, electromechanical, solid-state, or microprocessor devices, or virtual applications.

Protection typically refers to dedicated devices (often referred to as relays) or applications used to provide selective high-speed isolation of a power system fault from all sources of generation. These devices operate high voltage apparatus to segment the grid from an undesirable condition that is detected internally within its designated zone of protection.

Original protection algorithm implementations are described as ANSI number functions (for example, 50, 51, and so on). Special requirements are high levels of determinism, real-time or low latency networking, high availability, and redundancy.

vPR is then a software-defined version of a protection relay, provided as an application that operates as part of a virtual machine or within a container-based format.

Automation typically refers to devices or applications used to automate power system functionality using a collection of components that monitors and controls high voltage apparatus. For example, Fault Location, Isolation, and Service Restoration (FLISR).

Control typically refers to devices or applications used to provide local and remote operability and collect and logically provide indication and annunciation through monitoring power system assets. For example, a Remote Telemetry Unit (RTU) or a Human Machine Interface (HMI). vAC is then a software-defined version of either automation or control applications, which is intended to include any non-protection function used to operate the power grid.

Parallel Redundancy Protocol (PRP)

PRP is a communications protocol that achieves 0 ms recovery time for network device failures. Each participating device is attached to two separate parallel networks as a Dual Attached Node (DAN). Devices can be attached to either network with a single connection as a Singly Attached Node (SAN), but they do not benefit from the PRP redundancy.

If the virtualized environment does not participate in the PRP protocol, it requires a special DAN NIC with two ports specific to the PRP traffic.

This functionality is commonly referred to as a RedBox or redundancy box feature. The external-facing NIC ports are connected to separate Ethernet managed switches and generate TCP/IP traffic across both NIC ports simultaneously. PRP is defined in IEC 62439-3, Clause 4.

Precision Time Protocol (PTP)

PTP provides a method to precisely coordinate timestamps throughout a network. Time synchronization is achieved through packets that are transmitted and received in a session between the GNSS synchronized originating signal at the grandmaster clock and all the subsequent participating devices (ordinary, transparent, and boundary clocks). PTP networks can achieve nanosecond-level synchronization compared to Network Time Protocol (NTP) which can only achieve millisecond-level synchronization.

PTP is part of the IEEE-1588 standard. The Power Profile or the IEC 61850-9-3 and IEEE C37.238 applications of the standard are typically used in the power industry, due to the hard-coded requirements used to meet the highest synchronization needs found in PAC systems.

It is common to find PTP messages on the process bus, but may also be orchestrated within the station bus, or both.

  • An Ordinary Clock (OC) is typically an end-device accepting PTP packets from the grandmaster clock or the nearest boundary clock, and reporting on network characteristics.

  • A Transparent Clock (TC) participates only in correcting and forwarding PTP packets for delay calculations elsewhere. These devices have the lowest latency added to the PTP network.

  • A Boundary Clock (BC) accepts a PTP packet from a grandmaster and adjusts it for network path delays before re-distributing it as a master clock signal to nearby ordinary clocks or secondary devices.

  • A Grandmaster Clock (GMC) is a singular source of network time, derived from an originating GNSS signal. Networks can contain more than one clock capable of becoming a grandmaster, in the event of a failure. The Best Master Class Algorithm (BMCA) is a protocol feature operating exclusively within an individual PTP domain that listens for participating clocks, comparing a hierarchy of attributes, and electing the best qualified master.

Process Bus

Process bus typically refers to the digital transmission of analog measurements or binary signals over the Ethernet between the power station apparatus and low-level sensors, and the bay-level protection, automation, and control devices or applications. Process bus is often restricted to Layer 2 network protocols (SV, GOOSE, or PTP).

Sampled Values

Sampled Values (SV) or Sampled Message Values (SMV) are current and voltage signals from instrument transformers that are digitized and then communicated using an Ethernet-based Local Area Network (LAN).

Sampled Values are transmitted as high-speed streams of data set samples encoded in multicast Ethernet frames. The protocol uses a publisher or subscriber model, in which a publisher transmits unacknowledged data to subscribers.

SV/SMV is a layer-2 protocol and typically resides on the process bus. SV/SMV is defined in the IEC 61850-9-2 standard. Typical standards used for publishing SVs include:

  • IEC 61850-9-2LE (Light Edition), which has a protection-class sample rate of 4,800 messages (packets) per second (US – 60 Hz @ 80 samples per electrical cycle) and 4,000 messages (packets) per second (EMEA – 50 Hz @ 80 samples per electrical cycle) or a metering-class sample rate of 14.4 kHz (US – 60 Hz) or 12.8 kHz (EMEA – 50 Hz).

  • IEC 61869-9, backwards compatible with 61850-9-2 LE but standardizing packet messages and associated sample rates, while allowing a mixed number of currents and voltages to be contained (up to a total of 24). Examples of the 61850-9-2LE packet designations within the parameters of 61869-9 are F4800S1I4U4 or F14400S6I4U4, which indicate that a message has a frequency of 4800 Hz or 14400 Hz, a quantity of 1 or 6 samples per message, and 4 currents and 4 voltages in each. This standard provides much more flexibility and scalability in deploying analog-to-digital conversion devices.

Bandwidth usage is high at approximately 5.3 Mbps for 4.8 kHz and 13.5 Mbps for 14.4 kHz sample rates. PRP, vLANs, and QoS are used to help ensure reliable transmission of the packets.

Much like GOOSE, SVs are another communications protocol defined by the IEC 61850 standard, originally intended for LAN-restricted traffic in Layer 2. A routable version of the protocol (known as R-SV) has been defined within IEC 61850-90-5.

Station Bus

Station bus typically refers to the digital transmission of analog or binary data over the Ethernet between the bay-level protection, automation, and control devices or applications and the power station-level supervisory or management systems and applications. Station bus often includes up to Layer 3 network protocols (including MMS).

Acronyms and Definitions

This section lists the acronyms used frequently in this reference architecture guide.

Acronym

Definition

ECS

Edge Compute Stack

LCM

Lifecycle Management

TKG

Tanzu Kubernetes Grid

TMC

Tanzu Mission Control

SDDC

Software Defined Data Center

VVS

VMware Validated Solution

vPR

Virtual Protection and Relay

vAC

Virtual Automation Control

GOOSE

Generic Object Oriented Substation Event

HSR

High-availability Seamless Redundancy

HMI

Human Machine Interface

MU

Merging Unit

PIU

Process Interface Unit

MMS

Manufacturing Message Specification

PAC

Protection, Automation, and Control

PRP

Parallel Redundancy Protocol

PTP

Precision Time Protocol

CSP

Common Substation Platform

PCR

Platform Configuration Registers

AK

Attestation Key

TPM

Trusted Platform Module

CRB

Command Response Buffer

BES

Bulk Electric System

NIST

National Institute of Standards and Technology

KEK

Key Encryption Key

DEK

Data Encryption Key

BES

Bulk Electric System

EACMS

Electronic Access Controller Monitoring Systems

PACS

Physical Access Control System

SCI

Shared Cyber Infrastructure

VCA

Virtual Cyber Asset

SIEM

Security Information and Event Management

ATP

Advanced Threat Prevention