Security

You can apply security controls consistently across environments by using several pre-packaged components in Tanzu Kubernetes Grid (TKG). The pre-packaged components provide greater security for workload clusters and its underlying environment.

With every release, VMware remains committed to improving TKG security with a focus on making security an intrinsic part of the product while maintaining a frictionless developer experience. The recommendations in this document should be taken into consideration based on the security posture and risk appetite of the organization. This document uses a shared responsibility model for securing environments that run Tanzu Kubernetes Grid provisioned clusters for all layers in the cloud native stack: Code, Container, Cluster, and Cloud.

Security details differ between TKG deployed with a vSphere with Tanzu Supervisor and TKG deployed with a standalone management cluster. The security story for Supervisor-based deployments are covered in the vSphere with Tanzu documentation linked below. The rest of this topic describes TKG deployments with standalone management clusters.

vSphere with Tanzu Supervisor Security

Tanzu Kubernetes Grid v2.0 on vSphere 8 with Supervisor is an add-on module to vSphere. It leverages many vSphere features, including vSphere security and vCenter SSO. For more information about Tanzu Kubernetes Grid v2.0 security, see vSphere with Tanzu Security in the vSphere 8 documentation.

This document pertains only to the Tanzu Kubernetes Grid multi-cloud offering. Its official product documentation and differences with other offerings are described here.

For security guidance on other VMware Tanzu offerings, see:

Tanzu Kubernetes Grid Integrated Edition: See Tanzu Kubernetes Grid Integrated Edition Security in the TKGI documentation.
Tanzu Mission Control: See Security Measures in VMware Tanzu Mission Control.
VMware policies and practices for secure software development: See VMware Product Security.

Standalone Management Cluster Security

The following sections describe security for TKG deployed with a standalone management cluster. This includes the security controls that are built into the product and best practices to implement complementary security controls that protect the environments in which Tanzu Kubernetes Grid clusters are deployed.

Code

Tanzu Kubernetes Grid runs code written by application developers, deployed as Kubernetes pods. Tanzu Kubernetes Grid is made up of different components, many of which are open source and some proprietary to VMware. If the code of all of these applications and components is secure, it improves the security posture of the environment running Tanzu Kubernetes Grid provisioned clusters.

Tanzu Kubernetes Grid is developed in compliance with the VMware Security Development Lifecycle Process. Specifically, the following best practices are implemented to ensure Tanzu Kubernetes Grid product security:

Revise threat model for every major design change in the product.
Work on high priority fixes for the findings coming out of the threat modelling exercise.
Automated builds for all core Tanzu Kubernetes Grid components to compile them from source code.
Participate in upstream for security fixes, release management, and triage of vulnerabilities.
For VMware proprietary code before merging to main branch:
- Implement peer code reviews to ensure second pair of eyes.
- Execute automated static code scanning using tools like golint, gosec, govet.
Sign binaries like kubectl or tanzu-cli with VMware signing keys.

To secure code of the containerized apps running on Tanzu Kubernetes Grid, the following resources can serve as a helpful reference:

Containers

Containers are instantiated as Linux namespace isolated processes, using pre-packaged images that are essentially tarballs of all the runtime dependencies and app binary to run the containerized application. Tanzu Kubernetes Grid runs these containers as part of Kubernetes pods. Many Tanzu Kubernetes Grid components are also packaged as container images and are configured to run as pods (sometimes as Kubernetes daemonsets or static pods). The following best practices are implemented to secure containers of Tanzu Kubernetes Grid components:

Scan all container images with vulnerability scanner for Common Vulnerability and Exposures (CVEs) during push to the staging container registry.
Limit push access to external-facing container registry to the Tanzu Kubernetes Grid release team following the principle of least privilege.
Use a centrally (LDAP) managed service, or robot account, that automates push of container images from staging to production after release criteria and appropriate testing is complete.
Perform an internal impact assessment documenting any critical unfixed vulnerabilities in the container images.
Fix vulnerabilities with major product impact^[1][2] without waiting for the next minor release.
Regularly update Tanzu Kubernetes Grid components to newer base images in order to obtain fixes for newly identified vulnerabilities.
Prefer and drive the move towards minimal images for all Tanzu Kubernetes Grid components, when possible, and limit base image distributions to a small number, to reduce the patching surface area for all images.

To build, run, and consume container images securely in general, the following resources are a useful guide:

OS Updates and Node Image Versions

VMware packages versioned base machine OS images in Tanzu Kubernetes releases (TKrs), along with compatible versions of Kubernetes and supporting components. Tanzu Kubernetes Grid then uses these packaged OS, Kubernetes, and component versions to create cluster and control plane nodes. See Tanzu Kubernetes Releases and Custom Node Images for more information.

Each published TKr uses the latest stable and generally-available update of the OS version that it packages, containing all current CVE and USN fixes, as of the day that the image is built. VMware rebuilds these node images⸺vSphere OVAs, AWS AMIs, and Azure VM images⸺with each release of Tanzu Kubernetes Grid, and possibly more frequently. The image files are signed by VMware and have filenames that contain a unique hashcode identifier.

When a critical or high-priority CVE is reported, VMware collaborates on a fix, and when the fix is published, rebuilds all affected node images, and container base images, to include the update.

FIPS-Enabled Versions

You can install and run a FIPS-enabled versions of Tanzu Kubernetes Grid v2.1.0 and v2.1.1, in which core components use cryptographic primitives provided by a FIPS-enabled library based on the BoringCrypto / Boring SSL module. These core components include components of Kubernetes, Containerd and CRI, CNI plugins, CoreDNS, and etcd.

For information about how to install FIPS-enabled Tanzu Kubernetes Grid, see FIPS-Enabled Versions in VMware Tanzu Compliance.

Clusters

A Kubernetes cluster is made up of several components that act as a control plane of the cluster and a set of supporting components and worker nodes that actually help run deployed workloads. There are two types of clusters in the Tanzu Kubernetes Grid setup: management cluster and workload cluster. The Tanzu Kubernetes Grid management cluster hosts all the Tanzu Kubernetes Grid components used to manage workload clusters. Workload clusters that are spun up by Tanzu Kubernetes Grid admins are then used to actually run the containerized applications. Cluster security is a shared responsibility between Tanzu Kubernetes Grid cluster admins, developers, and operators who run apps on Tanzu Kubernetes Grid provisioned clusters. This section enumerates the components included with Tanzu Kubernetes Grid by default that can help implement secure best practices for both management and workload clusters.

Identity and Access Management

Tanzu Kubernetes Grid has a Pinniped package that enables secure access to Kubernetes clusters, as described in Identity and Access Management.

Tanzu Kubernetes Grid operators are still responsible for granting access to cluster resources to other users of Kubernetes through built-in role-based access control. Recommended best practices for managing identities in Tanzu Kubernetes Grid provisioned clusters are as follows:

Limit access to cluster resources following least privilege principle.
Limit access to management clusters to the appropriate set of users. For example, provide access only to users who are responsible for managing infrastructure and cloud resources but not to application developers. This is especially important because access to the management cluster inherently provides access to all workload clusters.
Limit cluster admin access for workload clusters to the appropriate set of users. For example, users who are responsible for managing infrastructure and platform resources in your organization but not to application developers.
With Pinniped, connect to a centralized identity provider to manage user identities allowed to access cluster resources instead of relying on admin generated kubeconfig files.

Multi-Tenancy

One of the core benefits of Tanzu Kubernetes Grid is the ability to manage the complete lifecycle of multiple clusters through a single management plane. This is important, because from a multi-tenancy point of view, the highest form of isolation between untrusted workloads is possible when they run in separate Kubernetes clusters. These are some of the defaults configured to support multi-tenant workloads in Tanzu Kubernetes Grid:

Nodes are not shared between clusters.
Nodes are configured to host only container workloads.
The management plane runs in its own dedicated cluster to enable separation of concerns with workload clusters.
Kubernetes management components such as api-server, scheduler, controller-manager, etc., run on dedicated nodes. Additionally, consider applying an audit rule to detect deployment of any workload pods to control plane nodes.
Application pod scheduling on dedicated nodes for management components (mentioned above) is deactivated through node taints and affinity rules.

To improve security in an AWS multi-tenant environment, deploy the workload clusters to an AWS account that is different from the one used to deploy the management cluster. To deploy workload clusters across multiple AWS accounts, see Clusters on Different AWS Accounts.

For more in depth information on the design considerations when deploying multi-tenant environments, see Workload Tenancy.

Workload Isolation

Workload Isolation requirements are unique for each customer. Therefore, to reasonably isolate workloads from each other with acceptable risk tolerance requires additional effort in line with the shared responsibility model. This includes limiting the number of containers that need to run with higher privileges to a handful of namespaces and implementing defense in depth mechanisms such as AppArmor and SELinux at runtime, pod, and node level. In Tanzu Kubernetes Grid 1.6 and later, AppArmor is enabled by default in Ubuntu 20.04 images.

These configurations can be centrally enforced on pods through Pod Security Policies with an eye on migration to its replacement: Pod Security Admission Control.

For advanced use cases and custom policy management, in general the following resources serve as a good starting point: OPA, Admission Control, and Pod Security Standards

Protecting Inter-Service Communication

One of the fundamental aspects of a microservices architecture is building services that do one thing only. This enables separation of concerns and allows teams to move faster. However, this also increases the need to communicate with several different microservices that are often running in the same cluster in their own pods. Therefore, the following best practices should be considered for securing these communications at runtime:

Least privilege network policies: Antrea is the default CNI plugin that is enabled in Tanzu Kubernetes Grid. To learn more about how to use it to implement network policies that can be applied depending on the risk posture, refer to the official docs for Antrea. To use a different CNI plugin of choice, follow this guide: Pod and Container Networking
Mutual TLS by default: Implementing this is a responsibility of the customers of Tanzu Kubernetes Grid. This can be implemented as part of application manifest or by using a service mesh that enables a sidecar container to handle TLS communication for the app container.
Protect Secrets: There are several different options to choose from when managing secrets in a Kubernetes cluster. For a quick run-down of options, see Secrets Management.

Auditing, Logging, and Monitoring

To ensure observability and repudiation of cluster resources including application pods, it is important to enable auditing and monitoring of Tanzu Kubernetes Grid provisioned clusters. Tanzu Kubernetes Grid is packaged with a set of extensions that allow administrators to enable this natively. The following guides explain this in depth:

API server and System audit logging: How to enable API server audit logging as well system level (node) auditing to prevent repudiation of cluster usage. Tanzu Kubernetes Grid includes a default policy for API server auditing. It is recommended to set an appropriate policy for node level audit daemon to ensure tampering of container runtime binaries and configuration can be detected.
Log Forwarding with Fluent Bit: How to enable centralized log collection that can prevent loss of repudiation due to local tampering of logs.
Monitoring with Prometheus and Grafana: How to enable observability of cluster and system metrics for alerting and visualization that can detect sudden spikes in resource consumption due to denial of service attacks.

Depending on the relevant threat outlined, any or all of the above controls can be applied to a Tanzu Kubernetes Grid cluster.

Cloud Providers

Cloud providers act as an underlay resource for all the Tanzu Kubernetes Grid provisioned Kubernetes Clusters, regardless of whether it is an on-premises (e.g. vSphere) or a public cloud (e.g. AWS, Azure, or Google Cloud) deployment. Securing the underlying infrastructure is generally a shared responsibility between customers of Tanzu Kubernetes Grid and the cloud providers. These are some recommendations to improve security of the cloud underlying the Tanzu Kubernetes Grid provisioned clusters:

Rotate or update your cloud credentials regularly using this guide (vSphere only): Tanzu Cluster Secrets (If automating rotation, please consider testing it in non-production environments to observe and plan for any disruption it may cause).
Apply least privileged permissions for cloud credentials as described in the documentation for AWS, Azure, and vSphere providers. Whenever possible, run management and workload clusters in separate (VPCs) and firewall zones. This is the default setting for Tanzu Kubernetes Grid provisioned clusters.
SSH node access, especially to control plane nodes, should be restricted to a small set of users who play the role of infrastructure admin.
SSH access should be rarely used, mainly as a break glass procedure such as loss of management cluster credentials.
Validate that cluster resources are not accessible to unauthenticated users on the internet. Customers with low risk tolerance should deploy clusters without exposing API server port to the internet with appropriate load balancer configuration.
Isolate Tanzu Kubernetes Grid environment (management and workload clusters) in dedicated VPCs, or behind a firewall from other non-Tanzu cloud workloads, to limit lateral movement and to reduce the attack surface area in case of a compromised cluster.
Apply, test, and validate disaster recovery scenarios for redundancy and multi-region availability across clusters.
Implement a plan to recover from loss of data caused by data corruption, ransomware attacks, or natural catastrophes that result in physical hardware damage.
Consider using native backup and restore of cluster resources with Velero to help in disaster recovery planning and data loss recovery scenarios.

These recommendations are in addition to the general guidance on security for any cloud provider. For general guidance on cloud security, please refer to the relevant cloud provider’s official security documentation.

In conclusion, this document provides a broad picture about the current state of the art and recommended security controls that can be applied to Tanzu Kubernetes Grid. We are committed to shipping more intrinsically secure Tanzu Kubernetes Grid with every release keeping in mind the desire to have a frictionless developer experience.

If you have feedback on the document or have any feature requests related to security, please contact your VMware representative.

Authority to Operate

You can harden Tanzu Kubernetes Grid (TKG) to achieve an Authority to Operate (ATO). TKG releases are continuously validated against the Defense the Information Systems Agency Security Technical Implementation Guides (DISA STIG), Cybersecurity and Infrastructure Security Agency (CISA) and National Security Agency (NSA) framework, and National Institute of Standards and Technology (NIST) guidelines. For more information, see Tanzu Kubernetes Grid 2.1 Compliance and Hardening.

References

Upstream Community Led Resources

These are some upstream (CNCF/Kubernetes) community-driven security-centric resources:

Kubernetes SIG Security 2020 Annual Report: Update in progress for 2021.
Cloud native security whitepaper (2020): Update in progress for 2021.
Cloud Native Security for your Clusters: Kubernetes specific take on (2).
OWASP CheatSheet for Kubernetes Security
4 Cs model for Kubernetes Security: This page borrows its high level structure from here.

Third Party Standards and Guidelines

Following are a list of documents published in no particular order of preference from government and standards bodies:

NSA/CISA Kubernetes Hardening Guide: Published in Aug 2022, this is a prescriptive document that covers many areas related to Kubernetes security.
NIST Application Container Security Guide: Published in 2017.
NIST Kubernetes STIG Checklist: Published in April 2021, provides a prescriptive a list of technical requirements for securing a basic Kubernetes platform.
CIS Kubernetes Benchmark: Widely used as a secure configuration guide, last updated in June 2021.
Container Platform Security Requirements Guide: U.S. Department of Defense published this guide to secure a basic Kubernetes platform in Dec 2020.
AWS well-architected- Security Pillar: High level document that describes designing cloud architectures with security in mind for AWS.
Azure well-architected- Security Pillar: High level document that describes designing cloud architectures with security in mind for Azure.
Google Architecture Framework- Security, privacy, and compliance: High level document that describes designing cloud architectures with security in mind for Google Cloud.
Hardening and Compliance for vSphere: High level overview of security and compliance that require attention to help plan security and compliance strategy.