This topic provides a high-level descriptive overview of Tanzu Kubernetes Grid security.
Tanzu Kubernetes Grid (TKG) enables a consistent experience that is Kubernetes-native across any cloud. The security controls can therefore also be consistently applied across environments by using several pre-packaged components in TKG that assist in achieving greater security for workload clusters and its underlying environment. Shared responsibility model also applies for securing environments that run Tanzu Kubernetes Grid provisioned clusters for all layers in the cloud native stack: Code, Container, Cluster, and Cloud.
This document is an attempt to share the current state of the art of TKG security. With every release, VMware remains committed to improving TKG security with a focus on making security intrinsically part of the product while maintaining a frictionless developer experience. All the recommendations in this document should be taken into consideration based on the security posture and risk appetite of the organization.
Security details differ between TKG deployed with a vSphere with Tanzu Supervisor and TKG deployed with a standalone management cluster. The security story for Supervisor-based deployments are covered in the vSphere with Tanzu documentation linked below. The rest of this topic describes TKG deployments with standalone management clusters.
Tanzu Kubernetes Grid v2.0 on vSphere 8 with Supervisor is an add-on module to vSphere. It leverages many vSphere features, including vSphere security and vCenter SSO. For more information about Tanzu Kubernetes Grid v2.0 security, see vSphere with Tanzu Security in the vSphere 8 documentation.
This document pertains only to the Tanzu Kubernetes Grid multi-cloud offering. Its official product documentation and differences with other offerings are described here.
For security guidance on other VMware Tanzu offerings, see:
Tanzu Kubernetes Grid Integrated Edition: See Tanzu Kubernetes Grid Integrated Edition Security in the TKGI documentation.
Tanzu Mission Control: See Security Measures in VMware Tanzu Mission Control.
VMware policies and practices for secure software development: See VMware Product Security.
The sections below describe security in and around TKG deployed with a standalone management cluster. This includes the security controls available for use built into the product, and best practices to implement complementary security controls that protect the environments in which Tanzu Kubernetes Grid clusters are deployed.
Tanzu Kubernetes Grid runs code written by application developers, deployed as Kubernetes pods. Tanzu Kubernetes Grid is made up of different components, many of which are open source and some proprietary to VMware. If the code of all of these applications and components is secure, it improves the security posture of the environment running Tanzu Kubernetes Grid provisioned clusters.
Tanzu Kubernetes Grid is developed in compliance with the VMware Security Development Lifecycle Process. Specifically, the following best practices are implemented to ensure Tanzu Kubernetes Grid product security:
Revise threat model for every major design change in the product.
Work on high priority fixes for the findings coming out of the threat modelling exercise.
Automated builds for all core Tanzu Kubernetes Grid components to compile them from source code.
Participate in upstream for security fixes, release management, and triage of vulnerabilities.
For VMware proprietary code before merging to
Implement peer code reviews to ensure second pair of eyes.
Execute automated static code scanning using tools like
Sign binaries like
tanzu-cli with VMware signing keys.
To secure code of the containerized apps running on Tanzu Kubernetes Grid, the following resources can serve as a helpful reference:
Containers are instantiated as Linux namespace isolated processes, using pre-packaged images that are essentially tarballs of all the runtime dependencies and app binary to run the containerized application. Tanzu Kubernetes Grid runs these containers as part of Kubernetes pods. Many Tanzu Kubernetes Grid components are also packaged as container images and are configured to run as pods (sometimes as Kubernetes daemonsets or static pods). The following best practices are implemented to secure containers of Tanzu Kubernetes Grid components:
Scan all container images with vulnerability scanner for Common Vulnerability and Exposures (CVEs) during
push to the staging container registry.
push access to external-facing container registry to the Tanzu Kubernetes Grid release team following the principle of least privilege.
Use a centrally (LDAP) managed service, or robot account, that automates
push of container images from staging to production after release criteria and appropriate testing is complete.
Perform an internal impact assessment documenting any critical unfixed vulnerabilities in the container images.
Fix vulnerabilities with major product impact  without waiting for the next minor release.
Regularly update Tanzu Kubernetes Grid components to newer base images in order to obtain fixes for newly identified vulnerabilities.
Prefer and drive the move towards minimal images for all Tanzu Kubernetes Grid components, when possible, and limit base image distributions to a small number, to reduce the patching surface area for all images.
To build, run, and consume container images securely in general, the following resources are a useful guide:
VMware packages versioned base machine OS images in Tanzu Kubernetes releases (TKrs), along with compatible versions of Kubernetes and supporting components. Tanzu Kubernetes Grid then uses these packaged OS, Kubernetes, and component versions to create cluster and control plane nodes. See Tanzu Kubernetes Releases and Custom Node Images for more information.
Each published TKr uses the latest stable and generally-available update of the OS version that it packages, containing all current CVE and USN fixes, as of the day that the image is built. VMware rebuilds these node images⸺vSphere OVAs, AWS AMIs, and Azure VM images⸺with each release of Tanzu Kubernetes Grid, and possibly more frequently. The image files are signed by VMware and have filenames that contain a unique hashcode identifier.
When a critical or high-priority CVE is reported, VMware collaborates on a fix, and when the fix is published, rebuilds all affected node images, and container base images, to include the update.
The Ubuntu 20.04 machine images for cluster nodes for vSphere, AWS and Azure are hardened to Center for Internet Security (CIS) standards by default, with AppArmor enabled. Photon OS 3 machine images are hardened to Security Technical Implementation Guides (STIG) standards by default.
You can install and run a FIPS-capable version of Tanzu Kubernetes Grid v1.6, in which core components use cryptographic primitives provided by a FIPS-compliant library based on the BoringCrypto / Boring SSL module. These core components include components of Kubernetes, Containerd and CRI, CNI plugins, CoreDNS, and etcd.
For information about how to install FIPS-capable Tanzu Kubernetes Grid, see the FIPS-Capable Version section of Prepare to Deploy Management Clusters in the TKG v1.6 documentation.
A Kubernetes cluster is made up of several components that act as a control plane of the cluster and a set of supporting components and worker nodes that actually help run deployed workloads. There are two types of clusters in the Tanzu Kubernetes Grid setup: management cluster and workload cluster. The Tanzu Kubernetes Grid management cluster hosts all the Tanzu Kubernetes Grid components used to manage workload clusters. Workload clusters that are spun up by Tanzu Kubernetes Grid admins are then used to actually run the containerized applications. Cluster security is a shared responsibility between Tanzu Kubernetes Grid cluster admins, developers, and operators who run apps on Tanzu Kubernetes Grid provisioned clusters. This section enumerates the components included with Tanzu Kubernetes Grid by default that can help implement secure best practices for both management and workload clusters.
Tanzu Kubernetes Grid has a Pinniped package that enables secure access to Kubernetes clusters, as described in Identity and Access Management.
Tanzu Kubernetes Grid operators are still responsible for granting access to cluster resources to other users of Kubernetes through built-in role-based access control. Recommended best practices for managing identities in Tanzu Kubernetes Grid provisioned clusters are as follows:
Limit access to cluster resources following least privilege principle.
Limit access to management clusters to the appropriate set of users. For example, provide access only to users who are responsible for managing infrastructure and cloud resources but not to application developers. This is especially important because access to the management cluster inherently provides access to all workload clusters.
Limit cluster admin access for workload clusters to the appropriate set of users. For example, users who are responsible for managing infrastructure and platform resources in your organization but not to application developers.
With Pinniped, connect to a centralized identity provider to manage user identities allowed to access cluster resources instead of relying on admin generated
One of the core benefits of Tanzu Kubernetes Grid is the ability to manage the complete lifecycle of multiple clusters through a single management plane. This is important, because from a multi-tenancy point of view, the highest form of isolation between untrusted workloads is possible when they run in separate Kubernetes clusters. These are some of the defaults configured to support multi-tenant workloads in Tanzu Kubernetes Grid:
Nodes are not shared between clusters.
Nodes are configured to host only container workloads.
The management plane runs in its own dedicated cluster to enable separation of concerns with workload clusters.
Kubernetes management components such as
controller-manager, etc., run on dedicated nodes. Additionally, consider applying an audit rule to detect deployment of any workload pods to control plane nodes.
Application pod scheduling on dedicated nodes for management components (mentioned above) is deactivated through node taints and affinity rules.
To improve security in an AWS multi-tenant environment, deploy the workload clusters to an AWS account that is different from the one used to deploy the management cluster. To deploy workload clusters across multiple AWS accounts, see Clusters on Different AWS Accounts.
For more in depth information on the design considerations when deploying multi-tenant environments, see Workload Tenancy.
Workload Isolation requirements are unique for each customer. Therefore, to reasonably isolate workloads from each other with acceptable risk tolerance requires additional effort in line with the shared responsibility model. This includes limiting the number of containers that need to run with higher privileges to a handful of namespaces and implementing defense in depth mechanisms such as AppArmor and SELinux at runtime, pod, and node level. In Tanzu Kubernetes Grid 1.6 and later, AppArmor is enabled by default in Ubuntu 20.04 images.
These configurations can be centrally enforced on pods through Pod Security Policies with an eye on migration to its replacement: Pod Security Admission Control.
For advanced use cases and custom policy management, in general the following resources serve as a good starting point: OPA, Admission Control, and Pod Security Standards
One of the fundamental aspects of a microservices architecture is building services that do one thing only. This enables separation of concerns and allows teams to move faster. However, this also increases the need to communicate with several different microservices that are often running in the same cluster in their own pods. Therefore, the following best practices should be considered for securing these communications at runtime:
Least privilege network policies: Antrea is the default CNI plugin that is enabled in Tanzu Kubernetes Grid. To learn more about how to use it to implement network policies that can be applied depending on the risk posture, refer to the official docs for Antrea. To use a different CNI plugin of choice, follow this guide: Pod and Container Networking
Mutual TLS by default: Implementing this is a responsibility of the customers of Tanzu Kubernetes Grid. This can be implemented as part of application manifest or by using a service mesh that enables a sidecar container to handle TLS communication for the app container.
Protect Secrets: There are several different options to choose from when managing secrets in a Kubernetes cluster. For a quick run-down of options, see Secrets Management.
To ensure observability and repudiation of cluster resources including application pods, it is important to enable auditing and monitoring of Tanzu Kubernetes Grid provisioned clusters. Tanzu Kubernetes Grid is packaged with a set of extensions that allow administrators to enable this natively. The following guides explain this in depth:
API server and System audit logging: How to enable API server audit logging as well system level (node) auditing to prevent repudiation of cluster usage. Tanzu Kubernetes Grid includes a default policy for API server auditing. It is recommended to set an appropriate policy for node level audit daemon to ensure tampering of container runtime binaries and configuration can be detected.
Log Forwarding with Fluent Bit: How to enable centralized log collection that can prevent loss of repudiation due to local tampering of logs.
Monitoring with Prometheus and Grafana: How to enable observability of cluster and system metrics for alerting and visualization that can detect sudden spikes in resource consumption due to denial of service attacks.
Depending on the relevant threat outlined, any or all of the above controls can be applied to a Tanzu Kubernetes Grid cluster.
Cloud providers act as an underlay resource for all the Tanzu Kubernetes Grid provisioned Kubernetes Clusters, regardless of whether it is an on-premises (e.g. vSphere) or a public cloud (e.g. AWS, Azure, or Google Cloud) deployment. Securing the underlying infrastructure is generally a shared responsibility between customers of Tanzu Kubernetes Grid and the cloud providers. These are some recommendations to improve security of the cloud underlying the Tanzu Kubernetes Grid provisioned clusters:
Rotate or update your cloud credentials regularly using this guide (vSphere only): Tanzu Cluster Secrets (If automating rotation, please consider testing it in non-production environments to observe and plan for any disruption it may cause).
Apply least privileged permissions for cloud credentials as described in the documentation for AWS, Azure, and vSphere providers. Whenever possible, run management and workload clusters in separate (VPCs) and firewall zones. This is the default setting for Tanzu Kubernetes Grid provisioned clusters.
SSH node access, especially to control plane nodes, should be restricted to a small set of users who play the role of infrastructure admin.
SSH access should be rarely used, mainly as a break glass procedure such as loss of management cluster credentials.
Validate that cluster resources are not accessible to unauthenticated users on the internet. Customers with low risk tolerance should deploy clusters without exposing API server port to the internet with appropriate load balancer configuration.
Isolate Tanzu Kubernetes Grid environment (management and workload clusters) in dedicated VPCs, or behind a firewall from other non-Tanzu cloud workloads, to limit lateral movement and to reduce the attack surface area in case of a compromised cluster.
Apply, test, and validate disaster recovery scenarios for redundancy and multi-region availability across clusters.
Implement a plan to recover from loss of data caused by data corruption, ransomware attacks, or natural catastrophes that result in physical hardware damage.
Consider using native backup and restore of cluster resources with Velero to help in disaster recovery planning and data loss recovery scenarios.
These recommendations are in addition to the general guidance on security for any cloud provider. For general guidance on cloud security, please refer to the relevant cloud provider’s official security documentation.
In conclusion, this document provides a broad picture about the current state of the art and recommended security controls that can be applied to Tanzu Kubernetes Grid. We are committed to shipping more intrinsically secure Tanzu Kubernetes Grid with every release keeping in mind the desire to have a frictionless developer experience.
If you have feedback on the document or have any feature requests related to security, please contact your VMware representative.
These are some upstream (CNCF/Kubernetes) community-driven security-centric resources:
Kubernetes SIG Security 2020 Annual Report: Update in progress for 2021.
Cloud native security whitepaper (2020): Update in progress for 2021.
Cloud Native Security for your Clusters: Kubernetes specific take on (2).
4 Cs model for Kubernetes Security: This page borrows its high level structure from here.
Following are a list of documents published in no particular order of preference from government and standards bodies:
NSA/CISA Kubernetes Hardening Guide: Published in Aug 2022, this is a prescriptive document that covers many areas related to Kubernetes security.
NIST Application Container Security Guide: Published in 2017.
NIST Kubernetes STIG Checklist: Published in April 2021, provides a prescriptive a list of technical requirements for securing a basic Kubernetes platform.
CIS Kubernetes Benchmark: Widely used as a secure configuration guide, last updated in June 2021.
Container Platform Security Requirements Guide: U.S. Department of Defense published this guide to secure a basic Kubernetes platform in Dec 2020.
AWS well-architected- Security Pillar: High level document that describes designing cloud architectures with security in mind for AWS.
Azure well-architected- Security Pillar: High level document that describes designing cloud architectures with security in mind for Azure.
Google Architecture Framework- Security, privacy, and compliance: High level document that describes designing cloud architectures with security in mind for Google Cloud.
Hardening and Compliance for vSphere: High level overview of security and compliance that require attention to help plan security and compliance strategy.