NSA/CISA hardening for TKG clusters

For compliance with NSA/CISA standards, you harden Tanzu Kubernetes Grid clusters by hardening the cluster network policies and pod security policies. This TKG compliance document uses Antrea CNI for network hardening and Open Policy Agent (OPA) for pod security policies.

We use Antrea CNI because it provides fine-grained control of network policies using tiers and the ability to apply a cluster-wide security policy using ClusterNetworkPolicy. For information about tiers, see Tier and Antrea ClusterNetworkPolicy in Antrea product documentation.

For pod security, we use Open Policy Agent (OPA) instead of pod security policies because pod security policies were deprecated in Kubernetes 1.21. For information about Open Policy Agent, see Overview & Architecture in Open Security Agent product documentation.

ytt overlays, network policies, and OPA policies are created in a way to make it easy for cluster administrators to opt out of hardening controls for certain workloads. We suggest that you do not completely opt out of the hardening practices. Instead, isolate the workloads in namespaces where the hardening controls are not applied. Opting out of hardening controls also depends on the risk appetite of the TKG deployment.

Antrea object specifications

To harden plan-based TKG clusters for NSA/CISA, VMware provides the following Antrea object specifications:

  • NSA/CISA hardening: Antrea ClusterNetworkPolicies
  • NSA/CISA hardening: Antrea network policy

You can get the default policies and Antrea object specifications needed for NSA/CISA compliance from the dod-compliance-and-automation GitHub repository. This information applies to TKG v2.1 and v2.2.

Antrea ClusterNetworkPolicies

The following Antrea ClusterNetworkPolicies specification for Network policies control sets a default policy for all Pods to deny all ingress and egress traffic and ensure that any unselected Pods are isolated.

apiVersion: security.antrea.tanzu.vmware.com/v1alpha1
kind: ClusterNetworkPolicy
metadata:
  name: default-deny
spec:
  priority: 150
  tier: baseline
  appliedTo:
    - namespaceSelector: {}
  ingress:
    - action: Drop              # For all Pods in every namespace, drop and log all ingress traffic from anywhere
      name: drop-all-ingress
      enableLogging: true
  egress:
    - action: Drop              # For all Pods in every namesapces, drop and log all egress traffic towards anywhere
      name: drop-all-egress
      enableLogging: true

Antrea network policy

The following Antrea network policy allows tanzu-capabilities-manager egress to kube-apiserver ports 443 and 6443.

apiVersion: security.antrea.tanzu.vmware.com/v1alpha1
kind: NetworkPolicy
metadata:
  name: tanzu-cm-apiserver
  namespace: tkg-system
spec:
  priority: 5
  tier: securityops
  appliedTo:
    - podSelector:
        matchLabels:
        app: tanzu-capabilities-manager
  egress:
    - action: Allow
      to:
      - podSelector:
          matchLabels:
          component: kube-apiserver
        namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: kube-system
      ports:
      - port: 443
        protocol: TCP
      - port: 6443
        protocol: TCP
      name: AllowToKubeAPI

Pod security

To harden plan-based clusters for NSA/CISA, this guide uses OPA Gatekeeper.

The VMware dod-compliance-and-automation GitHub repository provides the necessary OPA Gatekeeper policies for NSA/CISA compliance. This information applies to TKG v2.1 and v2.2.

OPA constraints and restrictions

The following example uses OPA gatekeeper to restrict allowed images repositories.

  • OPA template:

    apiVersion: templates.gatekeeper.sh/v1beta1
    kind: ConstraintTemplate
    metadata:
    name: k8sallowedrepos
    annotations:
      description: Requires container images to begin with a repo string from a specified
        list.
    spec:
    crd:
      spec:
        names:
          kind: K8sAllowedRepos
        validation:
          # Schema for the `parameters` field
          openAPIV3Schema:
            type: object
            properties:
              repos:
                type: array
                items:
                  type: string
    targets:
      - target: admission.k8s.gatekeeper.sh
        rego:|
          package k8sallowedrepos
    
          violation[{"msg": msg}] {
            container := input.review.object.spec.containers[_]
            satisfied := [good| repo = input.parameters.repos[_] ; good = startswith(container.image, repo)]
            not any(satisfied)
            msg := sprintf("container <%v> has an invalid image repo <%v>, allowed repos are %v", [container.name, container.image, input.parameters.repos])
          }
    
          violation[{"msg": msg}] {
            container := input.review.object.spec.initContainers[_]
            satisfied := [good| repo = input.parameters.repos[_] ; good = startswith(container.image, repo)]
            not any(satisfied)
            msg := sprintf("container <%v> has an invalid image repo <%v>, allowed repos are %v", [container.name, container.image, input.parameters.repos])
          }
    
  • OPA constraints:

    apiVersion: constraints.gatekeeper.sh/v1beta1
    kind: K8sAllowedRepos
    metadata:
    name: repo-is-openpolicyagent
    spec:
    match:
      kinds:
        - apiGroups: [""]
          kinds: ["Pod"]
    parameters:
      repos:
        - "<ALLOWED_IMAGE_REPO>"
    

OPA mutations

The following example uses OPA mutation to set allowPrivilegeEscalation to false if it is missing in the pod spec.

apiVersion: mutations.gatekeeper.sh/v1alpha1
kind: Assign
metadata:
  name: allow-privilege-escalation
spec:
    match:
      scope: Namespaced
      kinds:
        - apiGroups: ["*"]
          kinds: ["Pod"]
      excludedNamespaces:
      - kube-system
    applyTo:
    - groups: [""]
      kinds: ["Pod"]
      versions: ["v1"]
    location: "spec.containers[name:*].securityContext.allowPrivilegeEscalation"
    parameters:
      pathTests:
      - subPath: "spec.containers[name:*].securityContext.allowPrivilegeEscalation"
        condition: MustNotExist
      assign:
        value: false

TKG NSA/CISA hardening results

The NSA/CISA hardening process changes security scans for TKG v2.2 cluster nodes. The following screen captures show the scan results for out-of-box TKG cluster notes and the scan results after hardening.

Scan results, hardened TKG v2.2 cluster nodes:

Hardened NSA/CISA

NSA/CISA Kubernetes Hardening Guidance

Title Compliant By Default? Can be resolved? Explanation/Exception
Allow privilege escalation No Yes Resolved with OPA Gatekeeper Policy and mutations
Non-root containers No Yes Resolved with OPA Gatekeeper Policy and mutations.
Exception Some pods such as contour/envoy need root to function. Tanzu System Ingress needs to interact with the network
Automatic mapping of service account No Yes Resolved with OPA Gatekeeper Mutation.
Exception Gatekeeper needs access to the API server so its service accounts are automounted
Applications credentials in configuration files No No Exception All of the detected credentials in configuration files were false positives as they were public keys
Linux hardening No Yes Resolved with OPA gatekeeper constraint and mutation to drop all capabilities
Exception Some pods such as contour/envoy need advanced privileges to function. Tanzu System Ingress needs to interact with the network
Seccomp Enabled No Yes Resolved with OPA gatekeeper mutation to set a seccomp profile for all pods
Host PID/IPC privileges No Yes A gatekeeper constraint has been added to prohibit all pods from running with PID/IPC
Dangerous capabilities No Yes A gatekeeper constraint has been added to prohibit dangerous capabilities and a mutation has been added to set a default.
Exception Some pods such as contour/envoy need advanced privileges to function. Tanzu System Ingress needs to interact with the network
Exec into container No No Kubernetes ships with accounts that have exec access to pods this is likely needed by admins and a customer facing solution would be advised. Such as removing exec in RBAC for normal end users
Allowed hostPath No Yes A gatekeeper constraint has been added to prevent the host path from being mounted
hostNetwork access No Yes A gatekeeper constraint has been added to prevent the host network from being used.
Exception The kapp controller needs access to the host for Tanzu to function and is the only pod outside the control plane allowed host network access
Exposed dashboard Yes
Cluster-admin binding No No A cluster-admin binding is needed for k8s to start and should be the only one in the cluster
Resource policies No Yes Fixed by setting a default for all pods through a gatekeeper mutation
Control plane hardening Yes
Insecure capabilities No Yes A gatekeeper constraint has been added to prohibit dangerous capabilities and a mutation has been added to set a default.
Exception Some pods such as contour/envoy need advanced privileges to function. Tanzu System Ingress needs to interact with the network
Immutable container filesystem No Yes A gatekeeper constraint has been added to prevent readOnlyRootFilesystems from being deactivated.
Exception Pods created by contour/envoy, Fluentd, the kapp controller, telemetry agents, and all other data services that need to run on k8s
Caution This mutation can cause issues within the cluster and may not be the wisest to implement.
Privileged container No Yes By default all pods have privileged set to false but a constraint has been added to enforce that a user does not enable it.
Ingress and Egress blocked No Yes A default deny cluster network policy can be implemented in Antrea
Container hostPort No Yes A gatekeeper constraint has been added to ensure users do not use hostPorts.
Exception The Kapp controller needs access to the host for Tanzu to function and is the only pod outside the control plane allowed host network access
Network policies No Yes A suite of network policies can be installed to ensure all namespaces have a network policy
Fluent Bit Forwarding to SIEM No Yes Fluent bit needs to be installed and point at a valid output location
Fluent Bit Retry Enabled No Yes Fluent bit needs to be installed by the user with retries enabled in the configuration
IaaS metadata endpoint blocked No Yes A Cluster Network Policy can be implemented to restrict all pods from hitting the endpoint
check-circle-line exclamation-circle-line close-line
Scroll to top icon