Pod and Container Networking

This topic describes how to customize pod and container networking for workload clusters, including using a cluster network interface (CNI) other than the default Antrea, and supporting publicly-routable, no-NAT IP addresses for workload clusters on vSphere with VMware NSX networking.

For how to configure group Managed Service Accounts (gMSAs) for pods and containers on Windows, see Configure GMSA for Windows Pods and containers in the Kubernetes documentation.

Create a Cluster with a Non-Default CNI

When you use the Tanzu CLI to deploy a workload cluster, an Antrea cluster network interface (CNI) is automatically enabled in the cluster. Alternatively, you can enable Calico CNI or your own CNI provider.

Because auto-managed packages are managed by Tanzu Kubernetes Grid, you typically do not need to update their configurations. However, you may want to create a workload cluster that uses a custom CNI, such as Calico. The following sections provide the steps for configuring a custom CNI such as Calico.

Custom CNI for Standalone Management Cluster-Deployed Clusters

Workload clusters deployed by a standalone management cluster with a version of Tanzu Kubernetes Grid earlier than 1.2.x and then upgraded to v1.3 continue to use Calico as the CNI provider. You cannot change the CNI provider for these clusters.

You can change the default CNI for a workload cluster that you are deploying from a standalone management cluster by specifying the CNI variable in the configuration file. The CNI variable supports the following options:

(Default) antrea: Enables Antrea.
calico: Enables Calico. See Calico CNI. This option is not supported on Windows.
none: Allows you to enable a custom CNI provider. See Custom CNI.

If you do not set the CNI variable, Antrea is enabled by default.

Calico CNI

To enable Calico in a workload cluster, specify the following in the configuration file:

CNI: calico

After the cluster creation process completes, you can examine the cluster as described in Connect to and Examine Workload Clusters.

Custom CNI

To enable a custom CNI provider other than Calico in a workload cluster, follow the steps below:

Specify CNI: none in the configuration file when you create the cluster. For example:
```
CNI: none
```
The cluster creation process will not succeed until you apply a CNI to the cluster. You can monitor the cluster creation process in the Cluster API logs on the management cluster. For instructions on how to access the Cluster API logs, see Logs and Monitoring.
After the cluster has been initialized, apply your CNI provider to the cluster:
1. Get the admin credentials of the cluster. For example:
```
tanzu cluster kubeconfig get my-cluster --admin
```
2. Set the context of kubectl to the cluster. For example:
```
kubectl config use-context my-cluster-admin@my-cluster
```
3. Apply the CNI provider to the cluster:
```
kubectl apply -f PATH-TO-YOUR-CNI-CONFIGURATION/example.yaml
```
Monitor the status of the cluster by using the tanzu cluster list command. When the cluster creation completes, the cluster status changes from creating to running. For more information about how to examine your cluster, see Connect to and Examine Workload Clusters.

Calico CNI for Supervisor or Single-Node Class-Based Workload Clusters

To install calico instead of antrea on a class-based cluster that’s deployed by a Supervisor or deployed as a single-node workload cluster by a standalone management cluster, you first need to customize the cluster’s ClusterBootstrap object as follows:

Create a YAML file that contains the following Kubernetes objects:

apiVersion: cni.tanzu.vmware.com/v1alpha1
kind: CalicoConfig
metadata:
name: CLUSTER-NAME
namespace: CLUSTER-NAMESPACE
spec:
calico:
  config:
    vethMTU: 0
---
apiVersion: run.tanzu.vmware.com/v1alpha3
kind: ClusterBootstrap
metadata:
annotations:
  tkg.tanzu.vmware.com/add-missing-fields-from-tkr: TKR-VERSION
name: CLUSTER-NAME
namespace: CLUSTER-NAMESPACE
spec:
additionalPackages: # Customize additional packages
- refName: metrics-server*
- refName: secretgen-controller*
- refName: pinniped*
cni:
  refName: calico*
  valuesFrom:
    providerRef:
      apiGroup: cni.tanzu.vmware.com
      kind: CalicoConfig
      name: CLUSTER-NAME

Where:

CLUSTER-NAME is the name of the workload cluster that you intend to create.
CLUSTER-NAMESPACE is the namespace of the workload cluster.
TKR-VERSION is the version of the Tanzu Kubernetes release (TKr) that you intend to use for the workload cluster. For example:
- v1.23.5+vmware.1-tkg.1 for a Supervisor-deployed cluster, or
- v1.24.10---vmware.1-tiny.1-tkg.1 for a single-node cluster as described in Single-Node Clusters on vSphere (Technical Preview)

For single-node clusters, delete the spec.additionalPackages block from the ClusterBootstrap definition. Single-node clusters do not have the additional metrics-server, secretgen-controller, and pinniped packages.
Apply the file by running the kubectl apply -f command against the management cluster, whether it is a Supervisor or a standalone management cluster.

Create a YAML file for the Cluster object that contains the following configuration:

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: CLUSTER-NAME
namespace: CLUSTER-NAMESPACE
spec:
clusterNetwork:
  services:
    cidrBlocks: ["SERVICES-CIDR"]
  pods:
    cidrBlocks: ["PODS-CIDR"]
  serviceDomain: "SERVICE-DOMAIN"
topology:
  class: tanzukubernetescluster
  version: TKR-VERSION
  controlPlane:
    replicas: 1
  workers:
    machineDeployments:
      - class: node-pool
        name: NODE-POOL-NAME
        replicas: 1
  variables:
    - name: vmClass
      value: VM-CLASS
    # Default storageClass for control plane and node pool
    - name: storageClass
      value: STORAGE-CLASS-NAME

Where:

CLUSTER-NAME is the name of the workload cluster that you intend to create.
CLUSTER-NAMESPACE is the namespace of the workload cluster.
SERVICES-CIDR is the CIDR block for services. For example, 198.51.100.0/12.
PODS-CIDR is the CIDR block for pods. For example, 192.0.2.0/16.
SERVICE-DOMAIN is the service domain name. For example, cluster.local.
TKR-VERSION is the version of the TKr that you intend to use for the workload cluster. For example, v1.23.5+vmware.1-tkg.1.
NODE-POOL-NAME is the name of the node pool for machineDeployments.
VM-CLASS is the name of the VM class that you want to use for your cluster. For example, best-effort-small.
STORAGE-CLASS-NAME is the name of the storage class that you want to use for your cluster. For example, wcpglobal-storage-profile.

For example:

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: my-workload-cluster
namespace: my-workload-cluster-namespace
spec:
clusterNetwork:
 services:
   cidrBlocks: ["198.51.100.0/12"]
 pods:
   cidrBlocks: ["192.0.2.0/16"]
 serviceDomain: "cluster.local"
topology:
 class: tanzukubernetescluster
 version: v1.23.5+vmware.1-tkg.1
 controlPlane:
   replicas: 1
 workers:
   machineDeployments:
     - class: node-pool
       name: my-node-pool
       replicas: 1
 variables:
   - name: vmClass
     value: best-effort-small
   # Default storageClass for control plane and node pool
   - name: storageClass
     value: wcpglobal-storage-profile

Create the workload cluster by passing the Cluster object definition file that you created in the step above to the -f option of the tanzu cluster create command.

Calico CNI for Supervisor-Deployed TKC-Based Clusters

To install calico instead of antrea on a Supervisor-deployed workload cluster of type TanzuKubernetesCluster, set the CNI configuration variable in the cluster configuration file that you plan to use to create your workload cluster and then pass the file to the -f option of the tanzu cluster create command. For example, CNI: calico.

Enable Multiple CNI Providers

To enable multiple CNI providers on a workload cluster, such as macvlan, ipvlan, SR-IOV or DPDK, install the Multus package onto a cluster that is already running Antrea or Calico CNI, and create additional NetworkAttachmentDefinition resources for CNIs. Then you can create new pods in the cluster that use different network interfaces for different address ranges.

For directions, see Deploy Multus on Workload Clusters.

Deploy Pods with Routable, No-NAT IP Addresses (NSX)

On vSphere with NSX networking and the Antrea container network interface (CNI), you can configure a workload clusters with routable IP addresses for its worker pods, bypassing network address translation (NAT) for external requests from and to the pods.

Routable IP addresses on pods let you:

Trace outgoing requests to common shared services, because their source IP address is the routable pod IP address, not a NAT address.
Support authenticated incoming requests from the external internet directly to pods, bypassing NAT.

Configure NSX for Routable-IP Pods

To configure NSX to support routable IP addresses for worker pods:

Browse to your NSX server and open the Networking tab.
Under Connectivity > Tier-1 Gateways, click Add Tier-1 Gateway and configure a new Tier-1 gateway dedicated to routable-IP pods:
- Name: Make up a name for your routable pods T1 gateway.
- Linked Tier-0 Gateway: Select the Tier-0 gateway that your other Tier-1 gateways for Tanzu Kubernetes Grid use.
- Edge Cluster: Select an existing edge cluster.
- Route Advertisement: Enable All Static Routes, All NAT IP’s, and All Connected Segments & Service Ports.
Click Save to save the gateway.
Under Connectivity > Segments, click Add Segment and configure a new NSX segment, a logical switch, for the workload cluster nodes containing the routable pods:
- Name: Make up a name for the network segment for the workload cluster nodes.
- Connectivity: Select the Tier-1 gateway that you just created.
- Transport Zone: Select an overlay transport zone, such as tz-overlay.
- Subnets: Choose an IP address range for cluster nodes, such as 195.115.4.1/24. This range should not overlap with DHCP profile Server IP Address values.
- Route Advertisement: Enable All Static Routes, All NAT IP’s, and All Connected Segments & Service Ports.
Click Save to save the gateway.

Supervisor-Deployed TKC Pods with Routable IP Addresses

For how to deploy a TKC cluster with worker pods that have routable, no-NAT IP addresses, see v1beta1 Example: Cluster with Routable Pods Network.

Standalone Management Cluster-Deployed Pods with Routable IP Addresses

To use a standalone management cluster to deploy a workload cluster with worker pods that have routable, no-NAT IP addresses, do the following. The cluster’s CLUSTER_CIDR setting configures the range of its publicly-routable IP addresses.

Create a workload cluster configuration file as described in Create a Workload Cluster Configuration File and as follows:
- To set the block of routable IP addresses assigned to worker pods, you can either:
  - Set CLUSTER_CIDR in the workload cluster configuration file, or
  - Prepend your tanzu cluster create command with a CLUSTER_CIDR= setting, as shown in the following step.
- Set NSXT_POD_ROUTING_ENABLED to "true".
- Set NSXT_MANAGER_HOST to your NSX manager IP address.
- Set NSXT_ROUTER_PATH to the inventory path of the newly-added Tier-1 gateway for routable IPs. Obtain this from NSX manager > Connectivity > Tier-1 Gateways by clicking the menu icon () to the left of the gateway name and clicking Copy Path to Clipboard. The name starts with "/infra/tier-1s/
- Set other NSXT_ string variables for accessing NSX by following the NSX Pod Routing table in the Configuration File Variable Reference. Pods can authenticate with NSX in one of four ways, with the least secure listed last:
  - Certificate: Set NSXT_CLIENT_CERT_KEY_DATA, NSXT_CLIENT_CERT_KEY_DATA, and for a CA-issued certificate, NSXT_ROOT_CA_DATA_B64.
  - VMware Identity Manager token on VMware Cloud (VMC): Set NSXT_VMC_AUTH_HOST and NSXT_VMC_ACCESS_TOKEN.
  - Username/password stored in a Kubernetes secret: Set NSXT_SECRET_NAMESPACE, NSXT_SECRET_NAME, NSXT_USERNAME, and NSXT_PASSWORD.
  - Username/password as plaintext in configuration file: Set NSXT_USERNAME and NSXT_PASSWORD.

Run tanzu cluster create as described in Create Workload Clusters. For example:

$ CLUSTER_CIDR=100.96.0.0/11 tanzu cluster create my-routable-work-cluster -f my-routable-work-cluster-config.yaml
Validating configuration...
Creating workload cluster 'my-routable-work-cluster'...
Waiting for cluster to be initialized...
Waiting for cluster nodes to be available...

Validate Routable IPs

To test routable IP addresses for your workload pods:

Deploy a webserver to the routable workload cluster.
Run kubectl get pods --o wide to retrieve NAME, INTERNAL-IP and EXTERNAL-IP values for your routable pods, and verify that the IP addresses listed are identical and are within the routable CLUSTER_CIDR range.
Run kubectl get nodes --o wide to retrieve NAME, INTERNAL-IP and EXTERNAL-IP values for the workload cluster nodes, which contain the routable-IP pods.
Log in to a different workload cluster’s control plane node:
1. Run kubectl config use-context CLUSTER-CONTEXT to change context to the different cluster.
2. Run kubectl get nodes to retrieve the IP address of the current cluster’s control plane node.
3. Run ssh capv@CONTROLPLANE-IP using the IP address you just retrieved.
4. ping and send curl requests to the routable IP address where you deployed the webserver, and confirm its responses.
  - ping output should list the webserver’s routable pod IP as the source address.
From a browser, log in to NSX and navigate to the Tier-1 gateway that you created for routable-IP pods.
Click Static Routes and confirm that the following routes were created within the routable CLUSTER_CIDR range:
1. A route for pods in the workload cluster’s control plane node, with Next Hops shown as the address of the control plane node itself.
2. A route for pods in the workload cluster’s worker nodes, with Next Hops shown as the addresses of the worker nodes themselves.

Delete Routable IPs

After you delete a workload cluster that contains routable-IP pods, you may need to free the routable IP addresses by deleting them from T1 router:

In the NSX manager > Connectivity > Tier-1 Gateways select your routable-IP gateway.
Under Static Routes click the number of routes to open the list.
Search for routes that include the deleted cluster name, and delete each one from the menu icon () to the left of the route name.
1. If if a permissions error prevents you from deleting the route from the menu, which may happen if the route is created by a certificate, delete the route via the API:
  1. From the menu next to the route name, select Copy Path to Clipboard.
  2. Run curl -i -k -u 'NSXT_USERNAME:NSXT_PASSWORD' -H 'Content-Type: application/json' -H 'X-Allow-Overwrite: true' -X DELETE https://NSXT_MANAGER_HOST/policy/api/v1/STATIC-ROUTE-PATH where:
    - NSXT_MANAGER_HOST, NSXT_USERNAME, and NSXT_PASSWORD are your NSX manager IP address and credentials
    - STATIC_ROUTE_PATH is the path that you just copied to the clipboard. The name starts with /infra/tier-1s/ and includes /static-routes/.

Set Network Policies for the CNIs

To restrict a workload cluster from accessing the VMware vCenter Server management interface, set appropriate network policies on the Antrea and the Calico CNIs. When you configure these policies, only the traffic originating from the container network is filtered. The policies block the traffic originating from all pods except those from the container storage interface (CSI) and Cloud Provider Interface (CPI) pods.

Set Cluster Network Policies for Antrea

Set the cluster network policies for Antrea through the antrea-policy-csi-cpi.yaml file in the workload cluster. To do this:

In the Tanzu CLI, switch to the workload cluster context:
```
kubectl config use-context WORKLOAD-CLUSTER-CONTEXT
```

Create the antrea-policy-csi-cpi.yaml file, as shown in the following example:

apiVersion: crd.antrea.tanzu.vmware.com/v1alpha1
kind: TierEntitlement
metadata:
  name: edit-system-tiers
spec:
  permission: edit
  tiers:
  - emergency
  - securityops
  - networkops
  - platform
# application and baseline Tiers are not restricted
---
apiVersion: crd.antrea.tanzu.vmware.com/v1alpha1
kind: TierEntitlementBinding
metadata:
  name: admin-edit-system-tiers
spec:
  # Allow only admin to attach Antrea ClusterNetworkPolicy and NetworkPolicy to system Tiers
  subjects:
  - kind: User
    name: admin
  tierEntitlement: edit-system-tiers
---
apiVersion: crd.antrea.io/v1alpha3
kind: ClusterGroup
metadata:
  name: vc-ip
spec:
  ipBlocks:
  - cidr: VC_IP_CIDR # Enter the IP CIDR of vCenter Server, for example 192.168.1.1/3.
---
apiVersion: crd.antrea.io/v1alpha3
kind: ClusterGroup
metadata:
  name: csi-cpi-pods
spec:
  namespaceSelector:
    matchLabels:
      kubernetes.io/metadata.name: kube-system
    podSelector:
      matchExpressions:
      - key: k8s-app
        operator: In
        values: [vsphere-cloud-controller-manager]
---
apiVersion: crd.antrea.io/v1alpha1
kind: ClusterNetworkPolicy
metadata:
  name: allow-csi-cpi-egress-vc
spec:
  priority: 5
  tier: emergency
  appliedTo:
  - group: csi-cpi-pods
  egress:
  - action: Pass
    to:
    - group: vc-ip
---
apiVersion: crd.antrea.io/v1alpha1
kind: ClusterNetworkPolicy
metadata:
  name: drop-egress-vc
spec:
  priority: 10
  tier: emergency
  appliedTo:
  - namespaceSelector: {}  # Selects all Namespaces in the cluster
  egress:
  - action: Drop
    to:
    - group: vc-ip

Note
In the cidr: field, enter the IP CIDR of vCenter Server, for example 192.168.1.1/32.

Apply the file:

kubectl apply -f antrea-policy-csi-cpi.yaml

Set Network Policies for Calico

Set the cluster network policies for Calico through the gnp.yaml file in the workload cluster. To do this:

Download the calicoctl utility binary for your operating system from the Github location.

Install the utility on your system. For example, to download and install the utility on a Linux system:

wget https://github.com/projectcalico/calico/releases/download/CALICO-VERSION/calicoctl-linux-amd64
mv calicoctl-linux-amd64 calicoctl
chmod +x calicoctl

In the Tanzu CLI, switch to the workload cluster context:
```
kubectl config use-context WORKLOAD-CLUSTER-CONTEXT
```

Create the gnp.yaml file, as shown in the following example:

apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
name: vcenter-egress-deny-all
spec:
order: 1000
types:
  - Egress
egress:
  - action: Allow
    destination:
      notNets:
      -  VC_IP_CIDR # Enter the IP CIDR of vCenter Server, for example 192.168.1.1/32.

---
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
name: vcenter-egress-allow-csi-cpi
spec:
order: 0
types:
  - Egress
egress:
  - action: Allow
    source:
      selector: app == 'vsphere-csi-node' || app == 'vsphere-csi-controller' || k8s-app == 'vsphere-cloud-controller-manager'
    destination:
      nets:
      - VC_IP_CIDR # Enter the IP CIDR of vCenter Server, for example 192.168.1.1/32.

Note
Under the notNets: and the nets: fields, enter the IP CIDR of vCenter Server, for example 192.168.1.1/32.

Apply the file:
```
./calicoctl apply -f gnp.yaml
```

To know more about the selector options in Calico, see EntityRule in the Calico documentation.

Pod Security Admission Controller (Technical Preview)

For namespaces within clusters running Kubernetes v1.23 and above, TKG supports applying pod security policies of type privileged, baseline, or restricted via the Pod Security Admission (PSA) controller, as described in Pod Security Standards in the Kubernetes documentation.

Note
This feature is in the unsupported Technical Preview state; see TKG Feature States.

Pod Security Policies (PSPs) for nodes are deprecated in TKG v2.1, to reflect their deprecation in Kubernetes; for how to migrate pods from PSPs to the PSA controller, see Migrate from PodSecurityPolicy to the Built-In PodSecurity Admission Controller.

By default, Kubernetes v1.24 cluster namespaces have their pod security warn and audit modes set to baseline, which is a no-enforce setting. This means that migrating to the PSA controller may generate warnings about pods violating policy, but the pods will continue to run.

It is a known issue that some TKG packages and components do not comply with the baseline mode; for these, the fastest workaround is to set the labels audit=privileged and warn=privileged in the affected namespaces to suppress violation audit messages and warnings:

kubectl label --overwrite ns NAMESPACE pod-security.kubernetes.io/audit=privileged
kubectl label --overwrite ns NAMESPACE pod-security.kubernetes.io/warn=privileged

Where NAMESPACE is the namespace for any installed package or other components listed below:

Namespace	Package or component
`avi-system`	AKO (`load-balancer-and-ingress-service`)
`cert-manager`	Cert Manager
`pinniped-concierge`	Pinniped
`sonobuoy`	Sonobuoy
`tanzu-auth`	Auth service (for standalone management cluster)
`tanzu-system-ingress`	Contour, Envoy
`tanzu-system-logging`	Fluent Bit
`tanzu-system-monitoring`	Prometheus
`velero`	Restic, Velero
`vmware-system-auth`	Auth service (for Supervisor)

These namespaces contain the package components, as distinct from the user-chosen namespaces or default, where the packages themselves are installed.