This topic describes how to install and configure VMware Tanzu Kubernetes Grid Integrated Edition (TKGI) on vSphere with NSX integration as a VMware Tanzu Operations Manager (Ops Manager) tile.
Before you begin this procedure, ensure that you have successfully completed all preceding steps for installing Tanzu Kubernetes Grid Integrated Edition on vSphere with NSX, including:
To install and configure TKGI:
To install Tanzu Kubernetes Grid Integrated Edition, do the following:
https://YOUR-OPS-MANAGER-FQDN/in a browser to log in to the Ops Manager Installation Dashboard.
To configure TKGI:
Click the orange Tanzu Kubernetes Grid Integrated Edition tile to start the configuration process.
Note: Configuration of NSX-T or Flannel cannot be changed after initial installation and configuration of Tanzu Kubernetes Grid Integrated Edition.
WARNING: When you configure the Tanzu Kubernetes Grid Integrated Edition tile, do not use spaces in any field entries. This includes spaces between characters as well as leading and trailing spaces. If you use a space in any field entry, the deployment of Tanzu Kubernetes Grid Integrated Edition fails.
To configure the availability zones (AZs) and networks used by the Tanzu Kubernetes Grid Integrated Edition control plane:
Click Assign AZs and Networks.
Under Place singleton jobs in, select the availability zone (AZ) where you want to deploy the TKGI API and TKGI Database VMs.
Under Balance other jobs in, select the AZ for balancing other Tanzu Kubernetes Grid Integrated Edition control plane jobs.
Note: You must specify the Balance other jobs in AZ, but the selection has no effect in the current version of Tanzu Kubernetes Grid Integrated Edition.
ls-tkgi-mgmtNSX-T logical switch you created in the Create Networks Page step of Configuring BOSH Director with NSX-T for Tanzu Kubernetes Grid Integrated Edition. This provides network placement for Tanzu Kubernetes Grid Integrated Edition component VMs, such as the TKGI API and TKGI Database VMs.
ls-tkgi-serviceNSX-T logical switch that Tanzu Kubernetes Grid Integrated Edition created for you during installation. The service network provides network placement for existing on-demand Kubernetes cluster service instances that were created by the Tanzu Kubernetes Grid Integrated Edition broker.
Perform the following steps:
Click TKGI API.
Under Certificate to secure the TKGI API, provide a certificate and private key pair.
The certificate that you supply must cover the specific subdomain that routes to the TKGI API VM with TLS termination on the ingress. If you use UAA as your OIDC provider, this certificate must be a proper certificate chain and have a SAN field.
Warning: TLS certificates generated for wildcard DNS records only work for a single domain level. For example, a certificate generated for
*.tkgi.EXAMPLE.com does not permit communication to
*.api.tkgi.EXAMPLE.com. If the certificate does not contain the correct FQDN for the TKGI API, calls to the API will fail.
api.tkgi.example.com. To retrieve the public IP address or FQDN of the TKGI API load balancer, log in to your IaaS console.
Note: The FQDN for the TKGI API must not contain uppercase letters or trailing whitespace.
max_in_flightvariable value. The
max_in_flightsetting limits the number of component instances the TKGI CLI creates or starts simultaneously when running
tkgi update-cluster. By default,
max_in_flightis set to
4, limiting the TKGI CLI to creating or starting a maximum of four component instances in parallel.
A plan defines a set of resource types used for deploying a cluster.
You must first activate and configure Plan 1,
and afterwards you can activate up to twelve additional, optional, plans.
To activate and configure a plan, perform the following steps:
Note: Plans 11, 12, and 13 support Windows worker-based Kubernetes clusters on vSphere with NSX-T, and are a beta feature on vSphere with Flannel. To configure a Windows worker plan see Plans in Configuring Windows Worker-Based Kubernetes Clusters for more information.
Note: If you deploy a cluster with multiple control plane/etcd node VMs, confirm that you have sufficient hardware to handle the increased load on disk write and network traffic. For more information, see Hardware recommendations in the etcd documentation.
In addition to meeting the hardware requirements for a multi-control plane node cluster, we recommend configuring monitoring for etcd to monitor disk latency, network latency, and other indicators for the health of the cluster. For more information, see Configuring Telegraf in TKGI.
WARNING: To change the number of control plane/etcd nodes for a plan, you must ensure that no existing clusters use the plan. Tanzu Kubernetes Grid Integrated Edition does not support changing the number of control plane/etcd nodes for plans with existing clusters.
Under Master/ETCD VM Type, select the type of VM to use for Kubernetes control plane/etcd nodes. For more information, including control plane node VM customization options, see the Control Plane Node VM Size section of VM Sizing for Tanzu Kubernetes Grid Integrated Edition Clusters.
Under Master Persistent Disk Type, select the size of the persistent disk for the Kubernetes control plane node VM.
Under Master/ETCD Availability Zones, select one or more AZs for the Kubernetes clusters deployed by Tanzu Kubernetes Grid Integrated Edition. If you select more than one AZ, Tanzu Kubernetes Grid Integrated Edition deploys the control plane VM in the first AZ and the worker VMs across the remaining AZs. If you are using multiple control plane nodes, Tanzu Kubernetes Grid Integrated Edition deploys the control plane and worker VMs across the AZs in round-robin fashion.
Note: Tanzu Kubernetes Grid Integrated Edition does not support changing the AZs of existing control plane nodes.
Note: Changing a plan’s Worker Node Instances setting does not alter the number of worker nodes on existing clusters. For information about scaling an existing cluster, see Scale Horizontally by Changing the Number of Worker Nodes Using the TKGI CLI in Scaling Existing Clusters.
Note: Tanzu Kubernetes Grid Integrated Edition requires a Worker VM Type with an ephemeral disk size of 32 GB or more.
Under Worker Persistent Disk Type, select the size of the persistent disk for the Kubernetes worker node VMs.
Under Worker Availability Zones, select one or more AZs for the Kubernetes worker nodes. Tanzu Kubernetes Grid Integrated Edition deploys worker nodes equally across the AZs you select.
Under Kubelet customization - system-reserved, enter resource values that Kubelet can use to reserve resources for system daemons. For example,
memory=250Mi, cpu=150m. For more information about system-reserved values, see the Kubernetes documentation.
EVICTION-SIGNAL=QUANTITY. For example,
memory.available=100Mi, nodefs.available=10%, nodefs.inodesFree=5%. For more information about eviction thresholds, see the Kubernetes documentation.
WARNING: Use the Kubelet customization fields with caution. If you enter values that are invalid or that exceed the limits the system supports, Kubelet might fail to start. If Kubelet fails to start, you cannot create clusters.
---as a separator. For more information, see Adding Custom Linux Workloads.
Note: The SecurityContextDeny admission controller has been deprecated, and the Kubernetes community recommends the controller not be used. TKGI support for SecurityContextDeny will be removed in TKGI v1.18. Pod security admission (PSA) is the preferred method for providing a more secure Kubernetes environment. For more information about PSA, see Pod Security Admission in TKGI.
0, the node drain does not terminate.
(Optional) Under Pod Shutdown Grace Period (seconds), enter a timeout in seconds for the node to wait before it forces the pod to terminate. If you set this value to
-1, the default timeout is set to the one specified by the pod.
(Optional) To configure when the node drains, activate the following:
Warning: If you select Force node to drain even if pods are still running after timeout, the node halts all running workloads on pods. Before enabling this configuration, set Node Drain Timeout to a value greater than
For more information about configuring default node drain behavior, see Worker Node Hangs Indefinitely in Troubleshooting.
To deactivate a plan, perform the following steps:
In the procedure below, you use credentials for vCenter master VMs. You must have provisioned the service account with the correct permissions. For more information, see Create the Master Node Service Account in Preparing vSphere Before Deploying Tanzu Kubernetes Grid Integrated Edition.
To configure your Kubernetes cloud provider settings, follow the procedure below:
user@domainname, for example: “firstname.lastname@example.org”. For more information about the master node service account, see Preparing vSphere Before Deploying Tanzu Kubernetes Grid Integrated Edition.
Warning: The vSphere Container Storage Plug-in will not function if you do not specify the domain name for active directory users.
Note: The FQDN for the vCenter Server cannot contain uppercase letters.
example-ds. Populate Datastore Name with the Persistent Datastore name configured in your BOSH Director tile under vCenter Config > Persistent Datastore Names.
Note: The vSphere datastore type must be Datastore. Tanzu Kubernetes Grid Integrated Edition does not support the use of vSphere Datastore Clusters with or without Storage DRS. For more information, see Datastores and Datastore Clusters in the vSphere documentation.
Note: The Datastore Name is the default datastore used if the Kubernetes cluster
StorageClass does not define a
StoragePolicy. Do not enter a datastore that is a list of BOSH Job/VMDK datastores. For more information, see PersistentVolume Storage Options on vSphere.
Note: For multi-AZ and multi-cluster environments, your Datastore Name must be a shared Persistent datastore available to each vSphere cluster. Do not enter a datastore that is local to a single cluster. For more information, see PersistentVolume Storage Options on vSphere.
To configure networking, do the following:
Note: The NSX Manager CA Cert field and the Disable SSL certificate verification option are intended to be mutually exclusive. If you deactivate SSL certificate verification, leave the CA certificate field blank. If you enter a certificate in the NSX Manager CA Cert field, do not deactivate SSL certificate verification. If you populate the certificate field and deactivate certificate validation, insecure mode takes precedence.
Configure the NSX-T networking objects, including the Pods IP Block ID, Nodes IP Bock ID, T0 Router ID, Floating IP Pool ID, Nodes DNS, vSphere Cluster Names, and Kubernetes Service Network CIDR Range. Each of the these fields are described in more detail beneath the example screenshots. If you are using the NSX-T Policy API, you must have created the Pods IP Block ID, Nodes IP Bock ID, T0 Router ID, and Floating IP Pool ID objects using the NSX-T Policy API. See Create NSX-T Objects for Kubernetes Clusters Using the Policy Interface.
View a larger version of this image.
t0-tkgiT0 router UUID. Locate this value in the NSX-T UI router overview.
ip-pool-vipsID that you created for load balancer VIPs. For more information, see Plan Network CIDRs in Network Planning for Installing Tanzu Kubernetes Grid Integrated Edition with NSX-T. Tanzu Kubernetes Grid Integrated Edition uses the floating IP pool to allocate IP addresses to the load balancers created for each of the clusters. The load balancer routes the API requests to the control plane nodes and the data plane.
10.100.200.0/24. The IP address used here is internal to the cluster and can be anything, such as
/24subnet provides 256 IPs. If you have a cluster that requires more than 256 IPs, define a larger subnet, such as
Under TKGI Operation Timeout, enter the timeout for TKGI-API operation in milliseconds. Increase the timeout if you experience timeouts during cluster deletion in large-scale NSX environments. The default TKGI Operation Timeout value is
120000, 120 seconds. To determine the optimal Operation Timeout setting, see Cluster Deletion Fails in General Troubleshooting.
Note: If you use the TKGI MC, the TKGI MC configuration YAML
nsx_feign_client_read_timeout configuration overrides the TKGI tile TKGI Operation Timeout setting. For more information about configuring the Operation Timeout setting in TKGI MC, see Generate Configuration File and Deploy Tanzu Kubernetes Grid Integrated Edition in Deploy Tanzu Kubernetes Grid Integrated Edition by Using the Configuration Wizard.
(Optional) Configure a global proxy for all outgoing HTTP and HTTPS traffic from your Kubernetes clusters and the TKGI API server. See Using Proxies with Tanzu Kubernetes Grid Integrated Edition on NSX-T for instructions on how to enable a proxy.
To configure the UAA server:
Under TKGI API Access Token Lifetime, enter a time in seconds for the TKGI API access token lifetime. This field defaults to
Under TKGI API Refresh Token Lifetime, enter a time in seconds for the TKGI API refresh token lifetime. This field defaults to
Note: VMware recommends using the default UAA token timeout values. By default, access tokens expire after ten minutes and refresh tokens expire after six hours.
oidc:, UAA creates a group name like
oidc:developers. The default value is
user_name. Depending on your provider, you can enter claims besides
oidc:, UAA creates a user name like
oidc:admin. The default value is
Warning: VMware recommends adding OIDC prefixes to prevent users and groups from gaining unintended cluster privileges. If you change the above values for a pre-existing Tanzu Kubernetes Grid Integrated Edition installation, you must change any existing role bindings that bind to a user name or group. If you do not change your role bindings, developers cannot access Kubernetes clusters. For instructions, see Managing Cluster Access and Permissions.
redirect_uriURIs to your clusters. UAA redirect URIs configured in the TKGI cluster client redirect URIs field persist through cluster updates and TKGI upgrades.
In Host Monitoring, you can configure monitoring of nodes and VMs using Syslog, VMware vRealize Log Insight (vRLI) Integration, or Telegraf.
You can configure one or more of the following:
For more information about these components, see Monitoring TKGI and TKGI-Provisioned Clusters.
To configure Syslog for all BOSH-deployed VMs in Tanzu Kubernetes Grid Integrated Edition:
Note: Logs might contain sensitive information, such as cloud provider credentials. VMware recommends that you enable TLS encryption for log forwarding.
Note: You do not need to provide a new certificate if the TLS certificate for the destination syslog endpoint is signed by a Certificate Authority (CA) in your BOSH certificate store.
Note: Before you configure the vRLI integration, you must have a vRLI license and vRLI must be installed, running, and available in your environment. You need to provide the live instance address during configuration. For instructions and additional information, see the vRealize Log Insight documentation.By default, vRLI logging is deactivated. To configure vRLI logging:
Note: Deactivating certificate validation is not recommended for production environments.
0means that the rate is not limited, which suffices for many deployments.
Note: If your deployment is generating a high volume of logs, you can increase this value to limit network traffic. Consider starting with a lower value, such as
10, then tuning to optimize for your deployment. A large number might result in dropping too many log entries.
Note: The Tanzu Kubernetes Grid Integrated Edition tile does not validate your vRLI configuration settings. To verify your setup, look for log entries in vRLI.
In In-Cluster Monitoring, you can configure one or more observability components and integrations that run in Kubernetes clusters and capture logs and metrics about your workloads. For more information, see Monitoring Workers and Workloads.
To configure in-cluster monitoring:
To configure sink resources, see:
You can enable both log and metric sink resources or only one of them.
You can monitor Kubernetes clusters and pods metrics externally using the integration with Wavefront by VMware.
Note: Before you configure Wavefront integration, you must have an active Wavefront account and access to a Wavefront instance. You provide your Wavefront access token during configuration. For additional information, see the Wavefront documentation.
To use Wavefront with Windows worker-based clusters, developers must install Wavefront to their clusters manually, using Helm.
To enable and configure Wavefront monitoring:
The Tanzu Kubernetes Grid Integrated Edition tile does not validate your Wavefront configuration settings. To verify your setup, look for cluster and pod metrics in Wavefront.
You can monitor Tanzu Kubernetes Grid Integrated Edition Kubernetes clusters with VMware vRealize Operations Management Pack for Container Monitoring.
To integrate Tanzu Kubernetes Grid Integrated Edition with VMware vRealize Operations Management Pack for Container Monitoring, you must deploy a container running cAdvisor in your TKGI deployment.
cAdvisor is an open source tool that provides monitoring and statistics for Kubernetes clusters.
To deploy a cAdvisor container:
For more information about integrating this type of monitoring with TKGI, see the VMware vRealize Operations Management Pack for Container Monitoring User Guide and Release Notes in the VMware documentation.
You can configure TKGI-provisioned clusters to send Kubernetes node metrics and pod metrics to metric sinks. For more information about metric sink resources and what to do after you enable them in the tile, see Sink Resources in Monitoring Workers and Workloads.
To enable clusters to send Kubernetes node metrics and pod metrics to metric sinks:
DaemonSet, a pod that runs on each worker node in all your Kubernetes clusters.
(Optional) To enable Node Exporter to send worker node metrics to metric sinks of kind
ClusterMetricSink, select Enable node exporter on workers. If you enable this check box, Tanzu Kubernetes Grid Integrated Edition deploys Node Exporter as a
DaemonSet, a pod that runs on each worker node in all your Kubernetes clusters.
For instructions on how to create a metric sink of kind
ClusterMetricSink for Node Exporter metrics, see Create a ClusterMetricSink Resource for Node Exporter Metrics in Creating and Managing Sink Resources.
You can configure TKGI-provisioned clusters to send Kubernetes API events and pod logs to log sinks. For more information about log sink resources and what to do after you enable them in the tile, see Sink Resources in Monitoring Workers and Workloads.
To enable clusters to send Kubernetes API events and pod logs to log sinks:
DaemonSet, a pod that runs on each worker node in all your Kubernetes clusters.
(Optional) To increase the Fluent Bit Pod memory limit, enter a value greater than 100 in the Fluent-bit container memory limit(Mi) field.
Tanzu Mission Control integration lets you monitor and manage Tanzu Kubernetes Grid Integrated Edition clusters from the Tanzu Mission Control console, which makes the Tanzu Mission Control console a single point of control for all Kubernetes clusters. For more information about Tanzu Mission Control, see the VMware Tanzu Mission Control home page.
To integrate Tanzu Kubernetes Grid Integrated Edition with Tanzu Mission Control:
Confirm that the TKGI API VM has internet access and can connect to
cna.tmc.cloud.vmware.com and the other outbound URLs listed in the What Happens When You Attach a Cluster section of the Tanzu Mission Control Product documentation.
Navigate to the Tanzu Kubernetes Grid Integrated Edition tile > the Tanzu Mission Control pane and select Yes under Tanzu Mission Control Integration.
Configure the fields below:
/). For example,
Tanzu Mission Control Cluster Group: Enter the name of a Tanzu Mission Control cluster group.
The name can be
default or another value, depending on your role and access policy:
Org Memberusers in VMware cloud services have a
service.adminrole in Tanzu Mission Control. These users:
organization.adminuser grants them the
clustergroup.editrole for those groups.
Org Owner users in VMware cloud services have
organization.admin permissions in Tanzu Mission Control. These users:
service.adminusers through the Tanzu Mission Control Access Policy view.
For more information about role and access policy, see Access Control in the VMware Tanzu Mission Control Product documentation.
Warning: After the Tanzu Kubernetes Grid Integrated Edition tile is deployed with a configured cluster group, the cluster group cannot be updated.
Note: When you upgrade your Kubernetes clusters and have Tanzu Mission Control integration enabled, existing clusters will be attached to Tanzu Mission Control.
Tanzu Kubernetes Grid Integrated Edition-provisioned clusters send usage data to the TKGI control plane for storage. The VMware Customer Experience Improvement Program (CEIP) provides the option to also send the cluster usage data to VMware to improve customer experience.
To configure Tanzu Kubernetes Grid Integrated Edition CEIP Program settings:
In Storage Configurations, you can configure vSphere CNS settings.
To configure vSphere CNS:
(Optional) To enable automatic installation of the vSphere CSI driver on all clusters, select Yes.
Warning: If you have existing clusters with a manually deployed vSphere CSI driver, you must remove the manually deployed driver after enabling this feature. For more information, see Deploying Cloud Native Storage (CNS) on vSphere.
Errands are scripts that run at designated points during an installation.
To configure which post-deploy and pre-delete errands run for Tanzu Kubernetes Grid Integrated Edition:
Note: We recommend that you use the default settings for all errands except for the NSX-T validation and Run smoke tests errands.
(Optional) Set the NSX-T validation errand to On.
This errand verifies the NSX-T objects.
(Optional) Set the Run smoke tests errand to On.
The Smoke Test errand smoke tests the TKGI upgrade by creating and deleting a test Kubernetes cluster. If test cluster creation or deletion fails, the errand fails, and the installation of the TKGI tile halts.
The errand uses the TKGI CLI to create the test cluster configured using either the configuration settings on the TKGI tile - the default, or a network profile.
(Optional) To configure the Smoke Test errand to use a network profile instead of the default configuration settings on the TKGI tile:
(Optional) To ensure that all of your cluster VMs are patched, configure the Upgrade all clusters errand errand to On.
Warning: If you have TKGI-provisioned Windows worker clusters, do not activate the Upgrade all clusters errand before upgrading to the TKGI v1.17 tile. You cannot use the Upgrade all clusters errand because you must manually migrate each individual Windows worker cluster to the CSI Driver for vSphere. For more information, see Configure vSphere CSI for Windows in Deploying and Managing Cloud Native Storage (CNS) on vSphere.
Note: VMware recommends that you review the VMware Tanzu Network metadata and confirm stemcell version compatibility before using the VMware Tanzu Network APIs to update the stemcells in your automated pipeline. For more information, see the API reference.
To modify the resource configuration of Tanzu Kubernetes Grid Integrated Edition, follow the steps below:
Select Resource Config.
For each job, review the Automatic values in the following fields:
Warning: High availability mode is a beta feature. Do not scale your TKGI API or TKGI Database to more than one instance in production environments.
Note: On vSphere with NSX-T, you must manually deploy an NSX-T load balancer so that you can select it as part of the resource configuration. For more information, see Provisioning an NSX-T Load Balancer for the TKGI API Server.
Note: The Automatic VM TYPE values match the recommended resource configuration for the TKGI API and TKGI Database jobs.
Under each job, leave NSX-T CONFIGURATION and NSX-V CONFIGURATION blank.
Warning: To avoid workload downtime, use the resource configuration recommended in About Tanzu Kubernetes Grid Integrated Edition Upgrades and Maintaining Workload Uptime.
After configuring the Tanzu Kubernetes Grid Integrated Edition tile, follow the steps below to deploy the tile:
The TKGI CLI and the Kubernetes CLI help you interact with your Tanzu Kubernetes Grid Integrated Edition-provisioned Kubernetes clusters and Kubernetes workloads. To install the CLIs, follow the instructions below:
If you are using NAT mode, verify that you have created the required NAT rules for the Tanzu Kubernetes Grid Integrated Edition Management Plane. See Create Management Plane in Installing and Configuring NSX-T Data Center v3.0 for TKGI for details.
In addition, for NAT and No-NAT modes, verify that you created the required NAT rule for Kubernetes control plane nodes to access NSX-T Manager. For details, see Create IP Blocks and Pool for Compute Plane in Installing and Configuring NSX-T Data Center v3.0 for TKGI.
If you want your developers to be able to access the TKGI CLI from their external workstations, create a DNAT rule that maps a routable IP address to the TKGI API VM. This must be done after Tanzu Kubernetes Grid Integrated Edition is successfully deployed and it has an IP address. See Create Management Plane in Installing and Configuring NSX-T Data Center v3.0 for TKGI for details.
Follow the procedures in Setting Up Tanzu Kubernetes Grid Integrated Edition Admin Users on vSphere in Installing Tanzu Kubernetes Grid Integrated Edition > vSphere.
After installing Tanzu Kubernetes Grid Integrated Edition on vSphere with NSX-T integration, complete the following tasks:
Integrate VMware Harbor with Tanzu Kubernetes Grid Integrated Edition to store and manage container images. For more information, see Integrating VMware Harbor Registry with Tanzu Kubernetes Grid Integrated Edition.