This topic describes how to install and configure VMware Tanzu Kubernetes Grid Integrated Edition (TKGI) on Azure as a VMware Tanzu Operations Manager (Ops Manager) tile.
Before performing the procedures in this topic, you must have deployed and configured Ops Manager. For more information, see Azure Prerequisites and Resource Requirements.
If you use an instance of Ops Manager that you configured previously to install other runtimes, perform the following steps before you install Tanzu Kubernetes Grid Integrated Edition:
To install and configure TKGI:
To install Tanzu Kubernetes Grid Integrated Edition, do the following:
https://YOUR-OPS-MANAGER-FQDN/
in a browser to log in to the Ops Manager Installation Dashboard.
To configure TKGI:
Click the orange Tanzu Kubernetes Grid Integrated Edition tile to start the configuration process.
WARNING: When you configure the Tanzu Kubernetes Grid Integrated Edition tile, do not use spaces in any field entries. This includes spaces between characters as well as leading and trailing spaces. If you use a space in any field entry, the deployment of Tanzu Kubernetes Grid Integrated Edition fails.
To configure the networks used by the Tanzu Kubernetes Grid Integrated Edition control plane:
Click Assign Networks.
Under Network, select the infrastructure subnet that you created for Tanzu Kubernetes Grid Integrated Edition component VMs, such as the TKGI API and TKGI Database VMs. For example, infrastructure
.
services
.Perform the following steps:
Click TKGI API.
Under Certificate to secure the TKGI API, provide a certificate and private key pair.
The certificate that you supply must cover the specific subdomain that routes to the TKGI API VM with TLS termination on the ingress. If you use UAA as your OIDC provider, this certificate must be a proper certificate chain and have a SAN field.
Warning: TLS certificates generated for wildcard DNS records only work for a single domain level. For example, a certificate generated for *.tkgi.EXAMPLE.com
does not permit communication to *.api.tkgi.EXAMPLE.com
. If the certificate does not contain the correct FQDN for the TKGI API, calls to the API will fail.
api.tkgi.example.com
. To retrieve the public IP address or FQDN of the TKGI API load balancer, log in to your IaaS console. Note: The FQDN for the TKGI API must not contain uppercase letters or trailing whitespace.
max_in_flight
variable value. The max_in_flight
setting limits the number of component instances the TKGI CLI creates or starts simultaneously when running tkgi create-cluster
or tkgi update-cluster
. By default, max_in_flight
is set to 4
, limiting the TKGI CLI to creating or starting a maximum of four component instances in parallel.tkgi update-cluster
retry the cluster update process up to three times if it fails.--private-registries
option of the tkgi create-cluster
and tkgi update-cluster
commands described in Configuring Cluster Access to Private Registries. By default, the ability to configure clusters to use private registries is enabled.A plan defines a set of resource types used for deploying a cluster.
You must first activate and configure Plan 1, and afterwards you can optionally activate Plan 2 through Plan 10.
To activate and configure a plan, perform the following steps:
Note: Plans 11, 12, and 13 support Windows worker-based Kubernetes clusters on vSphere with NSX, and are a beta feature on vSphere with Antrea.
1
, 3
, or 5
. Note: If you deploy a cluster with multiple control plane/etcd node VMs, confirm that you have sufficient hardware to handle the increased load on disk write and network traffic. For more information, see Hardware recommendations in the etcd documentation.
In addition to meeting the hardware requirements for a multi-control plane node cluster, we recommend configuring monitoring for etcd to monitor disk latency, network latency, and other indicators for the health of the cluster. For more information, see Configuring Telegraf in TKGI.
WARNING: To change the number of control plane/etcd nodes for a plan, you must ensure that no existing clusters use the plan. Tanzu Kubernetes Grid Integrated Edition does not support changing the number of control plane/etcd nodes for plans with existing clusters.
Under Master/ETCD VM Type, select the type of VM to use for Kubernetes control plane/etcd nodes. For more information, including control plane node VM customization options, see the Control Plane Node VM Size section of VM Sizing for Tanzu Kubernetes Grid Integrated Edition Clusters.
Under Master Persistent Disk Type, select the size of the persistent disk for the Kubernetes control plane node VM.
Under Master/ETCD Availability Zones, select one or more AZs for the Kubernetes clusters deployed by Tanzu Kubernetes Grid Integrated Edition. If you select more than one AZ, Tanzu Kubernetes Grid Integrated Edition deploys the control plane VM in the first AZ and the worker VMs across the remaining AZs. If you are using multiple control plane nodes, deploys the control plane and worker VMs across the AZs in round-robin fashion.
Note: Tanzu Kubernetes Grid Integrated Edition does not support changing the AZs of existing control plane nodes.
Note: Changing a plan’s Worker Node Instances setting does not alter the number of worker nodes on existing clusters. For information about scaling an existing cluster, see Scale Horizontally by Changing the Number of Worker Nodes Using the TKGI CLI in Scaling Existing Clusters.
Note: Tanzu Kubernetes Grid Integrated Edition requires a Worker VM Type with an ephemeral disk size of 32 GB or more.
Under Worker Persistent Disk Type, select the size of the persistent disk for the Kubernetes worker node VMs.
Under Worker Availability Zones, select one or more AZs for the Kubernetes worker nodes. Tanzu Kubernetes Grid Integrated Edition deploys worker nodes equally across the AZs you select.
Under Kubelet customization - system-reserved, enter resource values that Kubelet can use to reserve resources for system daemons. For example, memory=250Mi, cpu=150m
. For more information about system-reserved values, see the Kubernetes documentation.
EVICTION-SIGNAL=QUANTITY
. For example, memory.available=100Mi, nodefs.available=10%, nodefs.inodesFree=5%
. For more information about eviction thresholds, see the Kubernetes documentation. WARNING: Use the Kubelet customization fields with caution. If you enter values that are invalid or that exceed the limits the system supports, Kubelet might fail to start. If Kubelet fails to start, you cannot create clusters.
---
as a separator. For more information, see Adding Custom Linux Workloads.0
, the node drain does not terminate.(Optional) Under Pod Shutdown Grace Period (seconds), enter a timeout in seconds for the node to wait before it forces the pod to terminate. If you set this value to -1
, the default timeout is set to the one specified by the pod.
(Optional) To configure when the node drains, activate the following:
Warning: If you select Force node to drain even if pods are still running after timeout, the node halts all running workloads on pods. Before enabling this configuration, set Node Drain Timeout to a value greater than 0
.
For more information about configuring default node drain behavior, see Worker Node Hangs Indefinitely in Troubleshooting.
Click Save.
To deactivate a plan, perform the following steps:
To configure your Kubernetes cloud provider settings, follow the procedures below:
Click Kubernetes Cloud Provider.
Under Choose your IaaS, select Azure.
Under Azure Cloud Name, select the identifier of your Azure environment.
Enter Subscription ID. This is the ID of the Azure subscription that the cluster is deployed in.
Enter Tenant ID. This is the Azure Active Directory (AAD) tenant ID for the subscription that the cluster is deployed in.
Enter Location. This is the location of the resource group that the cluster is deployed in.
If you do not already know the valid location value for your resource group, determine it:
Central US
, the location name property value is centralus
.To determine the valid location value for your resource group location, list the valid locations:
az account list-locations
Enter Location. Enter the valid location value for your resource group location into the Location field.
Enter Resource Group. This is the name of the resource group that the cluster is deployed in.
Enter Virtual Network. This is the name of the virtual network that the cluster is deployed in.
Enter Virtual Network Resource Group. This is the name of the resource group that the virtual network is deployed in.
Enter Default Security Group. This is the name of the security group attached to the cluster’s subnet.
Note: Tanzu Kubernetes Grid Integrated Edition automatically assigns the default security group to each VM when you create a Kubernetes cluster.
However, on Azure this automatic assignment might not occur. For more information, see Azure Default Security Group Is Not Automatically Assigned to Cluster VMs in Tanzu Kubernetes Grid Integrated Edition Release Notes.
Enter Primary Availability Set. This is the name of the availability set that will be used as the load balancer back end. Locate the name of the availability set within the Azure console.
For Master Managed Identity, enter tkgi-master
. You created the managed identity for the control plane nodes in Create the Control Plane Nodes Managed Identity in Creating Managed Identities in Azure for Tanzu Kubernetes Grid Integrated Edition.
For Worker Managed Identity, enter tkgi-worker
. You created the managed identity for the worker nodes in Create the Worker Nodes Managed Identity in Creating Managed Identities in Azure for Tanzu Kubernetes Grid Integrated Edition.
Select Disable Outbound SNAT to deactivate the default outbound SNAT rule for Azure.
Click Save.
To configure networking, do the following:
Under Allow outbound internet access from Kubernetes cluster vms (IaaS-dependent), leave the Enable outbound internet access check box unselected. You must leave this check box unselected due to an incompatibility between the public dynamic IPs provided by BOSH and load balancers on Azure.
Click Save.
To configure the UAA server:
Under TKGI API Access Token Lifetime, enter a time in seconds for the TKGI API access token lifetime. This field defaults to 600
.
Under TKGI API Refresh Token Lifetime, enter a time in seconds for the TKGI API refresh token lifetime. This field defaults to 21600
.
600
.21600
. Note: VMware recommends using the default UAA token timeout values. By default, access tokens expire after ten minutes and refresh tokens expire after six hours.
roles
.oidc:
, UAA creates a group name like oidc:developers
. The default value is oidc:
.user_name
. Depending on your provider, you can enter claims besides user_name
, like email
or name
.oidc:
, UAA creates a user name like oidc:admin
. The default value is oidc:
. Warning: VMware recommends adding OIDC prefixes to prevent users and groups from gaining unintended cluster privileges. If you change the above values for a pre-existing Tanzu Kubernetes Grid Integrated Edition installation, you must change any existing role bindings that bind to a user name or group. If you do not change your role bindings, developers cannot access Kubernetes clusters. For instructions, see Managing Cluster Access and Permissions.
cluster_client
redirect_uri
URIs to your clusters. UAA redirect URIs configured in the TKGI cluster client redirect URIs field persist through cluster updates and TKGI upgrades.In Host Monitoring, you can configure monitoring of nodes and VMs using Syslog, or Telegraf.
You can configure one or more of the following:
For more information about these components, see Monitoring TKGI and TKGI-Provisioned Clusters.
To configure Syslog for all BOSH-deployed VMs in Tanzu Kubernetes Grid Integrated Edition:
Note: Logs might contain sensitive information, such as cloud provider credentials. VMware recommends that you enable TLS encryption for log forwarding.
*.YOUR-LOGGING-SYSTEM.com
.Note: You do not need to provide a new certificate if the TLS certificate for the destination syslog endpoint is signed by a Certificate Authority (CA) in your BOSH certificate store.
In In-Cluster Monitoring, you can configure one or more observability components and integrations that run in Kubernetes clusters and capture logs and metrics about your workloads. For more information, see Monitoring Workers and Workloads.
To configure in-cluster monitoring:
To configure sink resources, see:
You can enable both log and metric sink resources or only one of them.
You can monitor Kubernetes clusters and pods metrics externally using the integration with Wavefront by VMware.
NoteWavefront integration in TKGI has been deprecated.
Prerequisites
Before you configure Wavefront integration, you must have an active Wavefront account and access to a Wavefront instance. You provide your Wavefront access token during configuration. For additional information, see the Wavefront documentation.
To use Wavefront with Windows worker-based clusters, developers must install Wavefront to their clusters manually, using Helm.
Procedure
To enable and configure Wavefront monitoring:
https://try.wavefront.com/api
The Tanzu Kubernetes Grid Integrated Edition tile does not validate your Wavefront configuration settings. To verify your setup, look for cluster and pod metrics in Wavefront.
cAdvisor is an open source tool for monitoring, analyzing, and exposing Kubernetes container resource usage and performance statistics.
To deploy a cAdvisor container:
Note: For information about configuring cAdvisor to monitor your running Kubernetes containers, see cAdvisor in the cAdvisor GitHub repository. For general information about Kubernetes cluster monitoring, see Tools for Monitoring Resources in the Kubernetes documentation.
You can configure TKGI-provisioned clusters to send Kubernetes node metrics and pod metrics to metric sinks. For more information about metric sink resources and what to do after you enable them in the tile, see Sink Resources in Monitoring Workers and Workloads.
To enable clusters to send Kubernetes node metrics and pod metrics to metric sinks:
DaemonSet
, a pod that runs on each worker node in all your Kubernetes clusters.(Optional) To enable Node Exporter to send worker node metrics to metric sinks of kind ClusterMetricSink
, select Enable node exporter on workers. If you enable this check box, Tanzu Kubernetes Grid Integrated Edition deploys Node Exporter as a DaemonSet
, a pod that runs on each worker node in all your Kubernetes clusters.
For instructions on how to create a metric sink of kind ClusterMetricSink
for Node Exporter metrics, see Create a ClusterMetricSink Resource for Node Exporter Metrics in Creating and Managing Sink Resources.
You can configure TKGI-provisioned clusters to send Kubernetes API events and pod logs to log sinks. For more information about log sink resources and what to do after you enable them in the tile, see Sink Resources in Monitoring Workers and Workloads.
To enable clusters to send Kubernetes API events and pod logs to log sinks:
DaemonSet
, a pod that runs on each worker node in all your Kubernetes clusters.(Optional) To increase the Fluent Bit Pod memory limit, enter a value greater than 100 in the Fluent-bit container memory limit(Mi) field.
Click Save.
Tanzu Mission Control integration lets you monitor and manage Tanzu Kubernetes Grid Integrated Edition clusters from the Tanzu Mission Control console, which makes the Tanzu Mission Control console a single point of control for all Kubernetes clusters. For more information about Tanzu Mission Control, see the VMware Tanzu Mission Control home page.
To integrate Tanzu Kubernetes Grid Integrated Edition with Tanzu Mission Control:
Confirm that the TKGI API VM has internet access and can connect to cna.tmc.cloud.vmware.com
and the other outbound URLs listed in the What Happens When You Attach a Cluster section of the Tanzu Mission Control Product documentation.
Navigate to the Tanzu Kubernetes Grid Integrated Edition tile > the Tanzu Mission Control pane and select Yes under Tanzu Mission Control Integration.
Configure the fields below:
/
). For example, YOUR-ORG.tmc.cloud.vmware.com
.Tanzu Mission Control Cluster Group: Enter the name of a Tanzu Mission Control cluster group.
The name can be default
or another value, depending on your role and access policy:
Org Member
users in VMware cloud services have a service.admin
role in Tanzu Mission Control. These users:
default
cluster group.organization.admin
user grants them the clustergroup.admin
or clustergroup.edit
role for those groups.Org Owner
users in VMware cloud services have organization.admin
permissions in Tanzu Mission Control. These users:
clustergroup
roles to service.admin
users through the Tanzu Mission Control Access Policy view.For more information about role and access policy, see Access Control in the VMware Tanzu Mission Control Product documentation.
Warning: After the Tanzu Kubernetes Grid Integrated Edition tile is deployed with a configured cluster group, the cluster group cannot be updated.
Note: When you upgrade your Kubernetes clusters and have Tanzu Mission Control integration enabled, existing clusters will be attached to Tanzu Mission Control.
Tanzu Kubernetes Grid Integrated Edition-provisioned clusters send usage data to the TKGI control plane for storage. The VMware Customer Experience Improvement Program (CEIP) provides the option to also send the cluster usage data to VMware to improve customer experience.
To configure Tanzu Kubernetes Grid Integrated Edition CEIP Program settings:
Errands are scripts that run at designated points during an installation.
To configure which post-deploy and pre-delete errands run for Tanzu Kubernetes Grid Integrated Edition:
Note: We recommend that you use the default settings for all errands except for the Run smoke tests errand.
(Optional) Set the Run smoke tests errand to On.
The Smoke Test errand smoke tests the TKGI upgrade by creating and deleting a test Kubernetes cluster. If test cluster creation or deletion fails, the errand fails, and the installation of the TKGI tile halts.
The errand uses the TKGI CLI to create the test cluster configured using the configuration settings on the TKGI tile.
(Optional) To ensure that all of your cluster VMs are patched, configure the Upgrade all clusters errand errand to On.
Updating the Tanzu Kubernetes Grid Integrated Edition tile with a new Linux stemcell and the Upgrade all clusters errand enabled triggers the rolling of every Linux VM in each Kubernetes cluster. Similarly, updating the Tanzu Kubernetes Grid Integrated Edition tile with a new Windows stemcell triggers the rolling of every Windows VM in your Kubernetes clusters.
Note: VMware recommends that you review the Broadcom Support metadata and confirm stemcell version compatibility before using the Broadcom Support APIs to update the stemcells in your automated pipeline. For more information, see the API reference.
To modify the resource configuration of Tanzu Kubernetes Grid Integrated Edition and specify your TKGI API load balancer, follow the steps below:
Select Resource Config.
For each job, review the Automatic values in the following fields:
3
.2
or more. Warning: High availability mode is a beta feature. Do not scale your TKGI API or TKGI Database to more than one instance in production environments.
Note: On Azure, you must reconfigure your TKGI API load balancer backend pool whenever you modify your TKGI API VM group. For more information about configuring your TKGI API load balancer backend pool, see Create a Load Balancer in Configuring an Azure Load Balancer for the TKGI API.
Note: The Automatic VM TYPE values match the recommended resource configuration for the TKGI API and TKGI Database jobs.
For the TKGI Database job:
For the TKGI API job:
Enter the name of your TKGI API load balancer in the LOAD BALANCERS field. For more information on the TKGI API load balancer, see Configuring an Azure Load Balancer for the TKGI API.
Note: After you click Apply Changes for the first time, BOSH assigns the TKGI API VM an IP address. BOSH uses the name you provide in the LOAD BALANCERS field to locate your load balancer and then connect the load balancer to the TKGI API VM using its new IP address.
(Optional) If you do not use a NAT instance, select INTERNET CONNECTED. This allows component instances direct access to the internet.
Warning: To avoid workload downtime, use the resource configuration recommended in About Tanzu Kubernetes Grid Integrated Edition Upgrades and Maintaining Workload Uptime.
You need to retrieve the TKGI API endpoint to allow your organization to use the API to create, update, and delete Kubernetes clusters.
To retrieve the TKGI API endpoint, do the following:
Follow the procedures in Configuring an Azure Load Balancer for the TKGI API to configure an Azure load balancer for the TKGI API.
The TKGI CLI and the Kubernetes CLI help you interact with your Tanzu Kubernetes Grid Integrated Edition-provisioned Kubernetes clusters and Kubernetes workloads. To install the CLIs, follow the instructions below:
Follow the procedures in Setting Up Tanzu Kubernetes Grid Integrated Edition Admin Users on Azure.
After installing Tanzu Kubernetes Grid Integrated Edition on Azure, you might want to do one or more of the following: