check-circle-line exclamation-circle-line close-line

VMware Enterprise PKS | 20 SEPT 2019

Check this site regularly for additions and updates to these release notes.

VMware Enterprise PKS is used to create and manage on-demand Kubernetes clusters using the PKS CLI.

Versions:

v1.5.0

Release Date: August 20, 2019

Features

New features and changes in this release:

  • Cluster administrators and managers can use the PKS CLI command pks cluster CLUSTER-NAME --details to view details about the named cluster, including Kubernetes nodes and NSX-T network details. See Viewing Cluster Details.
  • Enterprise PKS v1.5.0 adds the following network profiles:
    • Cluster administrators can define a network profile to use a single, shared Tier-1 router per Kubernetes cluster. For more information, see Defining Network Profiles for Shared Tier-1 Router. (NOTE: This feature requires NSX-T Data Center v2.5.)
    • Cluster administrators can define a network profile to use a third-party load balancer for Kubernetes services of type LoadBalancer. See Load Balancer Configuration for details.
    • Cluster administrators can define a network profile to use a third-party ingress controller for Pod ingress traffic. See Ingress Controller Configuration for details.
    • Cluster administrators can define a network profile to configure section markers for explicit distributed firewall rule placement. See DFW Section Marking for details.
    • Cluster administrators can define a network profile to configure NCP logging. See Configure NCP Logging for details.
    • Cluster administrators can define a network profile to configure DNS lookup of the IP addresses for the Kubernetes API load balancer and the ingress controller. See Configure DNS lookup Kubernetes API and Ingress Controllers for details.
  • Cluster administrators can provision a Windows worker-based Kubernetes cluster on vSphere with Flannel. Windows worker-based clusters in Enterprise PKS 1.5 currently do not support NSX-T integration. For more information, see Configuring Windows Worker-Based Clusters (Beta) and Deploying and Exposing Windows Workloads (Beta).
  • Operators can set the lifetime for the refresh and access tokens for Kubernetes clusters. You can configure the token lifetimes to meet your organization’s security and compliance needs. For instructions about configuring the access and refresh token for your Kubernetes clusters, see the UAA section in the Installing topic for your IaaS.
  • Operators can configure prefixes for OpenID Connect (OIDC) users and groups to avoid name conflicts with existing Kubernetes system users. Pivotal recommends adding prefixes to ensure OIDC users and groups do not gain unintended privileges on clusters. For instructions about configuring OIDC prefixes, see the Configure OpenID Connect section in the Installing topic for your IaaS.
  • Operators can configure an external SAML identity provider for user authentication and authorization. For instructions about configuring an external SAML identity provider, see the Configure SAML as an Identity Provider section in the Installing topic for your IaaS.
  • Operators can upgrade Kubernetes clusters separately from the Enterprise PKS tile. For instructions on upgrading Kubernetes clusters, see Upgrading Clusters.
  • Operators can configure the Telgraf agent to send master/etcd node metrics to a third-party monitoring service. For more information, see Monitoring Master/etcd Node VMs.
  • Operators can configure the default node drain behavior. You can use this feature to resolve hanging or failed cluster upgrades. For more information about configuring node drain behavior, see Worker Node Hangs Indefinitely in Troubleshooting and Configure Node Drain Behavior in Upgrade Preparation Checklist for Enterprise PKS v1.5.
  • App developers can create metric sinks for namespaces within a Kubernetes cluster. For more information, see Creating Sink Resources.
  • VMware’s Customer Experience Improvement Program (CEIP) and the Pivotal Telemetry Program (Telemetry) are now enabled in Enterprise PKS by default. This includes both new installations and upgrades. For information about configuring CEIP and Telemetry in the Enterprise PKS tile, see CEIP and Telemetry in the Installing topic for your IaaS.
  • Adds a beta release of VMware Enterprise PKS Management Console, that provides a graphical interface for deploying and managing Enterprise PKS on vSphere. For more information, see Using the Enterprise PKS Management Console.

Product Snapshot

Component Details
PKS version v1.5.0
Release Date August 20, 2019
Compatible Ops Manager versions *

v2.5.12 and later or v2.6.6 or later

Xenial Stemcell version v315.81
Windows Stemcell version v2019.7
Kubernetes version v1.14.5
On-Demand Broker version v0.29.0
NSX-T versions

v2.4.0.1, v2.4.1, v2.4.2**, v2.5.0**

NCP version v2.5.0
Docker version v18.09.8 CFCR
Backup and Restore SDK version v1.17.0

* If you want to use Windows workers in Enterprise PKS v1.5, you must install Ops Manager v2.6.6 and later. Enterprise PKS does not support this feature on Ops Manager v2.5. For more information about Ops Manager v2.6.6 and later, see PCF Ops Manager v2.6 Release Notes.

** See the Breaking Changes section below for more information on PKS support for these releases of NSX-T.

VMware Enterprise PKS Management Console Product Snapshot

NOTE: The Management Console BETA provides an opinionated installation of Enterprise PKS. The supported versions list may differ from or be more limited than what is generally supported by Enterprise PKS.

Element Details
Version v0.9 - This feature is a beta component and is intended for evaluation and test purposes only.
Release date August 22, 2019
Installed Enterprise PKS version v1.5.0
Installed Ops Manager version v2.6.5
Installed Kubernetes version v1.14.5
Supported NSX-T versions v2.4.1, v2.4.2 (see below)
Installed Harbor Registry version v1.8.1

vSphere Version Requirements

For Enterprise PKS installations on vSphere or on vSphere with NSX-T Data Center, refer to the VMware Product Interoperability Matrices.

Upgrade Path

PKS v1.5.0 supports upgrades from PKS v1.4.0 and later. Exception: If you are running PKS v1.4.0 with NSX-T v2.3.x, follow these steps:
1. Upgrade to PKS v1.4.1.
2. Upgrade to NSX-T v2.4.1
3. Upgrade to PKS v1.5.0.

For detailed instructions, see Upgrading Enterprise PKS and Upgrading Enterprise PKS with NSX-T.

Breaking Changes

Enterprise PKS v1.5.0 has the following breaking changes:

Announcing Support for NSX-T v2.5.0 with Known Issue and KB Article

Enterprise PKS v1.5 supports NSX-T v2.5. Before upgrading to NSX-T v2.5, note the following:

Announcing Support for NSX-T v2.4.2 with Known Issue and Workaround

Enterprise PKS v1.5 supports NSX-T v2.4.2. However, there is a known issue with NSX-T v2.4.2 that can affect new and upgraded installations of Enterprise PKS v1.5 that use a NAT topology.

For NSX-T v2.4.2, the PKS Management Plane must be deployed on a Tier-1 distributed router (DR). If the PKS Management Plane is deployed on a Tier-1 service router (SR), the router needs to be converted. To convert an SR to a DR, refer to the following KB article: East-West traffic between workloads behind different T1 is impacted, when NAT is configured on T0 (71363).

This issue will be addressed in a subsequent release of NSX-T such that it will not matter if the Tier-1 Router is a DR or an SR.

New OIDC Prefixes Break Existing Cluster Role Bindings

In Enterprise PKS v1.5, operators can configure prefixes for OIDC usernames and groups. If you add OIDC prefixes you must manually change any existing role bindings that bind to a username or group. If you do not change your role bindings, developers cannot access Kubernetes clusters. For instructions about creating a role binding, see Managing Cluster Access and Permissions.

New API Group Name for Sink Resources

The apps.pivotal.io API group name for sink resources is no longer supported. The new API group name is pksapi.io.

When creating a sink resource, your sink resource YAML definition must start with apiVersion: pksapi.io/v1beta1. All existing sinks are migrated automatically.

For more information about defining and managing sink resources, see Creating Sink Resources.

Log Sink Changes

Enterprise PKS v1.5.0 adds the following log sink changes:

  • The ClusterSink log sink resource has been renamed to ClusterLogSink and the Sink log sink resource has been renamed to LogSink.

    • When you create a log sink resource with YAML, you must use one of the new names in your sink resource YAML definition. For example, specify kind: ClusterLogSink to define a cluster log sink. All existing sinks are migrated automatically.
    • When managing your log sink resources through kubectl, you must use the new log sink resource names. For example, if you want to delete a cluster log sink, run kubectl delete clusterlogsink instead of kubectl delete clustersink.
  • Log transport now requires a secure connection. When creating a ClusterLogSink or LogSink resource, you must include enable_tls: true in your sink resource YAML definition. All existing sinks are migrated automatically.

For more information about defining and managing sink resources, see Creating Sink Resources.

Deprecation of Sink Commands in the PKS CLI

The following Enterprise PKS Command Line Interface (PKS CLI) commands are deprecated and will be removed in a future release:

  • pks create-sink
  • pks sinks
  • pks delete-sink

You can use the following Kubernetes CLI commands instead:

  • kubectl apply -f MY-SINK.yml
  • kubectl get clusterlogsinks
  • kubectl delete clusterlogsink YOUR-SINK

For more information about defining and managing sink resources, see Creating Sink Resources.

Known Issues

Enterprise PKS v1.5.0 has the following known issues:

Azure Default Security Group Is Not Automatically Assigned to Cluster VMs

Symptom

You experience issues when configuring a load balancer for a multi-master Kubernetes cluster or creating a service of type LoadBalancer. Additionally, in the Azure portal, the VM > Networking page does not display any inbound and outbound traffic rules for your cluster VMs.

Explanation

As part of configuring the Enterprise PKS tile for Azure, you enter Default Security Group in the Kubernetes Cloud Provider pane. When you create a Kubernetes cluster, Enterprise PKS automatically assigns this security group to each VM in the cluster. However, on Azure the automatic assignment may not occur.

As a result, your inbound and outbound traffic rules defined in the security group are not applied to the cluster VMs.

Workaround

If you experience this issue, manually assign the default security group to each VM NIC in your cluster.

Cluster Creation Fails When First AZ Runs out of Resources

Symptom

If the first availability zone (AZ) used by a plan with multiple AZs runs out of resources, cluster creation fails with an error like the following:

L Error: CPI error 'Bosh::Clouds::CloudError' with message 'No valid placement found for requested memory: 4096

Explanation

BOSH creates VMs for your Enterprise PKS deployment using a round-robin algorithm, creating the first VM in the first AZ that your plan uses. If the AZ runs out of resources, cluster creation fails because BOSH cannot create the cluster VM.

For example, if your three AZs each have enough resources for ten VMs, and you create two clusters with four worker VMs each, BOSH creates VMs in the following AZs:

  AZ1 AZ2 AZ3
Cluster 1 Worker VM 1 Worker VM 2 Worker VM 3
  Worker VM 4    
Cluster 2 Worker VM 1 Worker VM 2 Worker VM 3
  Worker VM 4    

In this scenario, AZ1 has twice as many VMs as AZ2 or AZ3.

Azure Worker Node Communication Fails after Upgrade

Symptom

Outbound communication from a worker node VM fails after upgrading Enterprise PKS.

Explanation

Enterprise PKS uses Azure Availability Sets to improve the uptime of workloads and worker nodes in the event of Azure platform failures. Worker node VMs are distributed evenly across Availability Sets.

Azure Standard SKU Load Balancers are recommended for the Kubernetes control plane and Kubernetes ingress and egress. This load balancer type provides an IP address for outbound communication using SNAT.

During an upgrade, when BOSH rebuilds a given worker instance in an Availability Set, Azure can time out while re-attaching the worker node network interface to the back-end pool of the Standard SKU Load Balancer.

For more information, see Outbound connections in Azure in the Azure documentation.

Workaround

You can manually re-attach the worker instance to the back-end pool of the Azure Standard SKU Load Balancer in your Azure console.

Passwords Not Supported for Ops Manager VM on vSphere

Starting in Ops Manager v2.6, you can only SSH onto the Ops Manager VM in a vSphere deployment with a private SSH key. You cannot SSH onto the Ops Manager VM with a password.

To avoid upgrade failure and errors when authenticating, add a public key to the Customize Template screen of the the OVF template for the Ops Manager VM. Then, use the private key to SSH onto the Ops Manager VM.

Warning: You cannot upgrade to Ops Manager v2.6 successfully without adding a public key. If you do not add a key, Ops Manager shuts down automatically because it cannot find a key and may enter a reboot loop.

For more information about adding a public key to the OVF template, see Deploy Ops Manager in Deploying Ops Manager on vSphere.

Error During Individual Cluster Upgrades

Symptom

While submitting a large number of cluster upgrade requests using the pks upgrade-cluster command, some of your Kubernetes clusters are marked as failed.

Explanation

BOSH upgrades Kubernetes clusters in parallel with a limit of up to four concurrent cluster upgrades by default. If you schedule more than four cluster upgrades, Enterprise PKS queues the upgrades and waits for BOSH to finish the last upgrade. When BOSH finishes the last upgrade, it starts working on the next upgrade request.

If you submit too many cluster upgrades to BOSH, an error may occur wherein some of the clusters are marked as FAILED because BOSH could not start the upgrade with the specified timeout. The timeout is set to 168 hours. However, BOSH does not remove the task from the queue or stop working on the upgrade if it has been picked up.

Solution

If you expect that upgrading all of your Kubernetes clusters takes more than 168 hours, do not use a script that submits upgrade requests for all of your clusters at once. For information about upgrading Kubernetes clusters provisioned by Enterprise PKS, see Upgrading Clusters.

Kubectl CLI Commands Do Not Work after Changing an Existing Plan to a Different AZ

Symptom

After you reconfigure the AZ of an existing plan, kubectl cli commands do not work in the plan’s existing clusters.

Explanation

This issue occurs in IaaS environments which either limit or prevent attaching a disk across multiple AZs.

BOSH supports creating new VMs and attaching existing disks to VMs. BOSH cannot “move” VMs.

If the plan for an existing cluster is changed to a different AZ, the cluster’s new “intended” state is for the cluster to be hosted within the new AZ. To migrate the cluster from its original state to its intended state, BOSH will create new VMs for the cluster within the designated AZ and remove the cluster’s original VMs from the original AZ.

On an IaaS where attaching VM disks across AZs is not supported, the disks attached to the newly created VMs will not have the original disks’ content.

Workaround

If you have reconfigured the AZ of an existing cluster and afterwards could not run kubectl cli commands, contact Support for assistance.

HTTP 500 Internal Server Error When Saving Telemetry Preferences

Symptom

You receive an HTTP 500 Internal Server Error when saving the Telemetry preferences form.

Explanation

When using Ops Manager v2.5, you may receive an HTTP 500 Internal Server Error if you attempt to save Telemetry preferences without configuring all of the form’s required settings.

Solution

Use your browser’s Back function to return to the Telemetry preference configuration form. Configure all of the form’s required settings. To submit your Telemetry preferences, click Save .

One Plan ID Is Longer Than Other Plan IDs

Symptom

One of your Plan IDs is one character longer than your other Plan IDs.

Explanation

Each Plan has a unique Plan ID. A Plan ID is normally a UUID consisting of 32 alphanumeric characters and 4 hyphens. The Plan 4 Plan ID is instead a UUID consisting of 33 alphanumeric characters and 4 hyphens.

You can safely configure and use Plan 4. The length of the Plan 4 Plan ID does not affect the functionality of Plan 4 clusters.

If you require all Plan IDs to have identical length, do not activate or use Plan 4.

The metric sink fails to send to secure connections

Symptom

If you attempt to use the MetricSink or ClusterMetricSink over a secure connection the TLS handshake will be rejected by the Telegraf.

Explanation

This is due to missing CA Certificates in the Telegraf container images included in the RC Tile version.

Workaround

A patch is being worked on. User should not expect to be able to send metrics to secure connections until this patch is published.

Enterprise PKS Management Console Known Issues

The following additional known issues are specific to the Enterprise PKS Management Console v0.9.0 appliance and user interface.

Enterprise PKS Management Console Notifications Persist

Symptom

In the Enterprise PKS view of Enterprise PKS Management Console, error notifications sometimes persist in memory on the Clusters and Nodes pages after you clear those notifications.

Explanation

After clicking the X button to clear a notification it is removed, but when you navigate back to those pages the notification might show again.

Workaround

Use shift+refresh to reload the page.

Cannot Delete Enterprise PKS Deployment from Management Console

Symptom

In the Enterprise PKS view of Enterprise PKS Management Console, you cannot use the Delete Enterprise PKS Deployment option even after you have removed all clusters.

Explanation

The option to delete the deployment is only activated in the management console a short period after the clusters are deleted.

Workaround

After removing clusters, wait for a few minutes before attempting to use the Delete Enterprise PKS Deployment option again.

Configuring Enterprise PKS Management Console Integration with VMware vRealize Log Insight

Symptom

Enterprise PKS Management Console appliance sends logs to VMware vRealize Log Insight over HTTP, not HTTPS.

Explanation

When you deploy the Enterprise PKS Management Console appliance from the OVA, if you require log forwarding to vRealize Log Insight, you must provide the port on the vRealize Log Insight server on which it listens for HTTP traffic. Do not provide the HTTPS port.

Workaround

Set the vRealize Log Insight port to the HTTP port. This is typically 9000.

Deploying Enterprise PKS to an Unprepared NSX-T Data Center Environment Results in Flannel Error

Symptom

When using the management console to deploy Enterprise PKS in NSX-T Data Center (Not prepared for PKS) mode, if an error occurs during the network configuration, the message Unable to set flannel environment is displayed in the deployment progress page.

Explanation

The network configuration has failed, but the error message is incorrect.

Workaround

To see the correct reason for the failure, see the server logs. For instructions about how to obtain the server logs, see Troubleshooting Enterprise PKS Management Console.

Using BOSH CLI from Operations Manager VM

Symptom

The BOSH CLI client bash command that you obtain from the Deployment Metadata view does not work when logged in to the Operations Manager VM.

Explanation

The BOSH CLI client bash command from the Deployment Metadata view is intended to be used from within the Enterprise PKS Management Console appliance.

Workaround

To use the BOSH CLI from within the Operations Manager VM, see Connect to Operations Manager.

From the Ops Manager VM, use the BOSH CLI client bash command from the Deployment Metadata page, with the following modifications:

  • Remove the clause BOSH_ALL_PROXY=xxx
  • Replace the BOSH_CA_CERT section with BOSH_CA_CERT=/var/tempest/workspaces/default/root_ca_certificate

Run pks Commands against the PKS API Server

Explanation

The PKS CLI is available in the Enterprise PKS Management Console appliance.

Workaround

To be able to run pks commands against the PKS API Server, you must first log to PKS using the following command syntax pks login -a fqdn_of_pks ….

To do this, you must ensure either of the following:

  • The FQDN configured for the PKS Server is resolvable by the DNS server configured for the Enterprise PKS Management Console appliance, or
  • An entry that maps the Floating IP assigned to the PKS Server to the FQDN exists on /etc/hosts in the appliance. For example: 192.168.160.102 api.pks.local.