VMware NSX Container Plugin 2.5 | 24 September, 2019 | Build 14628220 Check regularly for additions and updates to this document. |
What's in the Release Notes
The release notes cover the following topics:
What's New
- Support for using policy API to configure NSX-T resources.
- Support for additional topologies. Users have the option to have a shared tier-1 router per Kubernetes cluster - shared between all Namespaces in the cluster. In this topology, the stateful services like SNAT NAT rules, if any, will be plumbed on the tier-1 router giving users the option to have active-active tier-0 topologies. All features available in other topologies are also available in the shared tier-1 topology.
- A new DaemonSet, nsx-ncp-bootstrap, that automates the initialization of Kubernetes nodes.
- A new container, nsx-ovs, inside nsx-node-agent DaemonSet, that runs OVS in a VM.
- A single YAML file to simplify NCP installation. The YAML file contains all necessary Kubernetes resource definitions for NCP and container host components.
- Support for using Kubernetes CustomResourceDefinitions for NCP master election.
- Support for multiple CNI plugins in NSX BOSH release.
Compatibility Requirements
Product | Version |
---|---|
NCP / NSX-T Tile for PAS | 2.5 |
NSX-T | 2.4.1, 2.4.2, 2.5 |
Kubernetes | 1.13, 1.14 |
OpenShift | 3.10, 3.11 |
Kubernetes Host VM OS | Ubuntu 16.04, Ubuntu 18.04, CentOS 7.6 |
OpenShift Host VM OS | RHEL 7.5, RHEL 7.6 |
OpenShift BMC | RHEL 7.5, RHEL 7.6 |
PAS (PCF) | Ops Manager 2.8 + PAS 2.8 Ops Manager 2.7 + PAS 2.7 Ops Manager 2.6 + PAS 2.6 Ops Manager 2.5 + PAS 2.5 Note: PAS 2.7.0 + NCP 2.5.0 is not supported. |
Note: This release of NCP does not support Linux version 4.15.0-59-generic or later.
If the RHEL nodes have kernel version lower than 3.10.0-957.27.2, OpenShift installation prerequisite will fail. Upgrading the kernel version is not recommended on a bare-metal container node because OVS will fail to run. To deploy OpenShift 3.11 with a lower kernel version, the openshift-ansible repository should use commit: e0499023ea91741ab4afd29391e420a26b8859b5 as the top commit.
Support for upgrading to this release:
- All NCP 2.4.x releases
Resolved Issues
- Issue 2389094: When NCP deletes a load balancer server, the corresponding tier-1 router is not deleted
With automatic scaling enabled, if you create multiple services of type LoadBalancer, NCP creates the required number of load balancer virtual servers. If you then decrease the number of services, and as a result NCP deletes a load balancer virtual server, the corresponding tier-1 router is not deleted.
- Issue 2118515: In a large-scale setup, NCP takes a long time to create firewalls on NSX-T
In a large-scale setup (for example, 250 Kubernetes nodes, 5000 pods, 2500 network policies), it can take NCP a few minutes to create the firewall sections and rules in NSX-T.
- Issue 2125755: A StatefullSet could lose network connectivity when performing canary updates and phased rolling updates
If a StatefulSet was created before NCP was upgraded to the current release, the StatefullSet could lose network connectivity when performing canary updates and phased rolling updates.
- Issue 2193901: Multiple PodSelectors or multiple NsSelectors for a single Kubernetes network policy rule is not supported
Applying multiple selectors allows only incoming traffic from specific pods.
- Issue 2194646: Updating network policies when NCP is down is not supported
If you update a network policy when NCP is down, the destination IPset for the network policy will be incorrect when NCP comes back up.
- Issue 2199504: The display name of NSX-T resources created by NCP is limited to 80 characters
When NCP creats an NSX-T resouce for a resource in the container environment, it generates the display name of the NSX-T resource by combining the cluster name, namespace or project name, and the name of the resource in the container environment. If the display name is longer than 80 characters, it is truncated to 80 characters.
- Issue 2199778: With NSX-T 2.2, Ingress, Service and Secrets with names longer than 65 characters are not supported
With NSX-T 2.2, when use_native_loadbalancer is set to True, the names of Ingresses, Secrets and Services referenced by the Ingress, and Services of type LoadBalancer, must be 65 characters or less. Otherwise, the Ingress or Service will not work properly.
- Issue 2065750: Installing the NSX-T CNI package fails with a file conflict
In a RHEL environment with kubernetes installed, if you install the NSX-T CNI Package using yum localinstall or rpm -i, you get an error indicating a conflict with a file from the kubernetes-cni package.
- Issue 2317608: Multiple CNI plugins not supported
Kubernetes expects a CNI configuration file of type .conflist containing a list of plugin configurations. The kubelet will call the plugins defined in this conflist file one by one in the order defined. Currently, the nsx-cf-cni bosh release only supports a single CNI plugin configuration. Any additional CNI plugin will overwrite the existing CNI configuration file 10-nsx.conf in the specified cni_config_dir.
Known Issues
- Issue 2131494: NGINX Kubernetes Ingress still works after changing the Ingress class from nginx to nsx
When you create an NGINX Kubernetes Ingress, NGINX create traffic forwarding rules. If you change the Ingress class to any other value, NGINX does not delete the rules and continues to apply them, even if you delete the Kubernetes Ingress after changing the class. This is a limitation of NGINX.
Workaround: To delete the rules created by NGINX, delete the Kubernetes Ingress when the class value is nginx. Than re-create the Kubernetes Ingress.
- For a Kubernetes service of type ClusterIP, Client-IP based session affinity is not supported
NCP does not support Client-IP based session affinity for a Kubernetes service of type ClusterIP.
Workaround: None
- For a Kubernetes service of type ClusterIP, the hairpin-mode flag is not supported
NCP does not support the hairpin-mode flag for a Kubernetes service of type ClusterIP.
Workaround: None
- Issue 2192489: After disabling 'BOSH DNS server' in PAS director config, the Bosh DNS server (169.254.0.2) still appears in the container's resolve.conf file.
In a PAS environment running PAS 2.2, after you disable 'BOSH DNS server' in PAS director config, the Bosh DNS server (169.254.0.2) still appears in the container's resove.conf file. This causes a ping command with a fully qualified domain name to take a long time. This issue does not exist with PAS 2.1.
Workaround: None. This is a PAS issue.
- Issue 2224218: After a service or app is deleted, it takes 2 minutes to release the SNAT IP back to the IP pool
If you delete a service or app and recreate it within 2 minutes, it will get a new SNAT IP from the IP pool.
Workaround: After deleting a service or app, wait 2 minutes before recreating it if you want to reuse the same IP.
- Issue 2330811: When creating Kubernetes services of type LoadBalancer while NCP is down, the services might not get created when NCP is restarted
When NSX-T resources are exhausted for Kubernetes services of type LoadBalancer, you can create new services after deleting some of the existing services. However, if you delete and create the services while NCP is down, NCP will fail to create the new services.
Workaround: When NSX-T resources are exhausted for Kubernetes services of type LoadBalancer, do not perform both the delete and the create operations while NCP is down.
- Issue 2370137: The nsx-ovs and nsx-node-agent containers fail to run because the OVS database files are not in /etc/openvswitch
When the nsx-ovs and nsx-node-agent containers start, they look for the OVS database files in /etc/openvswitch. If there are symlinks in the directory that link to the actual OVS files (for example, conf.db), the nsx-ovs and nsx-node-agent containers will not run.
Workaround: Move the OVS database files to /etc/openvswitch and remove the symlinks.
- Issue 2397438: After a restart, NCP reports MultipleObjects error
Before the restart, NCP failed to create distributed firewall sections because of a ServerOverload error. NCP retried until the maximum number of attempts was reached. However, the firewall sections were still created. When NCP was restarted, it received duplicate firewall sections and reports the MultipleObjects error.
Workaround: Manually delete the stale and duplicate distributed firewall sections and then restart NCP.
- Issue 2397684: NCP found the correct transport zone but then failed with the error "Default transport-zone is not configured"
When you create Kubernetes namespaces with policy API-based NCP, the infra segment creation might fail due to the presence of multiple overlay transport zones in NSX-T. This issue occurs if none of the overlay transport zone is marked as default.
Workaround: Update the overlay transport zone, configured in NCP ConfigMap, and set the "is_default" field to "True".
- Issue 2404302: If multiple load balancer application profiles for the same resource type (for example, HTTP) exist on NSX-T, NCP will choose any one of them to attach to the Virtual Servers.
If multiple HTTP load balancer application profiles exist on NSX-T, NCP will choose any one of them with the appropriate x_forwarded_for configuration to attach to the HTTP and HTTPS Virtual Server. If multiple FastTCP and UDP application profiles exist on NSX-T, NCP will choose any one of them to attach to the TCP and UDP Virtual Servers, respectively. The load balancer application profiles might have been created by different applications with different settings. If NCP chooses to attach one of these load balancer application profiles to the NCP-created Virtual Servers, it might break the workflow of other applications.
Workaround: None
- Issue 2397621: OpenShift installation fails
OpenShift installation expects a node's status to be ready and this is possible after the installation of the CNI plugin. In this release there is no separate CNI plugin file, causing OpenShift installation to fail.
Workaround: Create the /etc/cni/net.d directory on each node before starting the installation.
- Issue 2398430: Connectivity to a node is lost after a restart
If OVS is configured to run on a node at startup and the node is restarted when the NSX node agent DaemonSet is running and IP is persisted on the ovs_uplink_port, then the connectivity to the node will be lost.
Workaround: Disable OVS to be started on the host at node startup by removing its service. For example, on Ubuntu, you can run
update-rc.d -f openvswitch-switch remove
. - Issue 2408100: In a large Kubernetes cluster with multiple NCP instances in active-standby mode or liveness probe enabled, NCP frequently restarts
In a large Kubernetes cluster (about 25,000 pods, 2,500 namespaces and 2,500 network policies), if multiple NCP instances are running in active-standby mode, or if liveness probe is enabled, NCP processes might be killed and restarted frequently due to "Acquiring lock conflicted" or liveness probe failure.
Workaround: Perform the following steps:
- Set
replicas
of NCP deployment to 1, or increase the configuration optionha.master_timeout
in ncp.ini from the default value 18 to 30. - Increase the liveness probe arguments as follows:
containers: - name: nsx-ncp livenessProbe: exec: command: - /bin/sh - -c - timeout 20 check_pod_liveness nsx-ncp initialDelaySeconds: 20 timeoutSeconds: 20 periodSeconds: 20 failureThreshold: 5
- Set
- Issue 2412421: Docker fails to restart a container
If (1) ConfigMap is updated, (2) the container uses subPath to mount the ConfigMap, and (3) the container is restarted, then Docker fails to start the container.
Workaround: Delete the Pod so that the DaemonSet will re-create the Pod.
- Issue 2413383: OpenShift upgrade fails because not all nodes are ready
By default the NCP bootstrap pod is not scheduled on the master node. As a result, the master node status is always Not Ready.
Workaround: Assign the master node with the role "compute" to allow nsx-ncp-bootstrap and nsx-node-agent DaemonSets to create pods. The node status will change to "Ready" once the nsx-ncp-bootstrap installs the NSX-CNI.
- Issue 2410909: After a restart, NCP may take a long time to initialize its cache in a large-scale environment (especially if there are many network policies), and can take around half an hour to come up and process new pods and resources
After a restart, NCP can take a long time to come up. The processing of resources such as pods, namespaces and network policies might take an additional amount of time depending on the quantity of resources involved.
Workaround: None
- Issue 2423240: The nsx-ncp-bootstrap container fails if any IP route has a link-down status
The nsx-ncp-bootstrap container assumes that all IP routes have a link status of "up" and will fail if that is not the case.
Workaround: Remove the routes whose link status is not "up" and add them again after bootstrap is finished.
- Issue 2425050: The nsx-ncp-bootstrap container fails to compile the OVS package on Linux version 4.15.0-59-generic or later
The compilation fails because of a missing header file during the compilation of the OVS kernel module.
Workaround: None. Note that NCP 2.5.0 does not support Linux version 4.15.0-59-generic or later.