VMware NSX Container Plugin 3.0.0 | 7 April, 2020 | Build 15841604 Check regularly for additions and updates to this document. |
What's in the Release Notes
The release notes cover the following topics:
What's New
- Support for relaxing the scale limit on Load Balancer Service. For Policy API, support for pool member limit per Load Balancer Service for Kubernetes LoadBalancer services.
- Support for source IP whitelisting for Kubernetes LoadBalancer services and Ingress for Policy API.
Compatibility Requirements
Product | Version |
---|---|
NSX-T | 3.0.0 |
vSphere with Kubernetes | 7.0 |
Support for upgrading to this release:
- All NCP 2.5.x releases
Resolved Issues
- Issue 2370137: The nsx-ovs and nsx-node-agent containers fail to run because the OVS database files are not in /etc/openvswitch
When the nsx-ovs and nsx-node-agent containers start, they look for the OVS database files in /etc/openvswitch. If there are symlinks in the directory that link to the actual OVS files (for example, conf.db), the nsx-ovs and nsx-node-agent containers will not run.
- Issue 2451442: After repeatedly restarting NCP and recreating a namespace, NCP might fail to allocate IP addresses to Pods
If you repeatedly delete and recreate the same namespace while restarting NCP, NCP might fail to allocate IP addresses to Pods in that namespace.
Workaround: Delete all stale NSX resources (logical routers, logical switches, and logical ports) associated with the namespace, and then recreate them.
Known Issues
- Issue 2131494: NGINX Kubernetes Ingress still works after changing the Ingress class from nginx to nsx
When you create an NGINX Kubernetes Ingress, NGINX create traffic forwarding rules. If you change the Ingress class to any other value, NGINX does not delete the rules and continues to apply them, even if you delete the Kubernetes Ingress after changing the class. This is a limitation of NGINX.
Workaround: To delete the rules created by NGINX, delete the Kubernetes Ingress when the class value is nginx. Than re-create the Kubernetes Ingress.
- For a Kubernetes service of type ClusterIP, Client-IP based session affinity is not supported
NCP does not support Client-IP based session affinity for a Kubernetes service of type ClusterIP.
Workaround: None
- For a Kubernetes service of type ClusterIP, the hairpin-mode flag is not supported
NCP does not support the hairpin-mode flag for a Kubernetes service of type ClusterIP.
Workaround: None
- Issue 2192489: After disabling 'BOSH DNS server' in PAS director config, the Bosh DNS server (169.254.0.2) still appears in the container's resolve.conf file.
In a PAS environment running PAS 2.2, after you disable 'BOSH DNS server' in PAS director config, the Bosh DNS server (169.254.0.2) still appears in the container's resove.conf file. This causes a ping command with a fully qualified domain name to take a long time. This issue does not exist with PAS 2.1.
Workaround: None. This is a PAS issue.
- Issue 2224218: After a service or app is deleted, it takes 2 minutes to release the SNAT IP back to the IP pool
If you delete a service or app and recreate it within 2 minutes, it will get a new SNAT IP from the IP pool.
Workaround: After deleting a service or app, wait 2 minutes before recreating it if you want to reuse the same IP.
- Issue 2330811: When creating Kubernetes services of type LoadBalancer while NCP is down, the services might not get created when NCP is restarted
When NSX-T resources are exhausted for Kubernetes services of type LoadBalancer, you can create new services after deleting some of the existing services. However, if you delete and create the services while NCP is down, NCP will fail to create the new services.
Workaround: When NSX-T resources are exhausted for Kubernetes services of type LoadBalancer, do not perform both the delete and the create operations while NCP is down.
- Issue 2397438: After a restart, NCP reports MultipleObjects error
Before the restart, NCP failed to create distributed firewall sections because of a ServerOverload error. NCP retried until the maximum number of attempts was reached. However, the firewall sections were still created. When NCP was restarted, it received duplicate firewall sections and reports the MultipleObjects error.
Workaround: Manually delete the stale and duplicate distributed firewall sections and then restart NCP.
- Issue 2397684: NCP found the correct transport zone but then failed with the error "Default transport-zone is not configured"
When you create Kubernetes namespaces with policy API-based NCP, the infra segment creation might fail due to the presence of multiple overlay transport zones in NSX-T. This issue occurs if none of the overlay transport zone is marked as default.
Workaround: Update the overlay transport zone, configured in NCP ConfigMap, and set the "is_default" field to "True".
- Issue 2404302: If multiple load balancer application profiles for the same resource type (for example, HTTP) exist on NSX-T, NCP will choose any one of them to attach to the Virtual Servers.
If multiple HTTP load balancer application profiles exist on NSX-T, NCP will choose any one of them with the appropriate x_forwarded_for configuration to attach to the HTTP and HTTPS Virtual Server. If multiple FastTCP and UDP application profiles exist on NSX-T, NCP will choose any one of them to attach to the TCP and UDP Virtual Servers, respectively. The load balancer application profiles might have been created by different applications with different settings. If NCP chooses to attach one of these load balancer application profiles to the NCP-created Virtual Servers, it might break the workflow of other applications.
Workaround: None
- Issue 2397621: OpenShift installation fails
OpenShift installation expects a node's status to be ready and this is possible after the installation of the CNI plugin. In this release there is no separate CNI plugin file, causing OpenShift installation to fail.
Workaround: Create the /etc/cni/net.d directory on each node before starting the installation.
- Issue 2408100: In a large Kubernetes cluster with multiple NCP instances in active-standby mode or liveness probe enabled, NCP frequently restarts
In a large Kubernetes cluster (about 25,000 pods, 2,500 namespaces and 2,500 network policies), if multiple NCP instances are running in active-standby mode, or if liveness probe is enabled, NCP processes might be killed and restarted frequently due to "Acquiring lock conflicted" or liveness probe failure.
Workaround: Perform the following steps:
- Set
replicas
of NCP deployment to 1, or increase the configuration optionha.master_timeout
in ncp.ini from the default value 18 to 30. - Increase the liveness probe arguments as follows:
containers: - name: nsx-ncp livenessProbe: exec: command: - /bin/sh - -c - timeout 20 check_pod_liveness nsx-ncp initialDelaySeconds: 20 timeoutSeconds: 20 periodSeconds: 20 failureThreshold: 5
- Set
- Issue 2412421: Docker fails to restart a container
If (1) ConfigMap is updated, (2) the container uses subPath to mount the ConfigMap, and (3) the container is restarted, then Docker fails to start the container.
Workaround: Delete the Pod so that the DaemonSet will re-create the Pod.
- Issue 2413383: OpenShift upgrade fails because not all nodes are ready
By default the NCP bootstrap pod is not scheduled on the master node. As a result, the master node status is always Not Ready.
Workaround: Assign the master node with the role "compute" to allow nsx-ncp-bootstrap and nsx-node-agent DaemonSets to create pods. The node status will change to "Ready" once the nsx-ncp-bootstrap installs the NSX-CNI.
- Issue 2410909: After a restart, NCP may take a long time to initialize its cache in a large-scale environment (especially if there are many network policies), and can take around half an hour to come up and process new pods and resources
After a restart, NCP can take a long time to come up. The processing of resources such as pods, namespaces and network policies might take an additional amount of time depending on the quantity of resources involved.
Workaround: None
- Issue 2447127: When upgrading NCP from 2.4.1 to 2.5.0 or 2.5.1, it might take NCP extra time to be up and running
During the upgrade of NCP from 2.4.1 to 2.5.x, NSX-T 2.4.1 might have an issue of slow response when NCP calls the switching profile API for leader election. This causes NCP to take several extra minutes to be up and running.
Workaround: None.
- Issue 2460219: HTTP redirect does not work without a default server pool
If the HTTP virtual server is not bound to a server pool, HTTP redirect fails. This issue occurs in NSX-T 2.5.0 and earlier releases.
Workaround: Create a default server pool or upgrade to NSX-T 2.5.1.
- Issue 2518111: NCP fails to delete NSX-T resources that have been updated from NSX-T
NCP creates NSX-T resources based on the configurations that you specify. If you make any updates to those NSX-T resources through NSX Manager or the NSX-T API, NCP might fail to delete those resources and re-create them when it is necessary to do so.
Workaround: Do not update NSX-T resources created by NCP through NSX Manager or the NSX-T API.
- Issue 2518312: NCP bootstrap container fails to install nsx-ovs kernel module on Ubuntu 18.04.4, kernel 4.15.0-88
The NCP bootstrap container (nsx-ncp-bootstrap) fails to install nsx-ovs kernel module on Ubuntu 18.04.4, kernel 4.15.0-88.
Do not install NSX-OVS on this kernel by setting use_nsx_ovs_kernel_module = False in nsx-node-agent-config. Instead, use the upstream OVS kernel module (Ubuntu comes by default with an OVS kernel module) on the host. If there is no OVS kernel module on the host, either install OVS kernel module manually and set use_nsx_ovs_kernel_module = False in nsx-node-agent-config, or downgrade the kernel version to 4.15.0-76 so that NSX-OVS can be installed.
- Issue 2524778: NSX Manager shows NCP as down or unhealthy after the NCP master node is deleted
After an NCP master node is deleted, for example, after a successful switch-over to a backup node, the health status of NCP still says down when it should be up.
Workaround: Use the Manager API DELETE /api/v1/systemhealth/container-cluster/<cluster-id>/ncp/status to clear the stale status manually.
- Issue 2517201: Unable to create a pod on an ESXi host
After removing an ESXi host from a vSphere cluster and adding it back to the cluster, creating a pod on the host fails.
Workaround: Reboot NCP.
- Issue 3033821: After manager-to-policy migration, distributed firewall rules not enforced correctly
After a manager-to-policy migration, newly created network policy-related distributed firewall (DFW) rules will have higher priority than the migrated DFW rules.
Workaround: Use the policy API to change the sequence of DFW rules as needed.