Refer to this section to troubleshoot NSX Advanced Load Balancer issues that you might encounter.
NSX Advanced Load Balancer Configuration Is Not Applied
When you deploy the Supervisor, the deployment does not complete and the NSX Advanced Load Balancer configuration is not applied.
Problem
The configuration of the NSX Advanced Load Balancer does not get applied if you provide a private Certificate Authority (CA) signed certificate.
You might see an error message with Unable to find certificate chain
in the log files of one of the NCP pods running on the Supervisor.
- Log in to the Supervisor VM.
- List all pods with the command kubectl get pods -A
- Get the logs from all the NCP pods on the Supervisor.
kubectl -n vmware-system-nsx logs nsx-ncp-<id> | grep -i alb
Cause
The Java SDK is used to establish communication between NCP and the NSX Advanced Load Balancer Controller. This error occurs when the NSX trust store is not synchronized with the Java certificate trust store.
Solution
ESXi Host Cannot Enter Maintenance Mode
You place an ESXi host in maintenance mode when you want to perform an upgrade.
Problem
The ESXi host cannot enter maintenance mode and can impact ESXi and NSX upgrade.
Cause
This can occur if there is a Service Engine in a powered-on state on the ESXi host.
Solution
- ♦ Power off the Service Engine so that the ESXi host can enter maintenance mode.
Troubleshooting IP Address Issues
Follow these troubleshooting tips if you encounter external IP assignment issues.
- Kubernetes resources, such as the gateways and ingress do not get an external IP from the AKO.
- External IPs that are assigned to Kubernetes resources are not reachable.
- External IPs that are incorrectly assigned.
Kubernetes resources do not get an external IP from the AKO
This error occurs when AKO cannot create the corresponding virtual service in the NSX Advanced Load Balancer Controller.
Check if the AKO pod is running. If the pod is running, check the AKO container logs for the error.
External IPs assigned to Kubernetes resources are not reachable
- The external IP is not available immediately but starts accepting traffic within a few minutes of creation. This occurs when a new service engine creation is triggered for virtual service placement.
- The external IP is not available because the corresponding virtual service shows an error.
A virtual service could indicate an error or appear red if there are no servers in the pool. This could occur if the Kubernetes gateway or ingress resource does not point an endpoint object.
To see the endpoints, run the kubectl get endpoints -n <servce_namespace>command and fix any selector label issues.
The pool could appear in a state of error when the Health Monitor shows the health of the pool servers as red.
- Verify if the pool servers or Kubernetes pods are listening on the configured port.
- Verify that there are no drop rules in the NSX DFW firewall that are blocking ingress or egress traffic on the service engines.
- Ensure there are no network policies in the Kubernetes environment that are blocking ingress or egress traffic on the service engines.
- Creation of Service Engines fails.
Creation of Service Engines can fail due to the following reasons:
- A license with insufficient resources is used in the NSX Advanced Load Balancer Controller.
- The number of Service Engines created in a Service Engine Group reached the maximum limit.
- The Service Engine Data NIC failed to acquire IP.
- Service Engine creation fails with an
Insufficient licensable resources available
error message.This error occurs if a license with insufficient resources was used to create the Service Engine.
Get a license with larger quota of resources and assign it to the NSX Advanced Load Balancer Controller.
- Service Engine creation fails with a
Reached configuration maximum limit
error message.This error occurs if the number of Service Engines created in a Service Engine Group reached the maximum limit.
To resolve this error, perform the following steps:- In the NSX Advanced Load Balancer Controller dashboard, select .
- Find the Service Engine group with the same name as the Supervisor in which the IP traffic failure is occurring and click the Edit icon.
- Configure a higher value for Number of Service Engines.
- The Service Engine Data NIC fails to acquire IP.
This error might occur if the DHCP IP pool has been exhausted for one of the following reasons:
- Too many Service Engines have been created for a large scale deployment.
- If a Service Engine is deleted directly from the NSX Advanced Load Balancer UI or the vSphere Client. Such a deletion does not release the DHCP address from the DHCP pool and leads to a LEASE Allocation Failure.
External IPs are incorrectly assigned
This error occurs when two ingresses in different namespaces share the same hostname. Check your configuration and verify that the same name is not given to two ingresses in different namespaces.
Troubleshooting Traffic Failure Issues
After you configure the NSX Advanced Load Balancer, traffic failures occur.
Problem
Traffic failures might occur when the endpoint for the service of type LB is in a different namespace.
Cause
In vSphere IaaS control plane environments configured with NSX Advanced Load Balancer, namespaces have a dedicated tier-1 gateway and each tier-1 gateway has a service engine segment with the same CIDR. Traffic failures might occur if the NSX Advanced Load Balancer service is in one namespace and the endpoints are in a different namespace. The failure occurs because the NSX Advanced Load Balancer assigns an external IP to the service and traffic to the external IP fails.
Solution
- ♦ To allow north-south traffic create a distributed firewall rule to allow ingress from the SNAT IP of the NSX Advanced Load Balancer service namespace.
Troubleshooting Issues caused by NSX Backup and Restore
NSX backup and restore can lead to traffic failure for all the external IPs provided by the NSX Advanced Load Balancer.
Problem
When you perform a backup and restore of NSX, it can lead to traffic failure.
Cause
This failure occurs as the Service Engine NICs do not come back up after a restore and as a result, the IP pool shows as down.
Solution
Stale Tier-1 Segments after NSX Backup and Restore
NSX backup and restore can restore stale tier-1 segments.
Problem
After an NSX backup and restore procedure, stale tier-1 segments that have Service Engine NICs do not get cleaned up.
Cause
When a namespace is deleted after an NSX backup, the restore operation restores stale tier-1 segments that are associated with the NSX Advanced Load Balancer Controller Service Engine NICs.
Solution
- Log in to the NSX Manager.
- Select .
- Find the stale segments that are associated with the deleted namespace.
- Delete the stale Service Engine NICs from the Ports/Interfaces section.