VMware Cloud on AWS supports alarms for a subset of NSX events.
The following tables describe events that trigger NSX alarms inVMware Cloud on AWS, including alarm messages and recommended actions to resolve them. Any event with a severity greater than LOW triggers an alarm. For more information, see Working with Events and Alarms in the NSX Administration Guide. Some of the events, alarms and related features supported by NSX are not available in VMware Cloud on AWS.
Distributed Firewall Events
Event Name | Severity | Node Type | Alert Message | Recommended Action |
---|---|---|---|---|
DFW CPU Usage Very High | Critical | esx | DFW CPU usage is very high. When event detected: "The DFW CPU usage on Transport node {entity_id} has reached {system_resource_usage}% which is at or above the very high threshold value of {system_usage_threshold}%. " When event resolved: "The DFW CPU usage on Transport node {entity_id} has reached {system_resource_usage}% which is below the very high threshold value of {system_usage_threshold}%. " | Consider re-balancing the VM workloads on this host to other hosts. Review the security design for optimization. For example, use the apply-to configuration if the rules are not applicable to the entire datacenter. |
DFW VMotion Failure | Critical | esx | DFW vMotion failed, port disconnected. When event detected: "The DFW vMotion for DFW filter {entity_id} on destination host {transport_node_name} has failed and the port for the entity has been disconnected. " When event resolved: "The DFW configuration for DFW filter {entity_id} on the destination host {transport_node_name} has succeeded and error caused by DFW vMotion failure cleared. " | Check VMs on the host in NSX Manager, manually repush the DFW configuration through NSX Manager UI. The DFW policy to be repushed can be traced by the DFW filter {entity_id}. Also consider finding the VM to which the DFW filter is attached and restart it. |
DFW Session Count High | Critical | esx | DFW session count is high. When event detected: "The DFW session count is high on Transport node {entity_id}, it has reached {system_resource_usage}% which is at or above the threshold value of {system_usage_threshold}%. " When event resolved: "The DFW session count on Transport node {entity_id} has reached {system_resource_usage}% which is below the the threshold value of {system_usage_threshold}%. " | Review the network traffic load level of the workloads on the host. Consider re-balancing the workloads on this host to other hosts. |
IDS IPS Events
Event Name | Severity | Node Type | Alert Message | Recommended Action |
---|---|---|---|---|
IDPS Signature Bundle Download Failure | Medium | manager | Unable to download IDPS signature bundle from NTICS. When event detected: "Unable to download IDPS signature bundle from NTICS." When event resolved: "IDPS signature bundle download from NTICS was successful. |
Check if there is internet connectivity from NSX Manager to NTICS. |
Distributed IDS IPS Events
Event Name | Severity | Node Type | Alert Message | Recommended Action |
---|---|---|---|---|
NSX IDPS Engine Memory Usage High | Medium | esx | NSX-IDPS engine memory usage reaches 75% or above. When event detected: "NSX-IDPS engine memory usage has reached {system_resource_usage}%, which is at or above the high threshold value of 75%. " When event resolved: "NSX-IDPS engine memory usage has reached {system_resource_usage}%, which is below the high threshold value of 75%. " | Consider re-balancing the VM workloads on this host to other hosts. |
IPAM Events
Event Name | Severity | Node Type | Alert Message | Recommended Action |
---|---|---|---|---|
IP Block Usage Very High | Medium | manager | IP block usage is very high. When event detected: "IP block usage of {intent_path} is very high. IP block nearing its total capacity, creation of subnet using IP block might fail. When event resolved: "IP block usage of{intent_path} is below threshold level. " | Review IP block usage. Use new IP block for resource creation or delete unused IP subnet from the IP block. To check subnet being used for IP Block. From NSX UI, navigate to GET /policy/api/v1/infra/ip-pools/ip-pool/ip-subnets To get IP allocations, invoke the NSX API GET /policy/api/v1/infra/ip-pools/ip-pool/ip-allocations . Note: Deletion of IP pool/subnet should only be done if it does not have any allocated IPs and it is not going to be used in future. |
. Select IP pools where IP block being used, check Subnets and Allocated IPs column on UI. If no allocation has been used for the IP pool and it is not going to be used in future then delete subnet or IP pool. Use following API to check if IP block being used by IP pool and also check if any IP allocation done: To get configured subnets of an IP pool, invoke the NSX API
IP Pool Usage Very High | Medium | manager | IP pool usage is very high. When event detected: "IP pool usage of {intent_path} is very high. IP pool nearing its total capacity. Creation of entity/service depends on IP being allocated from IP pool might fail. When event resolved: "IP pool usage of {intent_path} is normal now. | Review IP pool usage. Release unused IP allocations from IP pool or create new IP pool and use it. From NSX UI navigate to Networking | IP Address pools | IP Address pools Allocated IPs column, this will show IPs allocated from the IP pool. If user see any IPs are not being used then those IPs can be released. To release unused IP allocations, invoke the NSX API DELETE /policy/api/v1/infra/ip-pools/ip-pool/ip-allocations/ip-allocation . |
. Select IP pools and check
Groups Events
Event Name | Severity | Node Type | Alert Message | Recommended Action |
---|---|---|---|---|
Active Directory Groups Modified | Medium | manager | Active Directory Groups are modified on AD server. When event detected: "Group When event resolved: "Group |
In the NSX UI, navigate to the | tab to update the group definition of the applicable group with the new base distinguished name. Make sure the group has valid identity group members.
Identity Firewall Events
Event Name | Severity | Node Type | Alert Message | Recommended Action |
---|---|---|---|---|
Connectivity to LDAP Service Lost | Critical | manager | Connectivity to LDAP server is lost. When event detected: "The connectivity to LDAP server {ldap_server} is lost. When event resolved: "The connectivity to LDAP server {ldap_server} is restored. " | Check
|