From the NSX Manager UI, you can view NSX Application Platform point-in-time and time-series metrics.

To view the metrics, log in to the NSX Manager UI from a browser and navigate to System > NSX Application Platform. Click the Metrics tab. The following metrics are displayed:

Metric Description
Infra Classifier

A mix of monitor-based reporting and the custom gRPC metrics.

The monitor-based metrics are reported once daily and capture any uncaught status of generic failure or Spark-operator status. The possible status values are:
  • COMPLETED
  • FAILED
  • SUBMISSION_FAILED
  • FAILING
  • INVALIDATING
  • PENDING_RERUN
  • RUNNING
  • SUBMITTED
  • SUCCEEDING
  • UNKNOWN
  • NOT_INITIATED
The gRPC metrics show the reason for a graceful shutdown or a failure to run a task or service. The possible reasons are:
  • INSUFFICIENT_MEMORY
  • INSUFFICIENT_FLOWS
  • INSUFFICIENT_DAYS
  • FAILED
Recommendation Monitoring Job A Spark job runs hourly and monitors the READY_TO_PUBLISH jobs. It reports changes in the recommendation to run a job, and if necessary, suggests a rerun of the job.
The possible status values are:
  • COMPLETED
  • FAILED
  • SUBMISSION_FAILED
  • FAILING
  • INVALIDATING
  • NOT_AVAILABLE
  • PENDING_RERUN
  • RUNNING
  • SUBMITTED
  • SUCCEEDING
  • UNKNOWN
  • NOT_INITIATED
Flow Clustering Job The status of the flow clustering job that runs every hour. The possible status values are:
  • RUNNING - The clustering job is currently running.
  • SUCCEEDED - The last running job completed successfully.
  • FAILED - The last running job failed.
Flow Ingestion This metric indicates whether flow ingestion is paused or enabled depending on the disk usage. The possible status values are:
  • DISABLED - Ingestion is disabled.
  • ENABLED - Ingestion is enabled.
Suspicious Traffic Detectors After every run, each of the Security Intelligence detectors will have one of the following statuses. The status is displayed only for enabled detectors and not for detectors that are in the NOT_STARTED state.
  • NOT_STARTED - The NTA detector is not enabled and has never been run on the onboarded site.
  • SUCCESS - The detector successfully completed execution.
  • NOT_ENOUGH_BASELINE - The baseline detectors (VERTICAL_PORT_SCAN, LLMNR_NBTNS, REMOTE_SERVICES and UNCOMMONPORT) finished execution successfully but could not report events because the baseline size was insufficient for event detection.
  • FAILURE - The detector failed to execute.
Kafka Message Lag The average message delay for each Kafka topic.
Druid Task Failures Druid failure task count. A task can be a reindex task, a Kafka ingestion task, or a compaction task on flow table and configuration table.
Intelligence Configuration Updates The number of Security Intelligence new configurations per config-type, identified hourly.
Average CPU Usage (%) on Node The average CPU usage on all NSX Application Platform Kubernetes nodes.
Druid Average Retention Days The Druid retention days for the table correlated_flow_viz. The default is 30 days.
Total Flows and Unique Flows The total and unique flows in the entire Druid database when queried. One month data is available. This job runs once a day.
Kafka Average Message Input Rate The average incoming message rate of all Kafka topic.