The VMware Telco Cloud Service Assurance alarming service is a configurable facility for generating alarm events when metric values exceed configured thresholds. Example includes raising an alarm when CPU utilization of a particular device exceeds a threshold over a period. Or when available disk space on a device falls below a threshold over a period, so that manual or automated remedial action can be taken.
The alarming service works with the streaming service described in the Streaming section of this guide, and the Service Assurance Manager (SAM) described elsewhere. The user defines thresholds on raw metrics in the streaming service. They can then define an alarm, which associates a name and severity with a threshold crossing. Generated alarms can be used to trigger notifications, such as emails or SMS messages, and can be otherwise managed in the SAM.
Alarms can be defined and edited in the Alarming tab in the VMware Telco Cloud Service Assurance user interface; refer to the following sections for details.
The following diagram provides a high-level overview of the alarming component placed in the context of the entire system.
Alarming components comprise of the Alarming Graphical User Interface, Alarming Rest API, and Alarming Flink jobs.
Alarming Graphical User Interface Component
- Access the User Interface to create new alarm definitions.
- Perform the following operations on an existing alarm:
- Deploy alarm monitoring as a Flink job.
- Stop the alarm monitoring by stopping the Flink job.
- Check the alarm deployment status.
- Change the alarm definition.
- Delete the alarm definition from the system.
- Create User Interface widgets to simplify the alarm creation and the management tasks.
Alarming REST API Component
- Provide REST API for User Interface to perform alarm definition creation and management.
- Provide utility tool that can convert alarm definitions into Alarm Template Configurations.
- Provide utility tool to interact with the frontend of Flink platform to perform the Flink job submission and the job management.
- Provide utility tool to access and interact with other services of VMware Telco Cloud Service Assurance such as Metric Catalog service and Topology service.
Alarming Flink Job Component
- Create Alarm Rule Templates that can meet the alarm monitoring needs derived from alarm use cases that the User Interface mockup supports.
- Define Rule Template Configurations that specify the variables for the Rule Templates collected through User Interface.
- Load Alarm Rule Templates into Flink platform.
- Instantiate Alarm Rule Template instances for alarm jobs based on the Rule Configuration files that are submitted when deploying the job.
- Listen to specific Kafka for specific topics specified in the Rule Configuration file for a job, and perform metrics filtering defined in the same Rule Configuration file.
- Perform alarm conditions computation against filtered metrics.
- Generate alarm active or inactive notification, and perform alarm status transition management including alarm timeout management.
- Construct and enrich alarm notifications with user customization information defined through alarm tags.