Data enrichment streams pull in data from external sources and enrich other metrics or events with that external data. All metrics, events, and external data object are assumed to be coming in on a Kafka topic.

Here is a simple diagram showing how events and external data interact.

The data enricher works with any type of object, but by default it assumes that the objects are:
  • External Data: Objects with a key and a data field. The key is a unique identifier string, while the data field is a key-value map with all the enrichment data.
  • Events: The enrichment stream can handle Metric, Topology, or Event types.

The enricher has two modes of operation: Replicated and Partitioned. In replicated mode, the external data objects are replicated across the cluster so that a copy is present on each node. In partitioned mode, the external objects are spread out across the cluster and must be partitioned in the same way as the events so that they are enriched on the same node. Partitioned mode is scalable if the amount of external data is large.

Data Persistence

An Enrichment stream can be configured to persist the external data to disk so that it can be recovered in case of a restart. It uses Flink checkpoints to achieve this. On a checkpoint, the external data held in memory is written to disk on a kubernetes persistent volume. The checkpoint interval can be customized based on performance needs.

The data enricher has various attributes that can be set to customize its behavior:
data-type How the data enricher stores data. In Replicated mode, the data is copied to every node in the cluster. In Partitioned mode, the data is spread out over the cluster. Default: Replicated
event-key The key for the event objects. This key is used to match the event with its corresponding data object. "Instance" for Metrics. "InstanceName" for Events. "ID" for Topology
data-key The key for the data objects. Default: The "key" field.
event-map The location of the map in the event object where the enrichment data is added. Default: The "tags" field.
data-map The location of the map in the data object where the enrichment data is read. Default: The "data" field
event-partition-key Only used in partitioned mode. Determines how the events are partitioned. Must match the data object's partition key to guarantee they are on the same node during enrichment. Default: same as event-key
data-partition-key Only used in partitioned mode. Determines how the data objects are partitioned. Default: same as data-key

Changing the Existing Stream

When a user wants to change an existing stream, use the following sequence:
  1. Undeploy the default stream.
  2. Update the key as properites [dataSource].
  3. Save and deploy the enrichment stream.
  4. Add the enrichment input data to dataInput topic of the edge-kafka
  5. Create a collector (for example, Vipetela) corresponding to the dataSource given in the dataInput topic.
  6. Check the reports.