How does data processing pipeline behave in boundary conditions such as when platform-collector server communication breaks?
- What is the default retention period?
30 days. It can be increased from UI with Enterprise License. Note: When increasing make sure to follow disk guidelines.
- How data is handled on collector?
All data on collector is converted to SDM (Self Describing Message) before sending it to platform including flow data. It includes all config, inventory and metric data from any data source. If platform is not reachable or SDM upload to Kafka queue fails then they are written on disk on collector VM (under /var/BLOB_STORE).
- When will data start to purge on collector?
For non-flow data: There is 10GB space allocated to store SDMs on disk (BLOB_STORE). When this store fills, collector starts deleting older SDMs and adds new SDMs to the disk. It depends on the size of data gathered from all data sources how quickly this limit is breached.For flow data: There is 15 GB space allocated to store raw flows (under /var/flows/vds/nfcapd). As soon as this space is consumed flow processor starts deleting older flow files. At incoming raw flows rate of ~2M/min it would take ~10hrs till rotation start to occur.
- What is the purge logic?
Oldest SDMs get deleted first.
- When will new data stop being processed in collector?
Never, as long as services are running properly.
- Assuming disconnect between platform and collector and No purge condition met, would all data be reconciled on Platform on re-connect?
All data stored on disk will be sent to platform. It should be reconciled completely except if data loss conditions exist on platform (more info below).
- What are the conditions when data loss can occur on Platform?
Platform starts to drop SDMs that are on Kafka queue for more than 6hrs (18hrs if it is a 3-node cluster). Another possibility is if the queue is saturated. It can happen when there is Lag built up in system and incoming data rate is high.
- Will latest SDM be published first or earliest one in that order?
Oldest SDMs are sent first. There is one known issue until v3.9 which will result in some data loss. Contact GSS for more information.
- Is data stored on disk in collector and then pushed to Platform when there is no communication problem?
If there is no communication issue then SDMs are not stored on disk. It is sent to platform from memory itself. Whenever collector receives that there was a problem in sending SDM then only it is stored on disk.
- In an event of any issue how collector learn which was last processed flow file?
Flow-processor maintains bookmark in DB on which was last processed nfcapd file.
- What is the max size of SDM that can be processed without any issue? How can user learn about this breach?
There is 15MB limit on SDM size. Starting v3.9 an event is raised whenever platform drops large SDM.