The regex parser enables the use of some regular expressions for collected data.
VMware Aria Operations for Logs agents use the C++ Boost library regex, which is in Perl syntax. The regex parser can be defined by specifying a regular expression pattern that contains named capture groups. For example: (?<field_1>\d{4})[-](?<field_2>\d{4})[-](?<field_3>\d{4})[-](?<field_4>\d{4})
- Names specified in the regular expression pattern must be valid field names for VMware Aria Operations for Logs.
- The names can contain only alphanumeric characters and the underscore “ _ “ character.
- The name cannot start with a digital character.
If invalid names are provided, configuration fails.
Regex Parser Options
The only required option for the regex parser is the format option.
The debug option can be used when additional debugging information is needed.
Configuration
To create a regex parser, use regex as a base_parser and provide the format option.
Regex Configuration Examples
The following example can be used to analyze 1234-5678-9123-4567:
[parser|regex_parser] base_parser=regex format=(?<tag1>\d{4})[-](?<tag2>\d{4})[-](?<tag3>\d{4})[-](?<tag4>\d{4}) [filelog|some_info] directory=D:\Logs include=*.txt parser=regex_parser
The results show:
tag1=1234 tag2=5678 tag3=9123 tag4=4567
To parse Apache logs with the regex parser, provide the specific regex format for Apache logs:
[parser|regex_parser] base_parser=regex format=(?<remote_host>.*) (?<remote_log_name>.*) (?<remote_auth_user>.*) \[(?<log_timestamp>.*)\] "(?<request>.*)" (?<status_code>.*) (?<response_size>.*)
The results show:
127.0.0.1 - admin [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 remote_host=127.0.0.1 remote_log_name=- remote_auth_user=admin log_timestamp=10/Oct/2000:13:55:36 -0700 request=GET /apache_pb.gif HTTP/1.0 status_code=200 response_size=2326
[parser|regex_parser] base_parser=regex format=(?<remote_host>.* (?<remote_log_name>.*)) (?<remote_auth_user>.*) \[(?<log_timestamp>.*)\] "(?<request>.* (?<resource>.*) (?<protocol>.*))" (?<status_code>.*) (?<response_size>.*) 127.0.0.1 unknown - [17/Nov/2015:15:17:54 +0400] \"GET /index.php HTTP/1.1\" 200 4868 remote_host=127.0.0.1 unknown remote_log_name=unknown remote_auth_user=- log_timestamp=17/Nov/2015:15:17:54 +0400 request=GET /index.php HTTP/1.1 resource=/index.php protocol=HTTP/1.1 status_code=200 response_size=4868
Performance Considerations
The regex parser consumes more resources than other parsers, such as the CLF parser. If you can parse logs with other parsers, consider using those parsers instead of the regex parser to achieve better performance.
(?<remote_host>\d+.\d+.\d+.\d+) (?<remote_log_name>.*) (?<remote_auth_user>.*) \[(?<log_timestamp>.*)\] "(?<request>.*)" (?<status_code>\d+) (?<response_size>\d+)