The regex parser enables the use of some regular expressions for collected data.

VMware Aria Operations for Logs agents use the C++ Boost library regex, which is in Perl syntax. The regex parser can be defined by specifying a regular expression pattern that contains named capture groups. For example: (?<field_1>\d{4})[-](?<field_2>\d{4})[-](?<field_3>\d{4})[-](?<field_4>\d{4})

The names specified in the groups (for example: field_1, field_2, field_3, and field_4) become names of the corresponding extracted fields. Names have the following requirements:
  • Names specified in the regular expression pattern must be valid field names for VMware Aria Operations for Logs.
  • The names can contain only alphanumeric characters and the underscore “ _ “ character.
  • The name cannot start with a digital character.

If invalid names are provided, configuration fails.

Regex Parser Options

The only required option for the regex parser is the format option.

The debug option can be used when additional debugging information is needed.

Configuration

To create a regex parser, use regex as a base_parser and provide the format option.

Regex Configuration Examples

The following example can be used to analyze 1234-5678-9123-4567:

[parser|regex_parser]
base_parser=regex
format=(?<tag1>\d{4})[-](?<tag2>\d{4})[-](?<tag3>\d{4})[-](?<tag4>\d{4})
[filelog|some_info]
directory=D:\Logs
include=*.txt
parser=regex_parser

The results show:

tag1=1234
tag2=5678
tag3=9123
tag4=4567

To parse Apache logs with the regex parser, provide the specific regex format for Apache logs:

[parser|regex_parser]
base_parser=regex
format=(?<remote_host>.*) (?<remote_log_name>.*) (?<remote_auth_user>.*) \[(?<log_timestamp>.*)\] "(?<request>.*)" (?<status_code>.*) (?<response_size>.*)

The results show:

127.0.0.1 - admin [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
remote_host=127.0.0.1
remote_log_name=-
remote_auth_user=admin
log_timestamp=10/Oct/2000:13:55:36 -0700
request=GET /apache_pb.gif HTTP/1.0
status_code=200
response_size=2326
The following code shows another example of parsing Apache logs.
[parser|regex_parser]
base_parser=regex
format=(?<remote_host>.* (?<remote_log_name>.*)) (?<remote_auth_user>.*) \[(?<log_timestamp>.*)\] "(?<request>.* (?<resource>.*) (?<protocol>.*))" (?<status_code>.*) (?<response_size>.*)
127.0.0.1 unknown - [17/Nov/2015:15:17:54 +0400] \"GET /index.php HTTP/1.1\" 200 4868
remote_host=127.0.0.1 unknown
remote_log_name=unknown
remote_auth_user=-
log_timestamp=17/Nov/2015:15:17:54 +0400
request=GET /index.php HTTP/1.1
resource=/index.php
protocol=HTTP/1.1
status_code=200
response_size=4868

Performance Considerations

The regex parser consumes more resources than other parsers, such as the CLF parser. If you can parse logs with other parsers, consider using those parsers instead of the regex parser to achieve better performance.

If a parser is not provided and you use the regex parser, define formats as clear as possible. The following example shows a configuration that provides better performance results. This example specifies fields that have digital values.
(?<remote_host>\d+.\d+.\d+.\d+) (?<remote_log_name>.*) (?<remote_auth_user>.*) \[(?<log_timestamp>.*)\] "(?<request>.*)" (?<status_code>\d+) (?<response_size>\d+)