rex command: Overview, syntax, and usage

Use the SPL2 rex command to either extract fields using regular expression named groups, or replace or substitute characters in a field using sed expressions.

The rex command matches the value of the specified field against the unanchored regular expression and extracts the named groups into fields of the corresponding names.

When mode=sed, the given sed expression used to replace or substitute characters is applied to the value of the chosen field. This sed-syntax is also used to mask sensitive data at index-time.

Note: If a field is not specified, the regular expression or sed expression is applied to the _raw field. Running the rex command against the _raw field might have a performance impact.

Use the rex command for search-time field extraction or string replacement and character substitution.

Use these links to quickly navigate to the main sections in this topic:

Syntax

The required syntax is in bold.

rex

[field=<field>] [max_match=<int>] [offset_field=<string>]

( <regex-expression> | mode=sed <sed-expression> )

Required arguments

You must specify either <regex-expression> or mode=sed <sed-expression> when you use the rex command.

regex-expression

Syntax: <string>

Description: The regular expression using the perl-compatible regular expressions (PCRE) format that defines the information to match and extract from the specified field. Quotation marks are required.

Note: Starting March 5, 2025, all new pipelines will use PCRE2 syntax by default, with no option to use RE2. All existing pipelines can continue using RE2. Starting June 5, 2025, RE2 support ends completely. All pipelines (new and existing) must use PCRE2 syntax. RE2 and PCRE accept different syntax for named capture groups.
See Regular expression syntax for Edge Processor pipelines in Use Edge Processors.
See Regular expression syntax for Ingest Processor pipelines in Use Ingest Processors.

mode

Syntax: mode=sed

Description: Specify to indicate that you are using a sed (UNIX stream editor) expression.

sed-expression

Syntax: <string>

Description: When mode=sed, specify whether to replace strings (s) or substitute characters (y) in the matching regular expression. No other sed commands are implemented. Quotation marks are required. Sed mode supports the following flags: global (g) and Nth occurrence (N), where N is a number that is the character location in the string.

Optional arguments

field

Syntax: field=<field>

Description: The field that you want to extract information from.

Default: _raw

max_match

Syntax: max_match=<int>

Description: Controls the number of times the regular expression is matched. If greater than 1, the resulting fields are multivalued fields. You can use 0 for unlimited matches.

Default: 1

offset_field

Syntax: offset_field=<string>

Description: If provided, a field is created with the name specified by <string>. The value of this field has the endpoints of the match in terms of zero-offset characters into the matched field. For example, if the rex expression is (?<tenchars>.{10}), this matches the first ten characters of the field, and the offset_field contents is 0-9.

Default: None

Usage

SPL2 supports perl-compatible regular expressions (PCRE) for regular expressions.

Pipe characters

A pipe character ( | ) is used in regular expressions to specify an OR condition. For example, A or B is expressed as A | B.

Because pipe characters are used to separate commands in SPL2, you must enclose a regular expression that uses the pipe character in double quotation marks. For example:

This is interpreted by SPL2 as a search for the text "expression" OR "with pipe".

Escaping characters with backslashes

The backslash ( \ ) character is used to ignore, or escape, most special characters in regular expressions.

Character classes and string expressions

Regular expressions that include a character class, such as \d or \w, can be specified using one of two methods. The following table describes the methods and shows an example:

Description Example
Enclose the string expression in quotation marks and escape the backslash character in the character class.
Enclose the string expression in forward ( / ) slashes. You don't need to escape the backslash character in the character class.

Period characters

The period ( . ) character is used in a regular expression to match any character, except a line break character. If you want to match a period character, you must escape the period character by specifying \. in your regular expression.

Asterisk characters

The asterisk ( * ) character is a reserved character in SPL2 and can't be escaped. SPL2 uses the asterisk as a wildcard character.

Double backslash characters

When a search includes a regular expression that contains a double backslash, for example to represent a file path like c:\\temp, the search interprets the first backslash as an escape character. The file path is interpreted as c:\temp. One of the backslashes is removed.

You must escape both backslash characters in a file path by specifying 4 consecutive backslashes for the root portion of the file path. For example: c:\\\\temp. For a longer file path, such as c:\\temp\example, you would specify c:\\\\temp\\example in your regular expression.

Sed expression

When using the rex command in sed mode, you have two options: replace (s) or character substitution (y).

The syntax for using sed to replace (s) text in your data is: s/<regex>/<replacement>/<flags>

  • <regex> is a PCRE regular expression in searches and in pipelines, which can include capturing groups.
  • <replacement> is a string to replace the regex match. Use n for back references, where "n" is a single digit.
  • <flags> can be either: g to replace all matches, or a number to replace a specified match.

The syntax for using sed to substitute characters is: y/<string1>/<string2>/

  • This substitutes the characters that match <string1> with the characters in <string2>.

Differences between SPL and SPL2

The differences between the SPL and SPL2 rex command are described in these sections.

Support for raw string literals

SPL2 supports raw string literals.

Options must be specified before the expressions

The field option must be specified before the <regex-expression> or <sed-expression> argument.

Version Example Example
SPL ...rex "From: (?<from>.*) To: (?<to>.*)" field=myfield ...rex "From: (?<from>.*) To: (?<to>.*)" max_match=10 offset_field=newofield
SPL2 ...rex field=myfield "From: (?<from>.*) To: (?<to>.*)" ...rex max_match=10 offset_field=newofield "From: (?<from>.*) To: (?<to>.*)"

The max_match and offset_field options must be specified before the <regex-expression> argument.

Version Example
SPL ...rex "From: (?<from>.*) To: (?<to>.*)" max_match=10 offset_field=newofield
SPL2 ...rex max_match=10 offset_field=newofield "From: (?<from>.*) To: (?<to>.*)"