union command: Overview, syntax, and usage

The SPL2 union command merges the results from two or more datasets into one larger dataset. One of the datasets can be the incoming search results that are then piped into the union command and merged with a second dataset.

Use these links to quickly navigate to the main sections in this topic:

How the SPL2 union command works

You can use the SPL2 union command to merge datasets that have identical or different field names. The following sections illustrate how to use the union command in both of these situations.

Datasets with identical field names

Consider the following two datasets:

products-amer

productID product_name supplierID supplier_name categoryID
BS-AG-G09 Benign Space Debris A51G-USA Area 51 Games ARCADE
SF-BVS-G01 Grand Theft Scooter IP-PAN Isthmus Pastimes ARCADE

products-apac

productID product_name supplierID supplier_name categoryID
DC-SG-G02 Dream Crusher PMG-KOR Play More Games STRATEGY
PZ-SG-G05 Puppies vs. Zombies TF-JAP Tiger Fun STRATEGY
SC-MG-G10 SIM Cubicle PMG-KOR Play More Games SIMULATION

You can use the SPL2 union command to bring these dataset together. For example:

The results look something like this:

Datasets with different field names

Consider the following events from two datasets:

products-apac

productID product_name supplierID supplier_name categoryID
DC-SG-G02 Dream Crusher PMG-KOR Play More Games STRATEGY

suppliers_apac

supplierId supplier_name contact_name email address
PMG-KOR Play More Games Vanya Patel vanya@sample.com 234 Sejong-daero ... Seoul South Korea

Notice that both events have a field called supplier_name and fields for the supplier ID, but with different capitalization: supplierID and supplierId.

You can use the union command to bring these dataset together. For example:

When the datasets are unioned, the fields from both datasets added to the output. The NULL value is added to fields that were not in the original event.

The results look something like this:

Both events have a field called supplier_name that appears in the output with the same value. However, because the supplier ID fields have different capitalization, both fields appear in the output, even though the fields have the same value.

Syntax

The required syntax is in bold.

union

<dataset> ["," <dataset>...]

Required arguments

dataset

Syntax: [ <dataset-kind>"."]<dataset-name>

Description: The dataset that you want to perform the union on. Because dataset names must be unique, you only need to specify the dataset kind for built-in datasets that include the kind. The dataset can be the incoming set of search results, a dataset that has been defined in the Metadata Catalog, or a literal dataset that you type in. To perform a union with the incoming search results, you only need to specify one dataset. See the Usage section.

Usage

The union command is a generating command. Generating commands fetch information from the datasets, without any transformations.

You can use the union command at the beginning of your search to combine two datasets or later in your search where you can combine the incoming search results with a dataset.

Specifying a dataset

You can declare, or specify, a dataset several different ways. Here are some examples:

Type of declaration Description Example
Dataset references Specifying an existing dataset. The datasets in this example are indexes.
Transient Specifying a SPL subsearch as the dataset. Subsearches are enclosed in square brackets.
Fluent The search results that are piped into the union command are referred to as a fluent dataset. This type of declaration has a union command that contains one or more subsearches.
Literal Using literal values that you type in as subsearches. Each subsearch is a dataset. This example shows three separate literal dataset declarations.
Mixed Specifying a mixture of the types of declarations.

Semantics

If all of the datasets that are unioned together are streamable time-series, the union command attempts to interleave the data from all datasets into one globally sorted list of events or metrics. The list is based on the _time field in descending order. Otherwise, the union command returns all the rows from the first dataset, followed by all the rows from the second dataset, and so on.

Interleaving results

When two datasets are retrieved from disk in time descending order, which is the default sort order, the union command interleaves the results. The interleave is based on the _time field. For example, suppose you have the following datasets:

dataset_A

_time Host Bytes
4 mailsrv1 2412
1 dns15 231

dataset_B

_time Host Bytes
3 router1 23
2 dns12 22o

Both datasets are descending order by _time. When | union dataset_A, dataset_B is run, the following dataset is the result.

_time Host Bytes
4 mailsrv1 2412
3 router1 23
2 dns12 22o
1 dns15 231