union command: Overview, syntax, and usage

How the SPL2 union command works

You can use the SPL2 union command to merge datasets that have identical or different field names. The following sections illustrate how to use the union command in both of these situations.

Datasets with identical field names

Consider the following two datasets:

products-amer


productID	product_name	supplierID	supplier_name	categoryID
BS-AG-G09	Benign Space Debris	A51G-USA	Area 51 Games	ARCADE
SF-BVS-G01	Grand Theft Scooter	IP-PAN	Isthmus Pastimes	ARCADE

products-apac


productID	product_name	supplierID	supplier_name	categoryID
DC-SG-G02	Dream Crusher	PMG-KOR	Play More Games	STRATEGY
PZ-SG-G05	Puppies vs. Zombies	TF-JAP	Tiger Fun	STRATEGY
SC-MG-G10	SIM Cubicle	PMG-KOR	Play More Games	SIMULATION

You can use the SPL2 union command to bring these dataset together. For example:

CODE

$products = union products-amer, products-apac

$products = union products-amer, products-apac

The results look something like this:


productID	product_name	supplierID	supplier_name	categoryID
BS-AG-G09	Benign Space Debris	A51G-USA	Area 51 Games	ARCADE
DC-SG-G02	Dream Crusher	PMG-KOR	Play More Games	STRATEGY
PZ-SG-G05	Puppies vs. Zombies	TF-JAP	Tiger Fun	STRATEGY
SC-MG-G10	SIM Cubicle	PMG-KOR	Play More Games	SIMULATION
SF-BVS-G01	Grand Theft Scooter	IP-PAN	Isthmus Pastimes	ARCADE

Datasets with different field names

Consider the following events from two datasets:

products-apac


productID	product_name	supplierID	supplier_name	categoryID
DC-SG-G02	Dream Crusher	PMG-KOR	Play More Games	STRATEGY

suppliers_apac


supplierId	supplier_name	contact_name	email	address
PMG-KOR	Play More Games	Vanya Patel	vanya@sample.com	234 Sejong-daero ... Seoul South Korea

Notice that both events have a field called supplier_name and fields for the supplier ID, but with different capitalization: supplierID and supplierId.

You can use the union command to bring these dataset together. For example:

CODE

$products = union products-apac, suppliers_apac

$products = union products-apac, suppliers_apac

When the datasets are unioned, the fields from both datasets added to the output. The NULL value is added to fields that were not in the original event.

The results look something like this:


address	categoryID	contact_name	email	productID	product_name	supplierID	supplierId	supplier_name
NULL	STRATEGY	NULL	NULL	DC-SG-G02	Dream Crusher	PMG-KOR	NULL	Play More Games
234 Sejong-daero ... Seoul South Korea	NULL	Vanya Patel	vanya@sample.com	NULL	NULL	NULL	PMG-KOR	Play More Games

Both events have a field called supplier_name that appears in the output with the same value. However, because the supplier ID fields have different capitalization, both fields appear in the output, even though the fields have the same value.

Syntax

The required syntax is in bold.

union

dataset ["," dataset...]

Required arguments

dataset

Syntax: [ dataset-kind"."]dataset-name

Description: The dataset that you want to perform the union on. Because dataset names must be unique, you only need to specify the dataset kind for built-in datasets that include the kind. The dataset can be the incoming set of search results, a dataset that has been defined in the Metadata Catalog, or a literal dataset that you type in. To perform a union with the incoming search results, you only need to specify one dataset. See the Usage section.

Usage

The union command is a generating command. Generating commands fetch information from the datasets, without any transformations.

You can use the union command at the beginning of your search to combine two datasets or later in your search where you can combine the incoming search results with a dataset.

Specifying a dataset

You can declare, or specify, a dataset several different ways. Here are some examples:


Type of declaration	Description	Example
Dataset references	Specifying an existing dataset. The datasets in this example are indexes.	CODE ...\| union main, customers, purchases `...\| union main, customers, purchases`
Transient	Specifying a SPL subsearch as the dataset. Subsearches are enclosed in square brackets.	PYTHON ...\| union [search main \| stats count() by host ], [from customers \| stats count() by host] `...\| union [search main \| stats count() by host ], [from customers \| stats count() by host]`
Fluent	The search results that are piped into the union command are referred to as a fluent dataset. This type of declaration has a union command that contains one or more subsearches.	CODE ... <some search criteria> \| union [<subsearch1>], [<subsearch2>] `... <some search criteria> \| union [<subsearch1>], [<subsearch2>]`
Literal	Using literal values that you type in as subsearches. Each subsearch is a dataset. This example shows three separate literal dataset declarations.	PYTHON from [{state:"Washington", population:39557045}] \| union [{state:"California", population:753591}, {state:"Oregon", population:4190713}] `from [{state:"Washington", population:39557045}] \| union [{state:"California", population:753591}, {state:"Oregon", population:4190713}]`
Mixed	Specifying a mixture of the types of declarations.	JSON ... \| <union ds1, [ <subsearch1> ], [ { "state": "Washington", "population": 39557045 } ] `... \| <union ds1, [ <subsearch1> ], [ { "state": "Washington", "population": 39557045 } ]`

Semantics

If all of the datasets that are unioned together are streamable time-series, the union command attempts to interleave the data from all datasets into one globally sorted list of events or metrics. The list is based on the _time field in descending order. Otherwise, the union command returns all the rows from the first dataset, followed by all the rows from the second dataset, and so on.

Interleaving results

When two datasets are retrieved from disk in time descending order, which is the default sort order, the union command interleaves the results. The interleave is based on the _time field. For example, suppose you have the following datasets:

dataset_A


_time	Host	Bytes
4	mailsrv1	2412
1	dns15	231

dataset_B


_time	Host	Bytes
3	router1	23
2	dns12	22o

Both datasets are descending order by _time. When | union dataset_A, dataset_B is run, the following dataset is the result.


_time	Host	Bytes
4	mailsrv1	2412
3	router1	23
2	dns12	22o
1	dns15	231

Splunk Cloud Platform

How the SPL2 union command works

Datasets with identical field names

Datasets with different field names

Syntax

Required arguments

Usage

Specifying a dataset

Semantics

Interleaving results

See also

ON THIS PAGE

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

Enterprise Security

SOAR

IT Service Intelligence

Content Packs

Splunk Observability Cloud

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

Developer Documentation

Splunkbase

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

DATA MANAGEMENT

SEARCH AND ANALYTICS

ADMINISTRATION

Enterprise Security

SOAR

ENTERPRISE SECURITY

SOAR

RELATED APPS

IT Service Intelligence

Content Packs

ITSI

IT Ops

ADMINISTRATION

EXTENSIONS

Splunk Observability Cloud

MONITORING

DATA MANAGEMENT

ADMINISTRATION

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

ESSENTIALS

MONITORING

ADMINISTRATION

Developer Documentation

Splunkbase

PLATFORM

OBSERVABILITY

REFERENCE

Resources

REFERENCE

Learn More

Support

union command: Overview, syntax, and usage

How the SPL2 union command works

Datasets with identical field names

Datasets with different field names

Syntax

Required arguments

Usage

Specifying a dataset

Semantics

Interleaving results

See also