Remove duplicate fields from pipelines

Remove duplicate fields from Edge Processor pipelines

Remove duplicate fields from pipelines using the dedup command.

The SPL2 dedup command removes events that contain an identical combination of values for the fields that you specify.

This lets you specify the number of duplicate events to keep for each value of a single field, or for each combination of values among several fields.

Overview

Removing duplicate fields from your pipeline requires the following tasks:

Identify your data source: Determine the pipeline input and the fields causing duplication.
Select deduplication strategy: Choose between a visual UI configuration or custom SPL2 code.
Define scope: Specify the fields for the dedup command.
Configure time constraints: Set the span and TTL to define how long the processor should "remember" an event.
Validate: Run the pipeline in "Preview" mode to verify the reduction in event volume.

How duplicates are identified

Duplicate events are identified by determining how far back in time, and across how many events can be remembered in order to identify a duplicate. Deduplication effectiveness depends on the runtime context (batch vs. instance vs. inter-batch) and the configuration of memory/TTL constraints.

Steps

Configure using the UI or with custom SPL2 code.

Configure pipeline to remove duplicate events using the Data Management UI

Complete the following steps to remove duplicate fields from your pipeline.

On the Pipelines page of your Data Management instance, navigate to the pipeline that you want to deduplicate, and then select Edit.
Select the plus icon () next to Actions.
Select Remove duplicates for.
On the Remove duplicate field values page, set your desired deduplication parameters, and click Apply.
Select Next to confirm the deduplicated data.
Run a preview of your pipeline to verify your changes.
Select Done to confirm changes.

Configure pipeline to remove duplicate events using custom SPL2 code

Complete the following steps to remove duplicate fields from your pipeline.

On the Pipelines page of your Data Management instance, navigate to the pipeline that you want to deduplicate, and then select Edit.
In the pipeline editor menu, navigate to the fields that you want to deduplicate.
Create your desired SPL2 code.
Click the Preview button to review your changes.
Save your changes.

Examples of deduplication searches

The following are examples of SPL2 searches that utilize the dedup function.

Deduplicate by host within a batch.

PYTHON

from $source | dedup host, batch_id()

from $source | dedup host, batch_id()

Time-Interval Deduplication: Using spans to manage high-volume event streams.

PYTHON

from $source | eval field_with_batch_id = batch_id() | dedup host, field_with_batch_id, span(_time, 5m)

from $source | eval field_with_batch_id = batch_id() | dedup host, field_with_batch_id, span(_time, 5m)

Memory-Constrained Environments: Using @maxmem as a runtime hint.

PYTHON

from $source | @maxmem('1GB') dedup host, batch_id()

from $source | @maxmem('1GB') dedup host, batch_id()

Splunk Cloud Platform

Remove duplicate fields from pipelines

Overview

How duplicates are identified

Steps

Configure pipeline to remove duplicate events using the Data Management UI

Configure pipeline to remove duplicate events using custom SPL2 code

Examples of deduplication searches

Deduplicate by host within a batch.

Time-Interval Deduplication: Using spans to manage high-volume event streams.

Memory-Constrained Environments: Using @maxmem as a runtime hint.

See also

See Also

ON THIS PAGE

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

Enterprise Security

SOAR

IT Service Intelligence

Content Packs

Splunk Observability Cloud

AppDynamics SaaS

AppDynamics On-Premises

Virtual Appliance (Self-Hosted)

Developer Documentation

Splunkbase

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

DATA MANAGEMENT

SEARCH AND ANALYTICS

ADMINISTRATION

Enterprise Security

SOAR

ENTERPRISE SECURITY

SOAR

RELATED APPS

IT Service Intelligence

Content Packs

ITSI

IT Ops

ADMINISTRATION

EXTENSIONS

Splunk Observability Cloud

MONITORING

DATA MANAGEMENT

ADMINISTRATION

AppDynamics SaaS

AppDynamics On-Premises

Virtual Appliance (Self-Hosted)

ESSENTIALS

MONITORING

ADMINISTRATION

Developer Documentation

Splunkbase

PLATFORM

OBSERVABILITY

REFERENCE

Resources

REFERENCE

Learn More

Support

Remove duplicate fields from pipelines

Overview

How duplicates are identified

Steps

Configure pipeline to remove duplicate events using the Data Management UI

Configure pipeline to remove duplicate events using custom SPL2 code

Examples of deduplication searches

Deduplicate by host within a batch.

Time-Interval Deduplication: Using spans to manage high-volume event streams.

Memory-Constrained Environments: Using @maxmem as a runtime hint.

See also

See Also