Aggregate event data using Edge Processor

Learn how to aggregate event data using Edge Processor to optimize data flow and reduce log volume by processing partial aggregations.

You can create a pipeline that aggregates your incoming event data to reduce the volume of raw logs being sent to your destination.

For example, assume that your Edge Processor receives 20 network flow logs that are emitted by 5 servers in your network, and that each log includes the number of bytes sent out from a given server. You can aggregate the data so that each log shows the sum of the bytes sent out by each server, potentially reducing the number of logs sent to your data destination from 20 to 5.

To aggregate data, you can use the stats SPL2 command in your pipeline. However, be aware that the stats command works differently in Edge Processor pipelines compared to when you use it in Splunk platform searches.

See the following sections on this page:

For detailed information about aggregations in Edge Processor pipelines and how they differ from search-time aggregations, see:
- How Edge Processors aggregate streaming data
- Differences between ingest-time aggregations and search-time aggregations
For instructions on configuring aggregations in your pipelines, see Create a pipeline to aggregate your data.
For examples of various ingest-time and search-time aggregation configurations, see Managing aggregations.

For comprehensive reference information about the stats command, see stats command overview in the SPL2 Search Reference.

How Edge Processors aggregate streaming data

Edge Processors aggregate continuously streaming data by holding and aggregating the incoming events within a given state window.

Edge Processors work with incoming data that is continuously streaming through the applied pipelines. To aggregate this data, each Edge Processor instance holds the incoming events and aggregates the collected events until specific conditions are met, at which point the instance emits the result to the next processing action in the pipeline. This process is repeated as more events flow through the pipeline.

The interval between when the Edge Processor instance starts holding events and when it performs the aggregation is known as a "state window". The state window determines the scope of the data that an Edge Processor instance includes in each aggregation, as well as the rate at which the instance emits aggregated data.

The number of incoming events that each state window contains is determined by several factors. The following table summarizes these determining factors and the related settings that you can configure, if applicable:


Determining factor	Configuration	For more information
The maximum length of time for which the Edge Processor instance is permitted to hold and aggregate events before emitting the result.	The `@maxdelay` setting in the `stats` command in the Edge Processor pipeline.	See Create a pipeline to aggregate your data.
The maximum amount of disk space that the Edge Processor instance is permitted to use to hold and aggregate events before emitting the result.	The `@maxdisk` setting in the `stats` command in the Edge Processor pipeline.	See Create a pipeline to aggregate your data.

Aggregations reduce the number of events that your Edge Processors send to your data destinations. You can improve the efficiency of these aggregations and maximize event reduction by including more events in each aggregation. For example, you can increase the @maxdelay and @maxdisk settings in your Edge Processor pipeline in order to increase the size of the state window and include more events per aggregation. However, be aware that increasing those values can cause the Edge Processor to take longer to emit each aggregation.

Differences between ingest-time aggregations and search-time aggregations

Learn how ingest-time aggregations in Edge Processor pipelines differ from search-time aggregations in the Splunk platform, including their data scope and supported stats command options.

Ingest-time aggregations apply to a different scope of data compared to search-time aggregations. Additionally, the stats SPL2 command supports different configuration options depending on whether you are using it in a pipeline or a search.

Scope of the data being aggregated

Due to the continuous nature of streaming data, each aggregation that the Edge Processor performs is scoped to the events that are processed by a single instance within a specific state window. Because these aggregations are based on a subset of all the incoming events, they are partial aggregations. By contrast, the aggregations performed through searches in the Splunk platform are finalized aggregations that are based on a complete set of indexed events.

If you want to produce a finalized aggregation of the data that is handled by your Edge Processors, you need to aggregate your data again at the data destination. For example, you would start by configuring an Edge Processor pipeline to perform partial aggregations and send the results to the Splunk platform. Then, you would run a search using the stats command in the Splunk platform to return an aggregated result that is based on all the indexed events.

For more information about finalizing your aggregations through searches, see Processing aggregations at search-time.

Supported configuration options for the stats command

The stats command supports different configuration options depending on whether you are using it in an Edge Processor pipeline or in a Splunk platform search:

In pipelines, the stats command supports the @maxdelay and @maxdisk annotations. You can configure these annotations to adjust the size of the state window, which determines the scope of the data included in each aggregation.
Pipelines and searches support different subsets of the optional arguments for the stats command.

The following table summarizes which optional arguments are available for use in pipelines as opposed to searches:


Optional argument	Can be used in an Edge Processor pipeline	Can be used in Splunk platform search
`all-num`		Yes
`by-clause`	Yes	Yes
`delim`		Yes
`mode`	Yes
`partitions`		Yes
`prestats`	Yes
`span`	Yes	Yes

Additionally, pipelines and searches support different statistical functions. For more information on the statistical functions that can be used with the stats command in a pipeline, see SPL2 statistical functions for Edge Processor Pipelines.

Note: While the avg function is not supported in Edge Processor pipelines, you can still calculate an aggregation with an average of values by using the sum and count functions in your pipeline, and then dividing the sum by the count at search time. See Processing aggregations at search-time for a full pipeline and search statement example.

Create a pipeline to aggregate your data

Create a data pipeline to aggregate and process event data using the pipeline editor.

To create a pipeline that aggregates the data being processed by each Edge Processor instance, use the Summarize data action in the pipeline editor to specify the fields you want to summarize.

Navigate to the Pipelines page and then select New pipeline, then Edge Processor pipeline.

Follow the on-screen instructions to define a partition, optionally enter sample data, and select a data destination. For detailed instructions, see Create pipelines for Edge Processors.

Be aware of the following when creating a pipeline for aggregations:

Your sample data must contain accurate examples of the values that you want to summarize into aggregations. For example, the following sample events represent requested files from a server and contains aggregable data:


server_name	server_ip	file_requested	bytes_out	sourcetype
web-01	10.1.2.3	/index.html	500	web
web-01	10.1.2.3	/images/logo.png	8500	web
web-02	10.1.2.4	/css/main.css	1200	web

On the Select a destination page, if you select a Splunk platform destination, you can configure index routing. As a best practice, avoid sending both non-aggregated and aggregated data to the same index.

After you complete the on-screen instructions, the pipeline editor displays the SPL2 statement for your pipeline.

Select the plus icon () in the Actions section and then select Summarize data.
The Summarize data dialog box opens. By default, the Aggregations section shows an unconfigured aggregation.
(Optional) Set the Maximum delay option to the maximum length of time for which the Edge Processor can hold and aggregate incoming events before it emits the result.
(Optional) Set the Maximum disk usage option to the maximum amount of disk space that the Edge Processor can use to hold and aggregate incoming events before it emits the result.

(Optional) If you plan to group your aggregations by the host, source, sourcetype, or index metadata fields, then set Output mode to one of the following:

Output mode name	Description
Summary mode	The Edge Processor adds the prefix `orig_` to the names of those metadata fields, so that you can use them to differentiate aggregated data from non-aggregated data.
Passthrough mode	The Edge Processor does not change the names of those metadata fields.

In the Aggregations section, do the following:
1. Set the Field option to the name of the event field that you want to aggregate.
2. Set the Calculation option to the type of calculation that you want to perform on the specified field.
3. (Optional) To specify the name of the event field that stores the aggregated data, select the Edit alias () icon and then enter the desired field name.
4. (Optional) You can configure multiple aggregations by selecting the Add aggregations () icon and then repeating steps 7a-c.
For example, using the sample data shown in step 2, you can calculate the sum of the values in the bytes_out field by setting the Field option to bytes_out and the Calculation option to Sum. To keep the name of the aggregated field as bytes_out instead of changing it to sum(bytes_out), set the alias of the aggregation to bytes_out.
(Optional) In the Group by section, you can specify one or more fields to group the aggregations by. Do the following:
1. Select the Add group by () icon.
2. In the new entry that appears in the Group by section, set the Field option to the name of the event field that you want to group the aggregations by.
3. (Optional) Repeat steps 8a-b as needed to define more groupings for the aggregations.
Continuing the example from step 7, you can group the sum of bytes_out by the server names given in the server_name field. To do this, set the Field option to server_name.

Select Apply.

The pipeline editor adds a stats command to your pipeline. If you added sample data to your pipeline, the preview results panel shows the results of the aggregation.

For example, if you used the sample data and example configurations described in the preceding steps, the pipeline editor shows the following SPL2 statement:

PYTHON

$pipeline = | from $source
| @maxdisk("1MB") @maxdelay("30seconds") stats mode=summary sum(bytes_out) as bytes_out by server_name
| into $destination;

$pipeline = | from $source
| @maxdisk("1MB") @maxdelay("30seconds") stats mode=summary sum(bytes_out) as bytes_out by server_name
| into $destination;

The aggregated results look like this:


server_name	bytes_out	_raw
web-01	9000	{"server_name": "web-01", "bytes_out": 9000}
web-02	1200	{"server_name", "web-02", "bytes_out": 1200}

Note: Your pipeline keeps the event fields specified in the Aggregations and Group by sections and drops the unspecified fields. In the example above, the pipeline drops the server_ip and file_requested fields because they are not included in the aggregation.

(Optional) If you plan to work with the aggregated data in the Splunk platform using operations that require the data to be in prestats format, then add the prestats argument to the stats command. In the SPL2 editor, do one of the following:
- To store prestats data values in the _raw field, type prestats=raw inside the stats command, in the space before the aggregation expression. For example:
  PYTHON
  $pipeline = | from $source | stats prestats=raw mode=summary sum(bytes_out) as bytes_out by server_name | into $destination;
```
$pipeline = | from $source
| stats prestats=raw mode=summary sum(bytes_out) as bytes_out by server_name 
| into $destination;
```
- To store prestats data values in top-level event fields, type prestats=fields inside the stats command, in the space before the aggregation expression. For example:
  PYTHON
  $pipeline = | from $source | stats prestats=fields mode=summary sum(bytes_out) as bytes_out by server_name | into $destination;
```
$pipeline = | from $source
| stats prestats=fields mode=summary sum(bytes_out) as bytes_out by server_name 
| into $destination;
```
Save your pipeline, and then apply it to your Edge Processors as needed. For more information, see Apply pipelines to Edge Processors.

You now have a pipeline that calculates aggregations from your event data.

Managing aggregations

Learn more about common aggregation patterns and how to finalize the partial aggregations from Edge Processor pipelines.

To learn more about common aggregation patterns and how to finalize the partial aggregations from Edge Processor pipelines, see the following:

Aggregation patterns

Use the stats command to aggregate data with either a summary pattern that reduces both events and fields or a passthrough pattern that reduces events while preserving the original schema.

When configuring an aggregation using the stats command, you specify which fields from the original events to retain in the aggregated results. The following are two common aggregation patterns that are based on the number of fields retained:

Summary pattern: Reduce both the number of events and the number of fields in the events. This aggregation pattern reduces overall data volume as well as the size of the resulting events, but also changes the event schema. When aggregating data using the summary pattern, make sure to assign a different sourcetype value to the aggregated events so that events with different schemas are not categorized under the same source type.
Passthrough pattern: Reduce the number of events while retaining all the original event fields. This aggregation pattern reduces overall data volume without changing the event schema. When aggregating data using the passthrough pattern, there is no need to change the sourcetype value of the aggregated events.

The following examples demonstrate how to configure an Edge Processor pipeline to aggregate data using the summary pattern or the passthrough pattern:

Both examples use this sample data as input:


server_name	server_ip	file_requested	bytes_out	sourcetype
web-01	10.1.2.3	/index.html	500	web
web-01	10.1.2.3	/index.html	500	web
web-02	10.1.2.4	/css/main.css	1200	web

Example: Summary pattern

The following is the SPL2 syntax for a stats command in a pipeline that aggregates data using the summary pattern:

CODE

| stats mode=summary sum(bytes_out) AS bytes_out BY server_name

| stats mode=summary sum(bytes_out) AS bytes_out BY server_name

Compared to the original input data, the aggregated results contain fewer events and fields.


server_name	bytes_out	_raw
web-01	1000	{"server_name": "web-01", "bytes_out": 1000}
web-02	1200	{"server_name", "web-02", "bytes_out": 1200}

Example: Passthrough pattern

The following is the SPL2 syntax for a stats command in a pipeline that aggregates data using the passthrough pattern:

CODE

| stats mode=passthrough sum(bytes_out) AS bytes_out BY server_name, server_ip, file_requested, sourcetype

| stats mode=passthrough sum(bytes_out) AS bytes_out BY server_name, server_ip, file_requested, sourcetype

Compared to the original input data, the aggregated results contain fewer events but the schema of the data remains unchanged.


_raw	server_name	server_ip	file_requested	bytes_out	sourcetype
{"server_name": "web-01", "server_ip": "10.1.2.3", "file_requested": "/index.html", "bytes_out": 1000, "sourcetype": "web"}	web-01	10.1.2.3	/index.html	1000	web
{"server_name": "web-02", "server_ip": "10.1.2.4", "file_requested": "/css/main.css", "bytes_out": 1200, "sourcetype": "web"}	web-02	10.1.2.4	/css/main.css	1200	web

Processing aggregations at search-time

Use the Search & Reporting app to finalize partial aggregations from Edge Processor pipelines.

If you configure your pipeline to send the partially aggregated data to your Splunk platform deployment, you can finalize those aggregations by running a search in the Search & Reporting app using the stats command.

The following examples demonstrate how to write SPL searches that finalize aggregations from Edge Processor pipelines:

All examples use the following sample data as input:


_time	server_name	server_ip	file_requested	bytes_out	sourcetype
2025-01-01 12:00:05	web-01	10.1.2.3	/index.html	500	web
2025-01-01 12:01:22	web-01	10.1.2.3	/images/logo.png	8500	web
2025-01-01 12:00:28	web-02	10.1.2.4	/css/main.css	1200	web

Example: Sum

The SPL2 statement of the Edge Processor pipeline is as follows:

PYTHON

$pipeline = | from $source 
| stats mode=summary sum(bytes_out) as bytes_out BY server_name
| eval sourcetype="web:summary"

$pipeline = | from $source 
| stats mode=summary sum(bytes_out) as bytes_out BY server_name
| eval sourcetype="web:summary"

The aggregated results look like this:


server_name	bytes_out	_raw	sourcetype
web-01	9000	{"server_name": "web_01", "bytes_out":9000}	web:summary
web-02	1200	{"server_name": "web-02", "bytes_out":1200}	web:summary

To finalize the aggregation, run the following SPL search, where my_index is the name of an index:

CODE

index=<my_index> sourcetype="web:summary" | stats sum(bytes_out) BY server_name

index=<my_index> sourcetype="web:summary" | stats sum(bytes_out) BY server_name

Example: Count

The SPL2 statement of the Edge Processor pipeline is as follows:

PYTHON

$pipeline = | from $source 
| stats mode=summary count() as event_count BY server_name
| eval sourcetype="web:summary"

$pipeline = | from $source 
| stats mode=summary count() as event_count BY server_name
| eval sourcetype="web:summary"

The aggregated results look like this:


server_name	event_count	_raw	sourcetype
web-01	2	{"server_name": "web-01", "event_count": 2}	web:summary
web-02	1	{"server_name": "web-02", "event_count":1}	web:summary

To finalize the aggregation, run the following SPL search, where my_index is the name of an index:

CODE

index=<my_index> sourcetype="web:summary" | stats sum(event_count) BY server_name

index=<my_index> sourcetype="web:summary" | stats sum(event_count) BY server_name

Example: Min

The SPL2 statement of the Edge Processor pipeline is as follows:

PYTHON

$pipeline = | from $source 
| stats mode=summary min(bytes_out) as min_bytes_out BY server_name
| eval sourcetype="web:summary"

$pipeline = | from $source 
| stats mode=summary min(bytes_out) as min_bytes_out BY server_name
| eval sourcetype="web:summary"

The aggregated results look like this:


server_name	min_bytes_out	_raw	sourcetype
web-01	8500	{"server_name": "web-01", "min_bytes_out":500}	web:summary
web-02	1200	{"server_name": "web-02", "min_bytes_out":1200}	web:summary

To finalize the aggregation, run the following SPL search, where my_index is the name of an index:

CODE

index=<my_index> sourcetype="web:summary" | stats min(min_bytes_out) BY server_name

index=<my_index> sourcetype="web:summary" | stats min(min_bytes_out) BY server_name

Example: Max

The SPL2 statement of the Edge Processor pipeline is as follows:

PYTHON

$pipeline = | from $source 
| stats mode=summary max(bytes_out) as max_bytes_out BY server_name
| eval sourcetype="web:summary"

$pipeline = | from $source 
| stats mode=summary max(bytes_out) as max_bytes_out BY server_name
| eval sourcetype="web:summary"

The aggregated results look like this:


server_name	max_bytes_out	_raw	sourcetype
web-01	8500	{"server_name": "web-01", "max_bytes_out":8500}	web:summary
web-02	1200	{"server_name": "web-02", "max_bytes_out":1200}	web:summary

To finalize the aggregation, run the following SPL search, where my_index is the name of an index:

CODE

index=<myindex> sourcetype="web:summary" | stats max(max_bytes_out) BY server_name

index=<myindex> sourcetype="web:summary" | stats max(max_bytes_out) BY server_name

Example: Avg

The avg statistical function is not supported in Edge Processor pipelines. To calculate an aggregation with an average of values, start by using the sum and count functions in your pipeline. Then, use a search to divide the finalized sum by the finalized count.

The SPL2 statement of the Edge Processor pipeline is as follows:

PYTHON

$pipeline = | from $source 
| stats mode=summary sum(bytes_out) as bytes_out, count(bytes_out) as event_count BY server_name
| eval sourcetype="web:summary"

$pipeline = | from $source 
| stats mode=summary sum(bytes_out) as bytes_out, count(bytes_out) as event_count BY server_name
| eval sourcetype="web:summary"

The aggregated results look like this:


server_name	bytes_out	event_count	_raw	sourcetype
web-01	9000	2	{"server_name": "web-01", "bytes_out":9000, "event_count": 2}	web:summary
web-02	1200	1	{"server_name":"web-02", "bytes_out":1200, "event_count": 1}	web:summary

To finalize the aggregation, run the following SPL search, where my_index is the name of an index:

CODE

index=<my_index> sourcetype="web:summary"
 | stats sum(bytes_out) AS bytes_out, sum(event_count) AS event_count BY server_name
 | eval bytes_avg=bytes_out/event_count

index=<my_index> sourcetype="web:summary"
 | stats sum(bytes_out) AS bytes_out, sum(event_count) AS event_count BY server_name
 | eval bytes_avg=bytes_out/event_count

Example: Span

You can use the span statistical function to group aggregations by time span. For example, you can group aggregations into 1 minute time spans based on the event timestamps stored in the _time field.

The SPL2 statement of the Edge Processor pipeline is as follows:

PYTHON

$pipeline = | from $source 
| stats mode=summary sum(bytes_out) as bytes_out BY span(_time, 1m), server_name 
| eval sourcetype="web:summary"

$pipeline = | from $source 
| stats mode=summary sum(bytes_out) as bytes_out BY span(_time, 1m), server_name 
| eval sourcetype="web:summary"

The aggregated results look like this:


_time	server_name	bytes_out	_raw	sourcetype
2025-01-01 12:00:00	web-01	500	{"server_name": "web-01", "bytes_out": 500, "_time":1735732800}	web:summary
2025-01-01 12:01:00	web-01	8500	{"server_name": "web-01", "bytes_out": 8500, "_time":1735732860}	web:summary
2025-01-01 12:00:00	web-02	1200	{"server_name": "web-01", "bytes_out": 1200}	web:summary

To finalize the aggregation, run the following SPL search, where my_index is the name of an index:

CODE

index=<my_index> sourcetype="web:summary" | stats sum(bytes_out) BY _time, server_name

index=<my_index> sourcetype="web:summary" | stats sum(bytes_out) BY _time, server_name

Data Management

Aggregate event data using Edge Processor

How Edge Processors aggregate streaming data

Differences between ingest-time aggregations and search-time aggregations

Scope of the data being aggregated

Supported configuration options for the stats command

Create a pipeline to aggregate your data

Managing aggregations

Aggregation patterns

Example: Summary pattern

Example: Passthrough pattern

Processing aggregations at search-time

Example: Sum

Example: Count

Example: Min

Example: Max

Example: Avg

Example: Span

ON THIS PAGE

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

Enterprise Security

SOAR

IT Service Intelligence

Content Packs

Splunk Observability Cloud

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

Developer Documentation

Splunkbase

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

DATA MANAGEMENT

SEARCH AND ANALYTICS

ADMINISTRATION

Enterprise Security

SOAR

ENTERPRISE SECURITY

SOAR

RELATED APPS

IT Service Intelligence

Content Packs

ITSI

IT Ops

ADMINISTRATION

EXTENSIONS

Splunk Observability Cloud

MONITORING

DATA MANAGEMENT

ADMINISTRATION

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

ESSENTIALS

MONITORING

ADMINISTRATION

Developer Documentation

Splunkbase

PLATFORM

OBSERVABILITY

REFERENCE

Resources

REFERENCE

Learn More

Support

Aggregate event data using Edge Processor

How Edge Processors aggregate streaming data

Differences between ingest-time aggregations and search-time aggregations

Scope of the data being aggregated

Supported configuration options for the stats command

Create a pipeline to aggregate your data

Managing aggregations

Aggregation patterns

Example: Summary pattern

Example: Passthrough pattern

Processing aggregations at search-time

Example: Sum

Example: Count

Example: Min

Example: Max

Example: Avg

Example: Span