eventstats command: Overview, syntax, and usage
The SPL2 eventstats command generates summary statistics from fields in your events and saves those statistics into a new field. The eventstats command places the generated statistics in new field that is added to the original raw events.
Use these links to quickly navigate to the main sections in this topic:
How the SPL2 eventstats command works
It's much easier to see what the SPL2 eventstats command does by showing you examples, using a set of simple events.
These examples use the from command to create a set of events. The streamstats and eval commands are used to create additional fields in the events.
Creating a set of events
Let's start by creating a set of four events by using a dataset literal.
| from [{"age":25, "city": "San Francisco"}, {"age": 39, "city": "Seattle"}, {"age":31, "city": "San Francisco"}, {"city": "Seattle"}]
| eval _time = now()
| streamstats count()
- The
fromcommand is used to create four results, which contain the timestamp when the results where created. The dataset literal specifies fields and values for four events. The fields are "age" and "city". - The last event does not contain the
agefield. - The
streamstatscommand is used to create thecountfield. Thestreamstatscommand calculates a cumulative count for each event, at the time the event is processed.
The results of the search look like this:
| _time | age | city | count |
|---|---|---|---|
| 02 May 2024 18:32:07 | 25 | San Francisco | 1 |
| 02 May 2024 18:32:07 | 39 | Seattle | 2 |
| 02 May 2024 18:32:07 | 31 | San Francisco | 3 |
| 02 May 2024 18:32:07 | Seattle | 4 |
Using eventstats with a BY clause
The BY clause in the eventstats command is optional, but is used frequently with this command. The BY clause groups the generated statistics by the values in a field. You can use any of the statistical functions with the eventstats command to generate the statistics. See the Quick Reference for SPL2 Stats and Charting Functions.
In this example, the eventstats command generates the average age for each city. The generated averages are placed into a new field called avg(age).
The following search is the same as the previous search, with the eventstats command added at the end:
| from [{"age":25, "city": "San Francisco"}, {"age": 39, "city": "Seattle"}, {"age":31, "city": "San Francisco"}, {"city": "Seattle"}]
| eval _time = now()
| streamstats count()
| eventstats avg(age) BY city
- For
San Francisco, the average age is 28 = (25 + 31) / 2. - For
Seattle, there is only one event with a value. The average is 39 = 39 / 1. Theeventstatscommand places that average in every event for Seattle, including events that did not contain a value forage.
The results of the search look like this:
| _time | age | avg(age) | city | count |
|---|---|---|---|---|
| 02 May 2024 18:32:07 | 25 | 28 | San Francisco | 1 |
| 02 May 2024 18:32:07 | 39 | 39 | Seattle | 2 |
| 02 May 2024 18:32:07 | 31 | 28 | San Francisco | 3 |
| 02 May 2024 18:32:07 | 39 | Seattle | 4 |
Renaming the new field
By default, the name of the new field that is generated is the name of the statistical calculation. In these examples, that name is avg(age). You can rename the new field using the AS keyword.
In the following search, the eventstats command has been adjusted to rename the new field to average age by city.
| from [{"age":25, "city": "San Francisco"}, {"age": 39, "city": "Seattle"}, {"age":31, "city": "San Francisco"}, {"city": "Seattle"}]
| eval _time = now()
| streamstats count()
| eventstats avg(age) as 'average age by city' BY city
The results of the search look like this:
| _time | age | average age by city | city | count |
|---|---|---|---|---|
| 02 May 2024 18:32:07 | 25 | 28 | San Francisco | 1 |
| 02 May 2024 18:32:07 | 39 | 39 | Seattle | 2 |
| 02 May 2024 18:32:07 | 31 | 28 | San Francisco | 3 |
| 02 May 2024 18:32:07 | 39 | Seattle | 4 |
Events with text values
The previous examples show how an event is processed that does not contain a value in the age field. Let's see how events are processed that contain an alphabetic character value in the field that you want to use to generate statistics.
The following search includes the word test as a value in the age field.
| from [{"age":25, "city": "San Francisco"}, {"age": 39, "city": "Seattle"}, {"age":31, "city": "San Francisco"}, {"age":"test", "city": "Seattle"}]
| eval _time = now()
| streamstats count()
| _time | age | city | count |
|---|---|---|---|
| 02 May 2024 18:32:07 | 25 | San Francisco | 1 |
| 02 May 2024 18:32:07 | 39 | Seattle | 2 |
| 02 May 2024 18:32:07 | 31 | San Francisco | 3 |
| 02 May 2024 18:32:07 | test | Seattle | 4 |
eventstats command to the search.
| from [{"age":25, "city": "San Francisco"}, {"age": 39, "city": "Seattle"}, {"age":31, "city": "San Francisco"}, {"age":"test", "city": "Seattle"}]
| eval _time = now()
| streamstats count()
| eventstats avg(age) BY city
| _time | age | avg(age) | city | count |
|---|---|---|---|---|
| 02 May 2024 18:32:07 | 25 | 28 | San Francisco | 1 |
| 02 May 2024 18:32:07 | 39 | 39 | Seattle | 2 |
| 02 May 2024 18:32:07 | 31 | 28 | San Francisco | 3 |
| 02 May 2024 18:32:07 | test | 39 | Seattle | 4 |
Using the allnum argument
But suppose you don't want statistics generated when there are alphabetic characters in the field or the field is empty?
The allnum argument controls how the eventstats command processes field values. The default setting for the allnum argument is FALSE. Which means that the field used to generate the statistics does not need to contain all numeric values. Fields with empty values or alphabetic character values are ignored. You've seen this in the earlier examples.
You can force the eventstats command to generate statistics only when the fields contain all numeric values. To accomplish this, you can set the allnum argument to TRUE.
| from [{"age":25, "city": "San Francisco"}, {"age": 39, "city": "Seattle"}, {"age":31, "city": "San Francisco"}, {"age":"test", "city": "Seattle"}]
| eval _time = now()
| streamstats count()
| eventstats allnum=true avg(age) BY city
The results of the search look like this:
| _time | age | avg(age) | city | count |
|---|---|---|---|---|
| 02 May 2024 18:32:07 | 25 | 28 | San Francisco | 1 |
| 02 May 2024 18:32:07 | 39 | Seattle | 2 | |
| 02 May 2024 18:32:07 | 31 | 28 | San Francisco | 3 |
| 02 May 2024 18:32:07 | test | Seattle | 4 |
Because the age field contains values for Seattle that are not all numbers, the entire set of values for Seattle are ignored. No average is calculated.
The allnum=true argument applies to empty values as well as alphabetic character values.
Syntax
The required syntax is in bold.
eventstats
[allnum=<bool>]
<aggregation> ...
[<by-clause>]
Required arguments
aggregation
Syntax: <aggregate-function> "("<field>")" [as <field>] ["," <aggregate-function> "("<field> ")" [AS <field>] ]...
Description: A statistical aggregation function. The function can be applied to an eval expression, or to a field or set of fields. You can specify multiple aggregation functions. Separate each aggregation function with a comma.
By default, the name of the field added to the output is the same as your function. For example, if your search is ... | eventstats avg(bytes) the field name in the output is avg(bytes). Use the as clause to place the generated result into a new field with a name that you specify, for example ... | eventstats avg(bytes) as 'avg of bytes'.
The syntax for the <aggregate-function> depends on the function that you use. See Quick Reference for SPL2 Stats and Charting Functions for information about the statistical functions.
Optional arguments
allnum
Syntax: allnum=<bool>
Description: If set to true, computes numerical statistics on each field, if and only if, all of the values in that field are numerical. If you have a BY clause, the allnum argument applies to each group independently.
Default: false
by-clause
Syntax: BY <field> [span=<timespan>] ["," <field> [span=<timespan>] ]...
Description: The name of one or more fields to group the results by. You can specify a time span to apply to the grouping. The <by-clause> returns one row for each distinct value in the <by-clause> fields. You cannot use the wildcard character to specify multiple fields with similar names. You must specify each field separately.
Default: If no <by-clause> is specified, the eventstats command returns only one row, which is the aggregation over the entire incoming result set.
Usage
The following sections contain information to help you understand and use the eventstats command.
Differences between eventstats and stats
The SPL2 eventstats command is similar to the SPL2 stats command. You can use both commands to generate aggregations like average, sum, and maximum.
The differences between these commands are described in the following table:
| stats command | eventstats command |
|---|---|
| Events are transformed into a table of aggregated search results | Aggregations are placed into a new field that is added to each of the events in your output |
| You can only use the fields in your aggregated results in subsequent commands in the search | You can use the fields in your events in subsequent commands in your search, because the events have not been transformed |
How eventstats generates aggregations
The eventstats command looks for events that contain the field that you want to use to generate the aggregation. The command creates a new field in every event and places the aggregation in that field. The aggregation is added to every event, even events that were not used to generate the aggregation.
For example, you have 4 events and 3 of the events have the field you want to aggregate on, the eventstats command generates the aggregation based on the data in the 3 events. A new field is added all 4events and the aggregation is added to that field in every event. See eventstats command overview.
Limitations and optimizations
There are several default search limitations that might impact using the eventstats command:
- There is default limit to the amount of memory that the
eventstatscommand can use to keep track of information when processing a search. If theeventstatscommand reaches this limit, the command stops adding the requested fields to the search results. - There is default limit for the number of results returned from a search. The default value is 50,000.
You can avoid reaching these limit by filtering out events before you use the eventstats command in your search.
Functions and memory usage
Some functions are inherently more expensive, from a memory standpoint, than other functions. For example:
- The
distinct_countfunction requires far more memory than thecountfunction. - The
valuesandlistfunctions also can consume a lot of memory.
You can avoid running into memory issues by filtering out events before you use the eventstats command in your search.
When to use the estimated distinct count function
If you are using the distinct_count function without a BY clause field or with a low-cardinality field in the BY clause, consider replacing the distinct_count function with the estdc function (estimated distinct count). The estdc function can result in significantly lower memory usage and run times.
Event order functions
When you use the stats and eventstats commands to order events based on time, use the earliest and latest functions.
- To locate the first value based on time order, use the
earliestfunction. - To locate the last value based on time order, use the
latestfunction.
When searching events based on time, the first and last functions do not produce accurate results,
For more information about these functions, see Time functions.
See also
eventstats command