sdselect command syntax details

Syntax

The required syntax is in bold.

| sdselect

[reuse_search_results=<boolean>]

( <field-list> | <statistical-function> | <evaluation-function>), ...

FROM <from-dataset> [AS <alias>]

[JOIN <join-dataset> AS <alias> USING <field-list>]

[WHERE <evaluation-expression>]

[GROUPBY ((<field-list> | <evaluation-function>) [AS <alias>]... [span=[<unsigned_integer>]<timescale>])]

[ORDERBY (<field-list> | <evaluation-function>)...]

[LIMIT <unsigned_integer>]

Required arguments

<field-list>

Syntax: <field> [AS <alias>], …

Description: List 1 or more fields to select and group results. You have the option of renaming these fields with an AS operator.

Do not place quotation marks around nested fields. Place single quotation marks around the names of flattened fields that begin with numeric characters or contain special characters or spaces. See Special handling for sdselect syntax elements.

If your sdselect search includes a JOIN clause, field names in this field list must be prefixed with the alias of the federated index or subsearch to which they belong, except for fields listed in the USING clause. See JOIN clause restrictions for sdselect searches.

<statistical-function>

Syntax: count | count(<field>) | <statistical-function>(<field>) | <aggregate-statistical-function>(<evaluation-function>) [AS <alias>]

Description: Run a count of all events, a basic count of a field, a statistical function on a field, or an aggregate statistical function on a non-Boolean evaluation function. You have the option of using an AS operator to rename the results of the statistical function as a new field.

Note: The JOIN clause does not support statistical functions or evaluation functions outside of FROM or JOIN clause subsearches. See JOIN clause restrictions for sdselect searches.

The following table lists the supported statistical functions for sdselect by function type. Use the links in the table to see descriptions and examples for each function. For an overview about using functions with commands, see Statistical and charting functions in the Splunk Cloud Platform Search Reference.


Type of function	Supported functions and syntax
Aggregate functions	`avg()` `count()` `distinct_count()`, `dc()` `estdc()` `max()` `median()` `min()` `mode()` `perc<int>()` `range()` `stdev()` `stdevp()` `sum()` `var()` `varp()`
Multivalue stats and chart functions	`values()`
Time functions	`earliest()` `earliest_time()` `latest()` `latest_time()`

The sdselect command does not support usage of c(), the abbreviated form of count().

The sdselect command does not support usage of the distinct_count() aggregate function or the values() multivalue function in conjunction with any of the time functions.

When you apply a statistical function that expects fields with numeric values to a field with string values, sdselect converts the string field values into numeric field values. For example, say you have a search that starts like this: | sdselect avg(bytes) FROM... If bytes is a field with string values, sdselect converts those string values into numeric values before it applies the values to the avg() function.

If you use either the earliest_time() function or the latest_time() function in a search, you must apply all functions in the search to the same field.

For example, the following search is valid. It uses the earliest_time() function and applies the same field, end_time, to both functions in the search.

| sdselect avg(end_time), earliest_time(end_time) from my_csv_data

The following search is invalid. It uses the earliest_time() function, but it applies different fields, end_time and nap_time, to the functions in the search.

| sdselect avg(end_time), earliest_time(nap_time) from my_csv_data

You can apply an aggregate statistical function directly to any evaluation function supported by sdselect, with the following exceptions.

Boolean evaluation functions such as like() and match().
Date and time evaluation functions.
Non-function evaluation expressions that use mathematical, concatenation, or Boolean operators like a > 5 or x + 3.

For more information about sdselect evaluation function support, see Apply evaluation functions to your sdselect searches.

The <evaluation-function> to which you apply the aggregate stats function can contain a single evaluation function or a chain of evaluation functions, as long as its output is not a Boolean value of true or false.

If an aggregate statistical function requires a numeric value but your evaluation function does not produce a numeric value, Splunk software automatically applies the tonumber() evaluation function to transform the output of the other evaluation function into a numeric value. Here is an example of a search where tonumber() is used for this purpose.

| sdselect sum(tonumber(json_extract(req, "payload.bytes"))) from ...

<evaluation-function>

Syntax: <evaluation-function>(<field>) [AS <alias>]

Description: Apply evaluation functions to fields that you are selecting with sdselect. You have the option of using an AS operator to rename the results of the evaluation function as a new field.

You can apply any evaluation function supported by sdselect to a selected field, with the following exceptions.

Boolean evaluation functions such as like() and match().
Date and time evaluation functions.
Non-function evaluation expressions that use mathematical, concatenation, or Boolean operators like a > 5 or x + 3.

For more information about sdselect evaluation function support, see Apply evaluation functions to your sdselect searches.

Note: The JOIN clause does not support evaluation functions outside of subsearches. See JOIN clause restrictions for sdselect searches.

<from-clause>

Syntax: FROM <from-dataset> [AS <alias>]

Description: Use the FROM clause to specify the remote dataset that you want to search. The FROM clause is required.

<from-dataset>

Syntax: (federated:<federated-index-name> | <federated-index-name> | <sdselect-subsearch>)

Description: Specify the remote dataset that you want to search. You can specify the FROM clause dataset as a federated index or an sdselect subsearch.

If you want to specify a federated index as your FROM clause dataset, identify a federated index that maps to an AWS Glue Data Catalog table dataset that you want to search. You can optionally prefix federated index names with federated:.

If you want to to define your FROM clause dataset with a subsearch, provide a subsearch. FROM clause subsearches have the following characteristics.

The subsearch must begin with the sdselect command, and it cannot contain any other SPL commands. Do not include a leading | character.
The subsearch must be enclosed within parenthesis instead of square brackets.
The subsearch can include all available sdselect clauses, as well as all statistical functions and evaluation functions supported by sdselect.
The subsearch can include 1 or more time filters. However, time filter selections from the time range picker in Splunk Web are not applied to the subsearch if it has a time filter. See Applying time range picker selections to sdselect searches.

In Splunk Web, to see a list of federated indexes that you have defined for your deployment, navigate to Settings > Federated Search > Federated Indexes.

For more information, see Map a federated index to a customer-created AWS Glue table.

AS <alias>

Syntax: AS <alias>

Description: Use an AS operator to apply an alias to the federated index name or subsearch that has been specified as a dataset for the FROM clause.

Using the AS operator to provide an alias for the FROM clause dataset is optional as long as there is no JOIN clause present in the sdselect search and your search does not select one or more fields from the FROM clause dataset.

If your sdselect search includes a JOIN clause and it selects one or more fields from the FROM clause dataset, you must do the following things.

Use the AS operator to create an alias for the FROM clause dataset.
Apply the FROM clause dataset alias as a prefix to the fields that belong to that dataset in the sdselect field list. You do this to identify which selected fields come from the FROM clause dataset.

For example, the following search selects and groups results on an action field that belongs to the sf_csv_data federated index. To clarify that action comes from the sf_csv_data index, we give the index an alias of csvsf, and then we prefix that alias to the field name in the selection list: csvsf.action.

| sdselect sid, userid, csvsf.action, csvny.scan_count 
    FROM sf_csv_data AS csvsf
    JOIN ny_csv_data AS csvny
    USING sid, userid

For more information about the JOIN clause, see JOIN clause arguments.

Optional arguments

reuse_search_results

Syntax: reuse_search_results=<boolean>

Description: Specifies whether an sdselect search can reuse the result set from the last successful run of the same sdselect search, as long as that previous search run took place within the previous 24 hours. sdselect searches that reuse search results can benefit from improved search performance and a reduction in data scan unit consumption. The reuse_search_results argument defaults to true, which means that sdselect searches reuse results whenever it is possible for them to do so.

Reuse of sdselect search results is useful for sdselect searches that do not return different result sets within a given time frame. For example, sdselect searches with absolute or fixed date-to-date time ranges are good candidates for search result reuse. If you run such searches with a frequency below 24 hours, you might see an increase in their performance and a reduction in their data scan consumption.

When you run an sdselect search that reuses search results, an informational message appears in the Job Inspector that states that the results for the last successful run of the search have been reused for the current search job. The only exception to this rule are sdselect searches with an all-time time range. When you run an sdselect with an all-time time range, Splunk software displays a warning message under the search bar stating that the results from the last successful run of the all-time search have been reused.

Turn search result reuse off for an sdselect search by adding reuse_search_results=false to the search string. You might turn search result reuse off for sdselect searches that return different result sets each time they run, when you run them on a frequent basis. For example, you might turn off search result reuse for sdselect searches with relative time ranges, such as the last week to date, or the hour before the current hour, when you run those searches more than once in a 24 hour period.

You do not need to turn off search result reuse for an sdselect search with a relative time range when more than 24 hours elapses between runs of the search.

Note: Search result reuse is available only for Federated Search for Amazon S3 sdselect searches. Federated Analytics does not support the reuse_search_results argument in sdselect searches of remote Amazon Security Lake datasets.

Default: true

JOIN clause arguments

Use JOIN clause arguments to join 1 or more datasets with the FROM clause dataset in an sdselect search.

Use JOIN clauses in an sdselect search to join rows from the FROM clause dataset with rows from 1 or more additional datasets. You can use multiple JOIN clauses in a single search. Each JOIN clause represents a specific dataset.

The JOIN clause restricts sdselect searches in a variety of ways. For more information, see JOIN clause restrictions for sdselect searches.

Note: The sdselect JOIN clause does not have the default subsearch limitations of the join command. JOIN clause subsearches can join against more than 50,000 rows, and they can run for longer than 60 seconds.

<join-clause>

Syntax: JOIN <join-dataset> [AS <alias>] USING (<field-list>)

Description: Use the JOIN clause to join multiple datasets in a single search and run other search commands over the combined dataset.

You can specify the JOIN clause dataset as a federated index or an sdselect subsearch.

An sdselect search can support multiple JOIN clauses, where each JOIN clause joins a different dataset to the search.

| sdselect sid, userid, csvsf.action, csvny.scan_count, csvla.bytesout 
   FROM sf_csv_data AS csvsf
   JOIN ny_csv_data AS csvny USING sid, userid  
   JOIN ( sdselect...FROM la_csv_data) AS csvla USING sid, userid

You cannot use the JOIN clause to join datasets in different AWS accounts. All datasets joined in an sdselect search must belong to the same AWS account and region. However, the joined datasets can be in different AWS Glue catalogs or databases within that AWS account and region.

When you use a JOIN clause in an sdselect search, the following things are not supported outside of FROM clause and JOIN clause subsearches:

Statistical functions, statistical aggregations, and evaluation functions
WHERE, GROUPBY, ORDERBY, and SPAN clauses

In addition, when a JOIN clause is present, fields selected for the sdselect search must be prefixed with the alias of the dataset they belong to, unless they are join fields specified by the USING operator.

Note: The sdselect JOIN clause supports only equality joins with the USING operator. The sdselect command does not support operators like ON, CROSS JOIN, and UNNEST and the more complex kinds of JOIN operations that are facilitated by those operators.

The sdselect JOIN clause supports only inner joins. Other join types, such as outer joins, left joins, right joins, and combinations of those join types, are not supported.

See JOIN clause restrictions for sdselect searches.

<join-dataset>

Syntax: (federated:<federated-index-name> | <federated-index-name> | <sdselect-subsearch>)

Description: A JOIN clause must join on a dataset. You can specify this dataset as a federated index or an sdselect subsearch.

If you want to specify a federated index as your JOIN clause dataset, identify a federated index that maps to an AWS Glue Data Catalog table dataset that you want to search. You can optionally prefix federated index names with federated:.

If you want to to define your JOIN clause dataset with a subsearch, provide a subsearch. JOIN clause subsearches have the following characteristics.

The subsearch must begin with the sdselect command, and it cannot contain any other SPL commands. Do not include a leading | character.
The subsearch must be enclosed within parenthesis instead of square brackets.
The subsearch can support all available sdselect clauses, as well as all statistical functions and evaluation functions supported by sdselect.
The subsearch can include 1 or more time filters. However, time filter selections from the time range picker in Splunk Web are not applied to the subsearch if it has a time filter. See Applying time range picker selections to sdselect searches.

AS <alias>

Syntax: AS <alias>

Description: An alias, or secondary name, for the JOIN clause dataset. Required for each JOIN clause in an sdselect search.

When you include a JOIN clause in an sdselect search, you must define an alias for the JOIN clause dataset. Apply this JOIN clause dataset alias as a prefix to the fields that belong to that dataset in the sdselect field list. You do this to identify which of the selected fields come from the JOIN clause dataset.

For example, the following search selects and groups results on a scan_count field that belongs to the ny_csv_data federated index. To clarify that action comes from the ny_csv_data index, we give the index an alias of csvny, and then we prefix that alias to the field name in the selection list: csvny.scan_count.

| sdselect sid, userid, csvsf.action, csvny.scan_count, csvla.bytesout 
    FROM sf_csv_data AS csvsf
    JOIN ny_csv_data AS csvny USING sid, userid  
    JOIN ( sdselect...FROM la_csv_data) AS csvla USING sid, userid

USING (<field-list>)

Syntax: USING (<field>,...)

Description: Combine the USING operator with a field list to specify the fields that the JOIN clause joins other datasets upon. Required for each JOIN clause in an sdselect search.

The parentheses around the USING clause <field-list> are optional. Use them to help your sdselect search strings to be more human-readable.

If your sdselect search has multiple JOIN clauses, each USING field list for those JOIN clauses must contain the same fields.

The sdselect selected fields list can optionally include fields from the USING field list. When you do use fields from the USING field list in the selected fields list, the USING fields do not need to be prefixed with dataset aliases.

WHERE clause arguments

Use the WHERE clause to filter results. The WHERE clause is optional.

The sdselect command uses an application of the WHERE clause that is similar to that of the where command. See where in the Splunk Cloud Platform Search Reference.

Note: In sdselect searches that use the JOIN clause, the WHERE clause is supported only in FROM and JOIN clause subsearches. See JOIN clause restrictions for sdselect searches.

For additional information about using the WHERE clause, see sdselect command WHERE clause operations.

<where-clause>

Syntax: WHERE <evaluation-expression>

Description: Use the WHERE clause to filter results.

The WHERE clause must precede the GROUPBY, ORDERBY, and LIMIT clauses if you use those clauses in your sdselect search.

<evaluation-expression>

Syntax: <boolean-evaluation-function> | <boolean-nonfunction-evaluation-expression>

Description: A combination of values, variables, operators, and functions that represent the value of your destination field.

When you use the WHERE clause in an sdselect search, you must include an <evaluation-expression>. The output of this <evaluation-expression> must be a Boolean value of either true or false. This is true whether the <evaluation-expression> contains a single evaluation function or a chain of evaluation functions.

The WHERE clause returns only the results for which the <evaluation-expression> returns true.

The WHERE clause can support all evaluation functions generally supported by sdselect. See Apply evaluation functions to your sdselect searches.

The WHERE clause also supports the following date, time, and Boolean evaluation functions.


Type of function	Supported functions and syntax
Comparison and conditional functions	`like(<string>,<pattern>)` `match(<string>,<regular_expression>)`
Date and time functions	`now()` `relative_time(<time>,<specifier>)` `strftime(<time>,<format>)` `strptime(<string>,<format>)` `timestamp_from_unixtime<time>` (unique to `sdselect`) `timestamp_to_unixtime<time>` (unique to `sdselect`)

The match() evaluation function requires Java regular expression syntax when you use it in conjunction with sdselect. match() supports perl-compatible regular expression (PCRE) syntax when you use it with other SPL commands such as eval, fieldformat, and where.

The timestamp_from_unixtime() and timestamp_to_unixtime() evaluation functions are unique to sdselect and cannot be used with other SPL commands. See Evaluation functions specific to sdselect.

Note: If you use Federated Analytics, you must use the timestamp_from_unixtime() and timestamp_to_unixtime() evaluation functions in conjunction with the WHERE clause to filter Amazon Security Lake dataset partitions on values of the time_dt field. The values of time_dt have the SQL timestamp data type, which the sdselect command does not natively support.

There are some restrictions to the usage of date and time evaluation functions in the sdselect WHERE clause. See Apply date and time evaluation functions to fields in the WHERE clause.

The WHERE clause additionally supports Boolean non-function evaluation expressions that use the following operators: <, >, <=, >=, !=, =, and ==.

The WHERE clause <evaluation-expression> does not support usage of plus ( + ) and dot ( . ) characters to concatenate field names. You can use plus and dot characters to concatenate literal strings that are enclosed within double quotes.

Note: If your WHERE clause includes time range filters and you also want to use the time range picker from Splunk Web to apply a time range to the overall sdselect search, your time range picker selection won't be applied to the search if the time range filter in the WHERE clause uses as its time field the declared Time field, Unix time field, or Time partition field from the definition of the federated index referenced in the search. See Applying time range picker selections to sdselect searches.

GROUPBY clause arguments

You can use the GROUPBY clause to organize search results by field values or time spans. The GROUPBY clause is optional.

Note: In sdselect searches that use the JOIN clause, the GROUPBY clause is supported only in FROM and JOIN clause subsearches. See JOIN clause restrictions for sdselect searches.

<groupby-clause>

Syntax: GROUPBY ((<field-list> | <evaluation-function>) [AS <alias>]... [span=[<unsigned_integer>]<timescale>])

Description: Group search results together according to field values and time ranges.

If you use GROUPBY you must specify a field-list or eval-func.

You can optionally specify a span for a GROUPBY clause. If you specify a span, the <field-list> must include the name of the Unix time field for the federated index invoked in the sdselect search. See Map a federated index to an AWS Glue Data Catalog table dataset for more information about the Unix time field.

The GROUPBY clause must follow the WHERE clause if you are using both clauses in conjunction with sdselect.

The GROUPBY clause must precede the ORDERBY and LIMIT clauses, if you use either of those clauses.

If you use the GROUPBY clause in a sdselect search without an ORDERBY clause, sdselect sorts the search results by the fields according to the following sequence:

By the order that you have listed the fields in the GROUPBY clause.
By the values of the fields in the GROUPBY clause in ascending alphanumeric order.

For information about how ORDERBY clause sort operations interact with the GROUPBY clause, see GROUPBY and ORDERBY event sort interoperation.

<field-list>

Syntax: <field> [AS <alias>], ...

Description: Specify 1 or more fields by which to group results. If you specify a time field in the <field-list>, you must also specify a <span> argument. Separate each field name in the field list with a comma. You have the option of using an AS operator to rename each listed field.

Nested fields, certain kinds of other field names, and literal strings require special handling in the GROUPBY clause. See Special handling for sdselect syntax elements.

<evaluation-function>

Syntax: <evaluation-function> [AS <alias>], ...

Description: Group by the output of evaluation functions. You have the option of using an AS operator to rename each evaluation function as a new field.

The GROUPBY clause can support all evaluation functions supported by sdselect, with the following exceptions.

Boolean evaluation functions such as like() and match().
Date and time evaluation functions.
Non-function evaluation expressions that use mathematical, concatenation, or Boolean operators like a > 5 or x + 3.

An <evaluation-function> for a GROUPBY clause can contain a single evaluation function or a chain of evaluation functions, as long as its output is not a Boolean value of true or false.

For more information about sdselect evaluation function support, see Apply evaluation functions to your sdselect searches.

<span>

Syntax: span=[<unsigned_integer>]<timescale>

Description: The <span> of each time bin. If you use the GROUPBY clause to group by a time field, use the <span> argument to group the time buckets. You can specify time spans such as GROUPBY <Unix time field> span=1h or GROUPBY <Unix time field> span=5d.

Note: The sdselect command does not support auto as a value for the span argument.

Syntax: <sec> | <min> | <hr> | <day> | <month> | <year>

Description: Time scale units. The sdselect command does not support subseconds.

Default: 1 second

The following table describes the different kinds of time scale units that span supports and the valid values for each type of time scale unit.


Time scale	Syntax	Description
<sec>	s \| sec \| secs \| second \| seconds	Time scale in seconds.
<min>	m \| min \| mins \| minute \| minutes	Time scale in minutes.
<hr>	h \| hr \| hrs \| hour \| hours	Time scale in hours.
<day>	d \| day \| days	Time scale in days.
<month>	mon \| month \| months	Time scale in months.
<year>	y \| yr \| year \| years	Time scale in years.

ORDERBY clause arguments

You can use the ORDERBY clause to sort the search results. The ORDERBY clause is optional.

Note: In sdselect searches that use the JOIN clause, the ORDERBY clause is supported only in FROM and JOIN clause subsearches. See JOIN clause restrictions for sdselect searches.

<orderby-clause>

Syntax: ORDERBY (<field-list> | <evaluation-function>)...

Description: Sort search results by the values of the field or fields specified for the clause.

The ORDERBY clause must follow the WHERE and GROUPBY clauses if you use either clause in your sdselect search.

The ORDERBY clause must precede the LIMIT clause.

<field-list>

Syntax: <field> [ASC | DESC], ...

Description: You must specify at least 1 <field> for an ORDERBY clause. You can optionally use the ASC or DESC modifiers to indicate whether sdselect sorts events by the field values in ascending or descending order. If you do not specify ASC or DESC for a field, by default sdselect sorts the field in ascending order.

If you specify multiple fields for an ORDERBY clause, separate the fields and their ASC or DESC modifiers by commas. When you specify multiple ORDERBY fields or evaluation functions, sdselect sorts search results by the fields and functions in the order that you list them.

For example, say your search has the following ORDERBY clause: ORDERBY id name city. In this case, the ORDERBY clause sorts the search results first by id. Then, the clause sorts rows that have matching id values by name. Finally, the clause sorts rows with matching id and name values by city.

Nested fields, certain kinds of field names, and literal strings require special handling in the ORDERBY clause. See Special handling for sdselect syntax elements.

If your search includes a GROUPBY clause, the fields you specify in the ORDERBY clause must be aggregated fields, or fields that appear in the GROUPBY clause. If your search does not include a GROUPBY clause, for the ORDERBY clause you can specify any field that exists in the dataset you are searching.

If you want to use a renamed aggregated field, the ORDERBY clause must refer to the field by its rename. The following ORDERBY clause example renames an aggregated field from scan_count to scan_avg:

| sdselect avg(scan_count) AS scan_avg FROM my_csv_data GROUPBY sid,action ORDERBY scan_avg DESC

<evaluation-function>

Syntax: <evaluation-function> [ASC | DESC], ...

Description: Order events by the output of evaluation functions.

The ORDERBY clause can support all evaluation functions supported by sdselect, with these exceptions:

Boolean evaluation functions such as like() and match().
Date and time evaluation functions.
Non-function evaluation expressions that use mathematical, concatenation, or Boolean operators like a > 5 or x + 3.

An <evaluation-function> for an ORDERBY clause can contain a single evaluation function or a chain of evaluation functions, as long as its output is not a Boolean value of true or false. For more information about sdselect evaluation function support, see Apply evaluation functions to your sdselect searches.

You can optionally use the ASC or DESC modifiers to indicate whether sdselect sorts the output values in ascending or descending order. If you do not specify ASC or DESC for an evaluation function, by default sdselect sorts the output values in ascending order.

If you specify multiple evaluation functions for an ORDERBY clause, separate the evaluation functions and their ASC or DESC modifiers by commas. When you specify multiple ORDERBY fields or evaluation functions, sdselect sorts search results by the fields and evaluation functions in the order that you list them.

For more information about how ORDERBY clause sort operations interact with the GROUPBY clause, see GROUPBY and ORDERBY event sort interoperation.

LIMIT clause arguments

You can use the LIMIT clause to specify the maximum number of search results to return. The LIMIT clause is optional.

<limit-clause>

Syntax: LIMIT <unsigned_integer>

Description: Set the maximum number of search results that an sdselect search can return.

Default: 100,000 results

You must place the LIMIT clause after the WHERE, GROUPBY, and ORDERBY clauses, if you use any of them in the search.

For information about changing the default number of results returned by an sdselect search without a LIMIT clause, see Control the maximum number of returned results if a LIMIT clause is not present.

sdselect command syntax details

Syntax

Required arguments

Optional arguments

JOIN clause arguments

WHERE clause arguments

GROUPBY clause arguments

ORDERBY clause arguments

LIMIT clause arguments

See also