Write and run federated searches over remote datasets with SPL2

Use SPL2 to write and run federated searches over remote datasets.

Use SPL2 to write and run federated searches over remote datasets in Amazon S3, Azure Blob Storage and Azure Data Lake Storage containers, Azure Databricks delta shares, and more.

SPL2 is fully compatible and can operate in parallel with SPL.

If you are new to SPL2, see the SPL2 Overview.

For an introduction to the SPL2 user interfaces, see Searching data using SPL2 in the SPL2 Search Manual.

For detailed information about any of the SPL2 command syntax shown in these examples, refer to the SPL2 Search Reference.

Getting started with SPL2

SPL2 is an intuitive language that supports both SPL and SQL syntax patterns.

  • If you prefer SPL-like syntax, then use the search command.
  • If you prefer SQL-like syntax, then use the from command. This command supports a flexible syntax, so you can choose to start your search using the FROM clause or the SELECT clause.
SPL2 federated searches that use the from command are often more efficient than SPL2 federated searches that use the search command.
Note: In the search examples you'll find in this topic, from command clause names are shown in uppercase for readability and alignment with SQL syntax. You can specify clause names in uppercase or lowercase. However, you cannot specify clause names in mixed case, such as Group By.

The following table provides examples of valid searches that use SPL-like or SQL-like syntax.

Scenario SPL-like syntax SQL-like syntax
Search a dataset named mydataset.
CODE
index=mydataset
CODE
SELECT * FROM mydataset
Search a dataset named mydataset. Filter its data for events where the value of the status field is 200, the value of the host field is www2. In the search results, include only the fields named _raw, _time, productId and categoryId.
CODE
index=mydataset 
status=200 host=www2 
| fields _time, productId, categoryId
CODE
SELECT _raw, _time, productId, categoryId FROM main WHERE status=200 AND host="www2"
Search a dataset named mydataset and return the total value of the bytes field for each distinct clientip value in the data.
CODE
index=mydataset | stats sum(bytes) by clientip
CODE
SELECT clientip, sum(bytes) FROM mydataset GROUP BY clientip

When you use SPL-like syntax for an SPL2 search in the Search bar, if the first expression is anything other than index=<index_or_dataset_name>, you must include the search command at the beginning of the search. For example, search 404 index=mydataset is a valid search, but 404 index=mydataset is not.

You cannot mix syntax styles in the same search. For example, this search returns a syntax error: SELECT * index=dataset status=200 host=www2

Note: You cannot use SPL to search datasets that you create with the Data Management app. For example, the SPL search index=mydataset returns 0 results. But the SPL2 search index=mydataset does return results.

For more information about SPL2 limitations for federated searches, see Federated search limitations.

Converting SPL syntax to SPL2 syntax

Use the Convert to SPL2 button on the Search page to try to convert your search from SPL to SPL2. Conversions are supported for most, but not all SPL commands and search formats.
Note: If you are not new to federated search and rely on a number of SPL searches that use the sdselect command, consider converting them to SPL2, as sdselect is deprecated and will be removed in an upcoming release.

You see the Convert to SPL2 button on the Search page when the language picker is set to SPL. When you select Convert to SPL2 and the conversion succeeds, the contents in the Search bar are updated into SPL2 and the language picker setting changes to SPL2.

For example, say you have this SPL search:

CODE
| sdselect count(scan_count), sum(scan_count) FROM my_csv_data GROUPBY host LIMIT 5

If you select Convert to SPL2, that SPL search changes to this SPL2 search:

CODE
FROM my_csv_data GROUP BY host SELECT count(scan_count), sum(scan_count) LIMIT 5

Search and filter data

You can specify field-value pairs in the search or from command to filter for events that contain those combinations of fields and values. You can also specify a time range to filter for events that occurred during that time range.

For information about the supported time range syntax, see Time modifiers in the SPL2 Search Manual.

For example, the following searches return events from an Amazon S3 dataset named my_s3_bucket that pass all of the following filtering conditions:

  • The value in the status field is 200.
  • The value in the host field is www4.
  • The value in the action field is purchase.
  • The timestamp of the event is stored in the _time field and indicates that the event happened within the past hour.
Syntax Example
Using the search command
CODE
search index=my_s3_bucket status=200 host=www4 action=purchase earliest=-1h
Using the from command, starting with the FROM clause
CODE
FROM my_s3_bucket WHERE status=200 AND host="www4" AND action="purchase" AND earliest=-1h
Using the from command, starting with the SELECT clause
CODE
SELECT * FROM my_s3_bucket WHERE status=200 AND host="www4" AND action="purchase" AND earliest=-1h

Join two Amazon S3 datasets

You can use the JOIN clause supported in the from command to join two Amazon S3 datasets.

For example, the following searches create an inner join between the customer_id field from the s3_bucket_1 dataset and the cust_id field from the s3_bucket_2 dataset.

Syntax Example
Using the from command, starting with the FROM clause
CODE
FROM s3_bucket_1 AS t1 JOIN s3_bucket_2 AS t2 ON t1.customer_id=t2.cust_id
Using the from command, starting with the SELECT clause
CODE
SELECT * FROM s3_bucket_1 AS t1 JOIN s3_bucket_2 AS t2 ON t1.customer_id=t2.cust_id

Aggregate data and update event fields

You can use the stats command or the SELECT and GROUP BY clauses supported in the from command to calculate aggregation statistics. Then you can use the eval command to add a new field to your events.

Syntax Example
Using the search command
CODE
search index=my_s3_bucket_1
| stats sum(sales) AS total_sales by customer_id
| eval quarter = "Q1"
Using the from command, starting with the FROM clause
CODE
FROM my_s3_bucket GROUP BY customer_id SELECT sum(sales) AS total_sales, customer_id
| eval quarter = "Q1"
Using the from command, starting with the SELECT clause
CODE
SELECT sum(sales) AS total_sales, customer_id FROM my_s3_bucket GROUP BY customer_id
| eval quarter = "Q1"