About Federated Search for Microsoft Azure
Federated Search for Microsoft Azure lets you run federated searches from your Splunk Cloud Platform deployment over datasets in Microsoft Azure Data Lake Storage and Azure Blob Storage containers.
Federated Search for Microsoft Azure lets you run federated searches from your Splunk platform deployment over datasets located in Microsoft Azure Data Lake Storage (ADLS) and Azure Blob Storage (ABS) containers. When you run these federated searches, you use familiar SPL2 search commands and syntax.
Connections and datasets
- Connection
- A Microsoft Azure connection defines how Splunk software securely authenticates a link between your Splunk Cloud Platform deployment and a federated dataset in an ADLS or ABS container. Connections are reusable and can be associated with multiple Microsoft Azure datasets. Microsoft Azure connections do not specify what data is searchable.
- Dataset
- A Microsoft Azure dataset is a searchable data object that is associated with a single Microsoft Azure connection. Each Microsoft Azure dataset is defined by an ADLS or ABS container URL.
Two core workflows
Microsoft Azure datasets and connections support two workflows. One workflow facilitates a combination of data routing and federated search. The other workflow is only for federated search.
- Data routing to federated search workflow
- Configure a Data routing and federated search dataset that sends Edge Processor data, Ingest Processor data, or a combination of both to a dataset in an ADLS or ABS container that can then be used as a pipeline destination. You can optionally configure the dataset to support federated searches, so you can use the same dataset to write data to and read data from an ADLS or ABS container.
- Federated search only workflow
- Create a Federated search only dataset that is stored in an ADLS or ABS container and is backed by a Splunk data catalog. Select this option when you want to focus on search of data that you are storing in Microsoft Azure and do not require a data routing solution.
Splunk-native data catalog generation
Federated Search for Microsoft Azure searches apply filtering and statistical functions to data catalogs that contain column, schema, and partition definitions for datasets in your ADLS or ABS containers. This means that a data catalog must be associated with each Microsoft Azure dataset you intend to search.
Federated Search for Microsoft Azure builds a Splunk-native data catalog for for each dataset you define. You can let Splunk software automatically infer the dataset schema and partitions with a crawler, or you can manually configure the dataset schema and partitions yourself.
You can arrange to keep this catalog in sync with your dataset as your dataset changes over time.
What you need to get started
To get started with federated search of data you store in Microsoft Azure, you must have the following things:
-
You must have an Splunk Cloud Platform (SCP) deployment.
-
Your user account on the SCP deployment must have a role with the
edit_connectionsandedit_datasetscapabilities. See Define roles on the Splunk platform with capabilities in the Splunk Cloud Platform Manage Users and Security manual. -
You must have a Microsoft Azure account with data in ADLS or ABS containers that conforms to supported file and compression types.
-
(Optional) The Azure storage account that contains the Microsoft Azure dataset you want to access may have network-level access restrictions that prevent you from performing read or write operations on that dataset. To get around these restrictions, set up an IP address allow list for the storage account that corresponds to the Cloud region of your Splunk Cloud Platform deployment.
-
For instructions, see Set the default public network access rule for an Azure Storage account in Azure Blob Storage documentation.
-
To get an IP address list that corresponds to the Cloud region of your Splunk platform deployment, see IP address lists for Cloud regions.
-
Checklist of tasks to set up Federated Search for Microsoft Azure
The following checklist guides you through the cross-account setup of Federated Search for Microsoft Azure.
| Step | Task | Description |
|---|---|---|
| 1 | Create a Microsoft Azure connection | A connection contains the tools you need to authenticate the ability to run federated searches over Microsoft Azure datasets from your Splunk platform deployment. Connections can also support the sending of data from Edge Processor or Ingest Processor to an Azure dataset. |
| 2 | Define a Microsoft Azure dataset | Providebaseline information for your dataset, including its name and the URL of its ADLS or ABS container. Link it to a connection. Determine whether the dataset is used for Data Routing and Federated Search or just Federated Search. |
| 3 | Configure Microsoft Azure dataset details | Define a Federated Search dataset that is in an Azure storage container and is backed by a Splunk Catalog. You can define the dataset's schema and partition keys yourself, or you can let Splunk software use a crawler to automatically infer the schema and partition keys. |
| 4 | Give your users role-based access control of federated datasets | After you have successfully created a Microsoft Azure dataset, give your users role-based access to it. |
| 5 | Write and run federated searches over federated datasets with SPL2 | Run federated searches over your new Microsoft Azure dataset with SPL2. |