About Federated Search for Microsoft Azure
Federated Search for Microsoft Azure lets you run federated searches from your Splunk Cloud Platform deployment over datasets in Microsoft Azure Data Lake Storage and Azure Blob Storage containers.
Federated Search for Microsoft Azure lets you run federated searches from your Splunk platform deployment over datasets located in Microsoft Azure Data Lake Storage and Azure Blob Storage containers. When you run these federated searches, you use familiar SPL2 search commands and syntax.
Connections and datasets
- Connection
- A Microsoft Azure connection defines how Splunk software securely authenticates a link between your Splunk Cloud Platform deployment and a remote dataset in in an . Connections are reusable and can be associated with multiple Microsoft Azure datasets. Microsoft Azure connections do not specify what data is searchable.
- Dataset
- A Microsoft Azure dataset is a searchable data object that is associated with a single Microsoft Azure connection. Each Microsoft Azure dataset is defined by an Azure Data Lake Storage or Azure Blob Storage container URL.
Two core workflows
Microsoft Azure datasets and connections support two workflows. One workflow facilitates a combination of data routing and federated search. The other workflow is only for federated search.
- Data routing to federated search workflow
- Configure a Data routing and federated search dataset that sends Edge Processor data, Ingest Processor data, or a combination of both to a dataset in an Azure Data Lake Storage or Azure Blob Storage container that can then be used as a pipeline destination. You can optionally configure the dataset to support federated searches, so you can use the same dataset to write data to and read data from an Azure Data Lake Storage or Azure Blob Storage container.
- Federated search only workflow
- Create a Federated search only dataset that is stored in an Azure Data Lake Storage or Azure Blob Storage container and is backed by a Splunk data catalog. Select this option when you want to focus on search of data that you are storing in Microsoft Azure and do not require a data routing solution.
Splunk-native data catalog generation
Federated Search for Microsoft Azure searches apply filtering and statistical functions to data catalogs that contain column, schema, and partition definitions for datasets in your Azure Blob Storage and Azure Data Lake Storage containers. This means that a data catalog must be associated with each Microsoft Azure dataset you intend to search.
Federated Search for Microsoft Azure builds a Splunk-native data catalog for for each dataset you define. You can let Splunk software automatically infer the dataset schema and partitions with a crawler, or you can manually configure the dataset schema and partitions yourself.
You can arrange to keep this catalog in sync with your dataset as your dataset changes over time.
What you need to get started
To get started with federated search of data you store in Microsoft Azure, you must have the following things:
-
You must have an Splunk Cloud Platform (SCP) deployment.
-
Your user account on the SCP deployment must have a role with the
edit_datasetsandedit_federated_providerscapabilities. See Define roles on the Splunk platform with capabilities in the Splunk Cloud Platform Manage Users and Security manual. -
You must have a Microsoft Azure account with data in Azure Blob Storage or Azure Data Lake Storage containers that conforms to supported file and compression types.
Activate Federated Search for Microsoft Azure
To activate Federated Search for Microsoft Azure for your Splunk Cloud Platform deployment, contact your Splunk sales representative. As part of this activation, you acquire a data scan entitlement that is based on the amount of remote Microsoft Azure data, in terabytes, that you are projected to search over the upcoming year. Data scan entitlements are made up of Data Scan Units (DSUs). Each DSU is equivalent to 10 TB of data scanning capabilities.
You have one pool of DSUs that you share between the federated search products you use. For example, if you use both Federated Search for Microsoft Azure and Federated Search for Amazon S3, you will share one pool of DSUs among the searches you run for both products.
For more information about DSUs, see Splunk Offerings Purchase Capacity and Limitations.
Checklist of tasks to set up Federated Search for Microsoft Azure
The following checklist guides you through the cross-account setup of Federated Search for Microsoft Azure.
| Step | Task | Description |
|---|---|---|
| 1 | Create a Microsoft Azure connection | A connection contains the tools you need to authenticate the ability to run federated searches over Microsoft Azure datasets from your Splunk platform deployment. Connections can also support the sending of data from Edge Processor or Ingest Processor to an Azure dataset. |
| 2 | Define a Microsoft Azure dataset | Providebaseline information for your dataset, including its name and the URL of its Azure Data Lake Storage or Azure Blob Storage container. Link it to a connection. Determine whether the dataset is used for Data Routing and Federated Search or just Federated Search. |
| 3 | Configure Microsoft Azure dataset details | Define a Federated Search dataset that is in an Azure storage container and is backed by a Splunk Catalog. You can define the dataset's schema and partition keys yourself, or you can let Splunk software use a crawler to automatically infer the schema and partition keys. |
| 4 | Give your users role-based access control of remote datasets | After you have successfully created a Microsoft Azure dataset, give your users role-based access to it. |
| 5 | Write and run federated searches over remote datasets with SPL2 | Run federated searches over your new Microsoft Azure dataset with SPL2. |