About Federated Search for Microsoft Azure

Federated Search for Microsoft Azure lets you run federated searches from your Splunk Cloud Platform deployment over datasets in Microsoft Azure Data Lake Storage and Azure Blob Storage containers.

Note: In the Controlled Availability release stage, Splunk products may have limitations on customer access, features, maturity, and regional availability. For additional information on Controlled Availability please contact your Splunk representative.

Federated Search for Microsoft Azure lets you run federated searches from your Splunk platform deployment over datasets located in Microsoft Azure Data Lake Storage and Azure Blob Storage containers. When you run these federated searches, you use familiar SPL2 search commands and syntax.

Note: If you want to search Azure Databricks tables stored remotely in Unity Catalog, see About Federated Search for Azure Databricks.

Connections and datasets

Federated Search for Microsoft Azure is part of the Data Management app, where you'll set up your federated search experience through the definition of connections and datasets.

Connection: A Microsoft Azure connection defines how Splunk software securely authenticates a link between your Splunk Cloud Platform deployment and a remote dataset in a Microsoft Azure Data Lake Storage or Azure Blob Storage container. Connections are reusable and can be associated with multiple Microsoft Azure datasets. Microsoft Azure connections do not specify what data is searchable.
Dataset: A Microsoft Azure dataset is a searchable data object that is associated with a single Microsoft Azure connection. Each Microsoft Azure dataset is defined by an Azure Data Lake Storage or Azure Blob Storage container URL.

Two core workflows

Microsoft Azure datasets and connections support two workflows. One workflow facilitates a combination of data routing and federated search. The other workflow is only for federated search.

Data routing to federated search workflow: Configure a Data routing and federated search dataset that sends Edge Processor data, Ingest Processor data, or a combination of both to a dataset in an Azure Data Lake Storage or Azure Blob Storage container that can then be used as a pipeline destination. You can optionally configure the dataset to support federated searches, so you can use the same dataset to write data to and read data from an Azure Data Lake Storage or Azure Blob Storage container.
Federated search only workflow: Create a Federated search only dataset that is stored in an Azure Data Lake Storage or Azure Blob Storage container and is backed by a Splunk data catalog. Select this option when you want to focus on search of data that you are storing in Microsoft Azure and do not require a data routing solution.

Splunk-native data catalog generation

Federated Search for Microsoft Azure searches apply filtering and statistical functions to data catalogs that contain column, schema, and partition definitions for datasets in your Azure Blob Storage and Azure Data Lake Storage containers. This means that a data catalog must be associated with each Microsoft Azure dataset you intend to search.

Federated Search for Microsoft Azure builds a Splunk-native data catalog for for each dataset you define. You can let Splunk software automatically infer the dataset schema and partitions with a crawler, or you can manually configure the dataset schema and partitions yourself.

You can arrange to keep this catalog in sync with your dataset as your dataset changes over time.

What you need to get started

To get started with federated search of data you store in Microsoft Azure, you must have the following things:

You must have an Splunk Cloud Platform (SCP) deployment.
Your user account on the SCP deployment must have a role with the edit_datasets and edit_federated_providers capabilities. See Define roles on the Splunk platform with capabilities in the Splunk Cloud Platform Manage Users and Security manual.
You must have a Microsoft Azure account with data in Azure Blob Storage or Azure Data Lake Storage containers that conforms to supported file and compression types.

Activate Federated Search for Microsoft Azure

To activate Federated Search for Microsoft Azure for your Splunk Cloud Platform deployment, contact your Splunk sales representative. As part of this activation, you acquire a data scan entitlement that is based on the amount of remote Microsoft Azure data, in terabytes, that you are projected to search over the upcoming year. Data scan entitlements are made up of Data Scan Units (DSUs). Each DSU is equivalent to 10 TB of data scanning capabilities.

Note: If you are an existing user of Federated Search for Amazon S3 or Federated Analytics, apply for access to Federated Search for Microsoft Azure through the VOC portal.

You have one pool of DSUs that you share between the federated search products you use. For example, if you use both Federated Search for Microsoft Azure and Federated Search for Amazon S3, you will share one pool of DSUs among the searches you run for both products.

For more information about DSUs, see Splunk Offerings Purchase Capacity and Limitations.

Checklist of tasks to set up Federated Search for Microsoft Azure

The following checklist guides you through the cross-account setup of Federated Search for Microsoft Azure.


Step	Task	Description
1	Create a Microsoft Azure connection	A connection contains the tools you need to authenticate the ability to run federated searches over Microsoft Azure datasets from your Splunk platform deployment. Connections can also support the sending of data from Edge Processor or Ingest Processor to an Azure dataset.
2	Define a Microsoft Azure dataset	Provide baseline information for your dataset, including its name and the URL of its Azure Data Lake Storage or Azure Blob Storage container. Link it to a connection. Determine whether the dataset is used for Data Routing and Federated Search or just Federated Search.
3	Configure Microsoft Azure dataset details	Define a Federated Search dataset that is in an Azure storage container and is backed by a Splunk Catalog. You can define the dataset's schema and partition keys yourself, or you can let Splunk software use a crawler to automatically infer the schema and partition keys.
4	Give your users role-based access control of remote datasets	After you have successfully created a Microsoft Azure dataset, give your users role-based access to it.
5	Write and run federated searches over remote datasets with SPL2	Run federated searches over your new Microsoft Azure dataset with SPL2.

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

Enterprise Security

SOAR

IT Service Intelligence

Content Packs

Splunk Observability Cloud

AppDynamics SaaS

AppDynamics On-Premises

Virtual Appliance (Self-Hosted)

Developer Documentation

Splunkbase

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

DATA MANAGEMENT

SEARCH AND ANALYTICS

ADMINISTRATION

Enterprise Security

SOAR

ENTERPRISE SECURITY

SOAR

RELATED APPS

IT Service Intelligence

Content Packs

ITSI

IT Ops

ADMINISTRATION

EXTENSIONS

Splunk Observability Cloud

MONITORING

DATA MANAGEMENT

ADMINISTRATION

AppDynamics SaaS

AppDynamics On-Premises

Virtual Appliance (Self-Hosted)

ESSENTIALS

MONITORING

ADMINISTRATION

Developer Documentation

Splunkbase

PLATFORM

OBSERVABILITY

REFERENCE

Resources

REFERENCE

Learn More

Support

About Federated Search for Microsoft Azure

Connections and datasets

Two core workflows

Splunk-native data catalog generation

What you need to get started

Activate Federated Search for Microsoft Azure

Checklist of tasks to set up Federated Search for Microsoft Azure