About Federated Search for Azure Databricks

Run federated searches from your Splunk platform deployment over Azure Databricks tables stored remotely in Unity Catalog using SPL2 search commands and syntax.

Note: In the Controlled Availability release stage, Splunk products may have limitations on customer access, features, maturity, and regional availability. For additional information on Controlled Availability please contact your Splunk representative.

Federated Search for Azure Databricks lets you run federated searches from your Splunk platform deployment over Azure Databricks tables stored remotely in Unity Catalog. When you run these federated searches, you'll use familiar SPL2 search commands and syntax.

Note: If you want to search datasets located in Microsoft Azure Data Lake Storage and Azure Blob Storage containers, see About Federated Search for Microsoft Azure.

Connections and datasets

Federated Search for Azure Databricks is part of the Data Management app, where you'll set up your federated search experience through the definition of connections and datasets.
Connection
An Azure Databricks connection defines how Splunk software securely authenticates a link between your Splunk Cloud Platform deployment and one or more Unity Catalog datasets from an Azure Databricks workspace. This authentication is facilitated through Databricks Delta Sharing. Azure Databricks connections are reusable and can be associated with multiple Azure Databricks datasets. Amazon S3 connections do not specify what data is searchable.
Dataset
An Azure Databricks dataset is is defined by the three-level namespace (Share, Schema, and Table) that identifies a Unity Catalog data object that has been shared through Azure Databricks Delta Sharing. When you invoke a dataset name in a federated search, you can run searches over the Unity Catalog dataset specified by the dataset definition. Each dataset must be associated with a single connection. A connection can be associated with multiple datasets.

What you need to get started

To get started with federated search of Azure Databricks data, you must have the following things:
  • You must have a Splunk Cloud Platform (SCP) deployment.

  • Your user account on the SCP deployment must have a role with the edit_datasets and edit_federated_providers capabilities. See Define roles on the Splunk platform with capabilities in the Splunk Cloud Platform Manage Users and Security manual.

  • You must have access to an Azure Databricks workspace with a runtime of 11.3 LTS or higher that contains the data you want to share and which is assigned to a Unity Catalog metastore. See Enable Unity Catalog for a workspace.

Activate Federated Search for Azure Databricks

To activate Federated Search for Azure Databricks for your Splunk Cloud Platform deployment, contact your Splunk sales representative. As part of this activation, you acquire a data scan entitlement that is based on the amount of remote Amazon S3 data, in terabytes, that you are projected to search over the upcoming year. Data scan entitlements are made up of Data Scan Units (DSUs). Each DSU is equivalent to 10 TB of data scanning capabilities.

Note: If you are an existing user of Federated Search for Amazon S3 or Federated Analytics, apply for access to Federated Search for Azure Databricks through the VOC portal.

You have one pool of DSUs that you share between the federated search products you use. For example, if you use both Federated Search for Azure Databricks and Federated Search for Amazon S3, you will share one pool of DSUs among the searches you run for both products.

For more information about DSUs, see Splunk Offerings Purchase Capacity and Limitations.

Checklist of tasks to set up Federated Search for Azure Databricks

Use this checklist to guide you through the cross-account setup of Federated Search for Azure Databricks.

Step Task Description
1 Create an Azure Databricks connection You upload a Delta Sharing credentials file to a connection to give it the ability to authenticate federated searches over Azure Databricks datasets from your Splunk platform deployment.
2 Define an Azure Databricks dataset When combined with a connection, a dataset provides the ability to run searches over a specific Unity Catalog table in your Azure Databricks workspace.
3 Give your users role-based access control of remote datasets After you have successfully created an Azure Databricks dataset, give your users role-based access to it.
4 Write and run federated searches over remote datasets with SPL2 Run federated searches over your new Azure Databricks dataset with SPL2.