Define an Azure Databricks dataset

Define an Azure Databricks dataset in the Data Management app to provide federated search access to a specific Unity Catalog table or view.

Note: In the Controlled Availability release stage, Splunk products may have limitations on customer access, features, maturity, and regional availability. For additional information on Controlled Availability please contact your Splunk representative.

After you define an Azure Databricks connection, you define Azure Databricks datasets for use in federated searches. Each Azure Databricks dataset you define lets you run federated searches over a specific Unity Catalog table or view.

  • Your Splunk Cloud Platform deployment user account must have a role with the edit_datasets and edit_federated_providers capabilities. See Define roles on the Splunk platform with capabilities in Securing Splunk Cloud Platform.
  • You must have an Azure Databricks connection that has been uploaded with a credential file that provides access to a Delta Lake share containing one or more Unity Catalog tables or views. See .
  1. In the Data Management app, on the Datasets page, select Create dataset.
  2. On the Select data store page, choose Azure Databricks, then select Next.
  3. On the Configure connection page, do one of the following things:
    • If a suitable connection already exists for this dataset, select it from the Associated connection drop-down list and select Next.
    • If a suitable connection does not already exist, select Create connection. You are prompted to navigate away from the current screen to create a new connection. See Create an Azure Databricks connection. When you have successfully created a new connection, select Next.
  4. On the Define dataset page, provide values for the following fields and select Next:
    Setting Description
    Dataset name Supply a unique name for your dataset. The dataset name can contain only alphanumeric characters, underscores, and hyphens.
    Dataset description (Optional) Provide a description for your dataset.
  5. On the Configure dataset page, specify the Unity Catalog table that you want to run federated searches over with this dataset. This table must be included in the Delta Lake share that is associated with the connection for this dataset. To correctly identify the table, you must provide the following values.
    1. Share name: Provide the exact name of the Delta Lake share that the table belongs to, as specified in Azure Databricks.
    2. Schema name: Provide the exact name of the Unity Catalog schema that contains the table that you want to search. The schema must be included in the Delta Lake share.
    3. Table name: Provide the name of the Unity Catalog table that you want to run federated searches against. The table must be contained in the schema.
  6. (Optional) Select Define the time field if your dataset contains time-series data and you intend to use time-based fields and functions when you run searches over it.

    If you select Define the time field, you must identify the Time field, Time format, and Unix time field These settings identify the time field in your dataset, provide its time format, and indicate the Unix time field alias you want to use in your federated searches. Splunk software cannot identify the time field in your remote dataset without your assistance.

    For more information, see Identify the time field in an Azure Databricks dataset.

  7. Select Next.
  8. On the Review page, review your dataset definition. If the details appear correct, select Create Dataset to create your dataset.
After you create your Azure Databricks dataset there are two things you should do: