Define a Microsoft Azure dataset

After you define a Microsoft Azure connection, define Microsoft Azure datasets for use in federated searches.

Note: In the Controlled Availability release stage, Splunk products may have limitations on customer access, features, maturity, and regional availability. For additional information on Controlled Availability please contact your Splunk representative.

After you define a Microsoft Azure connection, you define Microsoft Azure datasets for use in federated searches.

This task guides you through the preliminary definition steps for a Microsoft Azure dataset. These steps apply to all Microsoft Azure datasets you define with the Data Management app. You'll identify the Microsoft Azure connection the dataset is connected with. You'll provide an Azure storage container URL that points to the data that you want to search with the dataset. And you'll indicate whether the dataset can be used for both data routing and federated search, or only federated search.

Data routing and federated search datasets support the "send data from Ingest Processor" and "send data from Edge Processor" abilities. When they are configured to send data to a Microsoft Azure dataset, they can also support federated searches of that dataset.

Federated search datasets support only the "run federated search" ability.

  • Your Splunk Cloud Platform deployment user account must have a role with the edit_datasets and edit_federated_providers capabilities. See Define roles on the Splunk platform with capabilities in the Splunk Cloud Platform Manage Users and Security manual.
  • You must have a Microsoft Azure storage account that is of the standard Storage V2 (General Purpose V2) storage account type. Other types of Azure storage accounts are not supported.
  • You must have a Microsoft Azure dataset within that storage account, located in an Azure Data Lake Storage or Azure Blob Storage container.
  1. In the Data Management app, on the Datasets page, select Create dataset.
  2. On the Select data store page, choose Microsoft Azure, then select Next.
  3. On the Configure connection page, do one of the following things:
    • If a suitable Microsoft Azure connection already exists for this dataset, select it from the Associated connection drop-down list and select Next.
    • If a suitable Microsoft Azure connection does not already exist for this dataset, select Create connection. You are prompted to navigate away from the current screen to create a new connection. See Create a Microsoft Azure connection.
  4. On the Define dataset page, provide values for the following fields.
    Setting Description
    Dataset name Supply a unique name for your dataset. The dataset name can contain only alphanumeric characters, underscores, and hyphens.
    Dataset description (Optional) Provide a description for your dataset.
    Azure container URL Provide the Azure storage container URL that you want to search. This URL is a path to a directory in the container. This path cannot end with a file name.
    Note: A valid Azure container URL follows this format: https://storage_account_name.blob.core.windows.net/container_name/path_to_directory
  5. Is your storage account hierarchical or flat?: Specify whether the dataset you want to search is in an Azure Data Lake Storage account, which uses hierarchical namespace storage, or in an Azure Blob Storage account, which uses flat namespace storage.
  6. Usage: Select Federated search.

    The Data routing and federated search option applies only to datasets that collect data sent to Microsoft Azure from your Splunk Cloud Platform deployment using Edge Processor, Ingest Processor, or a combination of both services. Once you set up the data routing details for this dataset, you can optionally enable federated search for it, which means that you can use the same dataset to write data to and read data from an Azure Data Lake Storage or Azure Blob Storage account.

    Note: Data routing and federated search datasets currently do not support federated search when their Output schema is set to Splunk HTTP Event Collector (HEC).
  7. Select Next to move on to the Configure dataset step.
Proceed to the Configure dataset step. See Configure Microsoft Azure dataset details.