Create a Microsoft Azure dataset for Ingest Processor pipelines

Create a Microsoft Azure dataset in the Data Management app to define the Azure storage container that your pipelines send data to.

To send data from Ingest Processor to an Azure Blob Storage container or an Azure Data Lake Storage container, you must create a Microsoft Azure dataset in the Data Management app on Splunk Cloud Platform. You can then use the dataset as a pipeline destination.

You can optionally configure the dataset to also support federated searches, so that you can use the same dataset to write and read data from Microsoft Azure.

The dataset uses a Microsoft Azure connection for authentication. You can create multiple datasets that use the same connection.

Your Splunk Cloud Platform deployment must be on version 10.4.2604 or higher.
Your user account on the Splunk Cloud Platform deployment must have the edit_datasets and admin_all_objects capabilities. For more information, see the following pages:
- Manage users for the Ingest Processor solution
- Define roles on the Splunk platform with capabilities in the Splunk Cloud Platform Manage Users and Security manual
You must have a Microsoft Azure connection that authenticates to the Azure storage container that you want the dataset to represent. For more information, see Create a Microsoft Azure connection for Ingest Processor pipelines.

In Splunk Cloud Platform, select Data Management from the Apps panel.
Navigate to the Datasets page, and then select Create dataset.
On the Select data store page, select Microsoft Azure, then select Next.
On the Configure connection page, do one of the following:
- If you have already created the necessary Microsoft Azure connection, select it from the Associated connection drop-down list and then select Next.
- If you have not created the connection yet, select Create connection. You are prompted to navigate away from the current screen to create the connection. See Create a Microsoft Azure connection for Ingest Processor pipelines for more information.

On the Define dataset page, configure the following options, and then select Next:


Option name	Configuration instructions
Dataset name	Enter a unique name for your dataset.
Dataset description	(Optional) Enter a description for your dataset.
Azure container URL	Enter the URL of the Azure storage container that you want to send data to. This URL must include the path to a directory in the container, and it cannot end in a file name. The format of a valid Azure container URL value is as follows: `https://storage_account_name.blob.core.windows.net/container_name/path_to_directory`
Is your storage account hierarchical or flat?	Specify whether you are using this dataset to send data to Azure Data Lake Storage or Azure Blob Storage.
Usage	This option is available only if your Splunk Cloud Platform deployment has access to Federated Search. Select Data routing and federated search. Note: If you set Usage to Federated search, then the dataset cannot be used as a pipeline destination and can only be used in federated searches.

On the Configure dataset page, configure the following Data routing options:


Option name	Configuration instructions
Output schema	In Splunk Cloud Platform version 10.5.2605 and higher, this option is hardcoded to Pipeline output and cannot be changed. If you are using Splunk Cloud Platform version 10.4.2604, then set this option to Pipeline output. Note: Avoid selecting Splunk HTTP Event Collector (HEC), especially if you intend to run federated searches on this dataset. Schema inference does not always work as expected on events that use the HEC schema.
Output format	Select the file format that you want to use to store your data in the Azure container. If you select Parquet, be aware that the following limitations apply: For best results, you must process and route your data using a pipeline that's created from a template instead of using a custom-configured pipeline. Pipeline templates can ensure that the schema of the resulting Parquet output is compatible with federated searches. If you use a custom-configured pipeline that changes the schema of the events, the dataset will format the event according to the HEC event schema in order to produce Parquet output that is at least partially compatible with federated searches. The resulting events contain the top-level fields described in Event metadata in the Splunk Cloud Platform Get Data In manual.
Compression type	Select the compression format for your data.
File name prefix	(Optional) Enter a prefix for the name of the file that contains your output data in Azure. By default, file names are 32-digit UUIDs (universally unique identifiers) that are autogenerated by the system. You can make the file name more human-readable by adding a prefix. For example, if the autogenerated UUID is `2f0ff66a-e87a-4af5-befb-18dcafa6012f`, then entering `financial-report` in the File name prefix field changes the resulting file name to `financial-report-2f0ff66a-e87a-4af5-befb-18dcafa6012f`.

(Optional) To adjust the maximum number of events that this dataset can send in each batch of output data, expand Advanced settings and enter your desired maximum number of events in the Batch size field.

Note: In most cases, the default Batch size value is sufficient. The actual size of each batch can vary depending on the rate at which the Ingest Processor is sending out data.

If your Splunk Cloud Platform deployment has access to Federated Search, then the Configure dataset page includes a Federated search section of options. Configure those options as follows:


Option name	Configuration instructions
Activate	Toggle this switch to either allow or disallow federated searches for this dataset.
Maintain catalog and dataset consistency	(Optional) If federated searches are allowed for this dataset, you can arrange for the Splunk-managed data catalog to be updated automatically whenever data is added to or removed from the dataset. To configure automated catalog updates, you first must create a queue in Azure Queue Storage that receives notifications, as well as an Azure Event Grid system topic that forwards blob lifecycle events from your Azure Storage Account to the queue. Then, set this option to Yes and enter the URL of the queue from Azure Queue Storage in the Queue URL field. For detailed instructions, see Ensure the Microsoft Azure dataset and its data catalog stay in sync in the Splunk Cloud Platform Federated Search manual. If you don't want to configure automated catalog updates, then set this option to No.

Select Next.
On the Review page, ensure that all the entered information is correct, and then select Create Dataset to create your dataset.

You now have a Microsoft Azure dataset that can access the data in your Azure container.

To send data from Ingest Processor to your Azure container, create a pipeline that uses the Microsoft Azure dataset as a destination. Then, apply the pipeline to Ingest Processor. For more information, see the following pages:

For information about running federated searches on Microsoft Azure datasets, see Run federated searches over Microsoft Azure datasets in the Splunk Cloud Platform Federated Search manual.

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

Enterprise Security

SOAR

IT Service Intelligence

Content Packs

Splunk Observability Cloud

AppDynamics SaaS

AppDynamics On-Premises

Virtual Appliance (Self-Hosted)

Developer Documentation

Splunkbase

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

DATA MANAGEMENT

SEARCH AND ANALYTICS

ADMINISTRATION

Enterprise Security

SOAR

ENTERPRISE SECURITY

SOAR

RELATED APPS

IT Service Intelligence

Content Packs

ITSI

IT Ops

ADMINISTRATION

EXTENSIONS

Splunk Observability Cloud

MONITORING

DATA MANAGEMENT

ADMINISTRATION

AppDynamics SaaS

AppDynamics On-Premises

Virtual Appliance (Self-Hosted)

ESSENTIALS

MONITORING

ADMINISTRATION

Developer Documentation

Splunkbase

PLATFORM

OBSERVABILITY

REFERENCE

Resources

REFERENCE

Learn More

Support

Create a Microsoft Azure dataset for Ingest Processor pipelines