Ensure the Microsoft Azure dataset and its data catalog stay in sync

Arrange for Splunk software to automatically update your data catalog whenever data is added to or removed from the Microsoft Azure dataset it represents.

Note: In the Controlled Availability release stage, Splunk products may have limitations on customer access, features, maturity, and regional availability. For additional information on Controlled Availability please contact your Splunk representative.

Federated searches must run over a data catalog that represents the data you want to search in Microsoft Azure. When you create a dataset definition, Splunk software creates this data catalog for you. The data catalog is based on the contents of the Azure container URL that you indicate on the Define dataset page.

You can optionally arrange for Splunk software to automatically update this data catalog whenever data is added to or removed from the dataset that the catalog represents. Federated searches that use data catalogs that are out of sync with their Microsoft Azure datasets return incorrect results.

Note: If the contents of your Microsoft Azure dataset are static and you do not expect them to change, you do not need to arrange for the data catalog to keep in sync with future changes to the dataset. You can skip this topic and return to configuring your dataset. See Configure Microsoft Azure dataset details.

Prerequisites

  • You must have access to the Microsoft Azure Portal. Open the Portal in a separate browser tab throughout this process, so you can easily retrieve the Queue URL and paste it into the Configure dataset page for your Microsoft Azure dataset.
  • You must have an existing Azure storage account that is Storage V2/General Purpose V2, with locally redundant storage (LRS) redundancy.
  • You must have sufficient permissions to make changes to the Azure storage account and the resource group to which the storage account belongs. You must have a Storage Account Contributor role assignment on the storage account. You must have an Event Grid Contributor role assignment on the resource group.
  • You must have the name of the app registration that was created for the connection that your dataset is associated with. If you do not know the app registration name for the connection, follow these steps:
    1. Go to the Connections listing page in the Data Manager app and select the connection your dataset is using.
    2. Copy the Tenant ID for the connection.
    3. In another browser tab, open the Azure portal, and search on "app registrations" to get a link to the App registrations page.
    4. Put the Tenant ID for your connection in the filter to bring up the app registration for your connection. Make note of its name.

Tasks

To set up this automatic update capability for the Splunk-native data catalog that backs your Microsoft Azure dataset, you need to go to the Microsoft Azure Portal and perform the following tasks:

  1. Create an Azure Storage Queue to receive messages from downstream consumers. When you do this, get the queue URL for the Queue URL field on the Configure dataset page of your dataset definition.

  2. Create an Event Grid system topic that is scoped to the Azure storage account that contains the data you want to search.

  3. Create an event subscription for the Event Grid system topic. This subscription will forward blob lifecycle events (such as blobs created or blobs deleted) to the Storage Queue.

  4. Grant permission to the Event Grid to write to the storage queue.

  5. Grant a storage queue role to the app registration for the dataset, so it can read, write, and delete on queue messages.