Getting Microsoft Azure data into the Splunk platform

Introduction

Splunk offers many ways of getting Microsoft Azure resource data into Splunk Cloud. Essentially the trade-offs vary by ingestion type and path by ways of scaling, support, security, performance, management and cost.

When considering the best ingest option for the resource in your organization it is best to consider the trade-offs as mentioned above, as in some cases there is more than one ingestion possibility for a Azure resource type.

As a general rule, Data Manager is the recommended method of data ingestion for Splunk Cloud customers for supported data sources where available. Data Manager greatly reduces the time to configure cloud data sources from hours to minutes, while providing a centralized data ingestion management, monitoring and troubleshooting experience.

Throughout this document we will discuss the different architectures to help you choose the best solution for your use case.

General overview

An overview of getting data in from Azure sources.

Overview of Azure GDI

There are two common approaches to ingesting Microsoft Azure data: push and pull. For the purpose of this document, push refers to sending the data from an Microsoft Azure account to a Splunk endpoint and pull refers to a Splunk component querying the data from Azure using APIs. Each approach has its own strengths and weaknesses which are outlined below.

Push approach

The push approach uses Azure functions to send data.

The push approach uses Azure Functions to send data to a Spunk HTTP Event Collector (HEC) endpoint. Many Azure services support Azure Storage and/or Event Hubs as a logging destination. Once a service has written data to Azure Storage or an Event Hub, an Azure Function is used to read the data and then push the data to a Splunk HEC endpoint.

Benefits

  • Highly Scalable
  • Lower latency data ingest than the pull approach

Limitations

  • Requires a publicly accessible HEC endpoint
  • Customer managed HEC endpoints must use a Public Certificate Authority issued SSL Certificate
  • Not all data sources support push based ingestion

Pull approach

Pulling data from Azure sources involve scripts that request data.

The pull approach involves running scripts that query Microsoft services for data using an API. The pull approach can be implemented with Technology Add ons or Data Manager. Considerations for those implementations are described in the Architecture considerations section.

Benefits

  • Simplest configuration
  • Pull architectures are typically less expensive to operate
  • Most Microsoft Azure data sources are supported by pull-based ingestion

Limitations

  • TAs pull data on a schedule which can introduce ingestion latency
  • API throttling possible with more frequent pulls or high-volume environments

Architecture considerations

Overall Azure getting-data-in architecture considerations.

Data Manager

Data Manager configures data ingestion from a wide range of sources.

Data Manager allows administrators to automatically configure best practice data ingestion from a wide range of sources across multiple platforms. For Azure specifically, it allows customers to configure multiple data inputs at once and then automatically generates the ARM templates for efficient and consistent deployments in their Azure accounts. It is currently only available on Splunk Cloud running on AWS and is supported in all commercial regions.

Data Manager uses the best-practice approach for data inputs. Customers only need to complete some simple configuration and Data Manager will automatically generate ARM templates and provide instructions to deploy everything needed to create the data pipeline.

Data Manager allows you to create and monitor your inputs via the UI in Splunk Web. The status and health metrics for each input are automatically captured and displayed both in the list of inputs on the Data Manager home page as well as with the pre-configured dashboards.

Data manager is the recommended ingestion method for high volume data sources (>1TB/day), such as Event Hubs, that support pull based ingestion.

Benefits

  • Data manager service offers highly scalable (autoscaling), highly available, and low latency ingestion of data from pull based data sources
  • Data manager service does not consume compute (svc) resources for data pull operations and index time transformations.
  • Guided configuration reduces level of effort to configure data ingestion
  • Applies Splunk best practice (push/pull) ingestion methods
  • Built in dashboards provide insights into ingestion input health

Limitations

  • Only available in Splunk Cloud environments on AWS in supported regions
  • Not all data sources are supported by Data Manager

Technology add-on enabled pull architectures

Splunk provides multiple add-ons for Azure and Microsoft services.

Splunk provides multiple add-ons for Azure and Microsoft services that enable data collection from Microsoft cloud data sources as well as providing CIM-compatible knowledge objects for use with any data collection method. The main add-ons for Microsoft cloud data are:

Choosing an app is primarily based on which service you are trying to ingest data from; each app has a page in its documentation that lists the relevant sourcetypes and services the app supports.

Pull collection through supported TAs may be configured on Splunk Enterprise installations, Inputs Data Manager in Splunk Cloud environments deployed on the classic experience, and on search heads/search head clusters in Splunk Cloud environments deployed with the Victoria Experience.

Single node pull collection

A single node Splunk Enterprise, IDM, or single search head installation is suitable for low volume environments.

A single node Splunk Enterprise installation, IDM installation (Classic Experience), or single node search head installation is a validated collection architecture suitable for low volume environments.

Benefits

  • Simplified configuration and management

Limitations

  • Requires a client and secret ID for authentication
  • Can be CPU/resource intensive when pulling large volumes or data or with many inputs
  • Use of an inputs data manager or stand alone ad-hoc search head can create a single point of failure

Multiple node pull collection

A multiple node Splunk Enterprise or clustered search head installation is suitable for low and high volume environments.

A multiple node Splunk Enterprise installation or clustered search head installation in Splunk Cloud Victoria Experience is a validated collection architecture suitable for low and high volume environments.

Benefits

  • Supports higher volume data ingestion than single node ingestion. (Data Manager pull-based ingestion (where supported) or customer managed push based data ingestion recommended for volumes in excess of 1TB/day)
  • Input management across multiple nodes may be more complex in Splunk Enterprise Environment deployments

Limitations

  • Requires a client and secret ID for authentication
  • Pulling large volumes or Add-On configuration with many inputs can be CPU/resource intensive

Customer-managed push architectures

Customer-implemented push architectures (that is, not Data Manager) allow for a more scalable and real-time collection of data from Azure.

Customer-implemented push architectures (that is, not Data Manager) allow for a more scalable and real-time collection of data from Azure, but require user-provided code in Azure functions. The Splunk GitHub repository contains sample code for Azure functions to integrate Microsoft data with Splunk, but is not supported.

The three sample Azure functions included in the repository support collection from:

  • Microsoft Teams
  • Azure Event Hubs
  • Azure Storage

Benefits

  • More scalable than the pull approach
  • Lower latency data ingest than the pull approach
  • Utilizes managed services, requiring less management and maintenance overhead

Limitations

  • Requires deployment of user provided code within Azure functions
  • Can cost more than pull based methods for smaller data environments (generally <1TB per day)
  • Customization requires developer knowledge in one of the supported programming languages (C#, Java, JavaScript, Powershell or Python).
  • Not all data sources support push-based ingestion

Splunk Common Information Model

Technology Add-ons are available on Splunkbase for Microsoft Azure services.

Technology Add-ons (TAs) are available on Splunkbase for Microsoft Azure services that provide data processing to correctly parse the incoming data and ensure it is compliant with Splunk's Common Information Model (CIM).

Implementation recommendations

Implementation recommendations.

Please see the Service Recommendations Lantern article for more information on methods for each service and links to documentation, blogs, and videos.