Getting Microsoft Azure data into the Splunk platform

Introduction

Getting Azure data sources into Splunk.

This Splunk Validated Architecture (SVA) presents validated approaches for getting data from Microsoft Azure resources into Splunk. The topics covered in this SVA apply to Splunk Cloud Platform and Splunk Enterprise products. Where applicable, limitations of platform availability are noted.

Microsoft Azure has a broad portfolio of services. The validated ingestion approaches described in this SVA address the most common services, but may not be exhaustive.

This document presents multiple options for ingesting data sources and, in many cases, data sources may be eligible for ingestion in more than one method. Choosing the most appropriate method requires consideration for customer architecture, data sources, volume, and velocity.

General overview

An overview of getting data in from Azure sources.

Overview of Azure GDI

There are two common approaches to ingesting Microsoft Azure data: push and pull. For the purpose of this document, push refers to sending the data from an Microsoft Azure account to a Splunk endpoint and pull refers to a Splunk component querying the data from Azure using APIs. Each approach has its own strengths and weaknesses which are outlined below.

Push approach

The push approach uses Azure functions to send data.

The push approach uses Azure Functions to send data to a Spunk HTTP Event Collector (HEC) endpoint. Many Azure services support Azure Storage and/or Event Hubs as a logging destination. Once a service has written data to Azure Storage or an Event Hub, an Azure Function is used to read the data and then push the data to a Splunk HEC endpoint.

Benefits

Highly Scalable
Lower latency data ingest than the pull approach

Limitations

Requires a publicly accessible HEC endpoint
Customer managed HEC endpoints must use a Public Certificate Authority issued SSL Certificate
Not all data sources support push based ingestion

Pull approach

Pulling data from Azure sources involve scripts that request data.

The pull approach involves running scripts that query Microsoft services for data using an API. The pull approach can be implemented with Technology Add ons or Data Manager. Considerations for those implementations are described in the Architecture considerations section.

Benefits

Simplest configuration
Pull architectures are typically less expensive to operate
Most Microsoft Azure data sources are supported by pull-based ingestion

Limitations

TAs pull data on a schedule which can introduce ingestion latency
API throttling possible with more frequent pulls or high-volume environments

Architecture considerations

Overall Azure getting-data-in architecture considerations.

Data Manager

Data Manager configures data ingestion from a wide range of sources.

Data Manager allows administrators to automatically configure best practice data ingestion from a wide range of sources across multiple platforms. For Azure specifically, it allows customers to configure multiple data inputs at once and then automatically generates the ARM templates for efficient and consistent deployments in their Azure accounts. It is currently only available on Splunk Cloud running on AWS and is supported in all commercial regions.

Data Manager uses the best-practice approach for data inputs. Customers only need to complete some simple configuration and Data Manager will automatically generate ARM templates and provide instructions to deploy everything needed to create the data pipeline.

Data Manager allows you to create and monitor your inputs via the UI in Splunk Web. The status and health metrics for each input are automatically captured and displayed both in the list of inputs on the Data Manager home page as well as with the pre-configured dashboards.

Data manager is the recommended ingestion method for high volume data sources (>1TB/day), such as Event Hubs, that support pull based ingestion.

Benefits

Data manager service offers highly scalable (autoscaling), highly available, and low latency ingestion of data from pull based data sources
Data manager service does not consume compute (svc) resources for data pull operations and index time transformations.
Guided configuration reduces level of effort to configure data ingestion
Applies Splunk best practice (push/pull) ingestion methods
Built in dashboards provide insights into ingestion input health

Limitations

Only available in Splunk Cloud environments on AWS in supported regions
Not all data sources are supported by Data Manager

Technology add-on enabled pull architectures

Splunk provides multiple add-ons for Azure and Microsoft services.

Splunk provides multiple add-ons for Azure and Microsoft services that enable data collection from Microsoft cloud data sources as well as providing CIM-compatible knowledge objects for use with any data collection method. The main add-ons for Microsoft cloud data are:

Choosing an app is primarily based on which service you are trying to ingest data from; each app has a page in its documentation that lists the relevant sourcetypes and services the app supports.

Pull collection through supported TAs may be configured on Splunk Enterprise installations, Inputs Data Manager in Splunk Cloud environments deployed on the classic experience, and on search heads/search head clusters in Splunk Cloud environments deployed with the Victoria Experience.

Single node pull collection

A single node Splunk Enterprise, IDM, or single search head installation is suitable for low volume environments.

A single node Splunk Enterprise installation, IDM installation (Classic Experience), or single node search head installation is a validated collection architecture suitable for low volume environments.

Benefits

Simplified configuration and management

Limitations

Requires a client and secret ID for authentication
Can be CPU/resource intensive when pulling large volumes or data or with many inputs
Use of an inputs data manager or stand alone ad-hoc search head can create a single point of failure

Multiple node pull collection

A multiple node Splunk Enterprise or clustered search head installation is suitable for low and high volume environments.

A multiple node Splunk Enterprise installation or clustered search head installation in Splunk Cloud Victoria Experience is a validated collection architecture suitable for low and high volume environments.

Benefits

Supports higher volume data ingestion than single node ingestion. (Data Manager pull-based ingestion (where supported) or customer managed push based data ingestion recommended for volumes in excess of 1TB/day)
Input management across multiple nodes may be more complex in Splunk Enterprise Environment deployments

Limitations

Requires a client and secret ID for authentication
Pulling large volumes or Add-On configuration with many inputs can be CPU/resource intensive

Customer-managed push architectures

Customer-implemented push architectures (that is, not Data Manager) allow for a more scalable and real-time collection of data from Azure.

Customer-implemented push architectures (that is, not Data Manager) allow for a more scalable and real-time collection of data from Azure, but require user-provided code in Azure functions. The Splunk GitHub repository contains sample code for Azure functions to integrate Microsoft data with Splunk, but is not supported.

The three sample Azure functions included in the repository support collection from:

Microsoft Teams
Azure Event Hubs
Azure Storage

Benefits

More scalable than the pull approach
Lower latency data ingest than the pull approach
Utilizes managed services, requiring less management and maintenance overhead

Limitations

Requires deployment of user provided code within Azure functions
Can cost more than pull based methods for smaller data environments (generally <1TB per day)
Customization requires developer knowledge in one of the supported programming languages (C#, Java, JavaScript, Powershell or Python).
Not all data sources support push-based ingestion

Splunk Common Information Model

Technology Add-ons are available on Splunkbase for Microsoft Azure services.

Technology Add-ons (TAs) are available on Splunkbase for Microsoft Azure services that provide data processing to correctly parse the incoming data and ensure it is compliant with Splunk's Common Information Model (CIM).

Implementation recommendations

Implementation recommendations.

Please see the Following best practices for ingesting data from your Azure environment article, available on the Splunk Lantern Customer Success Center, for data source-specific best practice guidance for getting Azure data into the Splunk platform.

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

Enterprise Security

SOAR

IT Service Intelligence

Content Packs

Splunk Observability Cloud

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

Developer Documentation

Splunkbase

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

DATA MANAGEMENT

SEARCH AND ANALYTICS

ADMINISTRATION

Enterprise Security

SOAR

ENTERPRISE SECURITY

SOAR

RELATED APPS

IT Service Intelligence

Content Packs

ITSI

IT Ops

ADMINISTRATION

EXTENSIONS

Splunk Observability Cloud

MONITORING

DATA MANAGEMENT

ADMINISTRATION

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

ESSENTIALS

MONITORING

ADMINISTRATION

Developer Documentation

Splunkbase

PLATFORM

OBSERVABILITY

REFERENCE

Resources

REFERENCE

Learn More

Support

Introduction

General overview

Push approach

Benefits

Limitations

Pull approach

Benefits

Limitations

Architecture considerations

Data Manager

Benefits

Limitations

Technology add-on enabled pull architectures

Single node pull collection

Benefits

Limitations

Multiple node pull collection

Benefits

Limitations

Customer-managed push architectures

Benefits

Limitations

Splunk Common Information Model

Implementation recommendations