Getting AWS data into the Splunk platform

Introduction

This Splunk Validated Architecture (SVA) presents validated approaches for getting data from Amazon Web Services (AWS) resources into Splunk.

This Splunk Validated Architecture (SVA) presents validated approaches for getting data from Amazon Web Services (AWS) resources into Splunk. The topics covered in this SVA apply to Splunk Cloud Platform and Splunk Enterprise products. Where applicable, limitations of platform availability are noted.

AWS has a broad portfolio of services. The validated ingestion approaches described in this SVA address the most common services, but may not be exhaustive. This document presents multiple options for ingesting data from AWS sources and, in many cases, data sources may be eligible for ingestion through more than one method. Choosing the most appropriate method requires consideration for customer architecture, data sources, volume, and velocity.

General overview

An overview of getting data in on AWS.

There are two common approaches to ingesting data from AWS: push and pull. For the purpose of this document, push refers to sending the data from an AWS account to a Splunk endpoint and pull refers to a Splunk component querying the data from AWS using APIs. Each approach has its own strengths and weaknesses which are outlined below.

Push approach

The push approach typically uses Amazon Data Firehose.

The push approach typically uses Amazon Data Firehose to stream data to a Splunk HTTP Event Collector (HEC) endpoint. Many AWS services support either direct integration with Amazon Data Firehose or support integration with Amazon CloudWatch Logs. Amazon CloudWatch Logs then supports forwarding logs via Data Firehose by configuring subscriptions. In cases where Data Firehose integration is not possible, an alternative approach is to deploy a script, typically as a Lambda function, to pull the required data via the AWS API and then push the data to a Splunk HEC endpoint.

Benefits

Highly Scalable
Tight integration between AWS services
Lower latency data ingest than the pull approach
Supports HEC ACK (SCP)

Considerations/Limitations

Requires a publicly accessible HEC endpoint
Customer managed HEC endpoints must use a Public Certificate Authority issued SSL Certificate
Not all data sources support push based ingestion

Pull approach

The pull approach involves running scripts that query AWS for data using the AWS API.

The pull approach involves running scripts that query AWS for data using the AWS API. These scripts are run either on an Inputs Data Manager (IDM) (Splunk Cloud - Classic Experience), search head (Splunk Cloud - Victoria Experience) or on one or more customer deployed heavy forwarders. For incremental data pulls, the best approach is to integrate the AWS service with SQS when available. The script can then consume messages from the SQS queue and run optimized queries via the API. For some services, a common use case is to send the logs to Amazon S3 and configure S3 to publish events to SQS. Other alternative approaches are to poll the API at specified time intervals or manually configure the query parameters for an ad-hoc data pull.

Similar to the push method, Splunk also provides two solutions to assist with the pull approach. The Splunk Add-on for AWS provides the same data processing for CIM compliance but also provides easily configurable scripts to pull the data from the most common AWS sources. Data Manager provides some of these configurable scripts as well to support use cases where pull is the preferred approach or the push approach is not available. More information on both of these solutions can be found in the following sections of this document.

Benefits

Capable of utilizing private networks or VPC endpoints to retrieve data
Supports bulk/replay scenarios in addition to incremental pulls

Considerations/Limitations

Pull approach uses scheduled data retrievals which introduces ingestion latency
Can generate an uncontrolled influx of events, which may overwhelm Splunk hosts and negatively impact data ingestion and indexing performance causing indexing pipelines and queues to block

Architecture considerations

Diagram showing AWS getting data in architecture considerations.

Data Manager

The Data Manager configures data ingestion from a wide range of sources.

Data Manager allows administrators to automatically configure best practice data ingestion from a wide range of sources across multiple platforms. For AWS specifically, it allows customers to configure multiple data inputs at once and then automatically generates the CloudFormation templates for efficient and consistent deployments in their AWS accounts. It is currently only available on Splunk Cloud running on AWS and is supported in all commercial regions.

Data Manager uses the best-practice approach for data inputs. Customers only need to complete some simple configuration and Data Manager will automatically generate CloudFormation templates and provide instructions to deploy everything needed to create the data pipeline.

You can configure the inputs for one or more accounts at the same time. Data Manager also supports deploying to Organizational Units (OUs) in AWS Organizations. This provides an efficient way to configure data ingestion across an entire organization and can save customers significant time and effort.

Data Manager allows you to create and monitor your inputs via the UI in Splunk Web. The status and health metrics for each input are automatically captured and displayed both in the list of inputs on the Data Manager home page as well as with the pre-configured dashboards.

Benefits

Data manager service offers highly scalable, highly available, and low latency ingestion of data from pull based data sources
Data manager service does not consume compute (svc) resources for data pull operations.(Not inclusive of parsing/transformation)
Guided configuration reduces level of effort to configure data ingestion
Applies Splunk best practice (push/pull) ingestion methods
Built in dashboards provide insights into ingestion input health

Limitations

Only available in Splunk Cloud environments on AWS in supported regions
Not all data sources are supported by Data Manager

Technology add-on enabled pull architectures

The Splunk Add-on for AWS provides a way to collect data from AWS.

The Splunk Add-on for AWS is a technical add-on (TA) that provides a way to collect data from many AWS services as well as CIM-compatible knowledge to use with other Splunk apps. It is available for download on Splunkbase and is supported for both Splunk Cloud and Splunk Enterprise.

The Add-on simplifies the management of resources for the pull approach. It provides configurable inputs for many common AWS services via the UI in Splunk Web. This eliminates the need for customers to develop, deploy and maintain these scripts themselves. Many of these inputs have options for both incremental pulls via SQS/S3 as well as polling over defined intervals.

Customers using the push approach can benefit from the CIM-compatible knowledge that the add-on provides for both approaches. This feature removes the need for customers to develop parsing logic to properly and consistently format AWS data.

For customers on Splunk Cloud, the inputs can be configured on either their IDM (Classic Experience) or on their search head (Victoria Experience). For customers using Splunk Enterprise, the add-on is typically installed and configured on a heavy forwarder.

The Splunk Add-on for AWS also provides an API for managing account and input configuration. This provides opportunities for automation.

Benefits

Capable of utilizing private networks or VPC endpoints to retrieve data
Supports bulk/replay scenarios in addition to incremental pulls

Considerations/Limitations

Not recommended for data volumes greater than 1 TB/day
Can be resource intensive when running many scripts/inputs
Scaling requires additional infrastructure with Splunk Add-on for AWS and can become complicated to maintain in customer managed environments
Requires management of AWS security keys on a Splunk Cloud Platform hosted IDM or search head, when using the Splunk Add-on for AWS with Splunk Cloud

Single node pull collection

A single node Splunk Enterprise, IDM, or single node search head installation is suitable for low volume environments.

A single node Splunk Enterprise installation, IDM installation (Classic Experience), or single node search head installation is a validated collection architecture suitable for low volume environments.

Benefits

Simple configuration and management

Limitations

Can be CPU/resource intensive when pulling large volumes or data or with many inputs causing indexing pipelines and queues to block
Use of an inputs data manager or stand alone ad-hoc search head can create a single point of failure

Multiple node pull collection

A multiple node Splunk Enterprise or clustered search head installation in Splunk Cloud Victoria Experience is suitable for low and high volume environments.

A multiple node Splunk Enterprise installation or clustered search head installation in Splunk Cloud Victoria Experience is a validated collection architecture suitable for low and high volume environments.

Benefits

Supports higher volume data ingestion. (Data Manager Pull/Push based ingestion or Push based data ingestion recommended for volumes in excess of 1TB/day where supported). Performance Benchmark available in Splunk Add on for Amazon Web Services documentation.

Limitations

Input management across multiple nodes may be more complex in Splunk Enterprise Environment deployments
Pulling large volumes of data or an Add-On configuration with many inputs can be CPU/resource intensive causing indexing pipelines and queues to block

Customer-managed push architectures

Push architecture based on customer-managed data.

The push approach typically uses Amazon Data Firehose to stream data to a Splunk HTTP Event Collector (HEC) endpoint. Many AWS services support either direct integration with Amazon Data Firehose or support integration with Amazon CloudWatch Logs. Amazon CloudWatch Logs then supports forwarding logs via Data Firehose by configuring subscriptions. The key difference between this customer-managed approach and Data Manager is that in the customer-managed push architecture, the configuration and management of all components must be manually performed (or automated) by an AWS administrator without the aid of the Cloud Formation templates provided by Data Manager. Similar data ingestion may be achieved, but requires manual effort.

In cases where Data Firehose integration is not possible, an alternative approach is to deploy a script, typically as a Lambda function, to pull the required data via the AWS API and then push the data to a Splunk HEC endpoint.

Benefits

More scalable than the pull approach and is primarily constrained by ingestion throughput capacity of indexing infrastructure
Lower latency data ingest than the pull approach
Utilizes managed services, requiring less management and maintenance overhead

Limitations

Can cost more than pull based methods for smaller data environments (generally <1TB per day)
Customization requires developer knowledge in one of the supported programming language
Data sources may require custom parsing for proper formatting and consistency when not using the Splunk Add-on for AWS
Not all data sources support push-based ingestion

Implementation recommendations

More information is found in the Service Recommendations Lantern article.

Please see the Following best practices for ingesting data from your AWS environment article, available on the Splunk Lantern Customer Success Center, for data source-specific best practice guidance for getting AWS data into the Splunk platform.

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

Enterprise Security

SOAR

IT Service Intelligence

Content Packs

Splunk Observability Cloud

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

Developer Documentation

Splunkbase

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

DATA MANAGEMENT

SEARCH AND ANALYTICS

ADMINISTRATION

Enterprise Security

SOAR

ENTERPRISE SECURITY

SOAR

RELATED APPS

IT Service Intelligence

Content Packs

ITSI

IT Ops

ADMINISTRATION

EXTENSIONS

Splunk Observability Cloud

MONITORING

DATA MANAGEMENT

ADMINISTRATION

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

ESSENTIALS

MONITORING

ADMINISTRATION

Developer Documentation

Splunkbase

PLATFORM

OBSERVABILITY

REFERENCE

Resources

REFERENCE

Learn More

Support

Getting AWS data into the Splunk platform

Introduction

General overview

Push approach

Benefits

Considerations/Limitations

Pull approach

Benefits

Considerations/Limitations

Architecture considerations

Data Manager

Benefits

Limitations

Technology add-on enabled pull architectures

Benefits

Considerations/Limitations

Single node pull collection

Benefits

Limitations

Multiple node pull collection

Benefits

Limitations

Customer-managed push architectures

Benefits

Limitations

Implementation recommendations