Create an S3 destination

To write events to a remote storage volume, select a preconfigured S3 destination when you configure the "Route to Destination" rule, You can write to multiple S3 destinations. The "Immediately send to" field has a typeahead capability that displays all preconfigured S3 destinations.

Note: You must configure an S3 remote storage destination before using the destination in a "Route to Destination" rule.

You configure and validate S3 destinations through the Destinations tab on the Data Ingest page. Select New Destination and fill out the fields, following the examples provided there. You can create multiple S3 destinations.

Note: The bucket you designate as the S3 remote storage destination must be used only by ingest actions. Do not share buckets with other tools such as SmartStore and edge processors.

You can create a maximum of eight S3 destinations. When rulesets route to a destination that is invalid or does not exist, the Splunk Platform instance blocks all queues and pipelines and does not drop data.

Note: In the case of heavy forwarders managed through a deployment server, S3 destinations must be configured on each heavy forwarder individually, not on the deployment server.

Partition events

When creating an S3 destination, you can define a partitioning schema for events based on timestamp and optionally source type. The events then flow into a directory structure based on the schema.

Go to the "Partitioning" section of the New Destination configuration.. You can choose a partitioning schema through the drop-down menu. The choices are:

  • Day (YYYY/MM/DD)
  • Month (YYYY/MM)
  • Year (YYYY)
  • Legacy

The legacy setting is for use with pre-9.1 destinations only. With legacy, for each 2MB (by default) batch, the latest event timestamp in the batch identifies the folder using the format "YYYY/MM/DD". However, unlike the true partitioning options such as "day", the folder might also contain events with other timestamps, if its batch contains other timestamps.

In the case of destinations created pre-9.1, "legacy" is the default. In the case of destinations created in 9.1 and higher, "day" is the default.

You can also set source type as a secondary key. However, if you are using federated search for Amazon S3 with the AWS Glue Data Catalog integration, you need to make sure that your Glue Data Catalog tables do not include a duplicate entry for the sourcetype column.

For details on the partitioning methods and examples of the resulting paths, see the partitionBy setting in outputs.conf

Use KMS encryption (Splunk Cloud Platform only)

You can employ SSE-KMS encryption when using ingest actions to write data to customer-owned S3 buckets. This capability is enabled through the configuration of AWS cross-account IAM roles.

CAUTION: Take note of the following critical points:
* You are assuming ownership and full responsibility for the integrity and ongoing availability of your AWS KMS key.
* The KMS key is required for encrypting Splunk data in real-time.
* Loss of access to the KMS key can result in service interruption and/or permanent loss of data access by all parties (AWS, Splunk, and you).
* Unauthorized access to the KMS key can result in accidental or explicit key operations (such as key deactivation or deletion) that could lead to service disruption or permanent loss of data access by all parties (AWS, Splunk and you).
* You must maintain Splunk privileged access to the KMS key via Splunk-mandated key policy definitions.
* Keys must be in the same region as their Splunk Cloud stack. Multi-region keys are not supported.
* Key aliases are not supported.

To enable KMS encryption, create the SplunkIngestActions IAM role in your AWS account:

  1. Go to the IAM roles section in the AWS configuration UI.
  2. Create the exact role "SplunkIngestActions".
  3. Edit the permissions section for that role by adding an inline policy and overwriting the existing JSON with JSON created through the Generate Permission Policy button in the Splunk ingest actions UI. You can edit that JSON text as needed for your organization.
  4. Edit the trust relationship section by overwrite the existing JSON with JSON created through the Generate Trust Policy button in the Splunk ingest actions UI. You can edit this JSON text as needed for your organization.

Perform advanced configurations with outputs.conf

While Destinations on the Data Ingest page can handle most common S3 configuration needs, for some advanced configurations, you might need to directly edit outputs.conf, using the rfs stanza.

For a complete list of rfs settings, see Remote File System (RFS) Output. The remote filesystem settings and options for S3 are similar to the SmartStore S3 configuration.

Troubleshoot

To troubleshoot the S3 remote file system, search the _internal index for events from the RfsOutputProcessor and S3Client components. For example:

Key provisos

Note the following:

  • You can configure and use multiple S3 remote storage locations, up to a maximum of 8 destinations.
  • In the case of a Splunk Cloud Platform deployment, buckets must be in the same region as the deployment.
  • In the case of an indexer cluster, each remote storage configuration must be identical across the indexer cluster peers.
  • AWS has an upload limit of 5 GB for single objects. An attempt to upload an object greater than 5 GB will result in data loss. You will only encounter this limit if you set batchSizeThresholdKB in outputs.conf to a value that is greater than 5 GB.
  • The remote file system creates buckets similar to index buckets on the remote storage location. The bucket names include the peer GUID and date.
  • Remember to set the correct life cycle policies for your S3 buckets and their paths. This data will live forever by default unless removed.
  • For information on S3 authentication requirements, see SmartStore on S3 security strategies in Managing Indexers and Clusters of Indexers. Ingest actions requirements are similar.