Create an S3 destination
To write events to a remote storage volume, select a preconfigured S3 destination when you configure the "Route to Destination" rule, You can write to multiple S3 destinations. The "Immediately send to" field has a typeahead capability that displays all preconfigured S3 destinations.
You configure and validate S3 destinations through the Destinations tab on the Data Ingest page. Select New Destination and fill out the fields, following the examples provided there. You can create multiple S3 destinations.
You can create a maximum of eight S3 destinations. When rulesets route to a destination that is invalid or does not exist, the Splunk Platform instance blocks all queues and pipelines and does not drop data.
Partition events
When creating an S3 destination, you can define a partitioning schema for events based on timestamp and optionally source type. The events then flow into a directory structure based on the schema.
Go to the "Partitioning" section of the New Destination configuration.. You can choose a partitioning schema through the drop-down menu. The choices are:
- Day (YYYY/MM/DD)
- Month (YYYY/MM)
- Year (YYYY)
- Legacy
The legacy setting is for use with pre-9.1 destinations only. With legacy, for each 2MB (by default) batch, the latest event timestamp in the batch identifies the folder using the format "YYYY/MM/DD". However, unlike the true partitioning options such as "day", the folder might also contain events with other timestamps, if its batch contains other timestamps.
In the case of destinations created pre-9.1, "legacy" is the default. In the case of destinations created in 9.1 and higher, "day" is the default.
You can also set source type as a secondary key. However, if you are using federated search for Amazon S3 with the AWS Glue Data Catalog integration, you need to make sure that your Glue Data Catalog tables do not include a duplicate entry for the sourcetype column.
For details on the partitioning methods and examples of the resulting paths, see the partitionBy
setting in outputs.conf
Use KMS encryption (Splunk Cloud Platform only)
You can employ SSE-KMS encryption when using ingest actions to write data to customer-owned S3 buckets. This capability is enabled through the configuration of AWS cross-account IAM roles.
To enable KMS encryption, create the SplunkIngestActions IAM role in your AWS account:
- Go to the IAM roles section in the AWS configuration UI.
- Create the exact role "SplunkIngestActions".
- Edit the permissions section for that role by adding an inline policy and overwriting the existing JSON with JSON created through the Generate Permission Policy button in the Splunk ingest actions UI. You can edit that JSON text as needed for your organization.
- Edit the trust relationship section by overwrite the existing JSON with JSON created through the Generate Trust Policy button in the Splunk ingest actions UI. You can edit this JSON text as needed for your organization.
Perform advanced configurations with outputs.conf
While Destinations on the Data Ingest page can handle most common S3 configuration needs, for some advanced configurations, you might need to directly edit outputs.conf, using the rfs stanza.
For a complete list of rfs settings, see Remote File System (RFS) Output. The remote filesystem settings and options for S3 are similar to the SmartStore S3 configuration.
Troubleshoot
To troubleshoot the S3 remote file system, search the _internal
index for events from the RfsOutputProcessor and S3Client components. For example:
index="_internal" sourcetype="splunkd" (ERROR OR WARN) RfsOutputProcessor OR S3Client
Key provisos
Note the following:
- You can configure and use multiple S3 remote storage locations, up to a maximum of 8 destinations.
- In the case of a Splunk Cloud Platform deployment, buckets must be in the same region as the deployment.
- In the case of an indexer cluster, each remote storage configuration must be identical across the indexer cluster peers.
- AWS has an upload limit of 5 GB for single objects. An attempt to upload an object greater than 5 GB will result in data loss. You will only encounter this limit if you set
batchSizeThresholdKB
inoutputs.conf
to a value that is greater than 5 GB. - The remote file system creates buckets similar to index buckets on the remote storage location. The bucket names include the peer GUID and date.
- Remember to set the correct life cycle policies for your S3 buckets and their paths. This data will live forever by default unless removed.
- For information on S3 authentication requirements, see SmartStore on S3 security strategies in Managing Indexers and Clusters of Indexers. Ingest actions requirements are similar.
Output optimizations for federated search
Several output optimizations have been introduced in Splunk Enterprise 9.1 and Splunk Cloud 9.0.2303. The changes affect only new destinations.
These behaviors are:
- Events are delimited with a new line.
- Index-time fields are output automatically.
- Compression type is set to "gzip".
- Batch Size is set to 128 MB (131072 KB).
These settings are turned on by default but can be turned off in the UI.
In addition:
- A raw option is now available for JSON output. It gives you full flexibility to output events in whatever form you want.
- The ingest actions feature outputs a new default field "index". It only outputs the field if you explicitly set an index with the Set Index rule.