Send data from Ingest Processor to Amazon S3
To send data from Ingest Processor to an Amazon S3 bucket, you must first add an Amazon S3 destination to the Data Management service.
You can then create a pipeline that uses that destination. When you apply that pipeline, the Ingest Processor starts sending data that it receives to your Amazon S3 bucket.
In Amazon S3, the data from your Ingest Processor is identified by an object key name that is constructed using auto-generated values from the Ingest Processor and some of the values that you specify in the destination configuration.
Amazon S3 destinations configured through the Data Management service can be used by both the Ingest Processor and Edge Processor solutions.
Supported authentication modes
Ingest Processor supports the following modes of authentication for exporting data to Amazon S3.
- Access key pairings (access key ID (for example, AKIAIOSFODNN7EXAMPLE)/secret access key (for example, wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY))
- IAM-based role authorization
Supported file compression types and data formats
Ingest Processor supports the following file types and data formats for exporting data to Amazon S3.
- new-line JSON (required by AWS Glue)
- Parquet (version 2.5.0 or higher)
How the Ingest Processor constructs object key names
When you send data from Ingest Processor to an Amazon S3 bucket, that data is identified using an object key name with the following format: <bucket_name>/<folder_name>/<year>/<month>/<day>/<instance_ID>/<file_prefix>-<UUID>.<extension>
When you create your Amazon S3 destination, you specify the bucket name, folder name, file prefix, and file extension to be used in this object key name. The instance ID is data-processor
and the Ingest Processor automatically generates the date partitions and the UUID (universally unique identifier).
For example, if you send data to Amazon S3 on October 31, 2022 using a destination that has the following configurations:
- Bucket name: IngestProcessor
- Folder name: FromUniversalForwarder
- File prefix: TestData
- Output data format: JSON (Splunk HTTP Event Collector schema)
- Compression type: Gzip
Your data in Amazon S3 would be associated with an object key name like in the following example: IngestProcessor/FromUniversalForwarder/year=2022/month=10/day=31/instanceId=data-processor/TestData-3ac12345-3b6f-12ed-78d6-0242ec110002.json.gz
.
AWS prerequisites for Ingest Processor
The Amazon S3 bucket that you want to send data to must have Object Lock turned off. For information about the Object Lock feature, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lock.html in the Amazon Simple Storage Service (S3) User Guide.
Use an IAM role to delegate access to your organization's S3 bucket
Create a role in your AWS account that lets the Splunk software access an S3 bucket.
You can copy the necessary policy strings from the Add an Amazon S3 destination dataset page in your Ingest Processor tenant.
Find the custom trust policy provided in your Ingest Processor tenant so that you can copy and use it to create an IAM role. Do the following:
In a browser, log in to your Ingest Processor tenant.
Navigate to the Destinations page, then select New destination and then Amazon S3.
On the Add an Amazon S3 destination dataset page, in the Amazon S3 general settings area, select Authenticate using IAM role.
In the Configuration setup instructions area of the page that appears, select Set up for use with Ingest Processor.
Configuration instructions appear after the Set up for use with Ingest Processor option.
Copy the policy string from step 2 of those configuration instructions.
Note: Keep this browser tab open on the Add an Amazon S3 destination dataset page. You will need to refer to the Configuration setup instructions on this page again during a later step.In a new browser tab, log in to the AWS Console and then navigate to AWS Identity and Access Management (IAM).
Select Create role.
On the Select trusted entity page, in the Trusted entity type section, choose Custom trust policy.
In the Custom trust policy field, paste the policy string that you copied during step 1e. Then, select Next.
On the Add permissions page, select Next without making any changes.
On the Name, review, and create page, do the following:
Name the role using the following format:
SplunkDataProcessor-S3-${rolename}
. For example,SplunkDataProcessor-S3-demo
.Review the Trust policy to make sure that it matches your deployment's information.
Select Create role.
Next, create an inline policy. To create an inline policy, navigate to the IAM Roles page.
On the IAM Roles page, navigate to your recently created
SplunkDataProcessor-S3-demo
role and then select it.On the
SplunkDataProcessor-S3-demo
page, in the Permissions policies section, select Add permissions and then Create inline policy.On the Specify permissions page, in the Policy editor section, select JSON.
Copy the inline policy from your Ingest Processor tenant. Do the following:
Return to the browser tab where the Add an Amazon S3 destination dataset page is open.
In the Configuration setup instructions area of the page, copy the policy string from step 4.
Return to the AWS Console in your other browser tab, and then paste the policy string into the JSON policy editor. Then, select Next.
On the Review and create page, do the following:
In the Policy details section, in the Policy name field, name your policy. For example, SplunkIngestProcessorS3WriteOnlyPermission1.
Select Create policy.
On the
SplunkDataProcessor-S3-demo
page, in the Summary section, copy the ARN. You’ll need to provide this IAM user role ARN when configuring your Amazon S3 destination in the Ingest Processor tenant.
Next, complete the steps in the next section to configure your Amazon S3 destination.
Destination configuration steps
- In the Data Management service, select Destinations.
- On the Destinations page, select New destination > Amazon S3.
- Provide a name and description for your destination:
Field Description Name
A unique name for your destination.
Description
(Optional) A description of your destination.
- Specify the object key name that you want to use to identify your data in the Amazon S3 bucket. See How the Ingest Processor constructs object key names for more information.
Field Description Bucket Name
The name of the bucket that you want to send your data to.
The Ingest Processor uses this bucket name as a prefix in the object key name.
Folder name
(Optional) The name or path of a folder in the bucket where you want to store your data. You can choose to specify a folder name, such as
folder_1
, or specify multiple folder levels by entering a path such asfolder_1/folder_2/folder_3
.In the object key name, the Ingest Processor includes this folder name or path after the bucket name and before a set of auto-generated timestamp partitions.
File prefix
(Optional) The file name that you want to use to identify your data.
In the object key name, the Ingest Processor includes this file prefix after the auto-generated timestamp partitions and before an auto-generated UUID value.
Output data formats
The format and file type that you want to use to store your data in the Amazon S3 bucket. Select one of the following options:
- JSON (Splunk HTTP Event Collector schema): Store your data as .json files. The contents of these .json files are formatted into the event schema that's supported by the Splunk HTTP Event Collector. See Event metadata in the Splunk Cloud Platform Getting Data In manual.
- Parquet: Store your data as .parquet files.
Compression type
The compression format for your data. Select one of the following options:
- Uncompressed: Your data is not compressed.
- Gzip: Compress your data using gzip. For JSON format, the Ingest Processor uses file-level compression and changes the resulting file extension to .json.gz. For Parquet format, the Ingest Processor uses in-file compression and the file extension remains as .parquet.
- Specify the AWS region and authentication method to allow this destination to connect with your Amazon S3 bucket.
Field Description Region
The AWS region that your bucket is associated with.
Authentication
The method for authenticating the connection between your Ingest Processor and your Amazon S3 bucket.
AWS access key ID
The access key ID for your IAM user.
This field is available only when Authentication is set to Authenticate using access key ID and secret access key.
AWS secret access key
The secret access key for your IAM user.
This field is available only when Authentication is set to Authenticate using access key ID and secret access key.
Customer role ARN
The role ARN for your IAM user.
This field is available only when Authentication is set to Authenticate using IAM role.
For information about how to create this role and verify the ARN, see Use an IAM role to delegate access to your organization's S3 bucket.
- (Optional) To adjust the maximum number of records that this destination sends in each batch of output data, expand Advanced settings and enter your desired maximum number of records in the Batch size field. Note: In most cases, the default Batch size value is sufficient. Be aware that the actual size of each batch can vary depending on the rate at which the Ingest Processor is sending out data.
- To finish adding the destination, select Add.
You now have a destination that you can use to send data from Ingest Processor to an Amazon S3 bucket.
To start sending data from Ingest Processor to the Amazon S3 bucket specified in the destination, create a pipeline that uses the destination you just added and then apply that pipeline to your Ingest Processor.