S3 path format
What you can define when adding a dataset
When you configure a dataset, use the options in the Edit dataset window to specify the location of your AWS S3 data.
Begin by selecting the appropriate source type from the drop-down menu. This informs Data Manager about the type of the data being ingested.
- Directly enter S3 paths This method is suitable for scenarios involving a smaller number of S3 paths. You can manually input the specific S3 locations where your data is stored. For example, an S3 path might looks like this:
To learn more about S3 paths, in the AWS documentation enter "Accessing an Amazon S3" in the search field.s3://your-bucket-name/optional-prefix/AWSLogs/123456789012/CloudTrail/us-east-1/2023/01/01 Use the S3 path builder If you need to define many S3 paths, it is more convenient to use this method. It functions as a dynamic template, enabling the automatic generation of multiple S3 paths based on token values that you define.
How to use the S3 path builder
In the S3 path format field, you can define a structure of your S3 paths. Data Manager provides templates based on your selected source type, which can be customized to precisely match your data organization.
s3://{bucket_name}/{optional_prefix}/AWSLogs/{account_id}/CloudTrail/{region}/{year}/{month}/{day} It illustrates the use of partition tokens. Partition tokens are enclosed in curly braces and they are placeholders for variable segments within your S3 paths. They are crucial for dynamically identifying and accessing data distributed across various S3 buckets, AWS accounts, regions, or time periods.- String tokens include
bucket_name,account_id, orregion. For these partition tokens, you directly enter the specific values in the Partition token values field. You can enter as many values as you need. Data Manger uses these values to construct the paths. - Time tokens include
year,month, orday. For time tokens, the actual S3 paths will be automatically generated by Data Manager based on the time range you configure after saving your dataset.
| Partition token name | Partition token type | Values |
|---|---|---|
| bucket_name | string | Enter the name of your AWS S3 bucket. You can enter as many coma separated values as you need. |
| account_id | string | Enter AWD account ID. You can enter as many coma separated values as you need. |
| region | string | Enter the supported regions for data ingestion from AWS. For the list of supported regions for data ingestion from AWS, see Supported regions for data ingestion |
| year | time: %Y | After confirming the path format, in the Select time range section, you have to define the start time and end time. Splunk will only apply time range to the token you use. |
| month | time: %m | After confirming the path format, in the Select time range section, you have to define the start time and end time. Splunk will only apply time range to the token you use. |
| day | time: %d | After confirming the path format, in the Select time range section, you have to define the start time and end time. Splunk will only apply time range to the token you use. |