S3 path format

What you can define when adding a dataset

When you configure a dataset, use the options in the Edit dataset window to specify the location of your AWS S3 data.

Begin by selecting the appropriate source type from the drop-down menu. This informs Data Manager about the type of the data being ingested.

Then you select how you want to provide the S3 path. You can select one of the following methods:

Directly enter S3 paths This method is suitable for scenarios involving a smaller number of S3 paths. You can manually input the specific S3 locations where your data is stored. For example, an S3 path might looks like this:
```
s3://your-bucket-name/optional-prefix/AWSLogs/123456789012/CloudTrail/us-east-1/2023/01/01
```
To learn more about S3 paths, in the AWS documentation enter "Accessing an Amazon S3" in the search field.
Use the S3 path builder If you need to define many S3 paths, it is more convenient to use this method. It functions as a dynamic template, enabling the automatic generation of multiple S3 paths based on token values that you define.

How to use the S3 path builder

In the S3 path format field, you can define a structure of your S3 paths. Data Manager provides templates based on your selected source type, which can be customized to precisely match your data organization.

A common template can look like this:

s3://{bucket_name}/{optional_prefix}/AWSLogs/{account_id}/CloudTrail/{region}/{year}/{month}/{day}

It illustrates the use of partition tokens. Partition tokens are enclosed in curly braces and they are placeholders for variable segments within your S3 paths. They are crucial for dynamically identifying and accessing data distributed across various S3 buckets, AWS accounts, regions, or time periods.

There are two types of the partition tokens:

String tokens include bucket_name, account_id, or region. For these partition tokens, you directly enter the specific values in the Partition token values field. You can enter as many values as you need. Data Manger uses these values to construct the paths.
Time tokens include year, month, or day. For time tokens, the actual S3 paths will be automatically generated by Data Manager based on the time range you configure after saving your dataset.

After defining the template in S3 path format, you can specify the type and values for each defined partition token. The following are examples of partition tokes and their values:


Partition token name	Partition token type	Values
bucket_name	string	Enter the name of your AWS S3 bucket. You can enter as many coma separated values as you need.
account_id	string	Enter AWD account ID. You can enter as many coma separated values as you need.
region	string	Enter the supported regions for data ingestion from AWS. For the list of supported regions for data ingestion from AWS, see Supported regions for data ingestion
year	time: %Y	After confirming the path format, in the Select time range section, you have to define the start time and end time. Splunk will only apply time range to the token you use.
month	time: %m	After confirming the path format, in the Select time range section, you have to define the start time and end time. Splunk will only apply time range to the token you use.
day	time: %d	After confirming the path format, in the Select time range section, you have to define the start time and end time. Splunk will only apply time range to the token you use.

If you choose to enter a custom path format, you can use partition tokens to define the format of each token. By default, each token is set to the string format. To change the format, use the drop-down lists to select the desired option.