Define an Amazon S3 dataset
Define the preliminary settings for an Amazon S3 dataset, including its name, connection, location, and dataset type.
After you define an Amazon S3 connection, you define Amazon S3 datasets for use in federated searches.
Each dataset you define is backed up by a data catalog. The data catalog enables efficient federated searches of the Amazon S3 dataset that it represents. Depending on the type of dataset you create, your dataset can be backed by a catalog you own and operate, such as an AWS Glue or Apache Iceberg REST catalog, or it can be backed by a Splunk catalog that is maintained by Splunk software.
This task guides you through preliminary definition steps for an Amazon S3 dataset, which include deciding whether the dataset will facilitate data routing and federated search, or just federated search.
- Your Splunk Cloud Platform deployment must be on version 10.4.2604 or higher.
- Your user account on the Splunk Cloud Platform deployment must have a role with the
edit_datasetsandedit_federated_providerscapabilities. See Define roles on the Splunk platform with capabilities in the Splunk Cloud Platform Manage Users and Security manual. - You must have an Amazon Web Services (AWS) account and an AWS IAM role with permissions that let you attach and modify custom trust policies and permissions policies for IAM roles. Contact your AWS administrator for assistance with AWS permissions. See IAM role creation in the AWS Identity and Access Management User Guide.
Proceed to the next step depending on the dataset type you selected and the type of catalog you're using, if you are creating a dataset that supports only federated search:
- For Data Routing and Federated Search datasets, see Create an Amazon S3 dataset for data routing and federated search.
- For Federated Search datasets referenced by an AWS Glue catalog, see Create an Amazon S3 dataset for federated search that is backed by an AWS Glue catalog table.
- For Federated Search datasets referenced by an Apache Iceberg REST catalog, see Create an Amazon S3 dataset for federated search that is backed by an Iceberg REST catalog.
- For Federated Search datasets referenced by a Splunk-native catalog, see Create an Amazon S3 dataset for federated search that is backed by a Splunk-native data catalog.