Define a federated search dataset that is referenced by a table in an AWS Glue catalog that you maintain.
Define a federated search dataset that is referenced by a table in an AWS Glue catalog that you maintain.
- Your Splunk Cloud Platform deployment must be on version 10.4.2604 or higher.
- Your user account on the Splunk Cloud Platform deployment must have a role with the
edit_datasets and edit_federated_providers capabilities. See Define roles on the Splunk platform with capabilities in the Splunk Cloud Platform Manage Users and Security manual.
- You must have an AWS Glue table that refers to the dataset at the Amazon S3 location you supplied in the Define dataset step. See Define an Amazon S3 dataset.
Note: The AWS Glue Apache Iceberg REST catalog interface is not supported. If you want to use AWS Glue in conjunction with Apache Iceberg, select AWS Glue as the type of catalog you want to represent your dataset and set Iceberg as the catalog's table format.
- On your Splunk Cloud Platform deployment, in the Data Management app, at the Configure dataset step of the Create dataset workflow, select AWS Glue as the type of catalog you use to represent your dataset.
- Use the following table to specify the settings for the AWS Glue table that references your dataset.
- Identify the format of your AWS Glue table.
- Select Iceberg if your Glue table supports the Apache Iceberg format.
- Select Delta if your Glue table supports the Delta Lake format.
- Select Non-table format if your Glue table is a traditional external table over files such as Parquet or JSON.
AWS Glue tables that support the Apache Iceberg and Delta Lake formats can support automated updates and ACID transactions.
AWS Glue tables that use non-table format map directly to a directory structure on Amazon S3, such as s3://bucket/table/year=2026/month=01/. When files are added to the dataset, you must run the AWS Glue crawler to update the catalog.
- (Optional) Select Define the time field if the AWS Glue table refers to a dataset that contains time-series data and you plan to use time-based filtering or SPL2 time functions when you run federated searches over it.
- If you select Define the time field, fill out the Time settings: Time field, Time format, and Unix time field.
- If your dataset is partitioned into data subsets by time, identify Time partition fields to improve search performance and reduce search cost.
- Select Next to move on to the Update policies step.