Create an Amazon S3 dataset for federated search that is backed by an AWS Glue catalog table

Define a federated search dataset that is referenced by a table in an AWS Glue catalog that you maintain.

Define a federated search dataset that is referenced by a table in an AWS Glue catalog that you maintain.
  • Your Splunk Cloud Platform deployment must be on version 10.4.2604 or higher.
  • Your user account on the Splunk Cloud Platform deployment must have a role with the edit_datasets and edit_federated_providers capabilities. See Define roles on the Splunk platform with capabilities in the Splunk Cloud Platform Manage Users and Security manual.
  • You must have an AWS Glue table that refers to the dataset at the Amazon S3 location you supplied in the Define dataset step. See Define an Amazon S3 dataset.
Note: The AWS Glue Apache Iceberg REST catalog interface is not supported. If you want to use AWS Glue in conjunction with Apache Iceberg, select AWS Glue as the type of catalog you want to represent your dataset and set Iceberg as the catalog's table format.
  1. On your Splunk Cloud Platform deployment, in the Data Management app, at the Configure dataset step of the Create dataset workflow, select AWS Glue as the type of catalog you use to represent your dataset.
  2. Use the following table to specify the settings for the AWS Glue table that references your dataset.
    Setting Description
    AWS Glue catalog ID By default, the 12-digit AWS account ID of the AWS account where your dataset is hosted.
    AWS Glue database Enter the name of the AWS Glue database that contains the AWS Glue table. The name can contain only lowercase letters, numbers, and underscores. An AWS Glue database name can have no more than 255 characters.
    AWS Glue table Supply the name of the AWS Glue table that references your dataset. The name can contain only lowercase letters, numbers, and underscores. An AWS Glue table name can have no more than 255 characters.

    The AWS Glue table must belong to the AWS Glue database. It must also reference the Amazon S3 location path specified by Amazon S3 location.

  3. Identify the format of your AWS Glue table.
    • Select Iceberg if your Glue table supports the Apache Iceberg format.
    • Select Delta if your Glue table supports the Delta Lake format.
    • Select Non-table format if your Glue table is a traditional external table over files such as Parquet or JSON.
    AWS Glue tables that support the Apache Iceberg and Delta Lake formats can support automated updates and ACID transactions.
    AWS Glue tables that use non-table format map directly to a directory structure on Amazon S3, such as s3://bucket/table/year=2026/month=01/. When files are added to the dataset, you must run the AWS Glue crawler to update the catalog.
  4. (Optional) Select Define the time field if the AWS Glue table refers to a dataset that contains time-series data and you plan to use time-based filtering or SPL2 time functions when you run federated searches over it.
  5. If you select Define the time field, fill out the Time settings: Time field, Time format, and Unix time field.
    For more information, see Identify the dataset time field.
  6. If your dataset is partitioned into data subsets by time, identify Time partition fields to improve search performance and reduce search cost.
    For more information, see Identify time partitions.
  7. Select Next to move on to the Update policies step.
Go to Apply the dataset resource access policy to an AWS IAM role to carry out the final steps of Amazon S3 dataset creation.