Create an Amazon S3 dataset for federated search that is backed by an Iceberg REST catalog

Define a federated search dataset that is referenced by a table in an Apache Iceberg REST catalog that you maintain.

Define a federated search dataset that is referenced by a table in an Apache Iceberg REST catalog that you maintain.
  • Your Splunk Cloud Platform deployment must be on version 10.4.2604 or higher.
  • Your user account on the Splunk Cloud Platform deployment must have a role with the edit_datasets and edit_federated_providers capabilities. See Define roles on the Splunk platform with capabilities in the Splunk Cloud Platform Manage Users and Security manual.
  • You must have an Apache Iceberg REST catalog that refers to the dataset at the Amazon S3 location you supplied in the Define dataset step. See Define an Amazon S3 dataset.
  • The dataset must have an Amazon S3 connection definition that includes the Catalog URL of your Apache Iceberg REST catalog. See Create an Amazon S3 connection.
  1. On your Splunk Cloud Platform deployment, in the Data Management app, at the Configure dataset step of the Create dataset workflow, select Apache Iceberg REST as the type of catalog you use to represent your dataset.
  2. Specify the following settings for the Apache Iceberg REST catalog table that references your dataset.
    Setting Description
    Catalog name A unique, user-defined name for the specific Iceberg catalog instance you are connecting to.
    Table namespace A logical grouping that organizes a set of tables within the Iceberg REST catalog, similar to databases or schemas in traditional SQL systems.
    Table namespaces are hierarchical. A table namespace can be a top level namespace, such as sales, or from a lower level in the namespace hierarchy separated by dot characters, such as finops.billing.west_region.
    Table name The specific name for the Iceberg catalog table that you want to run federated searches over. The Table name points to the dataset stored at the Amazon S3 location and must belong to the Table namespace.
  3. (Optional) Select Define the time field if this dataset contains time-series data and you plan to use time-based filtering or SPL2 time functions when you run federated searches over it.
  4. If you select Define the time field, fill out the Time settings: Time field, Time format, and Unix time field.
    For more information, see Identify the dataset time field.
  5. If your dataset is partitioned into data subsets by time, identify Time partition fields to improve search performance and reduce search cost.
    For more information, see Identify time partitions.
  6. Select Next to move on to the Update policies step.
Go to Apply the dataset resource access policy to an AWS IAM role to carry out the final steps of Amazon S3 dataset creation.