Map an Amazon S3 federated index to a customer-created AWS Glue table
After you define an Amazon S3 federated provider for your Splunk Cloud Platform deployment, you create federated indexes for use in federated searches. Each federated index you create maps to a specific AWS Glue table, which in turn references an Amazon S3 dataset. You invoke federated indexes in your federated searches to tell Splunk software which Amazon S3 dataset you intend to search.
The Splunk platform creates federated indexes on the search head of your Splunk Cloud Platform deployment.
This task guides you through the process of creating a federated index that maps to a customer-created AWS Glue table. This will be an AWS Glue table that you have already created and which you have identified in the federated provider definition.
In this task, you do these things:
- Provide the name of the federated index.
- Select the AWS Glue table (customer created) dataset type.
- Identify the AWS Glue table that the federated index maps to.
- Optionally provide time series settings if your data includes time series information and you want to make use of time functions in your searches.
- Identify time partition settings, if you have partitioned your dataset into time-based subsets.
You can map a federated index to only one remote dataset at a time. If a federated provider lists several AWS Glue tables over which you want to run federated searches, define a separate federated index for each AWS Glue table.
Prerequisites
- A role on your Splunk Cloud Platform deployment that has the admin_all_objects capability.
- You must have an AWS Glue table that refers to data you store in Amazon S3. See Create an AWS Glue table.
- You must define an Amazon S3 federated provider that supports AWS Glue tables. See Define Amazon S3 federated provider details.
Steps
- On your Splunk Cloud Platform deployment, in Splunk Web, at the Set up federated index step of the Add a new Amazon S3 provider workflow, use the following table to specify the settings for your federated index.
Note: You might also come to this collection of new federated index settings when you edit a federated provider or select Add federated index on the Federated indexes list page.
Setting Description Federated index name Enter a unique name for the federated index. Federated index names have the following restrictions: - They can contain only letters, numbers, underscores, and hyphens.
- They must begin with a letter or number.
- They cannot be more than 2,048 characters in length.
- They cannot be named kvstore. You can use this string in a longer name, like abc_kvstore.
Dataset type Select AWS Glue table (Customer created). Dataset name Select the name of the AWS Glue table to which you want the federated index to map. If the federated provider that the federated index is associated with has specific AWS Glue tables listed in AWS Glue tables, a drop-down list with those tables appears here, and you can select a table from it.If you select a Dataset name value that uses a wildcard to capture multiple table names, a Specify the dataset field appears. Enter the full name of the AWS Glue table that you want to base the federated index upon into the Specify the dataset field. Make sure that the table name is correct and that it is covered by the Glue Data Catalog resource policy you generated when you created the federated provider associated with the federated index. Time settings not required (Optional) Select Time settings not required if the AWS Glue table does not contain time-series data and you do not intend to use time-based fields and functions when you search it. Time field If your federated index definition requires time settings, enter the name of the field that acts as an event timestamp in the selected AWS Glue table. The time field can contain only lowercase letters, numbers, underscores, and dot characters ( . ). Surround time fields that contain dot characters but which are not nested fields with single quote characters. See "Special handling for sdselect syntax elements" in sdselect command usage. Time format If your federated index definition requires time settings, provide a time format variable or custom time format variable string that matches the Time field. You can set the following values for Time format: - Set %s when you have UNIX time values with the
string
data type. - Set %UT when you have UNIX time values with the
numeric
data type. - Set %ST when you have values with the SQL
timestamp
data type. - Set a custom string of time format variables when you have values that follow a specific
string
time format, such as 04-29-2023 11:45:22 PM. For more information and examples of time format strings, see Date and time format variables in the Splunk Cloud Platform Search Reference.
Note: %UT and %ST are not among the standard set of Splunk platform time format variables. Use them only in the context of Federated Search for Amazon S3.You can optionally append the %Q time format variable to time format variables to capture subsecond timestamps, such as milliseconds (%3Q), microseconds (%6Q), and nanoseconds (%9Q). For example, for a time field in numeric-typed UNIX time format with a nanosecond component, use %UT.%9Q, or %UT%9Q if you do not need to separate the subsecond component from the UNIX time value with a dot character ( . ).
The
sdselect
command does not support the following time format variables: %c, %+, %Ez, %k, %X, and %x.Unix time field If your federated index definition requires time settings, Unix time field provides an alias for the Time field that Splunk software converts into numeric UNIX time format at search time. Insert the Unix time field into federated searches that require numeric UNIX time field values, or when you want to see your time field in numeric UNIX time format in the search results. Unix time field defaults to _time
. In Splunk Web, the values of_time
always display in human-readable format, unless you are aggregating on the_time
field. For example,(avg)_time
returns values in numeric UNIX time format.Note: If_time
already exists as a field name in your Glue table, give the Unix time field a value other than_time
.For more information and examples that show how usage of the Unix time field in an
sdselect
search changes depending on the format of the Time field, see Use time fields in sdselect searches.Time partition settings Improve search performance and reduce search cost by identifying time partition fields in the federated index definition. Do this only if the federated index maps to a AWS Glue table that you have partitioned into data subsets by time, such as by year, month, and day. For more information about these settings, see Optimize searches of Amazon S3 datasets by identifying time partition fields. - Select Save to save the federated index configuration.
- (Optional) Give your users access to the federated index. To run searches over the remote dataset to which the federated index maps, your users must have access permissions for the federated index. See Give your users role-based access control of federated indexes.
Splunk software creates the federated index on the search head of your Splunk Cloud Platform deployment.
In Splunk Web, you can view the federated indexes that you create for your deployment by selecting Settings, then Federated search and then the Federated indexes tab.
Optimize searches of Amazon S3 datasets by identifying time partition fields
Partitioning is an organization strategy for large datasets that makes it possible for you to search them efficiently. When you partition your data, you organize it into a hierarchical directory structure based on the distinct values of 1 or more fields in the data.
For example, you might partition your application logs in Amazon S3 by date, breaking them down by year, month, and day. Then you can place files corresponding to a single day's worth of data in an Amazon S3 path like s3://my_bucket/logs/year=2022/month=08/day=23/.
If you generate a AWS Glue table that references a partitioned dataset in Amazon S3, you can map a federated index definition to that dataset and then identify the time fields that determine the hierarchical structure of the data partitions. When you identify the time partition fields in the federated index definition, your searches of that dataset become more efficient and cost effective.
When you define time partition filters for a federated index, you begin by identifying the first level field in the time field hierarchy. Then you identify the second level field, and so on. For example, if your federated index maps to a dataset that you have partitioned by year, month, and day, you identify year as the time partition field for the Level 1 filter, month as the time partition field for the Level 2 filter, and day as the time partition field for the Level 3 filter.
Steps
- In your federated index definition, under Time partition settings, select the Time zone that applies to your time partition fields. You must choose a Time zone if you define one or more time partition filter levels.
- Select Add filter.
- Identify the Level 1 time field by which you have partitioned your data. This is the highest level of partitioning you use. Specify values for the following fields:
Time partition setting Description Time partition field Provide the name of the time field that is the partition key for the indicated partition filter level. Values for the Time partition field can contain only lowercase letters, numbers, and underscores. Time format Provide a time format string for the indicated Time partition field. Compose this time format string out of Splunk-supported time format variables. For more information and examples see Date and time format variables in the Search Reference. The following time format variables are not supported: %c, %+, %Ez, %k, %X, and %x. Data type Select the data type of the Time partition field. Your options are String, Integer, and Date. - If you have another partition key in your AWS Glue table, you can create another partition filter level based on it. Select Add filter and identify the filter's Time partition field, Time format, and Data type. Repeat this step until you have defined a partition filter level for each partition key in your AWS Glue table.
- Select Save to save the federated index configuration.
Search your Amazon S3 datasets
After you set up federated indexes that map to AWS Glue tables, you can use the sdselect
command to search the datasets those AWS Glue tables reference. See sdselect command overview.
Delete a federated index
You can delete a federated index that maps to an AWS Glue table that you no longer need to search. You can also delete federated indexes when your data scanning entitlements are depleted, to prevent unintentional usage.
Prerequisites
- A role on your Splunk Cloud Platform deployment that has the admin_all_objects capability.
- A federated index for Federated Search for Amazon S3 that you want to delete.
Steps
- On your Splunk Cloud Platform deployment, in Splunk Web, select Settings, then Federation.
- On the Federated indexes tab, identify a federated index that you want to delete.
- Select Delete for the index you want to delete.