Map an Amazon S3 federated index to a Splunk-managed AWS Glue table for an AWS CloudTrail log dataset

Note: This topic covers the Update policies step of the workflow for adding a new Amazon S3 federated provider. You cannot follow this step until you complete the steps that precede it in the workflow. See the checklist of tasks to set up Federated Search for Amazon S3.
This topic shows you how to create an Amazon S3 federated index that maps to a Splunk-managed AWS Glue table for an AWS CloudTrail log dataset, so you can run federated searches over that data.
If you want to search Amazon S3 datasets composed of default format VPC flow log data, see Map an Amazon S3 federated index to a Splunk-managed AWS Glue table for a default format VPC flow log dataset.
If you have manually created AWS Glue tables for your Amazon S3 datasets, see Map a federated index to a customer-created AWS Glue table.

After you define an Amazon S3 federated provider for your Splunk Cloud Platform deployment, you create federated indexes for use in federated searches. Each federated index you create maps to a specific AWS Glue table, which in turn references an Amazon S3 dataset. You invoke federated indexes in your federated searches to tell Splunk software which Amazon S3 dataset you intend to search.

The Splunk platform creates federated indexes on the search head of your Splunk Cloud Platform deployment.

This task guides you through the process of creating a federated index that maps to a Splunk-managed AWS Glue table for an AWS CloudTrail log dataset. Splunk software creates an AWS Glue table based on the information you provide in this task, and manages it thereafter.

In this task, you do these things:

  • Provide the name of the federated index.
  • Select the AWS Glue table (Splunk managed: CloudTrail) dataset type.
  • Supply an Amazon S3 location path that points to the AWS CloudTrail log dataset to which this federated index will be mapped.
  • Provide the maximum relative time range for searches of the dataset.
  • List the AWS Account ID and AWS Region values that can be used as partition keys for your searches of the AWS CloudTrail log dataset.

You can map a federated index to only one AWS CloudTrail log dataset at a time. If a federated provider has Amazon S3 locations for several AWS CloudTrail log datasets over which you want to run federated searches, define a separate federated index for each AWS CloudTrail log dataset.

Prerequisites

  • A role on your Splunk Cloud Platform deployment that has the admin_all_objects capability.
  • Datasets in your Amazon S3 buckets that are composed entirely of AWS CloudTrail data.
  • You must have already defined an Amazon S3 federated provider that is set up for the creation of Splunk-managed AWS Glue tables. See Define an Amazon S3 federated provider.

Steps

  1. On your Splunk Cloud Platform deployment, in Splunk Web, at the Set up federated index step of the Add a new Amazon S3 provider workflow, use the following table to specify the settings for your federated index.

    Note: You might also come to this collection of new federated index settings when you edit a federated provider or select Add federated index on the Federated indexes list page.
    Setting Description
    Federated index name Enter a unique name for the federated index.

    Federated index names have the following restrictions:

    • They can contain only letters, numbers, underscores, and hyphens.
    • They must begin with a letter or number.
    • They cannot be more than 2,048 characters in length.
    • They cannot be named kvstore. You can use this string in a longer name, like abc_kvstore.
    Dataset type Select AWS Glue table (Splunk managed: CloudTrail).
    Amazon S3 location Select the Amazon S3 location path for the AWS CloudTrail log dataset that you will search with this federated index. Splunk software will create an AWS Glue table which represents this dataset, and the federated index will map to that AWS Glue table.

    Amazon S3 location lets you select from a set of Amazon S3 location paths that is equivalent to the list of Amazon S3 locations in the definition for the federated provider the federated index is associated with. For AWS CloudTrail log datasets, the location path must end at the AWSLogs/ folder.

    If the location path you seek does not end at the AWSLogs/ folder you may need to fix it in the definition for the federated provider with which this index is associated. See Supply correctly-formatted Amazon S3 locations for Splunk-managed AWS Glue table generation.

    Time settings The time settings define the time field for the dataset to which the federated index maps. Because AWS CloudTrail log datasets have a stable schema, the time settings have default values that you cannot change.

    The default event Time field is eventtime.

    Time partitions The time partition settings determine the fields by which the dataset to which the federated index maps is partitioned by time. Because AWS CloudTrail log datasets have a stable schema, we can provide one default partition key that represents the three partition keys supported by CloudTrail log datasets: year, month, and day.

    The default Time partition field is pk_timestamp.

    Max search time range Specify the maximum relative time range within which searches of the AWS CloudTrail log dataset return results.

    Max search time range applies to the time partitions in your data. For example, if you set a search that looks for the last 3 years in terms of time partitions and Max search time range is set to 1 year, your search returns results only for data within the last year partition.

    Federated searches with time ranges of 2 years or more might suffer from reduced search performance. If you occasionally need to run searches over data that is older than the Max search time range, consider setting up additional federated indexes with larger Max search time range values.
    For example, you might run most of your searches over a federated index with a Max search time range of 1 year. But you very occasionally have to run searches over data that is between 1 and 2 years of age, and for those searches you can set up a second federated index with a Max search time range of 2 years.

    AWS account IDs Provide the 12-digit AWS account IDs by which the AWS CloudTrail log dataset to which this federated index maps is partitioned. You must provide at least 1 AWS account ID.

    Alternatively, you can provide a wildcard symbol (*) to partition the dataset by all available AWS account IDs.

    Note: When you use a wildcard symbol for AWS account IDs in a federated index definition, you must include a WHERE clause that filters results by pk_account_id when you invoke that federated index in an sdselect search.
    See sdselect command WHERE clause operations.

    For more information about obtaining AWS account IDs by which AWS CloudTrail log datasets are partitioned, see Identify partitions to optimize searches of AWS CloudTrail log datasets.

    AWS regions Provide the AWS region by which the AWS CloudTrail log dataset to which this federated index maps is partitioned. You must provide at least 1 AWS region.

    Alternatively, you can provide a wildcard symbol (*) to partition the dataset by all available AWS regions.

    Note: When you use a wildcard symbol for AWS regions in a federated index definition, you must include a WHERE clause that filters results by pk_region when you invoke that federated index in an sdselect search.
    See sdselect command WHERE clause operations.

    For more information about obtaining AWS regions by which AWS CloudTrail log datasets are partitioned, see Identify partitions to optimize searches of AWS CloudTrail log datasets.

  2. Select Save to save the federated index configuration.
  3. (Optional) Give your users access to the federated index. To run searches over the remote dataset to which the federated index maps, your users must have access permissions for the federated index. See Give your users role-based access control of federated indexes.

Identify partitions to optimize searches of AWS CloudTrail log datasets

Partitioning is an organization strategy for large datasets that makes it possible for you to search them efficiently. When you partition your data, you organize it into a hierarchical directory structure based on the distinct values of 1 or more fields in the data. Files in AWS CloudTrail log datasets are partitioned by time, meaning they are organized into folders by year, month, and day. This means all of the files associated with a specific date can easily be searched for.

Because AWS CloudTrail log datasets have a stable schema, definitions for federated indexes that map to Splunk-managed AWS Glue tables come with default partition time field values that you cannot change.

However, all AWS CloudTrail log datasets are also partitioned by two other fields (or "keys"): AWS account ID and AWS region. Splunk software cannot predict the values for these partition keys, so it is up to you to supply them. When you identify the partition keys in the federated index definition, you can run efficient and cost-effective sdselect searches of the AWS CloudTrail log dataset to which the federated index maps.

Get partition key values for an AWS CloudTrail log dataset

All AWS CloudTrail log datasets are partitioned by at least 1 AWS account ID and 1 AWS region. This means that when you set up a federated index that maps to a Splunk-managed AWS Glue table, you must provide at least 1 value for the AWS account IDs and AWS regions partition key fields. Splunk software cannot know in advance which AWS account IDs and AWS regions a specific AWS CloudTrail log dataset is partitioned by, so these fields do not have default values.

To get values for the AWS account IDs, and AWS regions fields, go to the Amazon S3 console and inspect the full Amazon S3 location path for the dataset. The bold folders in the following AWS Cloudtrail location path syntax example show you where these values can be found:

s3://<bucket-name>/<additional-prefix-folders>/AWSLogs/<AWS-account-ID>/CloudTrail/<AWS-region>/<year>/<month>/<day>/<filename>

For example, in the Amazon S3 console, when you open the AWSLogs folder for an AWS CloudTrail log dataset, you'll see the AWS account IDs the dataset is associated with. Similarly, when you open the CloudTrail folder for an AWS CloudTrail log dataset, you'll see the AWS regions the dataset is associated with.

Optionally identify all possible partition keys with a wildcard

If an AWS CloudTrail log dataset is associated with large number of AWS account IDs or AWS regions and you do not want to take the time to enter every key value into those fields, you can save time by entering wildcard symbols (*) into the fields instead. The wildcard symbol indicates that all possible key values for the field are applied to the federated index definition.

Note: When you use a wildcard symbol for either AWS account IDs or AWS regions in a federated index definition, you must include a WHERE clause that filters results by pk_account_id or pk_region when you invoke that federated index in an sdselect search. See sdselect command WHERE clause operations.

Search your AWS CloudTrail log datasets

After you set up federated indexes that map to AWS Glue tables for AWS CloudTrail log datasets, you can use the sdselect command to search those datasets. See sdselect command overview.

Delete a federated index

You can delete a federated index that maps to an AWS Glue table that you no longer need to search. You can also delete federated indexes when your data scanning entitlements are depleted, to prevent unintentional usage.

Prerequisites

  • A role on your Splunk Cloud Platform deployment that has the admin_all_objects capability.
  • A federated index for Federated Search for Amazon S3 that you want to delete.

Steps

  1. On your Splunk Cloud Platform deployment, in Splunk Web, select Settings, then Federation.
  2. On the Federated index tab, identify a federated index that you want to delete.
  3. Select Delete for the index you want to delete.