Ingest historical data

Ingesting historical data allows you to access and analyze important logs from the past, helping you respond to security, investigation, or compliance needs that require information not currently in Splunk.

Make sure you have met the prerequisites. See Prerequisites.
Ingesting historical data allows you to access and analyze important logs from the past, helping you respond to security, investigation, or compliance needs that require information not currently in Splunk. By bringing in this data from sources like AWS S3, you can gain insights and ensure your analysis is complete and accurate.
  1. Log into Splunk Cloud and select Data Manager > New Data Input.
  2. On the Add a new data input window, select Promote.
  3. Select Amazon Web Services as the source from which you onboard data, then select Next.
  4. On the prerequisites window, check if you fulfilled all prerequisites and select Next.
    If you cannot fulfill the prerequisites yourself, contact your AWS admin.
    Important:

    Make sure that you defined the rules how your promote data should be handled by Ingest Processor and ingest actions before you create the promote input or its scheduled execution time.

  5. On the Input AWS - S3 data source (Promote) page, enter basic information about the data input:
    1. Data input name - Enter a unique name starting with a letter. Names must contain only letters, underscores, or the at (@) character. The name can consist of uo to 128 characters.
    2. Description - (Optional) Enter a description of this data input.
  6. Decide if you want to create a new dataset from scratch or apply the dataset from an existing promote input.
  7. To create a new dataset:
    1. In the Define dataset section, select Create a new dataset > Add dataset.
    2. On the Add dataset window, select a source type from the drop-down list.
    3. Select the way in which you want to provide the S3 paths:
      • If you need only a few paths, select Enter S3 paths and enter paths manually in the S3 paths field. You can enter as many S3 paths as you need, separate them with a comma. Use the following S3 path format: s3://your-bucket-name/account_id/region.

      • If you want to use many paths, it's more convenient to define the S3 path format and specify partition tokes to generate multiple S3 paths. In such a case, select Use S3 path builder. In the S3 path format field, select a path format that you would like to use. You can modify the format, for example you can add and remove partition tokes. Remember to start the path with s3://, separate the partition tokens with a slash "/" and enter the partition token names in curly brackets "{}". For each partition token, enter values, you can enter multiple values, separate them with a comma. For more information how to use S3 path builder, see S3 path format.

    4. Select Save.
    5. If your S3 path format template includes the time tokens, in the Select time range section, you have to define the start time and end time. Splunk will only apply time range to the token you use.
      The start time indicates when data from the defined dataset begins to be used for this promote input. Confirm the selection by selecting Apply.
      The end time is the last point in time from which data will be included in this promote input. Confirm the selection by selecting Apply.
      Note: You have to enter values in the HH:MM field to indicate hour and minutes even if your S3 path format template doesn't include hours and minutes. You can simply enter 00:00. Otherwise, you won't be able to create the input.
  8. Enter information that is necessary to create a connection with your AmazonS3.
    1. From the drop-down list, select IAM roles region.
    2. In S3 bucket ARNs, enter a prefix for your S3 bucket or provide the S3 bucket ARN. This ensures that Splunk Cloud Platform has read access to the bucket. Example: arn:aws.s3:::bucket-name.
      The S3 bucket ARNs field is used to specify the bucket ARNs that correspond to the S3 paths or S3 path format defined for the dataset. In most cases, Data Manager will auto-detect and pre-fill these ARNs. If the bucket ARNs aren’t detected, you can just add them manually. This ensures that Splunk Cloud Platform is granted the read-only permissions to access the data stored in those S3 buckets.
    3. In AWS account ID, enter ID of your AWS account from which you want to ingest historical data.
      ID of your AWS account from which you want to ingest historical data. This field is automatically filled in from the Define dataset step when the account ID is set as a partition token in the S3 path format and when a promote input with the same account ID already exists. In these cases, Data Manager uses this token to identify the correct AWS account. If these conditions aren’t met, you need to enter your AWS account ID manually.
    4. In AWS KMS keys field, enter the AWS KMS key that you created in AWS Key Management Service (AWS KMS) to encrypt your data. Use the Amazon Resource Name (ARN) to identify your AWS KMS key. Example: arn:aws:kms:region:acct-id:key/key-id.
  9. In Splunk index name, select an existing index already created for a promote input or create a new one for this promote.

    To use an index with the promote input, both the search and archive retention periods must be set to infinity (that is 100 years or 36,500 days, the maximum retention allowed by Splunk). This ensures that promote data remains searchable and retained in the Splunk platform for that duration. For indexes not used for promote data, you can set shorter retention periods.

    You can select any existing promote index or create a new index by entering a new index name in the Splunk index name field. If you create a new index for promote input, the infinite retention period will be set by default by Data Manager when the new index and input are created in the next steps.
  10. Decide when you want to start the promote:
    • If you want the promote to start immediately after its creation, select Yes.
    • If you want to schedule the promote for a specific time, select No.

      In the Promote start time field, enter the time you want to start the promote.

  11. Select Review Data Input.
    On the Review Data Input page, check if entered values are accurate. At this point you can still edit entered information, you just have to select Close and make the edits in any fields you need. If you are happy with the information, select Next.
The Promote inputs tab opens. The status of the promote is displayed in a table. Select the promote input name to see its details. In the Data source promote details section, you can see the total number of files that are stored in the S3 bucket, along with the number of scanned and completed files.
  • If a promote is paused, you can resume or cancell it, but you can't edit its configuration.

  • If a promote fails or is cancelled, you can reuse or duplicate the dataset to try again.

For more information about the possible promote statuses, see Promote status.
You can check the examples of S3 path formats and how they are resolved based on different token values and time ranges, see Examples of S3 path format.
Select an input name and the select the Open in Search button to open the Search tab in Splunk Cloud Platform and further analyze the promote data. For more information about the search options, see Exploring the Search views.