Remove indexes and indexed data

You can remove indexed data or even entire indexes from the indexer. These are the main options:

  • Delete events from subsequent searches.
  • Remove all data from one or more indexes.
  • Remove or disable an entire index.
  • Remove older data, based on a retirement policy.
CAUTION: Removing data is irreversible. If you want to get your data back once you've removed data using any of the techniques described in this topic, you must re-index the applicable data sources.

Delete events from subsequent searches

The Splunk search language provides the delete command to delete event data from subsequent searches.

The delete command is available only with events indexes. You cannot use it with metrics indexes

You cannot run the delete command for a real-time search. If you try to use delete during a real-time search, Splunk Enterprise will display an error.

CAUTION: The delete command only deletes the events from subsequent searches. The data itself remains in the index.

Who can delete?

The delete command can only be run by a user with the "delete_by_keyword" capability. By default, Splunk Enterprise ships with a special role, "can_delete" that has this capability (and no others). The admin role does not have this capability by default. It's recommended that you create a special user that you log into when you intend to delete index data.

For more information, refer to Add and edit roles in Securing Splunk Enterprise.

How to delete

First run a search that returns the events you want deleted. Make sure that this search returns only the events you want to delete, and no other events. Once you're certain of that, you can pipe the results of the search to the delete command.

For example, if you want to remove the events you've indexed from a source called /fflanda/incoming/cheese.log so that they no longer appear in searches, do the following:

1. Disable or remove that source so that it no longer gets indexed.

2. Search for events from that source in your index:

source="/fflanda/incoming/cheese.log"

3. Look at the results to confirm that this is the data you want to delete.

4. Once you've confirmed that this is the data you want to delete, pipe the search to delete:

source="/fflanda/incoming/cheese.log" | delete

See the page about the delete command in the Search Reference Manual for more examples.

Note: When running Splunk on Windows, substitute the forward slashes (/) in the examples with backslashes (\).

Piping a search to the delete command marks all the events returned by that search so that subsequent searches do not return them. No user (even with admin permissions) will be able to see this data when searching.

Note: Piping to delete does not reclaim disk space. The data is not actually removed from the index; it is just invisible to searches.

The delete command does not update the metadata of the events, so any metadata searches will still include the events although they are not searchable. The main All indexed data dashboard will still show event counts for the deleted sources, hosts, or sourcetypes.

The delete operation and indexer clusters

In the normal course of index replication, the effects of a delete operation get quickly propagated across all bucket copies in the cluster, typically within a few seconds or minutes, depending on the cluster load and amount of data and buckets affected by the delete operation. During this propagation interval, a search can return results that have already been deleted.

Also, if a peer that had primary bucket copies at the time of the delete operation goes down before all the results have been propagated, some of the deletes will be lost. In that case, you must rerun the operation after the primary copies from the downed peer have been reassigned.

Remove all data from one or all indexes

To delete indexed data permanently from your disk, use the CLI clean command. This command completely deletes the data in one or all indexes, depending on whether you provide an <index_name> argument. Typically, you run clean before re-indexing all your data.

Note: The clean command does not work on clustered indexes.

How to use the clean command

Here are the main ways to use the clean command:

  • To access the help page for clean, type:
CODE
splunk help clean
  • To permanently remove data from all indexes, type:
CODE
splunk clean eventdata
  • To permanently remove data from a single index, type:
CODE
splunk clean eventdata -index <index_name>

where <index_name> is the name of the targeted index.

  • Add the -f parameter to force clean to skip its confirmation prompts.

Important: You must stop the indexer before you run the clean command.

Examples

This example removes data from all indexes:

CODE
splunk stop
splunk clean eventdata

This example removes data from the _internal index and forces Splunk to skip the confirmation prompt:

CODE
splunk stop
splunk clean eventdata -index _internal -f

Remove an index entirely

To remove an index entirely (and not just the data contained in it) from a non-clustered indexer, you can use Splunk Web or the CLI. You can also edit indexes.conf directly

Before removing an index, look through all inputs.conf files on your indexer and on any forwarders sending data to the indexer and make sure that none of the stanzas are directing data to the index you plan to delete. For example, if you want to delete an index called "nogood", make sure the following attribute/value pair does not appear in any of your input stanzas: index=nogood. Once the index has been deleted, the indexer will discard any data still being sent to that index.

To remove an index in Splunk Web, navigate to Settings > Indexes and click Delete to the right of the index you want to remove. This action deletes the index's data directories and removes the index's stanza from indexes.conf.

To remove an index through the CLI, run the splunk remove index command:

CODE
splunk remove index <index_name>

This command deletes the index's data directories and removes the index's stanza from indexes.conf.

You can run splunk remove index while the indexer is running. You do not need to restart the indexer after the command completes.

The index deletion process is ordinarily fast, but the duration depends on several factors:

  • The amount of data being deleted.
  • Whether you are currently performing heavy writes to other indexes on the same disk.
  • Whether you have a large number of small .tsidx files in the index you're deleting.

You can also remove an index by editing indexes.conf directly and deleting the index's stanza. Restart the indexer and then remove the index's directories.

To remove an index from an indexer cluster, you must edit indexes.conf and delete the index's stanza. You cannot use Splunk Web or the CLI. As with all such changes on an indexer cluster, you first edit the file on the manager node and then apply the changes to the peer nodes. See Configure the peer indexes in an indexer cluster Once you've applied the indexes.conf changes and the peer nodes have restarted, remove the index's directories from each peer node.

Split indexed data

The split-buckets command allows you to select events from a source index using search criteria and move them to a destination index, effectively splitting the original buckets.

The split-buckets command is a command-line tool used to reorganize indexes by moving or permanently deleting events based on a search filter. It allows you to select events from a source index using search criteria and move them to a destination index, effectively splitting the original buckets. This is useful for reclaiming storage or isolating sensitive data.

Always use the --dryrun parameter to preview your changes on the console before running a split operation in production.

Prerequisites

  • The Splunk Enterprise service (splunkd) must be stopped before using the split-buckets command.
  • The command is available in Splunk Enterprise version 10.0 and later.
  • This functionality is supported on standalone indexers only. It does not support clustered indexers or Splunk Cloud Platform.
  • The indexes.conf file must have bucketMerging = true set in the global stanza.
  • The user running the command must have permissions to access and modify the index data directories.
  • This release supports moving event data only. Summary and metric data types are not supported.
  • In addition to console messages, the procedure is logged in $SPLUNK_HOME/var/log/splunk/splunkd-utility.log.
Note: This command permanently moves data from the source index. When the operation is complete, the original events that match the filter will no longer exist in the source index. To save the original source buckets, you must use the --backup-to parameter.

Syntax

splunk split-buckets [parameters...]

Required parameters

Parameter Description
--search-filter=<spl_query_string> Specifies the search criteria for the events you want to move. Filtering is limited to host, source, and sourcetype.
--source-index-name=<index_name> The name of the index containing the data to be moved.
--dest-index-name=<index_name> The name of the index where the filtered data will be moved.

Optional parameters

Parameter Description
--backup-to=<path to destination folder> Creates a backup archive of the original source buckets in the specified folder before they are modified.
--dryrun Previews the command's actions without moving any data. Use this to test your search filter and parameters.
--enddate=<yyyy/mm/dd> or <unix-time> Limits the split to buckets whose earliest event time is before the specified date/time.
--json-out Formats the command's output as JSON, which is useful for automation and parsing.
--max-total-runtime=<seconds> Sets a maximum runtime in seconds for the entire split process.
--startdate=<yyyy/mm/dd> or <unix-time> Limits the split to buckets whose earliest event time is after the specified date/time.
--verbose / -v Increases the verbosity of the output for detailed logging. Use -vv for even more detail.

Examples

Example 1: Move events of a specific sourcetype to an archive index

You want to move all events with sourcetype=cisco:asa_0 from the ciscoasa index to a new dest-idx index.

  1. Stop Splunk Enterprise services.
  2. On the command line, run the command with the --dryrun parameter to preview the operation: ./splunk split-buckets --source-index-name=ciscoasa --dest-index-name=dest-idx --search-filter="sourcetype=cisco:asa_0" --dryrun --verbose
  3. Review the process summary on the console to ensure the correct buckets and events are selected.
  4. Once you are satisfied, run the command again without the --dryrun parameter to perform the data move. Using --json-out will provide a detailed, machine-readable summary of the results. ./splunk split-buckets --source-index-name=ciscoasa --dest-index-name=dest-idx --search-filter="sourcetype=cisco:asa_0" --json-out
  5. Start Splunk Enterprise services.
  6. Verify the move by searching both indexes. Events with sourcetype=cisco:asa_0 should now be in the dest-idx index and removed from the ciscoasa index.

Example 2: Archive old data and create a backup

You want to move all events with sourcetype=cisco:asa_0 from the ciscoasa index to the archive_cisco index. You also want to create a backup of the original buckets before they are modified.

  1. Stop Splunk Enterprise services.
  2. On the command line, run: ./splunk split-buckets --source-index-name=ciscoasa --dest-index-name=archive_cisco --search-filter="sourcetype=cisco:asa_0" --backup-to=/mnt/backup/ciscoasa_archive
  3. This command moves all matching events from the source to the destination index.
  4. A complete backup of the original source buckets involved in the operation is created in the /mnt/backup/ciscoasa_archive directory.
  5. Start Splunk Enterprise services.

Disable an index without removing it

Once an index is disabled, the indexer no longer accepts data targeted at it. However, disabling an index does not delete index data, and the operation is reversible.

You can disable an index in Splunk Web. To do this, navigate to Settings > Indexes and click Disable to the right of the index you want to disable. To re-enable the index, click Enable to the right of the index.

You can also disable an index with the CLI command splunk disable index:

CODE
splunk disable index <index_name>

To re-enable the index, use the splunk enable index command.

To disable an index for an indexer cluster, you must edit indexes.conf and set disabled=true in the index's stanza. You cannot use Splunk Web or the CLI. As with all such changes on an indexer cluster, you first edit the file on the manager node and then apply the changes to the peer nodes. See Configure the peer indexes in an indexer cluster

Remove older data based on retirement policy

When a bucket in an index reaches a specified age or when the index grows to a specified size, the bucket rolls to the "frozen" state, at which point the indexer removes it from the index. Just before removing the bucket, the indexer can save it to an archive, depending on how you configure your retirement policy.

For more information, see Set a retirement and archiving policy.

Bulk Data Move for indexer clusters

Bulk Data Move is a self-service toolset for reorganizing indexes, removing sensitive data, and reclaiming storage in clustered environments.

Bulk Data Move is a self-service toolset for reorganizing indexes and removing sensitive data to reclaim storage and handle compliance requests. You can move events from a source index to a destination index based on filter criteria within clustered environments, including those using SmartStore.

You can interact with the Bulk Data Move toolset through the following interfaces:

  • Splunk CLI: Run commands directly on the Cluster Manager (CM).
  • REST API: Use the Cluster Manager REST API for automation and troubleshooting tasks.

To ensure system stability and data integrity, follow these best practices when performing operations in your production environment:

  • Always perform a dry run: Use the dryrun parameter to simulate the operation, then search the splunkd.log file to verify the destination buckets.
  • Review the SmartStore process: The move includes a localization phase (downloading source buckets to local cache) followed by a stable state phase (uploading and verifying new buckets in remote storage).
Warning: This command permanently moves data from the source index. To save the original source buckets, you must use the backup-to parameter. To ensure search correctness, schedule these jobs during periods of low search activity, as search correctness is not guaranteed while a data move is in progress.

Limitations

Before running a Bulk Data Move operation, consider the following restrictions applying to this operation:

  • Source and destination must use the same storage mode. Mixed-mode moves are not supported.

  • Cluster-to-cluster moves are not supported.

  • Metrics indexes are not supported.

  • Hot buckets are not supported, as they are actively receiving data and cannot be safely moved while in an open state.

Prerequisites

  • For SmartStore-backed clustered indexes, Splunk Enterprise version 10.4 or later.

  • The Cluster Manager and all peer indexers must run the identical version of Splunk Enterprise.

  • This feature is for clustered deployments, including SmartStore-backed clustered indexes.

  • Activate the Bulk Data Move REST endpoints. See Activate the Bulk Data Move REST endpoints. By default, they are deactivated.

  • The create_bulk_data_move capability must be assigned to the user role that will be making the API call. By default, no roles (including admin) have this capability.

  • Verify that the destination index has the appropriate role-based access controls (RBAC) implemented. Moved events inherit RBAC permissions of the destination index.

    Warning: If the RBAC permissions of the destination index are less restrictive than RBAC permissions of the source index, data may become visible to unauthorized roles.
  • In indexes.conf on the cluster peers, you must set allowBulkDataMove=true in the stanza for both source and destination indexes. This is set to false by default.

  • In indexes.conf on the cluster peers, ensure that enableDataIntegrityControl=false is set in the stanza for source and destination indexes. This is set to false by default.

  • Ensure source and destination indexes have consistent settings.
  • Source and destination indexes must use the same storage mode (both local or both SmartStore-backed).

  • Destination index must have at least one event in it. Ensure the local disk and the backup path have enough free space to accommodate the split operation and the original source buckets.

  • Ensure that your destination storage and the local cache on peer indexers have enough free disk space to hold a full copy of the data you plan to move.

    Warning: If the Bulk Data Move operation is interrupted (for example, due to a crash or network timeout), duplicate data might remain in the source and destination indexes. It can increase disk space usage. If available disk space drops below the minFreeSpace setting set in the server.conf file, Splunk Enterprise automatically stops indexing data to prevent metadata corruption.

Activate the Bulk Data Move REST endpoints

By default, the Bulk Data Move REST endpoints are deactivated. To activate the endpoints, follow these steps on the Cluster Manager (CM) and peers:
  1. On the Cluster Manager (CM), add the following stanza and setting to restmap.conf.

    CODE
    [admin:bulk-data-move]
    disabled = false

    The value of false means that the endpoints are activated.

  2. Restart the CM.

  3. On each cluster peer, add the following stanza and setting to restmap.conf.

    CODE
    [admin:peer-bulk-data-move]
    disabled = false

    The value of false means that that endpoints are activated.

  4. Restart the cluster peers.

Syntax

To perform a Bulk Data Move operation, on the CM, run the following CLI command:

CODE
splunk bulk-data-move

Required Parameters

Parameter Description
name Required by the API, but its value is ignored and not used for tracking or logging. Example: move-job-01.
backup-to The absolute path on the cluster peers where the original source buckets will be backed up before data is moved.
source-index-name The name of the index you are moving data from.
dest-index-and-filter A combined parameter specifying the destination index and the search filter to select which events to move. (For example: main:sourcetype=my_app.) The format is index:filter. Filtering is limited to host, source, and sourcetype.
dryrun Set to true to test the command and validate parameters without moving data. Set to false to perform the move. If not explicitly specified in the request, it will default to true.

Optional Parameters

Parameter Description
start

Unix epoch time. Selects buckets for processing based on their time range. Filters whole buckets, not individual events.

Start and end filters the buckets with events falling into this range, and then all events are split based on the search filter, regardless of the event timestamp.

end
max-total-runtime The maximum runtime for the job, in seconds.
token

Authentication token for the REST API call.

auth

Username and password for the REST API call.

Example: Move events of a specific sourcetype to an archive index

You want to move all events with sourcetype=my_app from the src_index to the main.

  1. Ensure all prerequisites are met.

    Confirm that the user role has the create_bulk_data_move capability and that allowBulkDataMove=true is set for both source and destination indexes in indexes.conf on the cluster peers.

  2. Perform a dry run to simulate the Bulk Data Move operation.

    Run the following command with the dryrun=true parameter. This will validate your parameters and show you what the job would do without actually moving any data.

    CODE
    splunk bulk-data-move -source-index-name src_index -backup-to /mnt/splunk_archives/job_123 -dest-index-and-filter "main:sourcetype=my_app" -dryrun true -token <jwt>
  3. Review the output of the dryrun command to ensure the destination indexes and filters are correct.

  4. Run the data move.

    Once you are satisfied with the dry run, run the command again with the dryrun=false parameter to perform the data move.

    CODE
    splunk bulk-data-move -source-index-name src_index -backup-to /mnt/splunk_archives/job_123 -dest-index-and-filter "main:sourcetype=my_app" -dryrun false -token <jwt>
  5. Verify the move.

    After the job completes, verify the move by searching both indexes. Events with sourcetype=my_app should now be in the main index and removed from the src_index.

Stop a Bulk Data Move

To stop a Bulk Data Move job before it completes, run the following stop command. Before terminating, the process completes the current bucket operation, but will not begin processing any new buckets.

CODE
splunk bulk-data-move stop <txn-id>

The txn-id parameter is the transaction ID that was returned when you created the job.

Example

CODE
splunk bulk-data-move stop <your_txn_id>

Monitoring and troubleshooting

To troubleshoot failed and interrupted jobs, search splunkd.log for a transaction ID (txn-id) that was returned when you created the job.

Cluster Manager REST API

In addition to the CLI, you can use the Cluster Manager REST API to automate bulk data move operations or to troubleshoot job status.

To perform the Bulk Data Move operation, send a POST request to the following REST endpoint on your Cluster Manager:

CODE
POST /services/cluster/manager/bulk-data-move

To stop a job, send the following DELETE request:

CODE
DELETE /services/cluster/manager/bulk-data-move/<txn-id>