Use field filters in searches on accelerated data models
READ THIS FIRST: Should you deploy field filters in your organization?
Field filters are a powerful tool that can help many organizations protect their sensitive fields from prying eyes, but it might not be a good fit for everyone.
If your organization uses downstream configurations, such as accelerated data models, Splunk Enterprise Security (ES) detections using those data models, and user-level search-time field extractions, make sure that you plan around the implications of field filters on those configurations before deploying field filters in your environment. See READ THIS: Downstream impact of field filters.
If your organization runs Splunk Enterprise Security or if your users rely heavily on commands that field filters restricts by default (mpreview and mstats), do not use field filters in production until you have thoroughly planned how you will work around these restricted commands. See READ THIS: Restricted commands do not work in searches on indexes that have field filters.
See also
How field filters work in accelerated data models
Field filters can protect search-time fields that are defined in accelerated data models.
You can use field filters to control access to specific sensitive fields in search results, including fields extracted from indexed events and fields included in summaries in accelerated data models that speed up searches of those fields. When data model acceleration (DMA) is in use, field filters are applied in the summarization searches and generate protected summaries on the disk as long as the field filters don't exempt certain specific roles. Field filters redact or obfuscate the field values in search results, including results retrieved from accelerated data model summaries, without modifying the underlying raw data in the original .tsidx files that are used to generate the summary .tsidx files. As a result, unauthorized users can't view protected fields and their values, regardless of whether a search accesses raw events or accelerated data model summaries.
For example, say you create an accelerated data model on an index called privacy_logs that extracts the following indexed fields:
- action
- ip
- ssn
- user
The summarization search matches ssn=*73*. When the summarization search is run without the field filter, the value for the ssn field for the user Rebecca is 607-73-0445.
To protect the ssn field, you create a _raw field filter in Splunk Web called ssn_fieldfilter on the privacy_logs index with Regex value match set to ssn=([^ ]+) and Replacement set to ssn=Redacted, like this:
After the field filter is applied and the summary is rebuilt, the next time the summarization search runs, the ssn field value is redacted and only the string Redacted is displayed:
Protect sensitive fields in DMA-summarized data
When working with sensitive data stored in Data Model Acceleration (DMA) summaries on disk in .tsidx files on an indexer, it is critical that field filters are correctly applied during summarization searches and that sensitive data is filtered before it is ever written to disk. Follow these best practices to ensure your sensitive data stored in a data model summary is protected.
Ensure field filters are applied during summarization
Make sure that none of the roles used by the internal system account for background DMA summarization generation are exempt from field filters, so that field filters are applied during summarization searches and sensitive data isn't written into the .tsidx files that store DMA summaries on disk. If any of the internal system account's roles is exempt from a field filter, the field filter will not be applied to DMA summary generation searches.
In a default environment, the roles associated with the internal system account are user, power, admin and splunk-system-role. To determine which roles these are in your environment in your environment, run the following search:
index=_audit action=search info=completed id="DM_search*" search_id='scheduler* | head 1 | table roles | eval clean_str=replace(roles,"'","") | eval list_roles=split(clean_str,"+") | table list_roles
The results of this search in a default environment look like this:
| list_roles |
|---|
| admin |
| power |
| splunk-system-role |
| user |
Now you know that, in order to protect your sensitive data during summarization searches, you need to ensure that the admin, power, splunk-system-role, and user roles are not listed as exempt roles on any of your field filters.
See Exempt certain roles from field filters using Splunk Web.
Control access to non-summarized sensitive data
If you need a role to access sensitive fields in non-summarized data, you should create a new role, which can inherit from a predefined role such as admin, power or user. Then, exempt that new role from the field filter, so you have controlled visibility for specific users while protecting summarized data.
Use role-based access control (RBAC) for sensitive data on disk
If there is sensitive data in a data model's summary on disk in .tsidx files that haven’t been filtered by your field filter for some reason, use RBAC instead of field filters to secure the data model. This ensures only authorized users can access summaries containing sensitive information.
For example, if your field filter wasn’t correctly applied during your summarization search or if the sensitive data is already on disk, the field and its value that your field filter is supposed to protect will be exposed on disk. In these cases, field filters are ineffective in protecting your sensitive data and you should use RBAC instead.
Test your use cases before rollout
Test your use cases thoroughly with DMA summarized data and field filters before deploying field filters to production environments to make sure that none of your sensitive data is unintentionally exposed.
Search-time order and DMA
Since field filters process field values in searches before all other search-time operations and change the values of fields extracted in data model searches, operations that come later in the search pipeline and depend on the value of the field that is changed by a field filter are affected. As a result, if you're running DMA searches, be aware that you might get search results that you don't expect. For example, say you have a data model called DM_search_xintestdm10 that uses the following summarization search to create summaries to detect malicious users:
| summarize tstats=t override=partial manual_rebuilds=f max_time=3600 poll_buckets_until_maxtime=f id=DM_search_xintestdm10 [ search (index=* OR index=_*) (index=xintest2) | eval nodename = "xintest"| eval malicious_user=if(searchmatch("val=86"),"true","false") | rename Code AS xintest.Code malicious_user AS xintest.malicious_user | fields nodename, _time, host, source, sourcetype, xintest.Code, xintest.malicious_user ]
When this summarization search runs without a field filter, the malicious_user field evaluates to true, alerting you to a possible security breach.
Then, you create a field filter on the sensitive field called val. When the DM_search_xintestdm10 data model's search runs with the field filter on the val field, the result of the search command is the redacted value, which is a value other than 86. When the next command in the pipeline, the eval command, evaluates the val field to the redacted field value, the malicious_user field is now false. Now when you have alerts with tstats searches on the accelerated data model set on the malicious_user field that trigger on true instead of false, you might not be alerted and, therefore, miss potential security breaches.
Filtered and original field values in summary indexes
If you’re using field filters to protect sensitive fields in DMA searches, it’s important to note that the original, unfiltered values of any sensitive fields summarized before field filters were processed might still exist in the data model summary index, and corresponding tsidx files, even after field filters replace those values in search results.
To remove these unfiltered values, you can either wait for Splunk software to automatically rebuild the summary index for the accelerated data model according to the configured schedule, or manually trigger a rebuild of the summary index. Note that manually rebuilding the summary index can be resource-intensive and might affect the performance of other running processes. For instructions about rebuilding an accelerated data model summary index, see Manage data model acceleration.