Ingest Actions for Splunk platform
This Splunk validated architecture (SVA) applies to Splunk Cloud Platform and Splunk Enterprise products.
Initial publication: March 23, 2023
Architecture diagram
        
      
Benefits
Ingest actions represents a number of capabilities related to pre-index event processing and data routing. You can use ingest actions both directly at the Splunk platform indexing layer as well as at the intermediate forwarding layer when heavyweight forwarders are used.
There are three primary benefits of ingest actions that are described in this document: rulesets, routing to S3, and the ingest actions user interface. Each of these benefits and features have their own advantages and benefits but the overall benefit is to provide users more control and flexibility over their data before it is permanently stored. Customers might want to exercise this additional control for the following reasons:
- Mask, redact, remove, or otherwise change raw data before indexing
- Tag, add, lookup, or otherwise augment raw data before indexing
- Filter entire events from being indexed
- Have tighter control over which indexes data is sent to
- Send some or all data to third-party storage either independently or concurrently with data going in to the Splunk platform
While the use of pre-index event processing has implementation and search time considerations, the data routing topologies documented throughout the validated architectures remain largely the same with or without the use of ingest actions. This includes high availability and scale decisions. Caveats and considerations to these topologies are described in the following sections.
The configuration of ingest actions uses the same props.conf and transforms.conf described in the Splunk product documentation and in other validated architectures. However, the distribution of these ingest actions can manifest in several different ways depending on the specific implementation. Those distribution methodologies are described in the following sections.
Limitations
- There are specific restrictions and requirements when ingest actions is used to route data to S3 depending on whether those ingest actions are executed in Splunk Cloud Platform versus a customer-managed environment.
- The addition of event processing can add significant overhead to the processing pipelines both on heavyweight forwarders and indexers. Take care to implement efficient and concise rules, especially with high volume data.
- Modifying data before indexing can break add-ons and other knowledge objects that rely on data to match specific shapes or patterns. Take care to evaluate search-time knowledge for any sourcetypes that you intend to modify with ingest actions to ensure consistency. This might require modifying search time knowledge to comply with new data formats as a result of ingest actions
- Management, maintenance, and distribution of ingest actions depend specifically on where in your topology those actions are implemented.
- Ingest actions rulesets are processed differently than traditional transforms and need to be understood for proper execution of rules.
Each of the main functions of ingest actions has specific benefits, limitations, and concepts worth exploring and understanding. Those details are provided in the following sections.
Rulesets
Ingest actions adds RULESET processing to Splunk heavy forwarder and indexer processing pipelines. The ruleset functionality adds a layer of capability to the pipeline such that data that has already been parsed can be re-parsed. TRANSFORMS are only able to affect unparsed data and will at most be processed once per pipeline, whereas RULESETS can be used at each stage of processing. RULESETS function as TRANSFORMS in most other ways. Users use TRANSFORMS and RULESETS to implement business logic to data prior to indexing.
It's important to understand how TRANSFORMS and RULESETS work together and their order of operations. TRANSFORMS can be processed at most one time per event and RULESETS can process data at any point along the path.
Benefits
- Support for event processing throughout the data routing pipeline. This allows for different processing of events to be applied at multiple or various stages of data routing. The ability to repeatedly modify data can be very useful when tagging data or protecting data prior to leaving specific data domains.
- Can execute event processing on parsed data. Transforms are limited to processing on unparsed data only. Rulesets allow for processing both unparsed and parsed data.
- Rulesets are defined in props and transforms and are compatible with existing configuration file management and distribution.
- Gives Splunk administrators more control over data before being indexed. Rulesets can be used as a traffic cop to allow or deny specific or arbitrary data from being sent to specific indexes.
Limitations
- Understanding transforms and ruleset processing order and precedence is critical to ensuring data is modified as intended and in the right order.
- Before modifying data using either rulesets or transforms, you must ensure that search-time artifacts are compatible with the format of data resulting from pre-index transformations. Modifying source data without evaluating search-time artifacts can result in broken dashboards, reports, alerts, and searches. This is also true for Splunkbase add-ons and apps.
- The scoping/application of Ingest Actions rulesets depends on the original sourcetype. If a transform has modified the original sourcetype, then rulesets must be scoped using the original sourcetype and not the resultant sourcetype from the transform.
Understanding the original sourcetype limitation
The order of operations noted previously is important for understanding how the use of a transform can affect the implementation of a ruleset. It's possible that a transform can modify a sourcetype such that when previewing data for ruleset authoring, the resultant events are marked with a sourcetype that's different from the original data. This can cause confusion when authoring rulesets in the UI because the ruleset needs to be authored using the original sourcetype, but the preview will have been done using the resultant sourcetype.
In reality, the ruleset that should have been crafted in the example in the preceding image would use the original "foo" sourcetype. The UI is previewing data that has already been affected by the prior transform rather than the original data, which is the actual target data.
This situation can be mitigated in several different ways:
- Use live preview to capture real-time original events. Live previewed events will show event data before any transforms have been applied. This requires that data is actively being received by the Ingest Actions host.
- Preview and author the rulesets using the resultant sourcetype, but modify the configuration produced by ingest actions to use the original sourcetype
- Upload a file containing the sample events and explicitly select the original sourcetype, rather than using search-based event sampling.
- Implement the ruleset on a different processing tier. The sourcetype restriction is only applicable if the transforms and rulesets are executed on the same Splunk platform instance. For example, an intermediate tier could change the original sourcetype and then send it to another tier, where a final ruleset is crafted targeting the sourcetype produced by the intermediate transform.
Routing data to S3
        
      
Routing data to S3 is similar in nature to routing data to syslog or TCP. Whereas those methods are fairly straightforward, the S3 connection has more variety to consider in configuration:
- Distance from S3 or increased latency limits overall throughput which can cause the output queue to fill.
- In the case of a Splunk Cloud Platform deployment, S3 buckets must be in the same region as the deployment.
- Must choose between IAM or Secret Key methods
- Either raw JSON or compressed formats can be used in the buckets written to S3
- Both size or time-based batching can be used. Optimal settings will depend on the shape of included data.
The default settings attempt to find a middle ground between performance and risk. Data that is too big or too small can result in poor performance. Data that is held in the queue for too long is at risk of being lost.
Sizing considerations
In general, each S3 RFS output thread can see write speeds of 90MBps (roughly 7.5TB per day) depending on optimization and network performance. Multiple RFS output threads can be used to increase throughput. Review rfs.provider.max_workers in limits.conf. By default there are 4 threads per pipeline.
User interface and management topology
A major component of ingest actions is a user interface for crafting event processing rules. The interface provides a mechanism for filtering, masking, and routing data. While ingest actions rulesets provide a distinct processing pipeline element, the ingest actions interface is an assistive feature to help administrators author common, but potentially complex configurations.
Benefits
- Enables users to implement logic using a UI rather than only configuration files
- Live preview of ruleset logic and effect on data, leading to faster iteration and implementation time
- Easier to interpret data transformation workflow
- Provides copy/paste configuration
- Compatible with all Splunk platform topologies
Limitations
- Data preview and rulesets authoring is limited to sourcetype-based configuration
- Data preview and authoring tools may provide an incongruent view between live data and indexed data (See Understanding the Original Sourcetype limitation.)
- Only a subset of capabilities are represented in the user interface: routing, masking, and filtering. Configuration files can be used for more complex event processing.
- The UI is not case sensitive when previewing data, but the resultant configuration files are case sensitive. It's critical to author ingest actions rules in the UI using the proper case.
- All rulesets created by the UI live in the same Splunk app. Any modification of rulesets within add-ons or other apps must be done manually by configuration file.
Management techniques
As the ingest actions functionality exists within the ecosystem of Splunk architecture, common best practices for managing configuration rulesets continue to apply. Rulesets are defined within the props and transforms constructs. If you are using third-party configuration management tools, the inclusion of rulesets will use the same files. You may need to re-evaluate whether restarting the Splunk platform is required for changes to take effect.
The following image shows the most common deployment topologies and the effect on management and use:
Dedicated deployment server
When you use the deployment server to manage rulesets on heavy forwarders, you must use dedicated deployment servers for those heavy forwarders rather than shared infrastructure. The Ingest Actions page on the deployment server automatically creates the IngestAction_AutoGenerated server class and assigns that class to the forwarders. This is not usually desirable on deployment servers that are servicing other server classes.
Decoupled ruleset authoring and management
It is possible to decouple ruleset authoring from ruleset deployment to some extent. The Ingest Actions UI has the capability to display the props and transforms configuration settings resulting from a ruleset preview session. The configurations produced can be integrated into your existing knowledge object and configuration and change management process. In this case, a server would be dedicated to authoring rulesets but would not be used in the management of those rulesets.
Evaluation of pillars
| Design Principles / Best Practices | ||||||
|---|---|---|---|---|---|---|
| Availability | Performance | Scalability | Security | Management | ||
| #1 | Meet your event level business requirements by using rulesets Filtering, masking, and routing are key elements in optimizing your data ingestions. Rulesets address those use cases directly. | X | X | |||
| #2 | Ensure adequate resources when routing data to S3 to avoid blocking queues Sending data to S3 happens on a single thread per forwarder. Horizontal scale may be needed. | X | X | |||
| #3 | Choose a management technique compatible with your deployment Management options differ from topology to topology. | X | X | |||
| #4 | Understand which sourcetypes your rulesets are targeting and which TAs are also modifying those sourcetypes Ensure that your rulesets are applied and downstream results are compatible with other knowledge objects | X | X | |||
| #5 | Use proper case in your sourcetypes when building rulesets Good practice to reduce troubleshooting | X | ||||
| #6 | Minimize latency to S3 The faster that data can make its way to S3, the faster queues will empty and keep overall event delivery latency low to all destinations | X | X | |||
| #7 | Use the UI to accelerate ruleset creation and understand impact of changes The interface offers a lot of advantages over creating rulesets manually, including reducing errors and troubleshooting | X | X | |||