Migrate anomaly detection to adaptive thresholding in ITSI

As of ITSI 4.20, the anomaly detection feature in Splunk's IT Service Intelligence (ITSI) has been deprecated and will no longer receive updates, with plans for removal in future versions. Users are encouraged to transition to more robust and flexible alternatives offered within ITSI, such as adaptive thresholding with outlier detection, to ensure continued accurate monitoring and alerting.

Understand anomalies in ITSI

An anomaly is a data point or pattern that deviates significantly from what is expected based on historical trends or group behavior. Anomalies are often unexpected KPI (key performance indicator) values, such as unusual spikes or dips, that could indicate potential issues like system failures, performance degradation, or security breaches.

For example:

  • Entity-specific anomaly: A single server shows high CPU utilization compared to its historical baseline

  • Entity-level anomaly: A subset of database servers in a cluster deviates from the typical behavior of the group

  • KPI aggregate anomaly: Sudden spike in average API response time

Anomalies can be detected across various use cases, including application performance monitoring, entity management, and infrastructure health. The goal is to identify such deviations early to trigger appropriate alerts and remediate potential issues promptly.

Both anomaly detection and adaptive thresholding with outlier detection are designed to identify deviations in KPI behavior. While the underlying methods differ, adaptive thresholding is fully capable of identifying the same types of anomalies that were previously captured by anomaly detection algorithms.

How adaptive thresholding works

Adaptive thresholding in IT Service Intelligence (ITSI) uses machine learning techniques to analyze your data and automatically adjust threshold values based on historical data and current conditions of a KPI. It is particularly beneficial for identifying outliers and changes in KPI performance, reducing false positives and improving the reliability of alerts. In ITSI 4.21, adaptive thresholding has been enhanced to support entity-level data, providing users with greater granularity and precision in monitoring KPIs.

Adaptive thresholding is ideal for monitoring KPIs with dynamic, seasonal, or unpredictable behavior where static thresholds may lead to false positives or missed anomalies.

Key features of adaptive thresholding include:

  • Automatic Adjustments: It dynamically adjusts thresholds based on historical data, ensuring accurate detection even in complex environments.

  • Improved Precision: Support for entity-level granularity provides greater accuracy in detecting anomalies across individual entities.

  • Reduced Missed Alerts: Outlier detection ensures anomalies are excluded from adaptive thresholding training data, improving the system's ability to detect issues and minimizing the likelihood of missed alerts.

This ensures that users transitioning to adaptive thresholding maintain, and often improve, their ability to detect abnormal KPI behaviors.

How drift detection works

Drift detection helps identify gradual or rapid changes in KPI behavior over extended periods, enabling proactive issue remediation and preventing potential system failures.

Key features of drift detection include:

  • Gradual drift: Identifies slow, incremental changes in KPI behavior over time.

  • Rapid Drift: Detects sustained, sudden changes in KPI behavior over a short period, signaling the need for immediate review.

  • Data Resolution: Defines the time frame over which data is collected and summarized.

  • Statistical Functions: Supports methods like Max, Average, Min, and Sum to analyze aggregated data.

  • Look Back Period: Specifies the historical time frame used to evaluate trends and patterns.

  • Drift Tolerance: Sets the percentage deviation from the baseline that is considered normal.

Migrate to adaptive thresholding

To migrate from anomaly detection to KPI-based monitoring, configure your KPIs with adaptive thresholding with outlier detection. When adaptive thresholding is properly set up, any anomalous KPI values will be automatically assigned high or critical severities, ensuring that significant deviations are flagged appropriately. It is important to note that notable events generated by anomaly detection will remain; however, generating new alerts for KPIs and entities will require you to create or enable the necessary correlation searches.

To effectively alert on high or critical severities, it is recommended to use the KPI/Entity Degraded correlation searches available in the ITSI Monitoring and Alerting content pack. These correlation searches are specifically designed to identify and alert on performance degradations, making them a reliable option for monitoring critical KPI anomalies. By configuring these searches, you can ensure a smooth transition from anomaly detection to a KPI monitoring approach that leverages adaptive thresholding for accurate and actionable insights.

Set up adaptive thresholding for KPIs

  • Install the Python for Scientific Computing 4.20 or later.
  • You must have the write_itsi_kpi_threshold_template capability to apply adaptive thresholds to a KPI.
  • Because adaptive thresholding looks for historic patterns in data, ensure KPIs have established baselines of data points and show a pattern or trend over time. If the historic data is noisy, a pattern will be difficult to detect.
Configure your KPIs with adaptive thresholding and drift detection in order to identify anomalies in KPI data. Follow these steps to apply adaptive thresholding to your KPIs.
  1. Select Configuration, then Service Monitoring, then Service and KPI management.
  2. Select the service that includes the KPI with anomaly detection configured.
  3. Select the Anomaly Detection tab.
  4. Select No for the Enable Trending AD Algorithm setting.
  5. Note: Notable events generated by anomaly detection will not be removed. To generate new alerts for KPIs and entities configured with anomaly detection, you must create or turn on corresponding correlation searches. For more information, see Set up KPI alerting.
    Select the KPI thresholds tab.
  6. Select AI thresholding or Threshold template for the Threshold type.
  7. (Optional) If the applied threshold template does not have outlier exclusion enabled:
    1. Turn on Outlier exclusion in the adaptive thresholding settings.
    2. Select an algorithm for outlier detection.
    3. Adjust the trigger outlier threshold until the correct number of outliers are identified in the preview chart.
  8. Select Save.
For more information about adaptive thresholding in ITSI, see Create adaptive KPI thresholds in ITSI.

Set up KPI drift detection

Use drift detection to identify changes in KPI behavior that occur slowly over longer periods of time, and prevent issues before they arise. Normal KPIs can display an incorrect severity (high or critical) due to user configuration changes in code, data, workload, or infrastructure. Follow these steps to set up KPIs with drift detection.

To set up KPI drift detection, see Configure drift detection.

When drift is detected on KPIs, drift indicators appear next to KPIs on the Service Analyzer page. Indicators also appear next to episodes on the Episode Review page.

Set up adaptive thresholding for entities

  • Install Python for Scientific Computing version 3.0.0 or later.
  • Because adaptive thresholding looks for historic patterns in data, ensure entities have established baselines of data points and show a pattern or trend over time.
If your entities share similar characteristics and data patterns, follow these steps to apply the same KPI threshold configuration across all entities.
  1. Select Configuration, then Service Monitoring, then Service and KPI management.
  2. Select the service that includes the KPI with anomaly detection configured.
  3. Select the Anomaly Detection tab.
  4. Select No for the Enable Entity Cohesion AD Algorithm.
  5. Select the Entity thresholds tab.
  6. Select the AI thresholding tab, and preview the recommendations.
  7. (Optional) To apply this to multiple entities, follow these steps: Select Default settings from the side panel.
    1. Select the Threshold levels tab.
    2. Select Copy once to copy the KPI aggregate threshold settings.
    3. Check the Always copy time policies after they are updated by adaptive thresholding setting.
  8. Select Save.
To set up KPI and entity alerts, see Set up KPI alerting.

Set up KPI alerting

Install the ITSI Monitoring and Alerting content pack. To learn more, see Set up KPI alerting.

Use the correlation searches available in the ITSI Monitoring and Alerting content pack to set up alerts when KPIs have high or critical severities. Use these searches to monitor KPIs configured with adaptive thresholding and receive actionable insights. Follow these steps to set up KPI alerting.

  1. Select Configuration, then Data integrations, then Content library, then ITSI Monitoring and Alerting.
  2. Install the Service Monitoring - KPI Degraded correlation search.
  3. Select Configuration, then Data integrations, then Correlation searches.
  4. Turn on the Service Monitoring - KPI Degraded search.

The Episodes by ITSI Service notable event aggregation policy can be installed and enabled to group the alerts by service.

To set up entity alerting, install and turn on the following correlation searches:

  • Service Monitoring - Entity Degraded

  • Service Monitoring - Sustained Entity Degradation

Use the ITSI Configuration Assistant to continuously optimize your adaptive thresholding and drift detection settings by identifying KPIs with sub-optimal threshold configurations. Use the Configuration Assistant to adjust the KPI configuration and improve monitoring accuracy.