Monitor your overall AI application and agent environment with Splunk APM

Attention:

Alpha features described in this document are provided by Splunk to you "as is" without any warranties, maintenance and support, or service-level commitments. Splunk makes this alpha feature available in its sole discretion and may discontinue it at any time. These documents are not yet publicly available and we ask that you keep such information confidential. Use of alpha features is subject to the Splunk Pre-Release Agreement for Hosted Services.

The following documentation links apply to users who want to store conversation data in Splunk Observability Cloud. Use the following links to navigate between the pages in this release:

Monitor the overall performance, quality, estimated cost, and security risk of your AI applications and agents using the AI overview page.

The AI overview page can help you answer questions such as:

How is my overall application environment performing, in terms of total errors, latency, and quality issues?

What's driving estimated costs and token usage among my applications?

Which models and providers are driving errors, latency, and quality issues?

Prerequisites

To monitor AI applications and agents, you must meet the following requirements.

You have Set up AI Agent Monitoring.
(Optional) To enable security risk metrics, set up the Cisco AI Defense integration. For instructions, see Set up an integration with Cisco AI Defense.

Monitor all AI applications and agents

To monitor all AI applications and agents, use the Splunk Observability Cloud main menu to select APM > AI overview. The following screenshot displays an example of the page.

The AI overview page in Splunk APM.

On the AI overview page, the Requests, Errors, Tokens, and Estimated cost sections of the header display the aggregate metrics across all of your AI applications and agents.

To monitor your AI agents in greater detail with the AI agents page, select View all AI agents. For more information on using this page, see Monitor AI agents with Splunk APM.

Analyze AI applications and agents using overview charts

On the AI overview page, the charts display all metric values in the selected time period for the AI applications and agents in your environment. Use the filters above each chart to update the chart view based on model, provider, or other filters based on the associated metric.

Select any chart in this view to show example traces that match the parameters of the chart.

The following table describes the available charts, the metric(s) associated with each chart, and use cases for each chart.


Chart name	Metric name	Use this chart to
Requests	CODE count(agents) `count(agents)`	Determine the total number of requests/calls based on spans with chat operations. This metric indicates the total traffic faced by your AI applications and agents.
Errors	CODE count(agents) where sf_error=true `count(agents) where sf_error=true`	Determine the total number of errors based on spans with chat operations. This metric is a leading indicator for technical issues faced by your system.
Error rates	CODE count(agents) where sf_error=true divided by count(agents) `count(agents) where sf_error=true divided by count(agents)`	Determine how many errors occurred among your total calls/requests. A high error rate indicates that a high number of your users are facing issues.
Latency per LLM generation	CODE percentile[50,90,99](agents) `percentile[50,90,99](agents)`	Determine latency of GenAI spans. A high latency indicates that your users are facing long wait times for responses.
Latency per provider	CODE percentile[50,90,99](agents) Group by gen_ai.provider.name `percentile[50,90,99](agents) Group by gen_ai.provider.name`	Determine latency of GenAI spans by model provider. Use this metric to determine if any model provider is currently producing slow responses.
Latency per operation	CODE percentile[50,90,99](agents) Group by gen_ai.operation.name `percentile[50,90,99](agents) Group by gen_ai.operation.name`	Determine latency of GenAI spans by operation type. This metric indicates which operations are currently performing slowly and helps guide troubleshooting.
Token usage	CODE gen_ai.client.token.usage Group by gen_ai.request.model gen_ai.provider.name `gen_ai.client.token.usage Group by gen_ai.request.model gen_ai.provider.name`	Track token usage by model or request. A model or request using a high number of tokens could be experiencing increased traffic or could be wasting resources.
Estimated cost	CODE gen_ai.cost.input gen_ai.cost.output Group by gen_ai.request.model gen_ai.provider.name `gen_ai.cost.input gen_ai.cost.output Group by gen_ai.request.model gen_ai.provider.name`	Track estimated costs by model or request. A model or request with high estimated costs may indicate high traffic or that re-optimization across models could reduce costs.
Quality issues	CODE gen_ai.evaluation.score filter('gen_ai.evaluation.name', 'toxicity') filter('gen_ai.evaluation.name', 'relevance') filter('gen_ai.evaluation.name', 'hallucination') filter('gen_ai.evaluation.name', 'sentiment') filter('gen_ai.evaluation.name', 'bias') Group by gen_ai.request.model gen_ai.provider.name `gen_ai.evaluation.score filter('gen_ai.evaluation.name', 'toxicity') filter('gen_ai.evaluation.name', 'relevance') filter('gen_ai.evaluation.name', 'hallucination') filter('gen_ai.evaluation.name', 'sentiment') filter('gen_ai.evaluation.name', 'bias') Group by gen_ai.request.model gen_ai.provider.name`	Track negative-scoring evaluations by model and correlate semantic issues with models. A high number of issues correlated with a specific model may require action to mitigate issues, or shifting traffic to different models to prevent the issues. For more information about quality scores, see About AI agent quality scores.
Risks	CODE gen_ai.security Group by gen_ai.request.model gen_ai.provider.name `gen_ai.security Group by gen_ai.request.model gen_ai.provider.name`	Track security risks by model or provider and correlate security issues with models. A high number of issues correlated with a specific model may require action to mitigate issues, or shifting traffic to different models to prevent the issues. For more information about monitoring security risks, see Monitor security risks on the AI overview page.

Create a detector to generate alerts from a chart

To create a detector to generate alerts for a chart, select the actions (…) menu in the chart and select New detector from chart. For more information on detectors and alerts, see Create detectors to trigger alerts.

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

Enterprise Security

SOAR

IT Service Intelligence

Content Packs

Splunk Observability Cloud

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

Developer Documentation

Splunkbase

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

DATA MANAGEMENT

SEARCH AND ANALYTICS

ADMINISTRATION

Enterprise Security

SOAR

ENTERPRISE SECURITY

SOAR

RELATED APPS

IT Service Intelligence

Content Packs

ITSI

IT Ops

ADMINISTRATION

EXTENSIONS

Splunk Observability Cloud

MONITORING

DATA MANAGEMENT

ADMINISTRATION

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

ESSENTIALS

MONITORING

ADMINISTRATION

Developer Documentation

Splunkbase

PLATFORM

OBSERVABILITY

REFERENCE

Resources

REFERENCE

Learn More

Support

Prerequisites

Monitor all AI applications and agents

Analyze AI applications and agents using overview charts

Create a detector to generate alerts from a chart