Monitor your overall AI application and agent environment with Splunk APM

Monitor the overall performance, quality, estimated cost, and security risk of your AI applications and agents with the AI overview page.

Attention:

Alpha features described in this document are provided by Splunk to you "as is" without any warranties, maintenance and support, or service-level commitments. Splunk makes this alpha feature available in its sole discretion and may discontinue it at any time. These documents are not yet publicly available and we ask that you keep such information confidential. Use of alpha features is subject to the Splunk Pre-Release Agreement for Hosted Services.

Monitor the overall performance, quality, estimated cost, and security risk of your AI applications and agents using the AI overview page.

The AI overview page can help you answer questions such as:

  • How is my overall application environment performing, in terms of total errors, latency, and quality issues?

  • What's driving estimated costs and token usage among my applications?

  • Which models and providers are driving errors, latency, and quality issues?

Prerequisites

To monitor AI applications and agents, you must meet the following requirements.

Monitor all AI applications and agents

To monitor all AI applications and agents, use the Splunk Observability Cloud main menu to select APM > AI overview. The following screenshot displays an example of the page.

The AI overview page in Splunk APM.

On the AI overview page, the Requests, Errors, Tokens, and Estimated cost sections of the header display the aggregate metrics across all of your AI applications and agents.

To monitor your AI agents in greater detail with the AI agents page, select View all AI agents. For more information on using this page, see Monitor AI agents with Splunk APM.

Analyze AI applications and agents using overview charts

On the AI overview page, the charts display all metric values in the selected time period for the AI applications and agents in your environment. Use the filters above each chart to update the chart view based on model, provider, or other filters based on the associated metric.

Select any chart in this view to show example traces that match the parameters of the chart.

The following table describes the available charts, the metric(s) associated with each chart, and use cases for each chart.
Chart name Metric name Use this chart to
Requests
CODE
count(agents)

Determine the total number of requests/calls based on spans with chat operations.

This metric indicates the total traffic faced by your AI applications and agents.

Errors
CODE
count(agents)  

where 

sf_error=true

Determine the total number of errors based on spans with chat operations.

This metric is a leading indicator for technical issues faced by your system.

Error rates
CODE
count(agents)  

where 

sf_error=true 

divided by count(agents)

Determine how many errors occurred among your total calls/requests.

A high error rate indicates that a high number of your users are facing issues.

Latency per LLM generation
CODE
percentile[50,90,99](agents)

Determine latency of GenAI spans.

A high latency indicates that your users are facing long wait times for responses.

Latency per provider
CODE
percentile[50,90,99](agents) 

Group by gen_ai.provider.name

Determine latency of GenAI spans by model provider.

Use this metric to determine if any model provider is currently producing slow responses.

Latency per operation
CODE
percentile[50,90,99](agents) 

Group by gen_ai.operation.name

Determine latency of GenAI spans by operation type.

This metric indicates which operations are currently performing slowly and helps guide troubleshooting.

Token usage
CODE
gen_ai.client.token.usage 

Group by 
gen_ai.request.model
gen_ai.provider.name

Track token usage by model or request.

A model or request using a high number of tokens could be experiencing increased traffic or could be wasting resources.

Estimated cost
CODE
gen_ai.cost.input
gen_ai.cost.output

Group by
gen_ai.request.model
gen_ai.provider.name

Track estimated costs by model or request.

A model or request with high estimated costs may indicate high traffic or that re-optimization across models could reduce costs.

Quality issues
CODE
gen_ai.evaluation.score

filter('gen_ai.evaluation.name', 'toxicity')

filter('gen_ai.evaluation.name', 'relevance')

filter('gen_ai.evaluation.name', 'hallucination')

filter('gen_ai.evaluation.name', 'sentiment')

filter('gen_ai.evaluation.name', 'bias')

Group by 
gen_ai.request.model
gen_ai.provider.name

Track negative-scoring evaluations by model and correlate semantic issues with models.

A high number of issues correlated with a specific model may require action to mitigate issues, or shifting traffic to different models to prevent the issues. For more information about quality scores, see About AI agent quality scores.

Risks
CODE
gen_ai.security

Group by
gen_ai.request.model
gen_ai.provider.name

Track security risks by model or provider and correlate security issues with models.

A high number of issues correlated with a specific model may require action to mitigate issues, or shifting traffic to different models to prevent the issues. For more information about monitoring security risks, see Monitor security risks on the AI overview page.

Create a detector to generate alerts from a chart

To create a detector to generate alerts for a chart, select the actions () menu in the chart and select New detector from chart. For more information on detectors and alerts, see Create detectors to trigger alerts.