Monitor AI agents with Splunk APM

Monitor and troubleshoot the performance, quality, estimated cost, and security risk of your AI agents with the AI agents page.

Attention:

Alpha features described in this document are provided by Splunk to you "as is" without any warranties, maintenance and support, or service-level commitments. Splunk makes this alpha feature available in its sole discretion and may discontinue it at any time. These documents are not yet publicly available and we ask that you keep such information confidential. Use of alpha features is subject to the Splunk Pre-Release Agreement for Hosted Services.

Monitor and troubleshoot the performance, quality, estimated cost, and security risk of your AI agents with the AI agents page.

The AI agents page can help you answer questions such as:

  • Which of my AI agents are currently degraded in performance?

  • What AI agents are using the most tokens?

  • What quality issues are currently affecting my AI agents?

  • What types of quality issues are most prevalent?

Prerequisites

To monitor AI agents, you must meet the following requirements:

Monitor all AI agents

To monitor all AI agents, select APM > AI agents from the Splunk Observability Cloud main menu. The following screenshot displays an example of the AI agents page.

The AI agents list view in Splunk APM.

On the AI agents page, the panels above the table display the aggregate metrics across all your agents. The table displays a list of the instrumented agents in your environment and their individual metrics. For more information on monitoring security risks, see Monitor security risks on the AI agents page.

Drill down into the detail view of an AI agent

In the table of agents on the AI agents page, select an agent name to navigate to the detail view. The detail view for an agent displays charts for the metrics shown in the table of agents.

Select any chart in this view to show example traces that match the parameters of the chart.

The following screenshot displays an example of the detail view for an agent.

The detail view for an AI agent.

Use the agent detail view to answer questions such as:

  • When did my agent start experiencing errors or issues?

  • Is my agent consuming a high number of tokens?

  • What quality issues is my agent facing?

On the AI agents page, you can use the following methods to view related traces:

  • Select View related traces to navigate to the AI trace data page. This page displays a table of the traces associated with your AI agents and the performance, estimated cost, quality, and risk metrics for each trace.

  • In the table of agents, select the actions () menu in the row for an agent. Select View related AI trace data to navigate to the AI trace data page with a filter set for the agent.

  • In the detail view for an agent, select View related traces to navigate to the AI trace data page with a filter set for the agent.

For more information on using the AI trace data page, see Monitor AI traces and spans with Splunk APM.
Note:

To view AI agent conversation details, you must have a role with the read_apm_ai_conversation capability. This capability is included in the admin and ai_monitoring roles. If you're an admin and want to grant the ai_monitoring rule to a user, see Assign roles to users in Splunk Observability Cloud.

Create a detector to generate alerts for an AI agent

To create a detector to trigger alerts for an AI agent from the AI agents page, select the actions () menu in the row for an agent and select Create Detector. For more information about detectors and alerts, see Create detectors to trigger alerts.

About AI agent quality scores

The percentage of evaluations that passed for a metric. Splunk Observability Cloud reports quality scores for the following metrics:

  • Bias: If responses are fair toward certain groups, ideas, or outcomes.

  • Hallucination: If responses are factually correct or incorrect.

  • Relevance: If responses are on topic, helpful, and match the user's question or task.

  • Sentiment: If the tone of responses are positive, negative, or neutral.

  • Toxicity: How harmful or offensive responses are.

For example, a quality score of 60% means that 60% of a given evaluation passed for the metric. When fewer than 80% of a given evaluation pass for a metric, the agent is flagged as having a quality issue.

Splunk Observability Cloud samples spans to calculate quality scores. The sampling rate is defined in the LLM Providers data integration, which is set up when you enable platform-side evaluations. To update the sampling rate, use the Splunk Observability Cloud main menu to select Data Management > Deployed integrations and edit the LLM Providers integration.