Monitor AI agents with Splunk APM

Monitor the performance, token usage, and quality of your AI agents with Splunk APM.

Monitor the performance, quality, and cost of your AI agents with Splunk APM.

The Agents page can help you answer questions such as:

  • Which of my AI agents are currently degraded in performance?

  • What AI agents are using the most tokens?

  • What quality issues are currently affecting my AI agents?

  • What types of quality issues are most prevalent?

Prerequisites

To monitor AI agents with Splunk APM, set up AI Agent Monitoring.

Note: Ensure that your Log Observer Connect index is set to the index that contains your AI trace data. For instructions, see step 4 of

Set up AI Agent Monitoring.

Monitor all AI agents

To monitor all AI agents, select APM > Agents from the Splunk Observability Cloud main menu. The following screenshot displays an example of the Agents page.

The Agents page in Splunk APM.

On the Agents page, the panels above the table display the aggregate metrics across all your agents. The table displays a list of the instrumented agents in your environment and their individual metrics.

In the table of agents on the Agents page, select the icon in the Related logs column to navigate to the Logs page. This page displays a table of related logs.

Select a log from the table to view additional details about the AI agent calls. You can select the Trace ID or Span ID to display an option to navigate to the related trace or span.

Drill down into the detail view of an AI agent

In the table of agents on the Agents page, select an agent name to navigate to the detail view. The detail view for an agent displays charts for the metrics shown in the table of agents.

The following screenshot displays an example of the detail view for an agent.

Detail view for an AI agent in Splunk APM.

Use the agent detail view to answer questions such as:

  • When did my agent start experiencing errors or issues?

  • Is my agent consuming a high number of tokens?

  • What quality issues is my agent facing?

About AI agent quality scores

Splunk Observability Cloud evaluates interactions using quality scores on a scale of 0-100% that measure the toxicity, sentiment, bias, hallucinations, and relevance of AI agent responses. Quality scores below 50% count towards the Quality Issues field in the Agents table.

Quality scores are calculated from instrumentation-side evaluations. The instrumentation frameworks for your AI applications trigger evaluations performed by DeepEval, an open-source evaluation framework for LLMs, and the Splunk Distribution of the OpenTelemetry Collector sends the evaluation results to Splunk Observability Cloud. Splunk Observability Cloud receives and displays evaluation results, but does not have visibility into your interaction inputs or outputs.

By default, Splunk Observability Cloud samples all collected spans to calculate quality scores. To control the sample rate, you can configure the OTEL_INSTRUMENTATION_GENAI_EVALUATION_SAMPLE_RATE setting when you instrument your application with the Splunk Distribution of the OpenTelemetry Collector. For more information on this setting, see Configure the Python agent for AI applications.