AI Agent Monitoring

The AI Agent Monitoring dashboard offers a comprehensive view of the performance and usage of large language model (LLM)-based applications. It enables end-to-end monitoring across the entire LLM stack, including applications, infrastructure, databases, and GenAI workload metrics. The dashboard provides key insights into token consumption, request volume, and latency, helping you understand the operational efficiency and resource utilization of your AI models. With this dashboard, you can:

Gain deep visibility with enhanced flowmaps, transaction snapshots (traces), monitoring of vector databases, and GPU or compute resource layers.
Track resource usage and performance at both the model and application levels.
Correlate LLM activity with business transactions and key performance indicators.
Detect anomalies and identify performance issues in LLM-based applications.

Prerequisites

The AI Agent Monitoring dashboard displays the relevant data only when you meet the following conditions:

Events Service version is 25.10.0 or higher
Python Agent version is 25.10.0 or higher
Configured the Python Agent to capture and publish events of LLMs and Vector Databases.

Dashboard Overview

The dashboard is designed to provide a quick glance of critical AI agent statistics over a specified time range. You can filter the data by application, LLM provider, or model.

When you click on any metric, it redirects you to the Analytics search page with the corresponding ADQL query and filters applied. You can then double click an event to view more details.

Key Performance Indicators (KPIs)

The following key metrics offer valuable insights into the performance and usage of your LLM-based applications:


Metric	Description
Total Tokens	The cumulative sum of input and output tokens processed by the LLMs.
Input Tokens	The total number of tokens sent to the LLMs.
Output Tokens	The total number of tokens received from the LLMs.
Total Requests	The total number of requests made to the LLMs.
Successful Requests	The number of LLM requests that completed successfully.
LLM Latency (s)	The average time taken for LLM responses in seconds.
LLM Usage by Volume	A visual representation (pie chart) of the distribution of different LLM models used, indicating their relative invocation volume.

Time Series Analysis

The time series charts provide a historical perspective on various metrics, allowing you to identify trends and anomalies:

Token Usage Time Series: Displays the trend of total, input, and output tokens over time, to identify spikes.
LLM Request Trend Analysis: Displays the trend of successful and failed LLM requests over time, to detect anomalies.
Time to First Token (TTFT): Displays the time taken to receive the first token of a response from an LLM, indicating initial response speed.

Top LLMs and Business Transactions

The bar charts highlight the most active components of your AI applications:

Top 10 LLMs by Invocation Volume: Displays the ten LLM models with the highest invocation volume, along with their success and failure rates. This helps identify heavily utilized or problematic models.
Top 10 Business Transactions by Invocation Volume: Displays the ten business transactions that are making the most LLM calls, along with their success and failure rates. This is vital for understanding which parts of your application are most reliant on AI and where performance bottlenecks might occur.

Vector Database Metrics

This section focuses on the performance and efficiency of your Vector Databases:

Vector DB Avg Documents Retrieved: Displays the average number of documents retrieved from Vector Databases over time.
Vector DB Similarity Score: Displays the trend of similarity scores from Vector Database queries, indicating the relevance of retrieved information.
Vector DB Latency: Displays the latency of operations performed on the Vector Databases, helping to ensure quick data retrieval.

Create Custom Dashboards

Create custom dashboards to monitor OpenAI, LangChain, and AWS Bedrock:

See: