Key concepts in Splunk AI Agent Monitoring
Glossary for key concepts and terminology used in Splunk AI Agent Monitoring.
This glossary describes key concepts and terminology in Splunk AI Agent Monitoring.
AI agent
An advanced software system that uses artificial intelligence to autonomously reason, plan, complete tasks, and make decisions by setting goals, communicating with other agents, or processing information.
The following sections describe concepts related to AI agents.
Tool call
The mechanism by which an AI agent uses an external tool, such as third-party API, to perform an action or access data beyond its LLM model's training data.
Tool calling enables an AI agent to recognize when to use an external tool, interact with the external tool, and incorporate the result into its response. For example, an AI agent can use a tool call to query a database or perform calculations on data.
Evaluation
The process of testing a large-language model (LLM) to assess the quality, accuracy, and safety of its outputs. Observability for AI supports performing instrumentation-side evaluations.
The following sections describe concepts related to evaluations.
Instrumentation-side evaluation
A type of evaluation built into the instrumentation framework for an AI application.
Splunk instrumentation frameworks trigger evaluations performed by DeepEval, an open-source evaluation framework for LLMs. The Splunk Distribution of the OpenTelemetry Collector sends the evaluation results to Splunk Observability Cloud. Splunk Observability Cloud receives and displays evaluation results as quality scores, but does not have visibility into your interaction inputs or outputs.
Splunk Observability Cloud ingests evaluations as events. To ingest evaluations, you must set up an HTTP Event Collector (HEC) token in Splunk Enterprise, set up Log Observer Connect, and configure the Splunk HEC exporter in the Splunk Distribution of the OpenTelemetry Collector configuration file. For instructions, see Set up AI Agent Monitoring.
Quality score
A percentage between 0-100% that measures the bias, hallucination, relevance, sentiment, or toxicity of AI agent responses. Quality scores are calculated from evaluations and are displayed on the Agents page.
Splunk Observability Cloud displays the quality scores for the following categories:
-
Bias: If responses are fair toward certain groups, ideas, or outcomes.
-
Hallucination: If responses are factually correct or incorrect.
-
Relevance: If responses are on-topic, helpful, and match the user's question or task.
-
Sentiment: If the tone of responses are positive, negative, or neutral.
-
Toxicity: How harmful or offensive responses are.
Instrumentation
The process of adding code or configuration to an application to collect observability data (traces, metrics, logs) for monitoring. The following sections describe concepts related to instrumentation.
Zero-code instrumentation
A type of instrumentation that exports telemetry data without modifying application source files. Zero-code instrumentation typically involves adding a configuration to your application and installing a language-specific instrumentation agent.
Code-based instrumentation
A type of instrumentation that requires modifying application code to export telemetry data. Modifying the application's source code allows it to send telemetry data to a local running instance of the OpenTelemetry Collector, which then processes and forwards the data to Splunk Observability Cloud.
Translator
A component that converts telemetry data from one format to another. Splunk Observability Cloud supports using translators to convert telemetry data from AI applications instrumented with supported third-party instrumentation libraries. The translators send the converted telemetry data to Splunk Observability Cloud.
Service
A service is a small, flexible, and autonomous unit of software that connects to other services to make up a complete application. A service typically represents a collection of API endpoints and operations that work together with other services’ endpoints in a distributed and dynamic architecture to deliver the full functionality of an application.
The following sections describe concepts related to services.
Endpoint
In a service API, an endpoint is an access point for a resource or action. For example, an e-commerce service could use the endpoint /ecommerce/users to access user profiles and the endpoint /ecommerce/checkout to perform a checkout action.
Endpoint names are often URLs, but can also be other types of network addresses or communication interfaces. The endpoint name is derived from the name of the first span for each service invoked as part of a trace. In other words, an endpoint is generated when the span.kind of the first span on each service = SERVER or CONSUMER.
Endpoints provide information about how a service is called in a trace. A service typically has one or more endpoints associated with it.
Operation
The actions that a service performs to respond to a request. Each operation in an instrumented service is represented by a span. Operations are derived from span names and describe what the service is doing at any point during a request.
-
Database query:
SQL Select
-
Cache operation:
cache.get
-
Internal function:
convertPrice
-
Batch or background task:
process_order
-
GET /checkout
-
POST /orders
Inferred service
A remote service that is not instrumented in Splunk Observability Cloud, but can be identified by Splunk Observability Cloud based on information in spans that make calls to the remote service. Inferred services often include external service providers, pub/subs, Remote Procedure Calls (RPCs), and databases. To learn more, see Inferred services in Splunk APM .
Instrumented service
A service instrumented by Splunk Observability Cloud, either with the Splunk Distribution of the OpenTelemetry Collector or the Splunk API. To learn more, see Instrument back-end applications to send spans to Splunk APM.
Traces and spans
Spans and traces form the backbone of application monitoring in Splunk APM. The following image illustrates the relationship between traces and spans:
The following sections describe concepts related to traces and spans.
Span
A single operation within a system of applications and services. A group of related spans makes up a trace.
Span attribute
A key-value pair attached to a span that provides metadata about the operation a span represents, such as the operation's location and duration.
Both the keys and values are strings, and span attribute keys for a single span must be unique. Examples of span attribute keys include service.name and http.operation. You can add span attributes to spans during instrumentation or with the Splunk Distribution of the OpenTelemetry Collector.
Span kind
A property that defines the role of a span in an operation and its relationship in the trace hierarchy. Each span kind is identified by an OpenTelemetry span attribute with the syntax gen_ai.operation.name=<value>. Span kinds help you understand what happened in an AI workflow.
Observability for AI supports the following OpenTelemetry span kinds. For more information, see Semantic conventions for generative client AI spans in the OpenTelemetry GitHub repository.
| Span kind | OpenTelemetry span attribute | Represents | Example |
|---|---|---|---|
| Inference | gen_ai.operation.name=chat |
A call to an LLM. | A call to a OpenAI GPT-4 for prompt completion. |
| Embeddings | gen_ai.operation.name=embeddings |
A call to a model or function that generates vector embedding from input. | A call to text-embedding-ada-002 to get text embeddings. |
| Execute tool | gen_ai.operation.name=execute_tool |
A call to a program or service where the call arguments are generated by an LLM. | A call to a web search API or calculator. |
| Retrievals | gen_ai.operation.name=retrieval |
A query to an external knowledge base or vector store to retrieve relevant data. | A call to the vector database Pinecone to fetch documents. |
| Invoke agent | gen_ai.operation.name=invoke_agent |
A specific invocation of an agent. | When a TravelPlanner agent is called, it generates an invoke_agent span. |
| Create agent | gen_ai.operation.name=create_agent |
The creation of an autonomous AI agent, which usually consists of nested workflows, LLMs, tools, and task calls. Identified by the first span where the gen_ai.agent.name attribute either first appears or changes from the parent span. |
A chatbot that answers a create_agent span. |
| Workflow | gen_ai.operation.name=invoke_workflow |
The workflow span can represent:
|
A service that takes a URL and returns a summary of the page, requiring a tool call to fetch the page, some text processing tasks, and an LLM summary. |
| Step | gen_ai.operation.name=step |
A standalone step that does not involve a call to an external service. | A data pre-processing or decision step. |
Trace
A collection of related operations, known as spans, that represents a unique transaction an application handles.
User interface pages
The following sections describe the user interface pages that you can use to monitor AI agents, AI applications, and LLM services.
Agents page
A user interface page that displays your AI agents and metrics related to their performance, quality, and cost. You can use this page to drill down into the detail view of an agent and navigate to related traces and logs.
For more information on using this page, see Monitor AI agents with Splunk APM.
Service map
A user interface page that visualizes your instrumented and inferred services and their relationships. The service map is dynamically generated based on your selections in the time range, environment, business transaction, service, and tag filters.
For more information on monitoring LLM services, see View LLM services on the service map .
Trace view
A user interface page that displays a span waterfall chart for a specific trace. You can use this view to search for spans within a trace.
Trace Analyzer
A user interface page that displays the traces generated by your applications. You can use Trace Analyzer to explore trace data and search traces to find the precise source of a particular issue.