Key concepts in Splunk AI Agent Monitoring

AI agent

An advanced software system that uses artificial intelligence to autonomously reason, plan, complete tasks, and make decisions by setting goals, communicating with other agents, or processing information.

The following sections describe concepts related to AI agents.

Tool call

The mechanism by which an AI agent uses an external tool, such as third-party API, to perform an action or access data beyond its LLM model's training data.

Tool calling enables an AI agent to recognize when to use an external tool, interact with the external tool, and incorporate the result into its response. For example, an AI agent can use a tool call to query a database or perform calculations on data.

Evaluation

The process of testing a large-language model (LLM) to assess the quality, accuracy, and safety of its outputs. Observability for AI supports performing instrumentation-side evaluations.

The following sections describe concepts related to evaluations.

Instrumentation-side evaluation

A type of evaluation built into the instrumentation framework for an AI application.

Splunk instrumentation frameworks trigger evaluations performed by DeepEval, an open-source evaluation framework for LLMs. The Splunk Distribution of the OpenTelemetry Collector sends the evaluation results to Splunk Observability Cloud. Splunk Observability Cloud receives and displays evaluation results as quality scores, but does not have visibility into your interaction inputs or outputs.

Splunk Observability Cloud ingests evaluations as events. To ingest evaluations, you must set up an HTTP Event Collector (HEC) token in Splunk Enterprise, set up Log Observer Connect, and configure the Splunk HEC exporter in the Splunk Distribution of the OpenTelemetry Collector configuration file. For instructions, see Set up AI Agent Monitoring.

Quality score

A percentage between 0-100% that measures the bias, hallucination, relevance, sentiment, or toxicity of AI agent responses. Quality scores are calculated from evaluations and are displayed on the Agents page.

Splunk Observability Cloud displays the quality scores for the following categories:

Bias: If responses are fair toward certain groups, ideas, or outcomes.
Hallucination: If responses are factually correct or incorrect.
Relevance: If responses are on-topic, helpful, and match the user's question or task.
Sentiment: If the tone of responses are positive, negative, or neutral.
Toxicity: How harmful or offensive responses are.

Instrumentation

The process of adding code or configuration to an application to collect observability data (traces, metrics, logs) for monitoring. The following sections describe concepts related to instrumentation.

Zero-code instrumentation

A type of instrumentation that exports telemetry data without modifying application source files. Zero-code instrumentation typically involves adding a configuration to your application and installing a language-specific instrumentation agent.

Code-based instrumentation

A type of instrumentation that requires modifying application code to export telemetry data. Modifying the application's source code allows it to send telemetry data to a local running instance of the OpenTelemetry Collector, which then processes and forwards the data to Splunk Observability Cloud.

Translator

A component that converts telemetry data from one format to another. Splunk Observability Cloud supports using translators to convert telemetry data from AI applications instrumented with supported third-party instrumentation libraries. The translators send the converted telemetry data to Splunk Observability Cloud.

Service

A service is a small, flexible, and autonomous unit of software that connects to other services to make up a complete application. A service typically represents a collection of API endpoints and operations that work together with other services’ endpoints in a distributed and dynamic architecture to deliver the full functionality of an application.

The following sections describe concepts related to services.

Endpoint

In a service API, an endpoint is an access point for a resource or action. For example, an e-commerce service could use the endpoint /ecommerce/users to access user profiles and the endpoint /ecommerce/checkout to perform a checkout action.

Endpoint names are often URLs, but can also be other types of network addresses or communication interfaces. The endpoint name is derived from the name of the first span for each service invoked as part of a trace. In other words, an endpoint is generated when the span.kind of the first span on each service = SERVER or CONSUMER.

Endpoints provide information about how a service is called in a trace. A service typically has one or more endpoints associated with it.

Operation

The actions that a service performs to respond to a request. Each operation in an instrumented service is represented by a span. Operations are derived from span names and describe what the service is doing at any point during a request.

In the context of an e-commerce application, examples of operations that are not endpoints include:

Database query: SQL Select

Cache operation: cache.get

Internal function: convertPrice

Batch or background task: process_order

Examples of operations that are also endpoints include:

GET /checkout

POST /orders

Inferred service

A remote service that is not instrumented in Splunk Observability Cloud, but can be identified by Splunk Observability Cloud based on information in spans that make calls to the remote service. Inferred services often include external service providers, pub/subs, Remote Procedure Calls (RPCs), and databases. To learn more, see Inferred services in Splunk APM .

Instrumented service

A service instrumented by Splunk Observability Cloud, either with the Splunk Distribution of the OpenTelemetry Collector or the Splunk API. To learn more, see Instrument back-end applications to send spans to Splunk APM.

Traces and spans

Spans and traces form the backbone of application monitoring in Splunk APM. The following image illustrates the relationship between traces and spans: This image shows a trace represented by a series of multicolored bars labeled with the letters A, B, C, D, and E. Each lettered bar represents a single span. The spans are organized to visually represent a hierarchical relationship in which span A is the parent span and the subsequent spans are its children.

The following sections describe concepts related to traces and spans.

Span

A single operation within a system of applications and services. A group of related spans makes up a trace.

Span attribute

A key-value pair attached to a span that provides metadata about the operation a span represents, such as the operation's location and duration.

Both the keys and values are strings, and span attribute keys for a single span must be unique. Examples of span attribute keys include service.name and http.operation. You can add span attributes to spans during instrumentation or with the Splunk Distribution of the OpenTelemetry Collector.

Span kind

A property that defines the role of a span in an operation and its relationship in the trace hierarchy. Each span kind is identified by an OpenTelemetry span attribute with the syntax gen_ai.operation.name=<value>. Span kinds help you understand what happened in an AI workflow.

Observability for AI supports the following OpenTelemetry span kinds. For more information, see Semantic conventions for generative client AI spans in the OpenTelemetry GitHub repository.


Span kind	OpenTelemetry span attribute	Represents	Example
Inference	`gen_ai.operation.name=chat`	A call to an LLM.	A call to a OpenAI GPT-4 for prompt completion.
Embeddings	`gen_ai.operation.name=embeddings`	A call to a model or function that generates vector embedding from input.	A call to `text-embedding-ada-002` to get text embeddings.
Execute tool	`gen_ai.operation.name=execute_tool`	A call to a program or service where the call arguments are generated by an LLM.	A call to a web search API or calculator.
Retrievals	`gen_ai.operation.name=retrieval`	A query to an external knowledge base or vector store to retrieve relevant data.	A call to the vector database Pinecone to fetch documents.
Invoke agent	`gen_ai.operation.name=invoke_agent`	A specific invocation of an agent.	When a TravelPlanner agent is called, it generates an `invoke_agent` span.
Create agent	`gen_ai.operation.name=create_agent`	The creation of an autonomous AI agent, which usually consists of nested workflows, LLMs, tools, and task calls. Identified by the first span where the gen_ai.agent.name attribute either first appears or changes from the parent span.	A chatbot that answers a `create_agent` span.
Workflow	`gen_ai.operation.name=invoke_workflow`	The workflow span can represent: The root GenAI Span, which identifies an AI application with any predetermined sequence of operations which include LLM calls, agent or sub-workflow invocation, and any surrounding contextual operations. A sub-workflow, which is the root span with a new gen_ai.workflow.name attribute that is different from the equivalent attribute on the parent span.	A service that takes a URL and returns a summary of the page, requiring a tool call to fetch the page, some text processing tasks, and an LLM summary.
Step	`gen_ai.operation.name=step`	A standalone step that does not involve a call to an external service.	A data pre-processing or decision step.

Trace

A collection of related operations, known as spans, that represents a unique transaction an application handles.

User interface pages

The following sections describe the user interface pages that you can use to monitor AI agents, AI applications, and LLM services.

Agents page

A user interface page that displays your AI agents and metrics related to their performance, quality, and cost. You can use this page to drill down into the detail view of an agent and navigate to related traces and logs.

For more information on using this page, see Monitor AI agents with Splunk APM.

Service map

A user interface page that visualizes your instrumented and inferred services and their relationships. The service map is dynamically generated based on your selections in the time range, environment, business transaction, service, and tag filters.

For more information on monitoring LLM services, see View LLM services on the service map .

Trace view

A user interface page that displays a span waterfall chart for a specific trace. You can use this view to search for spans within a trace.

For more information on monitoring AI traces, see:

Trace Analyzer

A user interface page that displays the traces generated by your applications. You can use Trace Analyzer to explore trace data and search traces to find the precise source of a particular issue.

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

Enterprise Security

SOAR

IT Service Intelligence

Content Packs

Splunk Observability Cloud

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

Developer Documentation

Splunkbase

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

DATA MANAGEMENT

SEARCH AND ANALYTICS

ADMINISTRATION

Enterprise Security

SOAR

ENTERPRISE SECURITY

SOAR

RELATED APPS

IT Service Intelligence

Content Packs

ITSI

IT Ops

ADMINISTRATION

EXTENSIONS

Splunk Observability Cloud

MONITORING

DATA MANAGEMENT

ADMINISTRATION

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

ESSENTIALS

MONITORING

ADMINISTRATION

Developer Documentation

Splunkbase

PLATFORM

OBSERVABILITY

REFERENCE

Resources

REFERENCE

Learn More

Support

AI agent

Tool call

Evaluation

Instrumentation-side evaluation

Quality score

Instrumentation

Zero-code instrumentation

Code-based instrumentation

Translator

Service

Endpoint

Operation

Inferred service

Instrumented service

Traces and spans

Span

Span attribute

Span kind

Trace

User interface pages

Agents page

Service map

Trace view

Trace Analyzer