Configure the Python agent for AI applications (0.1.14 and higher)

Configure the Python agent from the Splunk Distribution of the OpenTelemetry Python to meet your AI application instrumentation and evaluation needs.

Note:

This topic describes the settings for the OpenTelemetry GenAI utility 0.1.14 and higher.

For settings in 0.1.13 and lower, see Configure the Python agent for AI applications (0.1.13 and lower).

You can configure the Python agent from the Splunk Distribution of the OpenTelemetry Python to meet your AI application instrumentation and evaluation needs. For more information about the Python agent, see About the Splunk Distribution of OpenTelemetry Python.

Configuration methods

You can change the agent settings by setting environment variables. For example:
CODE
export OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE=DELTA

Instrumentation configuration settings

The following settings control instrumentation for AI applications.
Configuration setting Description Required?
OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE

Determines if the OTLP metric exporter reports cumulative totals, deltas, or low-memory-friendly temporality for emitted metrics.

Accepted values:
  • DELTA
  • CUMULATIVE

  • LOWMEMORY

Yes
OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED

Enriches the Python logger to include trace and span correlation fields. Defaults to true.

Accepted values: true, false

No
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT

Determines if input, output, and system messages are included in spans and logs (as events).

Designates the telemetry to include messages on. Options include spans as attributes, events as bodies, or both.

Accepted values:

  • NO_CONTENT (default)

  • SPAN_AND_EVENT (for instrumentation-side evaluations)

  • SPAN_ONLY (for platform-side evaluations)

  • EVENT_ONLY

This setting uses the values defined by the OpenTelemetry semantic conventions. This list of values is provided as a frame of reference. For the latest values,

see OpenTelemetry OpenAI Agents Instrumentation in the opentelemetry-python-contrib GitHub repository.

No
OTEL_INSTRUMENTATION_GENAI_EMITTERS

Controls what telemetry data is generated and emitted during GenAI operations, such as LLM calls and agent invocations. Defaults to span.

Accepted values:

  • span

  • span_metric (for platform-side evaluations)

  • span_metric_event (for instrumentation-side evaluations)

  • span_metric_event,splunk

    Note: To use the span_metric_event,splunk value, you must first install the required package by running pip install splunk-otel-genai-emitters-splunk.

No
OTEL_INSTRUMENTATION_GENAI_DEBUG

Enables opt-in debug logging for GenAI telemetry operations. Helps troubleshoot instrumentation issues by logging internal events without dumping full message content.

Accepted values: true, false

No

Evaluation configuration settings

The following settings control evaluations for AI applications.
Configuration setting Description Required?
OTEL_INSTRUMENTATION_GENAI_EVALS_EVALUATORS

Determines the metric types that are run by the evaluator for AgentInvocation and LLMInvocation. If this variable isn't set, all registered evaluators are enabled with their default metric sets and default thresholds.

Examples of accepted values:

  • Deepeval: The default value. Runs the default DeepEval bundle, which measures all of the metrics for every metric type (bias, toxicity, relevance, hallucination, and sentiment).

  • Deepeval(LLMInvocation(bias,toxicity),AgentInvocation(hallucination)): Runs DeepEval evaluations only for bias and toxicity for LLM invocations, and only for hallucination for agent invocations.

  • Deepeval(LLMInvocation(bias(threshold=0.75)),AgentInvocation(bias(threshold=0.5))): Overrides the bias threshold. Sets a bias level of over 0.75 as positive for LLM invocations. Sets a bias level of over 0.5 as positive for agent invocations. Does not run the remaining metrics.

  • none: Deactivates the evaluator.

No
OTEL_INSTRUMENTATION_GENAI_EVALS_RESULTS_AGGREGATION

Condenses evaluations into a single event. Defaults to false.

Accepted values: true, false

No
OTEL_INSTRUMENTATION_GENAI_EVALUATION_SAMPLE_RATE

Determines the sample rate of traces for evaluations. Sampling decisions are made probabilistically based on rate, where rate is the probability (between 0.0 and 1.0) that a span will be sampled for evaluation.

If this setting isn't configured, it defaults to 1 and all spans will be sampled for evaluation.

Accepted values: Values between 0.0 and 1.0.

No
OTEL_INSTRUMENTATION_GENAI_EVALS_SEPARATE_PROCESS

Determines whether the instrumentation framework runs evaluations in a separate process. Use true to run evaluations in a child process with the OpenTelemetry SDK deactivated and prevent evaluator LLM calls from polluting application telemetry.

Defaults to false, meaning that the instrumentation frameworks run evaluations in the same process as your application. LLM calls from evaluators such as DeepEval are instrumented alongside application telemetry.

Accepted values: true, false

  • Required for OpenAI instrumentation when evaluations are enabled

  • Optional for all other instrumentation frameworks

OTEL_INSTRUMENTATION_GENAI_EMITTERS_EVALUATION

Customizes which emitters handle evaluation results.

Accepted values: replace-category:SplunkEvaluationResults

No
OTEL_INSTRUMENTATION_GENAI_CAPTURE_TOOL_DEFINITIONS

Enables capture of gen_ai.tool.definitions on LLM invocation spans.

Requires OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT to also be enabled. When both variables are set to truthy values, the JSON-serialized tool/function schemas passed to the model are recorded as a span attribute.

Defaults to false because the payload can be large.

Accepted values: Any truthy or falsy value

No
OTEL_INSTRUMENTATION_GENAI_CONTEXT_PROPAGATION

Activates or deactivates automatic propagation of GenAI context (conversation_id and association properties) to child spans.

Defaults to true. Set to false to prevent context attributes from being automatically copied to nested GenAI invocations. When deactivated, only values explicitly set on each invocation object are emitted.

No
OTEL_INSTRUMENTATION_GENAI_CONTEXT_INCLUDE_IN_METRICS

Comma-separated list of GenAI context attribute keys to include as metric dimensions. Default value is empty; context attributes aren't included on metrics.

Accepted values: Any association property key or gen_ai.conversation.id. Examples of accepted values:
CODE
gen_ai.conversation.id
user.id,customer.id
all
all includes gen_ai.conversation.id, plus all association properties set on the invocation. Including gen_ai.conversation.id in metrics may cause high-cardinality issues. Use selective property keys for lower cardinality.
No
OTEL_INSTRUMENTATION_GENAI_ROOT_SPAN_AS_WORKFLOW

When set to a truthy value, the root GenAI span created by instrumentation frameworks uses the Workflow type instead of the default AgentInvocation type. Set to truthy value if you need an explicit workflow root.

Defaults to false, meaning the root span uses the AgentInvocation type. This type aligns with OpenTelemetry GenAI semantic conventions.

Accepted values: Any truthy or falsy value

No
OTEL_INSTRUMENTATION_GENAI_EMIT_EVENT

Controls whether GenAI content log events (such as gen_ai.client.inference.operation.details) are emitted. Accepted values: true or false.

When unset, this setting defaults to the OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT mode:

  • true for EVENT_ONLY and SPAN_AND_EVENT

  • false for NO_CONTENT and SPAN_ONLY

No
OTEL_INSTRUMENTATION_GENAI_EVALS_INTERVAL

Polling interval in seconds for the evaluation worker loop. Defaults to 5.0 seconds.

No
OTEL_INSTRUMENTATION_GENAI_EVALS_QUEUE_SIZE

Maximum size of the evaluation queue. When set to a positive integer, the queue becomes bounded and will apply backpressure when full, which means that new items are dropped with a warning.

When unset or set to 0, the queue is unbounded. This is the default behavior.

Recommended values: 100 to 1000, depending on memory constraints and evaluation throughput requirements.

No
OTEL_INSTRUMENTATION_GENAI_EVALS_CONCURRENT

Enables concurrent evaluation processing. When set to a truthy value, evaluations are processed concurrently using multiple worker threads and asynchronous LLM calls.

When unset or set to a falsy value, evaluations are processed sequentially. Concurrent mode significantly improves throughput for LLM-as-a-judge evaluations.

No
OTEL_INSTRUMENTATION_GENAI_EVALUATION_SAMPLE_RATE

Determines the trace-id ratio sampling rate for evaluations.

Accepted values: Values between 0.0 and 1.0.

No
OTEL_GENAI_EVAL_DEBUG_SKIPS

Determines if logs are created when measurements are skipped.

Accepted values: true, false

No
OTEL_GENAI_EVAL_DEBUG_EACH

Determines if a log is created for each evaluation result.

Accepted values: true, false

No
DEEPEVAL_FILE_SYSTEM

Determines if DeepEval can write temporary artifacts to the filesystem.

Use READ_ONLY in locked down environments. Defaults to READ_WRITE.

Accepted values: READ_WRITE, READ_ONLY

No

DeepEval custom LLM provider settings

The default LLM provider for DeepEval evaluations is OpenAI. The following settings can be used to configure a custom LLM provider for DeepEval evaluations. These settings do not affect OpenAI behavior.
Configuration setting Description Required?
DEEPEVAL_LLM_BASE_URL

The custom LLM endpoint URL. Required if you want to use a custom LLM provider for DeepEval evaluations instead of OpenAI.

This setting creates a LiteLLMModel configured for the custom endpoint. It supports both static API keys and OAuth2 token-based authentication.

No
DEEPEVAL_LLM_MODEL The LLM model name. Defaults to gpt-4o-mini. No
DEEPEVAL_LLM_PROVIDER The LLM provider identifier for the model prefix. Defaults to openai. No
DEEPEVAL_LLM_API_KEY The static API key. Only used for providers that do not require OAuth2 token-based authentication. Use this setting or DEEPEVAL_LLM_TOKEN_URL (which enables OAuth2), not both. No
DEEPEVAL_LLM_EXTRA_HEADERS

A JSON-formatted string containing key-value pairs that will be added as HTTP headers to all LLM API requests. Use this setting if your API gateway requires custom headers for authentication or tracking.

Example value: '{"system-code": "APP-123", "x-tenant-id": "tenant-abc"}'.

LiteLLM does not natively support setting extra_headers with environment variables and requires programmatically passing this parameter. This setting is provided for DeepEval users who need to configure custom headers without code changes. For more details on headers in LiteLLM, see Optional Fields in the LiteLLM documentation.

No
DEEPEVAL_LLM_CLIENT_APP_NAME The application key and name. No
DEEPEVAL_LLM_TOKEN_URL

The OAuth2 token endpoint. Used for providers that require OAuth2 token-based authentication. This setting enables OAuth2 mode for DeepEval.

Example value: https://identity.example.com/oauth2/default/v1/token

No
DEEPEVAL_LLM_CLIENT_ID

The OAuth2 client ID.

Used for providers that require OAuth2 token-based authentication. Requires DEEPEVAL_LLM_TOKEN_URL to be set.

No
DEEPEVAL_LLM_CLIENT_SECRET

The OAuth2 client secret. Used for providers that require OAuth2 token-based authentication. Requires DEEPEVAL_LLM_TOKEN_URL to be set.

No