Configure the Prometheus receiver to collect llm-d metrics

Send llm-d metrics to Splunk Observability Cloud.

You can monitor the performance of your llm-d stack by configuring the Splunk Distribution of the OpenTelemetry Collector to send llm-d metrics to Splunk Observability Cloud.

This solution uses the Prometheus receiver to collect metrics from llm-d, which exposes the Prometheus-compatible /metrics endpoint.

  1. Deploy the Splunk Distribution of the OpenTelemetry Collector to your host or container platform:
  2. To manually activate the Prometheus receiver for llm-d, make the following changes to your Collector values.yaml configuration file.
    1. Add prometheus/llm-d to the receivers section. For example:
      YAML
      agent:
        config:
          receivers:
            prometheus/llm-d:
              config:
                scrape_configs:
                - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
                  job_name: llm-d-epp
                  metrics_path: /metrics
                  scrape_interval: 30s
                  static_configs:
                  - labels:
                      job: llm-d-epp
                      namespace: # llm-d namespace
                      workload: llm-d
                    targets:
                    - ['http://{host}:{port}']
                - job_name: llm-d-decode
                  metrics_path: /metrics
                  scrape_interval: 30s
                  static_configs:
                  - labels:
                      job: llm-d-decode
                      namespace: llm-d
                      workload: llm-d
                    targets:
                    - ['http://{host}:{port}']
    2. Add prometheus/llm-d to the metrics pipeline of the service section. For example:
      YAML
      service:
        pipelines:
          metrics:
            receivers:
              - prometheus/llm-d
  3. Restart the Splunk Distribution of the OpenTelemetry Collector.

Configuration settings

To view the configuration options for the Prometheus receiver, see Settings.

Metrics

The following metrics are available for llm-d. For more information, see Prometheus metrics in the llm-d-inference-sim GitHub repository.

These metrics are considered custom metrics in Splunk Observability Cloud.

Metric name Metric type Description
inference_objective_request_total counter Total inference model requests.
inference_extension_scheduler_e2e_duration_seconds histogram End-to-end scheduling latency.
inference_objective_request_error_total counter Total inference model request errors.
inference_objective_input_tokens histogram Input token count distribution.

Attributes

The following resource attributes are available for llm-d.
Attribute name Description
model_name The specific AI model being served.
service.instance.id Unique identifier for a specific instance of the service.
k8s.cluster.name Name of the Kubernetes cluster where the inference workload is running.
namespace Kubernetes namespace used for logical isolation.
host.name Physical or virtual machine name.
k8s.node.name Kubernetes node name.