Configure the Prometheus receiver to collect vLLM metrics

Send vLLM metrics to Splunk Observability Cloud.

You can monitor the performance of your vLLM engine by configuring the Splunk Distribution of the OpenTelemetry Collector to send vLLM metrics to Splunk Observability Cloud.

This solution uses the Prometheus receiver to collect metrics from vLLM, which exposes the Prometheus-compatible /metrics endpoint.

  1. Deploy the Splunk Distribution of the OpenTelemetry Collector to your host or container platform:
  2. To manually activate the Prometheus receiver for vLLM, make the following changes to your Collector values.yaml configuration file.
    1. Add prometheus/vllm to the receivers section. For example:
      YAML
      agent:
        config:
          receivers:
            prometheus/vllm:
              config:
                scrape_configs:
                - job_name: vllm-worker
                  kubernetes_sd_configs:
                  - namespaces:
                      names:
                      - #namespace
                    role: pod
                  metrics_path: /metrics
    2. Add prometheus/vllm to the metrics pipeline of the service section. For example:
      YAML
      service:
        pipelines:
          metrics:
            receivers:
              - prometheus/vllm
  3. Restart the Splunk Distribution of the OpenTelemetry Collector.

Configuration settings

To view the configuration options for the Prometheus receiver, see Settings.

Metrics

The following metrics are available for vLLM. For more information, see Metrics in the vLLM documentation.

These metrics are considered custom metrics in Splunk Observability Cloud.

Metric name Metric type Description
vllm:num_requests_running gauge Number of requests currently running.
vllm:prompt_tokens_total counter Total number of prompt tokens processed.
vllm:request_success_total counter Number of finished requests.
vllm:request_generation_tokens histogram Histogram of generation token counts.

Attributes

The following resource attributes are available for vLLM.
Attribute name Description
k8s.cluster.name Logical name of the Kubernetes cluster.
model_name AI model being served.
namespace Kubernetes logical partition.
service.instance.id Unique ID for a single pod or process.
k8s.node.name Name of the specific Kubernetes worker node.
host.name Underlying hostname of the physical or virtual machine running the workload.