Configure the Prometheus receiver to collect vLLM metrics

Send vLLM metrics to Splunk Observability Cloud.

You can monitor the performance of your vLLM engine by configuring the Splunk Distribution of the OpenTelemetry Collector to send vLLM metrics to Splunk Observability Cloud.

This solution uses the Prometheus receiver to collect metrics from vLLM, which exposes the Prometheus-compatible /metrics endpoint.

Deploy the Splunk Distribution of the OpenTelemetry Collector to your host or container platform:
To manually activate the Prometheus receiver for vLLM, make the following changes to your Collector values.yaml configuration file.
1. Add prometheus/vllm to the receivers section. For example:
  YAML
  agent: config: receivers: prometheus/vllm: config: scrape_configs: - job_name: vllm-worker kubernetes_sd_configs: - namespaces: names: - #namespace role: pod metrics_path: /metrics
```
agent:
  config:
    receivers:
      prometheus/vllm:
        config:
          scrape_configs:
          - job_name: vllm-worker
            kubernetes_sd_configs:
            - namespaces:
                names:
                - #namespace
              role: pod
            metrics_path: /metrics
```
2. Add prometheus/vllm to the metrics pipeline of the service section. For example:
  YAML
  service: pipelines: metrics: receivers: - prometheus/vllm
```
service:
  pipelines:
    metrics:
      receivers:
        - prometheus/vllm
```
Restart the Splunk Distribution of the OpenTelemetry Collector.

Configuration settings

To view the configuration options for the Prometheus receiver, see Settings.

Metrics

The following metrics are available for vLLM. For more information, see Metrics in the vLLM documentation.

These metrics are considered custom metrics in Splunk Observability Cloud.


Metric name	Metric type	Description
`vllm:num_requests_running`	gauge	Number of requests currently running.
`vllm:prompt_tokens_total`	counter	Total number of prompt tokens processed.
`vllm:request_success_total`	counter	Number of finished requests.
`vllm:request_generation_tokens`	histogram	Histogram of generation token counts.

Attributes

The following resource attributes are available for vLLM.


Attribute name	Description
`k8s.cluster.name`	Logical name of the Kubernetes cluster.
`model_name`	AI model being served.
`namespace`	Kubernetes logical partition.
`service.instance.id`	Unique ID for a single pod or process.
`k8s.node.name`	Name of the specific Kubernetes worker node.
`host.name`	Underlying hostname of the physical or virtual machine running the workload.

Next steps

After you set up data collection, the data populates built-in dashboards that you can use to monitor and troubleshoot your instances.

For more information on using built-in dashboards in Splunk Observability Cloud, see:

Built-in dashboards

View dashboards in Splunk Observability Cloud

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

Enterprise Security

SOAR

IT Service Intelligence

Content Packs

Splunk Observability Cloud

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

Developer Documentation

Splunkbase

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

DATA MANAGEMENT

SEARCH AND ANALYTICS

ADMINISTRATION

Enterprise Security

SOAR

ENTERPRISE SECURITY

SOAR

RELATED APPS

IT Service Intelligence

Content Packs

ITSI

IT Ops

ADMINISTRATION

EXTENSIONS

Splunk Observability Cloud

MONITORING

DATA MANAGEMENT

ADMINISTRATION

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

ESSENTIALS

MONITORING

ADMINISTRATION

Developer Documentation

Splunkbase

PLATFORM

OBSERVABILITY

REFERENCE

Resources

REFERENCE

Learn More

Support

Configure the Prometheus receiver to collect vLLM metrics

Configuration settings

Metrics

Attributes

Next steps