Configure the Prometheus receiver to collect vLLM metrics
Send vLLM metrics to Splunk Observability Cloud.
You can monitor the performance of your vLLM engine by configuring the Splunk Distribution of the OpenTelemetry Collector to send vLLM metrics to Splunk Observability Cloud.
This solution uses the Prometheus receiver to collect metrics from vLLM, which exposes the Prometheus-compatible /metrics endpoint.
- Deploy the Splunk Distribution of the OpenTelemetry Collector to your host or container platform:
- To manually activate the Prometheus receiver for vLLM, make the following changes to your Collector
values.yamlconfiguration file. - Restart the Splunk Distribution of the OpenTelemetry Collector.
Configuration settings
To view the configuration options for the Prometheus receiver, see Settings.
Metrics
The following metrics are available for vLLM. For more information, see Metrics in the vLLM documentation.
These metrics are considered custom metrics in Splunk Observability Cloud.
| Metric name | Metric type | Description |
|---|---|---|
vllm:num_requests_running |
gauge | Number of requests currently running. |
vllm:prompt_tokens_total |
counter | Total number of prompt tokens processed. |
vllm:request_success_total |
counter | Number of finished requests. |
vllm:request_generation_tokens |
histogram | Histogram of generation token counts. |
Attributes
| Attribute name | Description |
|---|---|
k8s.cluster.name |
Logical name of the Kubernetes cluster. |
model_name |
AI model being served. |
namespace |
Kubernetes logical partition. |
service.instance.id |
Unique ID for a single pod or process. |
k8s.node.name |
Name of the specific Kubernetes worker node. |
host.name |
Underlying hostname of the physical or virtual machine running the workload. |
Next steps
After you set up data collection, the data populates built-in dashboards that you can use to monitor and troubleshoot your instances.
For more information on using built-in dashboards in Splunk Observability Cloud, see: