Configure the Prometheus receiver to collect NVIDIA Dynamo metrics

Send NVIDIA Dynamo metrics to Splunk Observability Cloud.

You can monitor the performance of NVIDIA Dynamo by configuring the Splunk Distribution of the OpenTelemetry Collector to send NVIDIA Dynamo metrics to Splunk Observability Cloud.

This solution uses the Prometheus receiver to collect metrics from NVIDIA Dynamo, which exposes the Prometheus-compatible /metrics endpoint.

  1. Deploy the Splunk Distribution of the OpenTelemetry Collector to your host or container platform:
  2. To manually activate the Prometheus receiver for NVIDIA Dynamo, make the following changes to your Collector values.yaml configuration file.
    1. Add prometheus/nvidia-dynamo to the receivers section. For example:
      YAML
      agent:
        config:
          receivers:
            prometheus/nvidia-dynamo:
              config:
                scrape_configs:
                - job_name: dynamo-frontend
                  kubernetes_sd_configs:
                  - namespaces:
                      names:
                      - dynamo
                    role: pod
                  metrics_path: /metrics
                  relabel_configs:
                  - action: keep
                    regex: \Q${env:K8S_NODE_NAME}\E
                    source_labels:
                    - __meta_kubernetes_pod_node_name
                  - action: keep
                    regex: Frontend
                    source_labels:
                    - __meta_kubernetes_pod_label_nvidia_com_dynamo_component
                  - replacement: $1:8000
                    source_labels:
                    - __meta_kubernetes_pod_ip
                    target_label: __address__
                  scrape_interval: 30s
                - job_name: dynamo-vllm-worker
                  kubernetes_sd_configs:
                  - namespaces:
                      names:
                      - dynamo
                    role: pod
                  metrics_path: /metrics
                  relabel_configs:
                  - action: keep
                    regex: \Q${env:K8S_NODE_NAME}\E
                    source_labels:
                    - __meta_kubernetes_pod_node_name
                  - action: keep
                    regex: VllmDecodeWorker
                    source_labels:
                    - __meta_kubernetes_pod_label_nvidia_com_dynamo_component
                  - replacement: $1:9090
                    source_labels:
                    - __meta_kubernetes_pod_ip
                    target_label: __address__
                  scrape_interval: 30s
                - job_name: dynamo-prefill
                  kubernetes_sd_configs:
                  - namespaces:
                      names:
                      - dynamo
                    role: pod
                  metrics_path: /metrics
                  relabel_configs:
                  - action: keep
                    regex: \Q${env:K8S_NODE_NAME}\E
                    source_labels:
                    - __meta_kubernetes_pod_node_name
                  - action: keep
                    regex: PrefillWorker
                    source_labels:
                    - __meta_kubernetes_pod_label_nvidia_com_dynamo_component
                  - replacement: $1:8081
                    source_labels:
                    - __meta_kubernetes_pod_ip
                    target_label: __address__
                  scrape_interval: 30s
                - job_name: dynamo-decode
                  kubernetes_sd_configs:
                  - namespaces:
                      names:
                      - dynamo
                    role: pod
                  metrics_path: /metrics
                  relabel_configs:
                  - action: keep
                    regex: \Q${env:K8S_NODE_NAME}\E
                    source_labels:
                    - __meta_kubernetes_pod_node_name
                  - action: keep
                    regex: DecodeWorker
                    source_labels:
                    - __meta_kubernetes_pod_label_nvidia_com_dynamo_component
                  - replacement: $1:8081
                    source_labels:
                    - __meta_kubernetes_pod_ip
                    target_label: __address__
                  scrape_interval: 30s
    2. Add prometheus/nvidia-dynamo to the metrics pipeline of the service section. For example:
      YAML
      service:
        pipelines:
          metrics:
            receivers:
              - prometheus/nvidia-dynamo
  3. Restart the Splunk Distribution of the OpenTelemetry Collector.

Configuration settings

To view the configuration options for the Prometheus receiver, see Settings.

Metrics

The following metrics are available for NVIDIA Dynamo. For more information, see Metrics in the NVIDIA Dynamo documentation.

These metrics are considered custom metrics in Splunk Observability Cloud.

Metric name Description
dynamo_frontend_* Frontend metrics that measure request handling, token processing, and latency measurements.
dynamo_component_* Component metrics that measure request counts, processing times, byte transfers, and system uptime.
dynamo_preprocessor_* Component-specific metrics.
  • vllm:* (vLLM)

  • sglang:* (SGLang)

  • trtllm_* (TensorRT-LLM)

Metrics related to backend engines. Backend engines expose their own metrics.

Attributes

The following resource attributes are available for NVIDIA Dynamo.

Attribute name Description
service.name Logical name of the component emitting data.
service.instance.id Unique ID for one running instance (pod/replica) of that component.
k8s.cluster.name Name of the Kubernetes cluster hosting the workload.
namespace Kubernetes namespace where Dynamo pods run. Isolates the env/team.
model_name LLM or model these metrics refer to.
nvidia.com/dynamo-graph-deployment-name Label that links pods to one Dynamo graph deployment.