Configure the Prometheus receiver to collect NVIDIA NIM metrics

Learn how to configure the Prometheus receiver to collect NVIDIA NIM metrics.

You can monitor the performance of NVIDIA NIMs by configuring your Kubernetes cluster to send NVIDIA NIM metrics to Splunk Observability Cloud.

This solution uses the Prometheus receiver to collect metrics from NVIDIA NIM, which can be installed on its own or as part of the NVIDIA NIM Operator. For more information on the NVIDIA NIM Operator, see About the Operator in the NVIDIA documentation. NVIDIA NIM exposes a :8000/metrics endpoint that publishes Prometheus-compatible metrics.

Complete the following high-level steps to collect metrics from and monitor your NVIDIA NIMs.

  1. Ensure that you meet the prerequisites.
  2. Configure and activate the component for NVIDIA NIM.
  3. Use the NVIDIA NIM navigator to monitor the performance of your NVIDIA NIMs.

Prerequisites

Learn about the prerequisites required to configure the Prometheus receiver to collect NVIDIA NIM metrics.

To use the Prometheus receiver to collect metrics from NVIDIA NIMs, you must meet the following requirements.
  • You have installed NVIDIA NIM using one of the following methods:

  • You have installed Prometheus for scraping metrics from NVIDIA NIM. For instructions, see Prometheus in the NVIDIA NIM documentation.

Configure and activate the component for NVIDIA NIM

Learn how to configure and activate the component for NVIDIA NIM.

Complete the following steps to configure and activate the component for NVIDIA NIM.
  1. Install the Splunk Distribution of the OpenTelemetry Collector for Kubernetes using Helm.
  2. To activate the Prometheus receiver for NVIDIA NIM manually in the Collector configuration, make the following changes to your configuration file:
    1. Add prometheus/nvidianim to the receivers section. For example:
      prometheus/nvidianim:
        config:
          scrape_configs:
            - job_name: nvidianim-metrics'
              metrics_path: /metrics
              scrape_interval: 15s
              static_configs:
                - targets: ["localhost:8000"]
      
    2. Add prometheus/nvidianim to the metrics pipeline of the service section. For example:
      service:
        pipelines:
          metrics:
            receivers: [prometheus/nvidianim]
  3. Restart the Splunk Distribution of the OpenTelemetry Collector.

Monitor the performance of NVIDIA NIMs

Learn how to navigate to the NVIDIA NIM navigator, which you can use to monitor the performance of NVIDIA NIMs.

Complete the following steps to access the NVIDIA NIM navigator and monitor the performance of NVIDIA NIMs. For more information on navigators, see Use navigators.

  1. From the Splunk Observability Cloud main menu, select Infrastructure.
  2. Under AI/ML, select AI Frameworks.
  3. Select the NVIDIA NIM summary card.

Configuration settings

Learn about the configuration options for the Prometheus receiver.

To view the configuration options for the Prometheus receiver, see Settings.

Metrics

Learn about the available metrics for NVIDIA NIM.

For more information on the metrics available for NVIDIA NIM, see Observability for NVIDIA NIM for LLMs in the NVIDIA documentation.

Attributes

Learn about the available resource attributes are available for NVIDIA NIM.

The following resource attributes are available for NVIDIA NIM.
Resource attribute nameTypeDescriptionExample value
model_namestringThe name of the deployed model.

meta/llama-3.1-8b-instruct

computationIdstringThe unique identifier for the computation.comp-5678xyz

Troubleshoot

Learn how to get help if you can't see your data in Splunk Observability Cloud.

If you are a Splunk Observability Cloud customer and are not able to see your data in Splunk Observability Cloud, you can get help in the following ways:

  • Prospective customers and free trial users can ask a question and get answers through community support in the Splunk Community.