Configure the Prometheus receiver to collect NVIDIA NIM metrics
Learn how to configure the Prometheus receiver to collect NVIDIA NIM metrics.
You can monitor the performance of NVIDIA NIMs by configuring your Kubernetes cluster to send NVIDIA NIM metrics to Splunk Observability Cloud.
This solution uses the Prometheus receiver to collect metrics from NVIDIA NIM, which can be installed on its own or as part of the NVIDIA NIM Operator. For more information on the NVIDIA NIM Operator, see About the Operator in the NVIDIA documentation. NVIDIA NIM exposes a :8000/metrics endpoint that publishes Prometheus-compatible metrics.
Complete the following steps to collect metrics from NVIDIA NIMs.
You have installed NVIDIA NIM using one of the following methods:
To install NVIDIA NIM separately, see Get Started with NVIDIA NIM for LLMs in the NVIDIA NIM documentation.
To install NVIDIA NIM as part of the NVIDIA NIM Operator, see Installing NVIDIA NIM Operator in the NVIDIA documentation.
You have installed Prometheus for scraping metrics from NVIDIA NIM. For instructions, see Prometheus in the NVIDIA NIM documentation.
- Install the Splunk Distribution of the OpenTelemetry Collector for Kubernetes using Helm.
- To activate the Prometheus receiver for NVIDIA NIM manually in the Collector configuration, make the following changes to your configuration file:
- Restart the Splunk Distribution of the OpenTelemetry Collector.
コンフィギュレーション設定
Prometheus レシーバーの設定オプションについて確認します。
Prometheus レシーバーの設定オプションを表示するには、 [Settings]No Content found for /db/organizations/splunk/repositories/portal-production/content/documents/gdi/opentelemetry/components/prometheus-receiver.dita#d2cb969377e1a48eeb19a47a873f37138/prometheus-receiver-settings を参照してください。
メトリクス
NVIDIA NIM で使用可能なメトリクスについて確認します。
NVIDIA NIM で使用できるメトリクスについて、詳細は NVIDIA ドキュメントの「Observability for NVIDIA NIM for LLMs」をご確認ください。
属性
NVIDIA NIM で利用可能なリソース属性について確認します。
| リソース属性の名称 | タイプ | 説明 | 値の例 |
|---|---|---|---|
model_name | 文字列 | デプロイされたモデルの名称。 |
|
computationId | 文字列 | 一意の計算識別子。 | comp-5678xyz |
Next steps
How to monitor your AI components after you set up Observability for AI.
After you set up data collection from supported AI components to Splunk Observability Cloud, the data populates built-in experiences that you can use to monitor and troubleshoot your AI components.
| Monitoring tool | Use this tool to | Link to documentation |
|---|---|---|
| Built-in navigators | Orient and explore different layers of your AI tech stack. | |
| Built-in dashboards | Assess service, endpoint, and system health at a glance. | |
| Splunk Application Performance Monitoring (APM) service map and trace view | View all of your LLM service dependency graphs and user interactions in the service map or trace view. |