NVIDIA NIM Metrics

NVIDIA NIM metrics in AppDynamics are documented for the llm service. They cover running requests, token metrics, cache utilization, finished-request counts, and latency approximations.

Prerequisites

Ensure that:
  • NIM llm service is deployed in the nim namespace
  • Prometheus-compatible metrics are exposed at /v1/metrics

Enable Prometheus Scraping for NVIDIA NIM

The following are example values from this repo:
  • service: llm
  • namespace: nim
  • port: 8000
  • path: /v1/metrics

Replace these values with the NIM LLM service name and namespace used in the target environment.

Configure Machine Agent Ingestion

Infrastructure Visibility Prometheus monitoring loads the NIM exporter definition through prometheus-config-template.yaml.

Before enabling the scrape, update the exporter YAML service discovery fields to the service name and namespace used by your NIM deployment.

Exporter YAML Contract

  • exporter-yamls/nim-for-llms-exporter.yaml
  • key direct metrics:
    • num_requests_running
    • prompt_tokens_total
    • generation_tokens_total
    • request_finish_total
    • gpu_cache_usage_perc
  • computed-metric source series (drive the latency and per-request metrics):
    • e2e_request_latency_seconds_sum / e2e_request_latency_seconds_count
    • time_to_first_token_seconds_sum / time_to_first_token_seconds_count
    • time_per_output_token_seconds_sum / time_per_output_token_seconds_count
    • request_prompt_tokens_sum / request_prompt_tokens_count
    • request_generation_tokens_sum / request_generation_tokens_count
  • key computed metrics:
    • Avg E2E Latency (ms)
    • Avg TTFT (ms)
    • Avg TPOT (ms)
    • Total Tokens

    • Prompt Tokens per Request
    • Generation Tokens per Request

Expected AppDynamics Custom Metric Paths

  • Custom Metrics|NIM|LLMs|{model_name}|Requests Running
  • Custom Metrics|NIM|LLMs|{model_name}|KV Cache Utilization (%)
  • Custom Metrics|NIM|LLMs|{model_name}|Prompt Tokens
  • Custom Metrics|NIM|LLMs|{model_name}|Generation Tokens
  • Custom Metrics|NIM|LLMs|{model_name}|{finished_reason}|Finished Requests
  • Custom Metrics|NIM|LLMs|{model_name}|Avg E2E Latency (ms)
  • Custom Metrics|NIM|LLMs|{model_name}|Avg TTFT (ms)
  • Custom Metrics|NIM|LLMs|{model_name}|Avg TPOT (ms)
  • Custom Metrics|NIM|LLMs|All Models|Prompt Tokens
  • Custom Metrics|NIM|LLMs|All Models|Generation Tokens
  • Custom Metrics|NIM|LLMs|{model_name}|Prompt Tokens per Request
  • Custom Metrics|NIM|LLMs|{model_name}|Generation Tokens per Request
Prompt Tokens and Generation Tokens are interval delta metrics. The All Models|... leaves are aggregate mirrors across models. The per-request leaves are derived metrics built from histogram _sum and _count sources.

Create Custom Dashboard

The custom dashboard script generates ready-to-import AppDynamics dashboard JSON files from a set of templates. You supply your environment's node names and, optionally, the custom metric path prefixes. The script substitutes them into the templates and writes the JSON files. See Create Custom Dashboards for AI Pods.

Troubleshooting

AppDynamics approximates the Splunk histogram percentile widgets as interval mean values from _sum and _count.