GPU Metrics

Once configured, the DCGM Exporter automatically collects and sends GPU metrics on Server Dashboard. Key metrics include:

MetricDescription
GPU Utilization (%)Percentage of time the GPU actively executes compute kernels.
GPU Memory Utilization (%)Percentage of GPU memory in use (used / total × 100).
GPU PCIe Tx ThroughputOutbound PCIe bandwidth from GPU to the host.
GPU Power Usage (W)Instantaneous power draw of the GPU.
GPU PCIe Rx ThroughputInbound PCIe bandwidth from host to GPU.
GPU Temperature (°C)Current core temperature of the GPU.

The following GPU metrics are available on the Cluster, Pods and Container Dashboard:

Metric
DescriptionScope
Total GPUsTotal count of GPU devices detected across all nodes in the cluster.CLUSTER
Active GPUsNumber of GPUs currently processing workloads.CLUSTER
Idle GPUsNumber of Idle GPUs present in a Cluster (No utilization for some threshold amount of time).CLUSTER
GPU Limit (%)Total number of cluster GPUs expressed as a percentage (Total GPU compute capacity).CLUSTER
GPU Used (%)Sum of actual GPU usage across pods, as a percentage of total cluster GPUs.CLUSTER
GPU Request (%)Sum of GPU resource requests by pods as a percentage.CLUSTER
GPU Memory Limit (%)Total number of cluster GPUs expressed as a percentage (Total GPU memory capacity).CLUSTER
GPU Memory Used (%)Sum of actual GPU memory usage across pods, as a percentage of total cluster GPUs.CLUSTER
GPU Memory Request (%)Sum of GPU memory resource requests by pods as a percentage.CLUSTER
GPU %Percentage of available GPU compute capacity currently used by a pod (with respect to total node capacity).POD
GPU Memory %Percentage of total GPU memory in use by a pod (with respect to total node capacity).POD
GPU Utilization (%)Percentage of time a container’s GPU was actively processing compute work (with respect to total node capacity).CONTAINER
GPU Memory Utilization (%)Percentage of a container’s GPU memory in use (with respect to total node capacity).CONTAINER

For the full list of available metrics, see Metrics Browser.