Troubleshooting GPU Monitoring

IssuePublic Cause
No GPU metrics in the UIsim.cluster.gpu.enabled=true is not set on the Controller.
gpuMonitoringEnabled: true is missing in the Cluster Agent spec.
Machine Agent environment variables are misconfigured.
Cross-node metric mixing The Service is missing internalTrafficPolicy: Local.
DNS resolution failureMachine Agent pods cannot resolve nvidia-dcgm-exporter.gpu-operator.svc.cluster.local. Verify DNS settings.