Configure the DCGM Exporter (Standalone Node)
-
Enable GPU monitoring at the Controller account level using the following flag:
sim.cluster.gpu.enabled=true
-
Enable GPU Monitoring on the Machine Agent using one of the following methods:
-
System Property:
-Dappdynamics.machine.agent.gpu.enabled=true
-
Controller Configuration File (
controller-info.xml
):<gpu-enabled>true</gpu-enabled>
-
Environment Variable:
APPDYNAMICS_MACHINE_AGENT_GPU_ENABLED=true
-
-
By default, the DCGM Exporter is used for GPU metrics collection. Use one of the following to customize or override the exporter endpoint by specifying the host and port:
-
System Property:
-Dappdynamics.machine.agent.dcgm.exporter.service.host=<host> -Dappdynamics.machine.agent.dcgm.exporter.service.port=<port> # Default: 9400
-
Controller Configuration File (
controller-info.xml
):<dcgm-exporter-service-host><host></dcgm-exporter-service-host> <dcgm-exporter-service-port><port></dcgm-exporter-service-port> <!-- Default: 9400 -->
-
Environment Variable:
export APPDYNAMICS_MACHINE_AGENT_DCGM_EXPORTER_SERVICE_HOST=<host> export APPDYNAMICS_MACHINE_AGENT_DCGM_EXPORTER_SERVICE_PORT=<port> # Default: 9400
-
-
Ensure Docker version 19.03 or higher (configured with NVIDIA Container Runtime) or containerd is installed and configured.
-
Install and configure the NVIDIA Container Toolkit using the following commands:
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \ | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \ | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update sudo apt-get install -y nvidia-container-toolkit sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker
Perform the following steps to deploy the DCGM Exporter as a standalone container: