Configure the DCGM Exporter (Standalone Node)

Ensure the following prerequisites are met before configuring the DCGM Exporter:
  • Enable GPU monitoring at the Controller account level using the following flag:
    sim.cluster.gpu.enabled=true
    
  • Enable GPU Monitoring on the Machine Agent using one of the following methods:

    • System Property:
      -Dappdynamics.machine.agent.gpu.enabled=true
    • Controller Configuration File (controller-info.xml):
      <gpu-enabled>true</gpu-enabled>
    • Environment Variable:
      APPDYNAMICS_MACHINE_AGENT_GPU_ENABLED=true
  • By default, the DCGM Exporter is used for GPU metrics collection. Use one of the following to customize or override the exporter endpoint by specifying the host and port:
    • System Property:
      -Dappdynamics.machine.agent.dcgm.exporter.service.host=<host> -Dappdynamics.machine.agent.dcgm.exporter.service.port=<port>  # Default: 9400
    • Controller Configuration File (controller-info.xml):
      <dcgm-exporter-service-host><host></dcgm-exporter-service-host>
      <dcgm-exporter-service-port><port></dcgm-exporter-service-port>  <!-- Default: 9400 -->
    • Environment Variable:
      export APPDYNAMICS_MACHINE_AGENT_DCGM_EXPORTER_SERVICE_HOST=<host>
      export APPDYNAMICS_MACHINE_AGENT_DCGM_EXPORTER_SERVICE_PORT=<port>  # Default: 9400
  • Ensure Docker version 19.03 or higher (configured with NVIDIA Container Runtime) or containerd is installed and configured.

  • Install and configure the NVIDIA Container Toolkit using the following commands:
    curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
    curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
      | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
      | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
    sudo apt-get update
    sudo apt-get install -y nvidia-container-toolkit
    sudo nvidia-ctk runtime configure --runtime=docker
    sudo systemctl restart docker

Perform the following steps to deploy the DCGM Exporter as a standalone container:

  1. Pull an exporter image using the following example command. Ensure that you are using an image version >= 3.3.8–3.6.x.
    docker pull nvcr.io/nvidia/k8s/dcgm-exporter:4.2.3-4.1.1-ubuntu22.04
  2. Run the exporter as a container. Also, expose port 9400 and grant GPU access. Use the following example command:
    docker run -d --gpus all --cap-add SYS_ADMIN --rm -p 9400:9400 nvcr.io/nvidia/k8s/dcgm-exporter:4.2.3-4.1.1-ubuntu22.04
  3. Use the following command to verify that the exporter is running and accessible by querying the metrics endpoint:
    curl http://localhost:9400/metrics