Configure GPU Monitoring

Configure the following system properties to enable and customize the GPU monitoring:

Enable GPU Monitoring

Controller-info.xml tag: <gpu-enabled>

System Property: -Dappdynamics.machine.agent.gpu.enabled

Environment Variable: APPDYNAMICS_MACHINE_AGENT_GPU_ENABLED

Type: Boolean

Default: false

Required: No

Configure DCGM-Exporter

DCGM-Exporter is a tool based on the Go APIs to NVIDIA DCGM that allows users to gather GPU metrics and understand workload behavior or monitor GPUs in clusters. Configure this property to specify the DCGM-Exporter host or domain name. DCGM-Exporter Service Name and DCGM-Exporter Namespace are Kubernetes service name and namespace. Machine Agent supports DCGM-Exporter version 3.3.8-3.6.0 and higher.

Controller-info.xml tag: <dcgm-exporter-service-host>

System Property: -Dappdynamics.machine.agent.dcgm.exporter.service.host

Environment Variable: APPDYNAMICS_MACHINE_AGENT_DCGM_EXPORTER_SERVICE_HOST

Type: String

Default: ""

Required: No

Specify DCGM-Exporter Namespace

Configure this property to specify the DCGM-Exporter namespace. DCGM-Exporter Namespace is the Kubernetes namespace.

Controller-info.xml tag: <dcgm-exporter-service-namespace>

System Property: -Dappdynamics.machine.agent.dcgm.exporter.service.namespace

Environment Variable: APPDYNAMICS_MACHINE_AGENT_DCGM_EXPORTER_SERVICE_NAMESPACE

Type: String

Default: gpu-operator

Required: No

Specify DCGM-Exporter Service Name

Configure this property to specify the DCGM-Exporter service name. DCGM-Exporter Service Name is the Kubernetes service name.

Controller-info.xml tag: <dcgm-exporter-service-name>

System Property: -Dappdynamics.machine.agent.dcgm.exporter.service.name

Environment Variable: APPDYNAMICS_MACHINE_AGENT_DCGM_EXPORTER_SERVICE_NAME

Type: String

Default: nvidia-dcgm-exporter

Required: No

Specify DCGM-Exporter Service Port

Configure this property to specify the DCGM-Exporter service port.

Controller-info.xml tag: <dcgm-exporter-service-port>

System Property: -Dappdynamics.machine.agent.dcgm.exporter.service.port

Environment Variable: APPDYNAMICS_MACHINE_AGENT_DCGM_EXPORTER_SERVICE_PORT

Type: Integer

Default: 9400

Required: No

Enable NVIDIA System Management Interface

Configure this property to enable metric collection using NVIDIA System Management Interface (nvidia-smi). The nvidia-smi is a command line utility, based on top of the NVIDIA Management Library (NVML), intended to aid in the management and monitoring of NVIDIA GPU devices. By default, collection using the DCGM-Exporter is enabled.

Controller-info.xml tag: NA

System Property: -Dappdynamics.machine.agent.gpu.collection.nvml.enabled

Environment Variable: N/A

Type: Boolean

Default: false

Required: No

Specify GPU Metrics Collection Sampling Interval

Configure this property to specify the time interval (in milliseconds) for scheduling metric collection.

Controller-info.xml tag: NA

System Property: -Dappdynamics.machine.agent.gpu.collection.sampling.interval:

Environment Variable: NA

Type: Long

Default: 30000

Required: No