Configure GPU Monitoring
Configure the following system properties to enable and customize the GPU monitoring:
Enable GPU Monitoring
Controller-info.xml tag: <gpu-enabled>
System Property: -Dappdynamics.machine.agent.gpu.enabled
Environment Variable: APPDYNAMICS_MACHINE_AGENT_GPU_ENABLED
Type: Boolean
Default: false
Required: No
Configure DCGM-Exporter
DCGM-Exporter is a tool based on the Go APIs to NVIDIA DCGM that allows users to gather GPU metrics and understand workload behavior or monitor GPUs in clusters. Configure this property to specify the DCGM-Exporter host or domain name. DCGM-Exporter Service Name and DCGM-Exporter Namespace are Kubernetes service name and namespace. Machine Agent supports DCGM-Exporter version 3.3.8-3.6.0 and higher.
Controller-info.xml tag: <dcgm-exporter-service-host>
System Property: -Dappdynamics.machine.agent.dcgm.exporter.service.host
Environment Variable: APPDYNAMICS_MACHINE_AGENT_DCGM_EXPORTER_SERVICE_HOST
Type: String
Default: ""
Required: No
Specify DCGM-Exporter Namespace
Configure this property to specify the DCGM-Exporter namespace. DCGM-Exporter Namespace is the Kubernetes namespace.
Controller-info.xml tag: <dcgm-exporter-service-namespace>
System Property: -Dappdynamics.machine.agent.dcgm.exporter.service.namespace
Environment Variable: APPDYNAMICS_MACHINE_AGENT_DCGM_EXPORTER_SERVICE_NAMESPACE
Type: String
Default: gpu-operator
Required: No
Specify DCGM-Exporter Service Name
Configure this property to specify the DCGM-Exporter service name. DCGM-Exporter Service Name is the Kubernetes service name.
Controller-info.xml tag: <dcgm-exporter-service-name>
System Property: -Dappdynamics.machine.agent.dcgm.exporter.service.name
Environment Variable: APPDYNAMICS_MACHINE_AGENT_DCGM_EXPORTER_SERVICE_NAME
Type: String
Default: nvidia-dcgm-exporter
Required: No
Specify DCGM-Exporter Service Port
Configure this property to specify the DCGM-Exporter service port.
Controller-info.xml tag: <dcgm-exporter-service-port>
System Property: -Dappdynamics.machine.agent.dcgm.exporter.service.port
Environment Variable: APPDYNAMICS_MACHINE_AGENT_DCGM_EXPORTER_SERVICE_PORT
Type: Integer
Default: 9400
Required: No
Enable NVIDIA System Management Interface
Controller-info.xml tag: NA
System Property: -Dappdynamics.machine.agent.gpu.collection.nvml.enabled
Environment Variable: N/A
Type: Boolean
Default: false
Required: No
Specify GPU Metrics Collection Sampling Interval
Configure this property to specify the time interval (in milliseconds) for scheduling metric collection.
Controller-info.xml tag: NA
System Property: -Dappdynamics.machine.agent.gpu.collection.sampling.interval:
Environment Variable: NA
Type: Long
Default: 30000
Required: No