NVIDIA GPU Metrics

NVIDIA GPU metrics are collected from the DCGM exporter and mapped into AppDynamics custom metrics for the AI POD GPU dashboards.

Prerequisites

Ensure that:

  • NVIDIA DCGM exporter is deployed
  • NVIDIA GPU Operator is used by the environment
  • GPU nodes are reachable through the Kubernetes service path
  • if Infrastructure Visibility is scheduled only on GPU nodes, you use nodeSelector: nvidia.com/gpu.present: "true" and the matching GPU toleration

Enable Prometheus Scraping for NVIDIA GPU

The following are example values from this repo:

  • service: nvidia-dcgm-exporter
  • namespace: nvidia-gpu-operator
  • port: 9400
  • path: /metrics

Replace these values with the DCGM service name and namespace used in the target environment.

Configure Machine Agent Ingestion

Infrastructure Visibility Prometheus monitoring loads the DCGM exporter definition through prometheus-config-template.yaml.

If GPU metrics are required, set these Infrastructure Visibility pod environment variables:

  • APPDYNAMICS_MACHINE_AGENT_GPU_ENABLED=true
  • APPDYNAMICS_MACHINE_AGENT_DCGM_EXPORTER_SERVICE_NAME=nvidia-dcgm-exporter
  • APPDYNAMICS_MACHINE_AGENT_DCGM_EXPORTER_SERVICE_NAMESPACE=nvidia-gpu-operator
  • APPDYNAMICS_MACHINE_AGENT_DCGM_EXPORTER_PORT=9400

If the DCGM service is backed by node-local endpoints, verify that the deployment uses internalTrafficPolicy: Local where required by the environment.

Before enabling the scrape, update the exporter YAML service discovery fields to the service name and namespace used by your GPU metrics deployment.

Exporter YAML Contract

  • exporter-yamls/dcgm-exporter.yaml
  • key source metrics:
    • DCGM_FI_DEV_GPU_TEMP
    • DCGM_FI_DEV_GPU_UTIL
    • DCGM_FI_DEV_POWER_USAGE
    • DCGM_FI_DEV_FB_USED
    • DCGM_FI_DEV_FB_FREE
    • DCGM_FI_PROF_PIPE_TENSOR_ACTIVE
    • DCGM_FI_PROF_DRAM_ACTIVE
  • computed metrics are used for GPU memory used percent

Expected AppDymanics Custom Metric Paths

  • Custom Metrics|AI Pod|GPUs|{gpu}|GPU Temperature (C)
  • Custom Metrics|AI Pod|GPUs|{gpu}|GPU Utilization (%)
  • Custom Metrics|AI Pod|GPUs|{gpu}|GPU Power (W)
  • Custom Metrics|AI Pod|GPUs|{gpu}|Framebuffer Memory Used (MiB)
  • Custom Metrics|AI Pod|GPUs|{gpu}|Framebuffer Memory Free (MiB)
  • Custom Metrics|AI Pod|GPUs|{gpu}|Tensor Core Utilization (%)
  • Custom Metrics|AI Pod|GPUs|{gpu}|DRAM Utilization (%)
  • Custom Metrics|AI Pod|GPUs|gpu{gpu}|GPU Memory Used (%)
  • Custom Metrics|AI Pod|GPUs|Average GPU Utilization (%)
  • Custom Metrics|AI Pod|GPUs|Average GPU Memory Used (%)
  • Custom Metrics|AI Pod|GPUs|Total GPU Power Usage (W)

Dashboard Dependencies

appd_dcgm_gpu.json
JSON
{
    "schemaVersion": null,
    "dashboardFormatVersion": "4.0",
    "name": "dashboard-ai-pod-gpus",
    "description": null,
    "properties": null,
    "templateEntityType": "APPLICATION_COMPONENT_NODE",
    "associatedEntityTemplates": null,
    "timeRangeSpecifierType": "GLOBAL",
    "minutesBeforeAnchorTime": -1,
    "startDate": null,
    "endDate": null,
    "refreshInterval": 120000,
    "backgroundColor": 15856629,
    "color": 15856629,
    "height": 768,
    "width": 1024,
    "canvasType": "CANVAS_TYPE_GRID",
    "layoutType": "",
    "widgetTemplates": [
        {
            "widgetType": "GraphWidget",
            "title": "GPU Utilization",
            "height": 3,
            "width": 6,
            "minHeight": 0,
            "minWidth": 0,
            "x": 0,
            "y": 0,
            "label": null,
            "description": null,
            "drillDownUrl": null,
            "openUrlInCurrentTab": false,
            "useMetricBrowserAsDrillDown": true,
            "drillDownActionType": null,
            "backgroundColor": 16777215,
            "backgroundColors": null,
            "backgroundColorsStr": "16777215,16777215",
            "color": 1646891,
            "fontSize": 12,
            "useAutomaticFontSize": false,
            "borderEnabled": false,
            "borderThickness": 0,
            "borderColor": 14408667,
            "backgroundAlpha": 1,
            "showValues": false,
            "formatNumber": null,
            "numDecimals": 0,
            "removeZeros": null,
            "compactMode": false,
            "showTimeRange": false,
            "renderIn3D": false,
            "showLegend": true,
            "legendPosition": "POSITION_BOTTOM",
            "legendColumnCount": 1,
            "timeRangeSpecifierType": "BEFORE_NOW",
            "startTime": null,
            "endTime": null,
            "minutesBeforeAnchorTime": 15,
            "isGlobal": true,
            "propertiesMap": null,
            "dataSeriesTemplates": [
                {
                    "seriesType": "LINE",
                    "metricType": null,
                    "showRawMetricName": false,
                    "colorPalette": null,
                    "name": "Series 1",
                    "metricMatchCriteriaTemplate": {
                        "entityMatchCriteria": {
                            "matchCriteriaType": "SpecificEntities",
                            "entityType": "APPLICATION_COMPONENT_NODE",
                            "agentTypes": null,
                            "entityNames": [
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker3.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker2.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker1.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                }
                            ],
                            "summary": false
                        },
                        "metricExpressionTemplate": {
                            "metricExpressionType": "Logical",
                            "functionType": "VALUE",
                            "displayName": "null",
                            "inputMetricText": true,
                            "inputMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|*|GPU Utilization (%)",
                            "relativeMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|*|GPU Utilization (%)"
                        },
                        "rollupMetricData": false,
                        "expressionString": null,
                        "useActiveBaseline": false,
                        "sortResultsAscending": false,
                        "maxResults": 20,
                        "evaluationScopeType": null,
                        "baselineName": null,
                        "applicationName": "Server & Infrastructure Monitoring",
                        "metricDisplayNameStyle": "DISPLAY_STYLE_AUTO",
                        "metricDisplayNameCustomFormat": null,
                        "includeHistoricalNodes": false,
                        "includeAbove": true,
                        "includeBelow": false,
                        "includeBoth": false,
                        "includeBand12": false,
                        "includeBand23": false,
                        "includeBand34": false,
                        "includeBand45": false,
                        "includeShade": false
                    },
                    "axisPosition": "LEFT"
                }
            ],
            "verticalAxisLabel": null,
            "hideHorizontalAxis": null,
            "horizontalAxisLabel": null,
            "axisType": "LINEAR",
            "stackMode": null,
            "multipleYAxis": null,
            "customVerticalAxisMin": null,
            "customVerticalAxisMax": null,
            "showEvents": null,
            "interpolateDataGaps": false,
            "showAllTooltips": null,
            "staticThresholdList": [],
            "eventFilterTemplate": null
        },
        {
            "widgetType": "GraphWidget",
            "title": "GPU Power (W)",
            "height": 3,
            "width": 6,
            "minHeight": 0,
            "minWidth": 0,
            "x": 6,
            "y": 0,
            "label": null,
            "description": null,
            "drillDownUrl": null,
            "openUrlInCurrentTab": false,
            "useMetricBrowserAsDrillDown": true,
            "drillDownActionType": null,
            "backgroundColor": 16777215,
            "backgroundColors": null,
            "backgroundColorsStr": "16777215,16777215",
            "color": 1646891,
            "fontSize": 12,
            "useAutomaticFontSize": false,
            "borderEnabled": false,
            "borderThickness": 0,
            "borderColor": 14408667,
            "backgroundAlpha": 1,
            "showValues": false,
            "formatNumber": null,
            "numDecimals": 0,
            "removeZeros": null,
            "compactMode": false,
            "showTimeRange": false,
            "renderIn3D": false,
            "showLegend": true,
            "legendPosition": "POSITION_BOTTOM",
            "legendColumnCount": 1,
            "timeRangeSpecifierType": "BEFORE_NOW",
            "startTime": null,
            "endTime": null,
            "minutesBeforeAnchorTime": 15,
            "isGlobal": true,
            "propertiesMap": null,
            "dataSeriesTemplates": [
                {
                    "seriesType": "LINE",
                    "metricType": null,
                    "showRawMetricName": false,
                    "colorPalette": null,
                    "name": "Series 1",
                    "metricMatchCriteriaTemplate": {
                        "entityMatchCriteria": {
                            "matchCriteriaType": "SpecificEntities",
                            "entityType": "APPLICATION_COMPONENT_NODE",
                            "agentTypes": null,
                            "entityNames": [
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker3.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker2.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker1.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                }
                            ],
                            "summary": false
                        },
                        "metricExpressionTemplate": {
                            "metricExpressionType": "Logical",
                            "functionType": "VALUE",
                            "displayName": "null",
                            "inputMetricText": true,
                            "inputMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|*|GPU Power (W)",
                            "relativeMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|*|GPU Power (W)"
                        },
                        "rollupMetricData": false,
                        "expressionString": null,
                        "useActiveBaseline": false,
                        "sortResultsAscending": false,
                        "maxResults": 20,
                        "evaluationScopeType": null,
                        "baselineName": null,
                        "applicationName": "Server & Infrastructure Monitoring",
                        "metricDisplayNameStyle": "DISPLAY_STYLE_AUTO",
                        "metricDisplayNameCustomFormat": null,
                        "includeHistoricalNodes": false,
                        "includeAbove": true,
                        "includeBelow": false,
                        "includeBoth": false,
                        "includeBand12": false,
                        "includeBand23": false,
                        "includeBand34": false,
                        "includeBand45": false,
                        "includeShade": false
                    },
                    "axisPosition": "LEFT"
                }
            ],
            "verticalAxisLabel": null,
            "hideHorizontalAxis": null,
            "horizontalAxisLabel": null,
            "axisType": "LINEAR",
            "stackMode": null,
            "multipleYAxis": null,
            "customVerticalAxisMin": null,
            "customVerticalAxisMax": null,
            "showEvents": null,
            "interpolateDataGaps": false,
            "showAllTooltips": false,
            "staticThresholdList": [
                {
                    "value": 0,
                    "color": 0,
                    "name": null
                }
            ],
            "eventFilterTemplate": null
        },
        {
            "widgetType": "GraphWidget",
            "title": "GPU Memory Used",
            "height": 3,
            "width": 6,
            "minHeight": 0,
            "minWidth": 0,
            "x": 0,
            "y": 3,
            "label": null,
            "description": null,
            "drillDownUrl": null,
            "openUrlInCurrentTab": false,
            "useMetricBrowserAsDrillDown": true,
            "drillDownActionType": null,
            "backgroundColor": 16777215,
            "backgroundColors": null,
            "backgroundColorsStr": "16777215,16777215",
            "color": 1646891,
            "fontSize": 12,
            "useAutomaticFontSize": false,
            "borderEnabled": false,
            "borderThickness": 0,
            "borderColor": 14408667,
            "backgroundAlpha": 1,
            "showValues": false,
            "formatNumber": null,
            "numDecimals": 0,
            "removeZeros": null,
            "compactMode": false,
            "showTimeRange": false,
            "renderIn3D": false,
            "showLegend": true,
            "legendPosition": "POSITION_BOTTOM",
            "legendColumnCount": 1,
            "timeRangeSpecifierType": "BEFORE_NOW",
            "startTime": null,
            "endTime": null,
            "minutesBeforeAnchorTime": 15,
            "isGlobal": true,
            "propertiesMap": null,
            "dataSeriesTemplates": [
                {
                    "seriesType": "LINE",
                    "metricType": null,
                    "showRawMetricName": false,
                    "colorPalette": null,
                    "name": "Series 1",
                    "metricMatchCriteriaTemplate": {
                        "entityMatchCriteria": {
                            "matchCriteriaType": "SpecificEntities",
                            "entityType": "APPLICATION_COMPONENT_NODE",
                            "agentTypes": null,
                            "entityNames": [
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker3.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker2.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker1.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                }
                            ],
                            "summary": false
                        },
                        "metricExpressionTemplate": {
                            "metricExpressionType": "Logical",
                            "functionType": "VALUE",
                            "displayName": "null",
                            "inputMetricText": true,
                            "inputMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|*|GPU Memory Used (%)",
                            "relativeMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|*|GPU Memory Used (%)"
                        },
                        "rollupMetricData": false,
                        "expressionString": null,
                        "useActiveBaseline": false,
                        "sortResultsAscending": false,
                        "maxResults": 20,
                        "evaluationScopeType": null,
                        "baselineName": null,
                        "applicationName": "Server & Infrastructure Monitoring",
                        "metricDisplayNameStyle": "DISPLAY_STYLE_AUTO",
                        "metricDisplayNameCustomFormat": null,
                        "includeHistoricalNodes": false,
                        "includeAbove": true,
                        "includeBelow": false,
                        "includeBoth": false,
                        "includeBand12": false,
                        "includeBand23": false,
                        "includeBand34": false,
                        "includeBand45": false,
                        "includeShade": false
                    },
                    "axisPosition": "LEFT"
                }
            ],
            "verticalAxisLabel": null,
            "hideHorizontalAxis": null,
            "horizontalAxisLabel": null,
            "axisType": "LINEAR",
            "stackMode": null,
            "multipleYAxis": null,
            "customVerticalAxisMin": null,
            "customVerticalAxisMax": null,
            "showEvents": null,
            "interpolateDataGaps": false,
            "showAllTooltips": null,
            "staticThresholdList": [],
            "eventFilterTemplate": null
        },
        {
            "widgetType": "GraphWidget",
            "title": "GPU Temperature",
            "height": 3,
            "width": 6,
            "minHeight": 0,
            "minWidth": 0,
            "x": 6,
            "y": 3,
            "label": null,
            "description": null,
            "drillDownUrl": null,
            "openUrlInCurrentTab": false,
            "useMetricBrowserAsDrillDown": true,
            "drillDownActionType": null,
            "backgroundColor": 16777215,
            "backgroundColors": null,
            "backgroundColorsStr": "16777215,16777215",
            "color": 1646891,
            "fontSize": 12,
            "useAutomaticFontSize": false,
            "borderEnabled": false,
            "borderThickness": 0,
            "borderColor": 14408667,
            "backgroundAlpha": 1,
            "showValues": false,
            "formatNumber": null,
            "numDecimals": 0,
            "removeZeros": null,
            "compactMode": false,
            "showTimeRange": false,
            "renderIn3D": false,
            "showLegend": true,
            "legendPosition": "POSITION_BOTTOM",
            "legendColumnCount": 1,
            "timeRangeSpecifierType": "BEFORE_NOW",
            "startTime": null,
            "endTime": null,
            "minutesBeforeAnchorTime": 15,
            "isGlobal": true,
            "propertiesMap": null,
            "dataSeriesTemplates": [
                {
                    "seriesType": "LINE",
                    "metricType": null,
                    "showRawMetricName": false,
                    "colorPalette": null,
                    "name": "Series 1",
                    "metricMatchCriteriaTemplate": {
                        "entityMatchCriteria": {
                            "matchCriteriaType": "SpecificEntities",
                            "entityType": "APPLICATION_COMPONENT_NODE",
                            "agentTypes": null,
                            "entityNames": [
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker3.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker2.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker1.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                }
                            ],
                            "summary": false
                        },
                        "metricExpressionTemplate": {
                            "metricExpressionType": "Logical",
                            "functionType": "VALUE",
                            "displayName": "null",
                            "inputMetricText": true,
                            "inputMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|*|GPU Temperature (C)",
                            "relativeMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|*|GPU Temperature (C)"
                        },
                        "rollupMetricData": false,
                        "expressionString": null,
                        "useActiveBaseline": false,
                        "sortResultsAscending": false,
                        "maxResults": 20,
                        "evaluationScopeType": null,
                        "baselineName": null,
                        "applicationName": "Server & Infrastructure Monitoring",
                        "metricDisplayNameStyle": "DISPLAY_STYLE_AUTO",
                        "metricDisplayNameCustomFormat": null,
                        "includeHistoricalNodes": false,
                        "includeAbove": true,
                        "includeBelow": false,
                        "includeBoth": false,
                        "includeBand12": false,
                        "includeBand23": false,
                        "includeBand34": false,
                        "includeBand45": false,
                        "includeShade": false
                    },
                    "axisPosition": "LEFT"
                }
            ],
            "verticalAxisLabel": null,
            "hideHorizontalAxis": null,
            "horizontalAxisLabel": null,
            "axisType": "LINEAR",
            "stackMode": null,
            "multipleYAxis": null,
            "customVerticalAxisMin": null,
            "customVerticalAxisMax": null,
            "showEvents": null,
            "interpolateDataGaps": false,
            "showAllTooltips": null,
            "staticThresholdList": [],
            "eventFilterTemplate": null
        },
        {
            "widgetType": "PieWidget",
            "title": "GPU Power (W)",
            "height": 3,
            "width": 4,
            "minHeight": 0,
            "minWidth": 0,
            "x": 0,
            "y": 6,
            "label": null,
            "description": null,
            "drillDownUrl": null,
            "openUrlInCurrentTab": false,
            "useMetricBrowserAsDrillDown": true,
            "drillDownActionType": null,
            "backgroundColor": 16777215,
            "backgroundColors": null,
            "backgroundColorsStr": "16777215,16777215",
            "color": 1646891,
            "fontSize": 12,
            "useAutomaticFontSize": false,
            "borderEnabled": false,
            "borderThickness": 0,
            "borderColor": 14408667,
            "backgroundAlpha": 1,
            "showValues": true,
            "formatNumber": null,
            "numDecimals": 0,
            "removeZeros": null,
            "compactMode": false,
            "showTimeRange": false,
            "renderIn3D": false,
            "showLegend": true,
            "legendPosition": "POSITION_BOTTOM",
            "legendColumnCount": 1,
            "timeRangeSpecifierType": "BEFORE_NOW",
            "startTime": null,
            "endTime": null,
            "minutesBeforeAnchorTime": 15,
            "isGlobal": true,
            "propertiesMap": null,
            "dataSeriesTemplates": [
                {
                    "seriesType": "LINE",
                    "metricType": null,
                    "showRawMetricName": false,
                    "colorPalette": null,
                    "name": "Series 1",
                    "metricMatchCriteriaTemplate": {
                        "entityMatchCriteria": {
                            "matchCriteriaType": "SpecificEntities",
                            "entityType": "APPLICATION_COMPONENT_NODE",
                            "agentTypes": null,
                            "entityNames": [
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker3.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker2.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker1.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                }
                            ],
                            "summary": false
                        },
                        "metricExpressionTemplate": {
                            "metricExpressionType": "Logical",
                            "functionType": "VALUE",
                            "displayName": "null",
                            "inputMetricText": true,
                            "inputMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|*|GPU Power (W)",
                            "relativeMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|*|GPU Power (W)"
                        },
                        "rollupMetricData": true,
                        "expressionString": null,
                        "useActiveBaseline": false,
                        "sortResultsAscending": false,
                        "maxResults": 20,
                        "evaluationScopeType": null,
                        "baselineName": null,
                        "applicationName": "Server & Infrastructure Monitoring",
                        "metricDisplayNameStyle": "DISPLAY_STYLE_AUTO",
                        "metricDisplayNameCustomFormat": null,
                        "includeHistoricalNodes": false,
                        "includeAbove": true,
                        "includeBelow": false,
                        "includeBoth": false,
                        "includeBand12": false,
                        "includeBand23": false,
                        "includeBand34": false,
                        "includeBand45": false,
                        "includeShade": false
                    },
                    "axisPosition": null
                }
            ],
            "showLabels": true,
            "showPercentValues": null
        },
        {
            "widgetType": "PieWidget",
            "title": "GPU Memory Used",
            "height": 3,
            "width": 6,
            "minHeight": 0,
            "minWidth": 0,
            "x": 4,
            "y": 6,
            "label": null,
            "description": null,
            "drillDownUrl": null,
            "openUrlInCurrentTab": false,
            "useMetricBrowserAsDrillDown": true,
            "drillDownActionType": null,
            "backgroundColor": 16777215,
            "backgroundColors": null,
            "backgroundColorsStr": "16777215,16777215",
            "color": 1646891,
            "fontSize": 12,
            "useAutomaticFontSize": false,
            "borderEnabled": false,
            "borderThickness": 0,
            "borderColor": 14408667,
            "backgroundAlpha": 1,
            "showValues": true,
            "formatNumber": null,
            "numDecimals": 0,
            "removeZeros": null,
            "compactMode": false,
            "showTimeRange": false,
            "renderIn3D": false,
            "showLegend": true,
            "legendPosition": "POSITION_BOTTOM",
            "legendColumnCount": 1,
            "timeRangeSpecifierType": "BEFORE_NOW",
            "startTime": null,
            "endTime": null,
            "minutesBeforeAnchorTime": 15,
            "isGlobal": true,
            "propertiesMap": null,
            "dataSeriesTemplates": [
                {
                    "seriesType": "LINE",
                    "metricType": null,
                    "showRawMetricName": false,
                    "colorPalette": null,
                    "name": "Series 1",
                    "metricMatchCriteriaTemplate": {
                        "entityMatchCriteria": {
                            "matchCriteriaType": "SpecificEntities",
                            "entityType": "APPLICATION_COMPONENT_NODE",
                            "agentTypes": null,
                            "entityNames": [
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker3.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker2.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker1.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                }
                            ],
                            "summary": false
                        },
                        "metricExpressionTemplate": {
                            "metricExpressionType": "Logical",
                            "functionType": "VALUE",
                            "displayName": "null",
                            "inputMetricText": true,
                            "inputMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|*|GPU Memory Used (%)",
                            "relativeMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|*|GPU Memory Used (%)"
                        },
                        "rollupMetricData": true,
                        "expressionString": null,
                        "useActiveBaseline": false,
                        "sortResultsAscending": false,
                        "maxResults": 20,
                        "evaluationScopeType": null,
                        "baselineName": null,
                        "applicationName": "Server & Infrastructure Monitoring",
                        "metricDisplayNameStyle": "DISPLAY_STYLE_AUTO",
                        "metricDisplayNameCustomFormat": null,
                        "includeHistoricalNodes": false,
                        "includeAbove": true,
                        "includeBelow": false,
                        "includeBoth": false,
                        "includeBand12": false,
                        "includeBand23": false,
                        "includeBand34": false,
                        "includeBand45": false,
                        "includeShade": false
                    },
                    "axisPosition": null
                }
            ],
            "showLabels": true,
            "showPercentValues": null
        },
        {
            "widgetType": "PieWidget",
            "title": "Average GPU utilization",
            "height": 3,
            "width": 2,
            "minHeight": 0,
            "minWidth": 0,
            "x": 10,
            "y": 6,
            "label": "",
            "description": null,
            "drillDownUrl": null,
            "openUrlInCurrentTab": false,
            "useMetricBrowserAsDrillDown": true,
            "drillDownActionType": null,
            "backgroundColor": 16777215,
            "backgroundColors": null,
            "backgroundColorsStr": "16777215,16777215",
            "color": 1646891,
            "fontSize": 12,
            "useAutomaticFontSize": true,
            "borderEnabled": false,
            "borderThickness": 0,
            "borderColor": 14408667,
            "backgroundAlpha": 1,
            "showValues": true,
            "formatNumber": true,
            "numDecimals": 0,
            "removeZeros": true,
            "compactMode": false,
            "showTimeRange": false,
            "renderIn3D": false,
            "showLegend": true,
            "legendPosition": "POSITION_BOTTOM",
            "legendColumnCount": 1,
            "timeRangeSpecifierType": "BEFORE_NOW",
            "startTime": null,
            "endTime": null,
            "minutesBeforeAnchorTime": 15,
            "isGlobal": true,
            "propertiesMap": null,
            "dataSeriesTemplates": [
                {
                    "seriesType": "LINE",
                    "metricType": null,
                    "showRawMetricName": false,
                    "colorPalette": null,
                    "name": "Series 0",
                    "metricMatchCriteriaTemplate": {
                        "entityMatchCriteria": {
                            "matchCriteriaType": "SpecificEntities",
                            "entityType": "APPLICATION_COMPONENT_NODE",
                            "agentTypes": null,
                            "entityNames": [
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker3.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker2.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker1.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                }
                            ],
                            "summary": false
                        },
                        "metricExpressionTemplate": {
                            "metricExpressionType": "Logical",
                            "functionType": "CURRENT",
                            "displayName": "null",
                            "inputMetricText": true,
                            "inputMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|Average GPU Utilization (%)",
                            "relativeMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|Average GPU Utilization (%)"
                        },
                        "rollupMetricData": true,
                        "expressionString": null,
                        "useActiveBaseline": true,
                        "sortResultsAscending": false,
                        "maxResults": 20,
                        "evaluationScopeType": null,
                        "baselineName": null,
                        "applicationName": "Server & Infrastructure Monitoring",
                        "metricDisplayNameStyle": "DISPLAY_STYLE_AUTO",
                        "metricDisplayNameCustomFormat": null,
                        "includeHistoricalNodes": false,
                        "includeAbove": true,
                        "includeBelow": false,
                        "includeBoth": false,
                        "includeBand12": false,
                        "includeBand23": false,
                        "includeBand34": false,
                        "includeBand45": false,
                        "includeShade": false
                    },
                    "axisPosition": null
                }
            ],
            "showLabels": true,
            "showPercentValues": null
        },
        {
            "widgetType": "PieWidget",
            "title": "Average GPU Memory Used",
            "height": 2,
            "width": 12,
            "minHeight": 0,
            "minWidth": 0,
            "x": 0,
            "y": 9,
            "label": "",
            "description": null,
            "drillDownUrl": null,
            "openUrlInCurrentTab": false,
            "useMetricBrowserAsDrillDown": true,
            "drillDownActionType": null,
            "backgroundColor": 16777215,
            "backgroundColors": null,
            "backgroundColorsStr": "16777215,16777215",
            "color": 1646891,
            "fontSize": 12,
            "useAutomaticFontSize": true,
            "borderEnabled": false,
            "borderThickness": 0,
            "borderColor": 14408667,
            "backgroundAlpha": 1,
            "showValues": true,
            "formatNumber": true,
            "numDecimals": 0,
            "removeZeros": true,
            "compactMode": false,
            "showTimeRange": false,
            "renderIn3D": false,
            "showLegend": true,
            "legendPosition": "POSITION_BOTTOM",
            "legendColumnCount": 1,
            "timeRangeSpecifierType": "BEFORE_NOW",
            "startTime": null,
            "endTime": null,
            "minutesBeforeAnchorTime": 15,
            "isGlobal": true,
            "propertiesMap": null,
            "dataSeriesTemplates": [
                {
                    "seriesType": "LINE",
                    "metricType": null,
                    "showRawMetricName": false,
                    "colorPalette": null,
                    "name": "Series 0",
                    "metricMatchCriteriaTemplate": {
                        "entityMatchCriteria": {
                            "matchCriteriaType": "SpecificEntities",
                            "entityType": "APPLICATION_COMPONENT_NODE",
                            "agentTypes": null,
                            "entityNames": [
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker3.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker2.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker1.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                }
                            ],
                            "summary": false
                        },
                        "metricExpressionTemplate": {
                            "metricExpressionType": "Logical",
                            "functionType": "CURRENT",
                            "displayName": "null",
                            "inputMetricText": true,
                            "inputMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|Average GPU Memory Used (%)",
                            "relativeMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|Average GPU Memory Used (%)"
                        },
                        "rollupMetricData": true,
                        "expressionString": null,
                        "useActiveBaseline": true,
                        "sortResultsAscending": false,
                        "maxResults": 20,
                        "evaluationScopeType": null,
                        "baselineName": null,
                        "applicationName": "Server & Infrastructure Monitoring",
                        "metricDisplayNameStyle": "DISPLAY_STYLE_AUTO",
                        "metricDisplayNameCustomFormat": null,
                        "includeHistoricalNodes": false,
                        "includeAbove": true,
                        "includeBelow": false,
                        "includeBoth": false,
                        "includeBand12": false,
                        "includeBand23": false,
                        "includeBand34": false,
                        "includeBand45": false,
                        "includeShade": false
                    },
                    "axisPosition": null
                }
            ],
            "showLabels": true,
            "showPercentValues": null
        },
        {
            "widgetType": "PieWidget",
            "title": "GPU Power (W)",
            "height": 3,
            "width": 6,
            "minHeight": 0,
            "minWidth": 0,
            "x": 6,
            "y": 11,
            "label": null,
            "description": null,
            "drillDownUrl": null,
            "openUrlInCurrentTab": false,
            "useMetricBrowserAsDrillDown": true,
            "drillDownActionType": null,
            "backgroundColor": 16777215,
            "backgroundColors": null,
            "backgroundColorsStr": "16777215,16777215",
            "color": 1646891,
            "fontSize": 12,
            "useAutomaticFontSize": false,
            "borderEnabled": false,
            "borderThickness": 0,
            "borderColor": 14408667,
            "backgroundAlpha": 1,
            "showValues": true,
            "formatNumber": null,
            "numDecimals": 0,
            "removeZeros": null,
            "compactMode": false,
            "showTimeRange": false,
            "renderIn3D": false,
            "showLegend": true,
            "legendPosition": "POSITION_BOTTOM",
            "legendColumnCount": 1,
            "timeRangeSpecifierType": "BEFORE_NOW",
            "startTime": null,
            "endTime": null,
            "minutesBeforeAnchorTime": 15,
            "isGlobal": true,
            "propertiesMap": null,
            "dataSeriesTemplates": [
                {
                    "seriesType": "LINE",
                    "metricType": null,
                    "showRawMetricName": false,
                    "colorPalette": null,
                    "name": "Series 1",
                    "metricMatchCriteriaTemplate": {
                        "entityMatchCriteria": {
                            "matchCriteriaType": "SpecificEntities",
                            "entityType": "APPLICATION_COMPONENT_NODE",
                            "agentTypes": null,
                            "entityNames": [
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker3.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker2.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker1.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                }
                            ],
                            "summary": false
                        },
                        "metricExpressionTemplate": {
                            "metricExpressionType": "Logical",
                            "functionType": "VALUE",
                            "displayName": "null",
                            "inputMetricText": true,
                            "inputMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|*|GPU Power (W)",
                            "relativeMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|*|GPU Power (W)"
                        },
                        "rollupMetricData": true,
                        "expressionString": null,
                        "useActiveBaseline": false,
                        "sortResultsAscending": false,
                        "maxResults": 20,
                        "evaluationScopeType": null,
                        "baselineName": null,
                        "applicationName": "Server & Infrastructure Monitoring",
                        "metricDisplayNameStyle": "DISPLAY_STYLE_AUTO",
                        "metricDisplayNameCustomFormat": null,
                        "includeHistoricalNodes": false,
                        "includeAbove": true,
                        "includeBelow": false,
                        "includeBoth": false,
                        "includeBand12": false,
                        "includeBand23": false,
                        "includeBand34": false,
                        "includeBand45": false,
                        "includeShade": false
                    },
                    "axisPosition": null
                }
            ],
            "showLabels": true,
            "showPercentValues": null
        },
        {
            "widgetType": "PieWidget",
            "title": "GPU Memory Used",
            "height": 3,
            "width": 6,
            "minHeight": 0,
            "minWidth": 0,
            "x": 6,
            "y": 14,
            "label": null,
            "description": null,
            "drillDownUrl": null,
            "openUrlInCurrentTab": false,
            "useMetricBrowserAsDrillDown": true,
            "drillDownActionType": null,
            "backgroundColor": 16777215,
            "backgroundColors": null,
            "backgroundColorsStr": "16777215,16777215",
            "color": 1646891,
            "fontSize": 12,
            "useAutomaticFontSize": false,
            "borderEnabled": false,
            "borderThickness": 0,
            "borderColor": 14408667,
            "backgroundAlpha": 1,
            "showValues": true,
            "formatNumber": null,
            "numDecimals": 0,
            "removeZeros": null,
            "compactMode": false,
            "showTimeRange": false,
            "renderIn3D": false,
            "showLegend": true,
            "legendPosition": "POSITION_BOTTOM",
            "legendColumnCount": 1,
            "timeRangeSpecifierType": "BEFORE_NOW",
            "startTime": null,
            "endTime": null,
            "minutesBeforeAnchorTime": 15,
            "isGlobal": true,
            "propertiesMap": null,
            "dataSeriesTemplates": [
                {
                    "seriesType": "LINE",
                    "metricType": null,
                    "showRawMetricName": false,
                    "colorPalette": null,
                    "name": "Series 1",
                    "metricMatchCriteriaTemplate": {
                        "entityMatchCriteria": {
                            "matchCriteriaType": "SpecificEntities",
                            "entityType": "APPLICATION_COMPONENT_NODE",
                            "agentTypes": null,
                            "entityNames": [
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker3.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker2.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker1.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                }
                            ],
                            "summary": false
                        },
                        "metricExpressionTemplate": {
                            "metricExpressionType": "Logical",
                            "functionType": "VALUE",
                            "displayName": "null",
                            "inputMetricText": true,
                            "inputMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|*|GPU Memory Used (%)",
                            "relativeMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|*|GPU Memory Used (%)"
                        },
                        "rollupMetricData": true,
                        "expressionString": null,
                        "useActiveBaseline": false,
                        "sortResultsAscending": false,
                        "maxResults": 20,
                        "evaluationScopeType": null,
                        "baselineName": null,
                        "applicationName": "Server & Infrastructure Monitoring",
                        "metricDisplayNameStyle": "DISPLAY_STYLE_AUTO",
                        "metricDisplayNameCustomFormat": null,
                        "includeHistoricalNodes": false,
                        "includeAbove": true,
                        "includeBelow": false,
                        "includeBoth": false,
                        "includeBand12": false,
                        "includeBand23": false,
                        "includeBand34": false,
                        "includeBand45": false,
                        "includeShade": false
                    },
                    "axisPosition": null
                }
            ],
            "showLabels": true,
            "showPercentValues": null
        },
        {
            "widgetType": "PieWidget",
            "title": "Average GPU utilization",
            "height": 3,
            "width": 6,
            "minHeight": 0,
            "minWidth": 0,
            "x": 6,
            "y": 17,
            "label": "",
            "description": null,
            "drillDownUrl": null,
            "openUrlInCurrentTab": false,
            "useMetricBrowserAsDrillDown": true,
            "drillDownActionType": null,
            "backgroundColor": 16777215,
            "backgroundColors": null,
            "backgroundColorsStr": "16777215,16777215",
            "color": 1646891,
            "fontSize": 12,
            "useAutomaticFontSize": true,
            "borderEnabled": false,
            "borderThickness": 0,
            "borderColor": 14408667,
            "backgroundAlpha": 1,
            "showValues": true,
            "formatNumber": true,
            "numDecimals": 0,
            "removeZeros": true,
            "compactMode": false,
            "showTimeRange": false,
            "renderIn3D": false,
            "showLegend": true,
            "legendPosition": "POSITION_BOTTOM",
            "legendColumnCount": 1,
            "timeRangeSpecifierType": "BEFORE_NOW",
            "startTime": null,
            "endTime": null,
            "minutesBeforeAnchorTime": 15,
            "isGlobal": true,
            "propertiesMap": null,
            "dataSeriesTemplates": [
                {
                    "seriesType": "LINE",
                    "metricType": null,
                    "showRawMetricName": false,
                    "colorPalette": null,
                    "name": "Series 0",
                    "metricMatchCriteriaTemplate": {
                        "entityMatchCriteria": {
                            "matchCriteriaType": "SpecificEntities",
                            "entityType": "APPLICATION_COMPONENT_NODE",
                            "agentTypes": null,
                            "entityNames": [
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker3.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker2.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker1.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                }
                            ],
                            "summary": false
                        },
                        "metricExpressionTemplate": {
                            "metricExpressionType": "Logical",
                            "functionType": "CURRENT",
                            "displayName": "null",
                            "inputMetricText": true,
                            "inputMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|Average GPU Utilization (%)",
                            "relativeMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|Average GPU Utilization (%)"
                        },
                        "rollupMetricData": true,
                        "expressionString": null,
                        "useActiveBaseline": true,
                        "sortResultsAscending": false,
                        "maxResults": 20,
                        "evaluationScopeType": null,
                        "baselineName": null,
                        "applicationName": "Server & Infrastructure Monitoring",
                        "metricDisplayNameStyle": "DISPLAY_STYLE_AUTO",
                        "metricDisplayNameCustomFormat": null,
                        "includeHistoricalNodes": false,
                        "includeAbove": true,
                        "includeBelow": false,
                        "includeBoth": false,
                        "includeBand12": false,
                        "includeBand23": false,
                        "includeBand34": false,
                        "includeBand45": false,
                        "includeShade": false
                    },
                    "axisPosition": null
                }
            ],
            "showLabels": true,
            "showPercentValues": null
        },
        {
            "widgetType": "PieWidget",
            "title": "GPU Memory % Utilization",
            "height": 3,
            "width": 6,
            "minHeight": 0,
            "minWidth": 0,
            "x": 0,
            "y": 14,
            "label": null,
            "description": null,
            "drillDownUrl": null,
            "openUrlInCurrentTab": false,
            "useMetricBrowserAsDrillDown": true,
            "drillDownActionType": null,
            "backgroundColor": 16777215,
            "backgroundColors": null,
            "backgroundColorsStr": "16777215,16777215",
            "color": 1646891,
            "fontSize": 12,
            "useAutomaticFontSize": false,
            "borderEnabled": false,
            "borderThickness": 0,
            "borderColor": 14408667,
            "backgroundAlpha": 1,
            "showValues": true,
            "formatNumber": null,
            "numDecimals": 0,
            "removeZeros": null,
            "compactMode": false,
            "showTimeRange": false,
            "renderIn3D": false,
            "showLegend": true,
            "legendPosition": "POSITION_BOTTOM",
            "legendColumnCount": 1,
            "timeRangeSpecifierType": "BEFORE_NOW",
            "startTime": null,
            "endTime": null,
            "minutesBeforeAnchorTime": 15,
            "isGlobal": true,
            "propertiesMap": null,
            "dataSeriesTemplates": [
                {
                    "seriesType": "LINE",
                    "metricType": null,
                    "showRawMetricName": false,
                    "colorPalette": null,
                    "name": "Series 1",
                    "metricMatchCriteriaTemplate": {
                        "entityMatchCriteria": {
                            "matchCriteriaType": "SpecificEntities",
                            "entityType": "APPLICATION_COMPONENT_NODE",
                            "agentTypes": null,
                            "entityNames": [
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker3.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker2.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                },
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "worker1.flashstack.local",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                }
                            ],
                            "summary": false
                        },
                        "metricExpressionTemplate": {
                            "metricExpressionType": "Boolean",
                            "operator": {
                                "type": "DIVIDE"
                            },
                            "expression1": {
                                "metricExpressionType": "Boolean",
                                "operator": {
                                    "type": "MULTIPLY"
                                },
                                "expression1": {
                                    "metricExpressionType": "Logical",
                                    "functionType": "VALUE",
                                    "displayName": "used",
                                    "inputMetricText": false,
                                    "inputMetricPath": null,
                                    "relativeMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|Total Framebuffer Memory Used (MiB)"
                                },
                                "expression2": {
                                    "metricExpressionType": "Literal",
                                    "literalValue": 100
                                }
                            },
                            "expression2": {
                                "metricExpressionType": "Boolean",
                                "operator": {
                                    "type": "PLUS"
                                },
                                "expression1": {
                                    "metricExpressionType": "Logical",
                                    "functionType": "VALUE",
                                    "displayName": "used",
                                    "inputMetricText": false,
                                    "inputMetricPath": null,
                                    "relativeMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|Total Framebuffer Memory Used (MiB)"
                                },
                                "expression2": {
                                    "metricExpressionType": "Logical",
                                    "functionType": "VALUE",
                                    "displayName": "free",
                                    "inputMetricText": false,
                                    "inputMetricPath": null,
                                    "relativeMetricPath": "Custom Metrics|Temp|AI Pod|GPUs2|Total Framebuffer Memory Free (MiB)"
                                }
                            }
                        },
                        "rollupMetricData": true,
                        "expressionString": "({used}*100)/({used}+{free})",
                        "useActiveBaseline": false,
                        "sortResultsAscending": false,
                        "maxResults": 20,
                        "evaluationScopeType": null,
                        "baselineName": null,
                        "applicationName": "Server & Infrastructure Monitoring",
                        "metricDisplayNameStyle": "DISPLAY_STYLE_AUTO",
                        "metricDisplayNameCustomFormat": null,
                        "includeHistoricalNodes": false,
                        "includeAbove": true,
                        "includeBelow": false,
                        "includeBoth": false,
                        "includeBand12": false,
                        "includeBand23": false,
                        "includeBand34": false,
                        "includeBand45": false,
                        "includeShade": false
                    },
                    "axisPosition": null
                }
            ],
            "showLabels": true,
            "showPercentValues": null
        },
        {
            "widgetType": "GraphWidget",
            "title": "Average GPU Utilization",
            "height": 3,
            "width": 6,
            "minHeight": 0,
            "minWidth": 0,
            "x": 0,
            "y": 17,
            "label": null,
            "description": null,
            "drillDownUrl": null,
            "openUrlInCurrentTab": false,
            "useMetricBrowserAsDrillDown": true,
            "drillDownActionType": null,
            "backgroundColor": 16777215,
            "backgroundColors": null,
            "backgroundColorsStr": "16777215,16777215",
            "color": 1646891,
            "fontSize": 12,
            "useAutomaticFontSize": false,
            "borderEnabled": false,
            "borderThickness": 0,
            "borderColor": 14408667,
            "backgroundAlpha": 1,
            "showValues": false,
            "formatNumber": null,
            "numDecimals": 0,
            "removeZeros": null,
            "compactMode": false,
            "showTimeRange": false,
            "renderIn3D": false,
            "showLegend": true,
            "legendPosition": "POSITION_BOTTOM",
            "legendColumnCount": 1,
            "timeRangeSpecifierType": "BEFORE_NOW",
            "startTime": null,
            "endTime": null,
            "minutesBeforeAnchorTime": 15,
            "isGlobal": true,
            "propertiesMap": null,
            "dataSeriesTemplates": [
                {
                    "seriesType": "LINE",
                    "metricType": null,
                    "showRawMetricName": false,
                    "colorPalette": null,
                    "name": "Series 1",
                    "metricMatchCriteriaTemplate": {
                        "entityMatchCriteria": {
                            "matchCriteriaType": "SpecificEntities",
                            "entityType": "APPLICATION_COMPONENT_NODE",
                            "agentTypes": null,
                            "entityNames": [
                                {
                                    "applicationName": "Server & Infrastructure Monitoring",
                                    "entityType": "APPLICATION_COMPONENT_NODE",
                                    "entityName": "appd-aipod-cluster-agent-appdynamics",
                                    "scopingEntityType": "APPLICATION_COMPONENT",
                                    "scopingEntityName": "Root",
                                    "subtype": null,
                                    "uniqueKey": null
                                }
                            ],
                            "summary": false
                        },
                        "metricExpressionTemplate": {
                            "metricExpressionType": "Logical",
                            "functionType": "VALUE",
                            "displayName": "null",
                            "inputMetricText": true,
                            "inputMetricPath": "Hardware Resources|Cluster|GPU|Utilization (%)",
                            "relativeMetricPath": "Hardware Resources|Cluster|GPU|Utilization (%)"
                        },
                        "rollupMetricData": false,
                        "expressionString": null,
                        "useActiveBaseline": false,
                        "sortResultsAscending": false,
                        "maxResults": 20,
                        "evaluationScopeType": null,
                        "baselineName": null,
                        "applicationName": "Server & Infrastructure Monitoring",
                        "metricDisplayNameStyle": "DISPLAY_STYLE_AUTO",
                        "metricDisplayNameCustomFormat": null,
                        "includeHistoricalNodes": false,
                        "includeAbove": true,
                        "includeBelow": false,
                        "includeBoth": false,
                        "includeBand12": false,
                        "includeBand23": false,
                        "includeBand34": false,
                        "includeBand45": false,
                        "includeShade": false
                    },
                    "axisPosition": "LEFT"
                }
            ],
            "verticalAxisLabel": null,
            "hideHorizontalAxis": null,
            "horizontalAxisLabel": null,
            "axisType": "LINEAR",
            "stackMode": null,
            "multipleYAxis": null,
            "customVerticalAxisMin": null,
            "customVerticalAxisMax": null,
            "showEvents": null,
            "interpolateDataGaps": false,
            "showAllTooltips": null,
            "staticThresholdList": [],
            "eventFilterTemplate": null
        }
    ],
    "warRoom": false,
    "template": false
}

Troubleshooting

Connection refused usually indicates a bad service or exporter pod state, not a scrape interval issue.