Set up alerts for Edge Processor metrics

As an Edge Processor administrator, you can set up alerts that trigger when Edge Processor metrics meet a certain criteria so that you can monitor the health and status of your Edge Processors. You can then take action to troubleshoot any potential issues with your Edge Processors. You can do this from your Splunk Cloud Platform deployment in your cloud tenant for use in Edge Processors.

This table highlights the search queries that you can use to set up alerts for Edge Processor metrics as well as some potential action items you can take once that situation occurs. You can create these queries and alerts by utilizing Splunk Cloud Platform functionality. For more information on how to configure alerts in Splunk Cloud Platform, see Getting started with alerts in the Splunk Cloud Platform Alerting Manual.

Metrics Alert trigger conditions Example search Action item

Edge Processor availability

If the Edge Processor is not sending any metrics to the connected Splunk Cloud Platform deployment.

Edge Processors send metrics to Splunk Cloud Platform every 30 seconds. If the SPL query for this alert returns 0, that means the Edge Processor has not sent any metrics, indicating that it is not running as expected.

SPL query to see the number of metrics data points that the Edge Processor has sent: | mstats count(*) WHERE index=_metrics sourcetype=edge-metrics

First, verify that the Edge Processor is not in the Error status. See An Edge Processor instance is in the "Error" status for troubleshooting guidance.

If this alert persists, then verify that the host machine meets the necessary network requirements and the Edge Processor is able to send data to Splunk Cloud Platform. See Network requirements.

Edge Processor data ingestion in bytes

If data ingestion is below a certain threshold. For example, 0 indicates that the Edge Processor is not ingesting any data at all.

SPL query to see the amount of ingested data in bytes: | mstats sum(processor_bytes_in_total) as bytes_in where index=_metrics

First, verify that the Edge Processor is not in the Error status. See An Edge Processor instance is in the "Error" status for troubleshooting guidance.

If the alert persists, then verify that the ports for receiving data are configured correctly, and that your data sources are correctly configured to send data to those ports. See Configure shared Edge Processor settings.

Edge Processor queue size

If queue size is above a certain threshold. For example, 70%. This indicates that you need to increase your queue size.

SPL query to see latest queue size for each instance: | mstats latest(exporter_queue_size) as current_queue_size where index=_metrics by exporter

Increase your queue size to process more data. See these topics for more information:

Destination data send failure

If the Edge Processor fails to send data to a destination, creates errors, and those errors are above a certain threshold. This indicates that your destination configuration might be incorrect or the destination might be offline.

SPL query to see total send errors per dataset: | mstats sum(exporter_error_count) as export_failures_total where index=_metrics by dataset_name

Verify that the destination information is correct for Edge Processors by checking the edge.log file. See View logs for the Edge Processor solution for more information.

CPU usage

If your host resource has an idle CPU usage above a certain threshold. This indicates that the host CPU can't handle the required workload.

SPL query to see the CPU usage by state for each host: | mstats sum(system.cpu.time) where index=_metrics by host,state

Verify what is causing a high CPU usage and take action accordingly. Increase CPU specifications or create an additional host to manage traffic. See An Edge Processor instance is in the "Warning" status for more information.

Memory usage

If your host resource has a memory usage above a certain threshold. This indicates that the host memory can't handle the required workload.

SPL query to see memory usage in bytes per host: | mstats latest(system.memory.usage) where index=_metrics by host

Verify what is causing a high memory usage and take action accordingly, such as by increasing memory specifications. See An Edge Processor instance is in the "Warning" status for more information.