Monitor detector service latency for a group of customers

This Splunk APM example describes how to monitor for service latency.

Kai, a site reliability engineer at the fictitious Buttercup Games, wants to monitor a latency issue affecting a critical checkout workflow for the cart service and /getcart endpoint for a specific set of customers who most frequently have problems with the service.

Kai takes the following steps to monitor latency in the cart service:

  1. Kai generates a Monitoring MetricSet (MMS) and filters by span tag

  2. Kai creates service latency detectors to track metrics

  3. Kai sets up charts, dashboards, and alerts for custom dimensions

Kai generates a Monitoring MetricSet (MMS) and filters by span tag

To generate Monitoring MetricSets (MMS) by customer, Kai indexes a span tag to identify each customer: version_id. Kai then generates an MMS using version_id as a dimension. Kai sets the scope of the MMS to the cartservice, and filters on the tag values for version_id that represent the specific list of customers Kai wants to investigate.

This image shows an example MMS configuration for the cartservice endpoint /getcart and a filter by tag values for version_id:

This screenshot shows how to add a custom Monitoring MetricSet for a single service.

Kai creates service latency detectors to track metrics

Kai uses the custom dimensionalized MMS they created to monitor the performance of this critical checkout workflow in the cart service. To do this, Kai creates a detector using the same custom indexed tag, version_id, to track error rates associated with the checkout workflow.

Kai follows the guided setup detector creation to create their detector based on the error rate in the service cartservice:GetCart, filtered to the custom dimension of version_id.

Kai uses the metric finder to find additional information on the metrics and metadata for their system. Kai applies sf_dimensionalized:true as a filter to see related metrics as shown in the following image.

This screenshot shows how to filter the MetricFinder for metrics related to custom MMS.

Kai sets up charts, dashboards, and alerts for custom dimensions

Kai also creates charts and dashboards that use the custom dimensions they created.

This screenshot shows how to filter the MetricFinder for metrics related to custom Monitoring MetricSets.

Summary

By generating an MMS with version_id as a custom dimension and filtering it to the customers affected by the issue, Kai set up a detector to monitor service and endpoint latency by customer. Kai also created charts and dashboards that show service and endpoint latency for specific customers over time.

Learn more