Monitor and proactively troubleshoot issues in an application environment using the service map

An example of a real-life scenario for using the service map to monitor and proactively troubleshoot issues in an application environment.

Alex, a site reliability engineer for Buttercup Games, manages the company’s e-commerce site. Alex is responsible for the uptime of the e-commerce fulfillment service, ecom-fulfillment-green-svc, and wants to monitor the environment and proactively find issues before they cause problems. However, the environment is complex.

To quickly monitor the environment, Alex takes the following steps:

  1. Alex navigates to the service map.

  2. Alex finds the service and determines if it has any issues.

  3. Alex groups the service map by application name to determine if there are other issues that might affect the service.

Alex navigates to the service map

Alex uses the Splunk Observability Cloud main menu to navigate to the APM service map and view all of the services in the environment. Alex selects the constellation view icon to display a larger view of the environment.

Using the service map, Alex can understand the relationships between services, monitor error and latency issues, and quickly spot issues.

A screenshot of the constellation view of the APM service map.

Alex finds the service and determines if it has any issues

Alex wants to quickly find the fulfillment service, ecom-fulfillment-green-svc, but many of the services have similar names. In the Search for service field, Alex enters the name of the service and selects it from the drop-down menu.

A screenshot of the search bar in the APM service map.

The service map zooms in on the service and displays a panel that summarizes information about the service.

In the service map, Alex sees that the service doesn’t have a colored ring around its node. This indicates that the service has no health status because it doesn’t have any reported alerts. Alex then views the metric charts in the panel and sees that the service doesn’t have any apparent errors or latency issues.

A screenshot of the APM service map with a service selected, which displays a panel with metrics charts.

Alex groups the service map by application name to determine if there are other issues that might affect the service

Alex sees that there are services on the service map with red colored rings around their nodes, which indicates that these services have at least one major or critical alert. Alex wants to understand if these errors could affect their service.

Alex selects the Group by drop-down menu and selects service.namespace. This action groups services that share the same value for service.namespace. Their team uses this attribute to annotate application boundaries, which aligns with the OpenTelemetry convention of using the attribute to distinguish a group of services.

A screenshot of grouping services by the service.namespace value using the Group by drop-down menu in the APM service map.

Alex’s service belongs to the ad-ecommerce-o11y service group, which represents the e-commerce fulfillment application. Alex sees that there are no dependencies outside of the service group.

A screenshot of the APM service map, grouped by the service.namespace value.

Alex selects the node twice to return to the filtered service map, which individually displays only the services in the group. Alex now sees that there is one service in this service group with issues, as denoted by the red colored ring around the node. Alex contacts the team member responsible for that service to determine whether this issue is already being investigated so that it does not affect the rest of the service group.

Summary

Alex used the service map to visualize the services and relationships in their application environment. By searching for the ecom-fulfillment-green-svc service, Alex isolated the service and confirmed that it didn’t have any issues.

After grouping services by application name, Alex identified an issue in a different service that could affect the ecom-fulfillment-green-svc service. Alex then took steps to proactively troubleshoot the issue before it affected the e-commerce fulfillment application.

Learn more

To learn more about the service map, see: