Find a code-level issue affecting a business transaction using Call Graph Profiling

An example of a real-world scenario for finding a code-level issue affecting a business transaction using Call Graph Profiling.

Kai, a site reliability engineer at Buttercup Games, sees that a critical business transaction is affecting the checkout process for the company’s e-commerce site.

To determine how to resolve the issue, Kai takes the following steps:

  1. Kai views the list of business transactions.

  2. Kai investigates the affected business transaction.

  3. Kai finds problematic traces.

  4. Kai views the call graph and determines the code owner for the issue.

Kai views the list of business transactions

Kai uses the Splunk Observability Cloud main menu to navigate to the APM Overview. Kai selects the Business transactions tab to view the list of business transactions configured for the e-commerce application. The list represents the business processes that are most important to Buttercup Games.

Kai finds a business transaction named ecom-ecommerce-green-svc:POST /ecommerce/checkout with a Critical alert. This business transaction represents the traces involved in the checkout process of the e-commerce site.

A screenshot of the Business Transactions tab in the APM Overview page.

Kai selects the alert to view additional details and sees that the triggered alert signals elevated latency. Kai now knows that users may be facing slow response times in the checkout process of the application.

A screenshot of selecting an alert for a business transaction in the Business Transactions tab of the APM Overview page.

Kai investigates the affected business transaction

Kai selects the name of the business transaction to navigate to the business transaction view. Kai views the Business transaction duration chart and confirms that the business transaction has elevated latency.

Kai then views the service map and sees that the business transaction involves 2 services, 1 inferred service, and 3 inferred databases, which narrows the investigation.

A screenshot of the business transaction view in Splunk APM.

Kai finds problematic traces

Kai selects the Traces tab to view the list of Long traces, or traces with long latency. In the list, Kai sees a trace with a call graph, which is denoted by a blue icon.

A screenshot of the Traces tab in the business transaction view.

Kai views the call graph and determines the code owner for the issue

Kai selects the trace link to navigate to the trace view, which displays the spans which represent all of the individual operations in the business transaction. Kai selects the blue call graph icon on the first span.

A screenshot of the trace view, which contains a span with a call graph.

Kai investigates the individual methods executed during this time. Kai spots the longest method, a call to java.lang.StackTraceElement.initStackTraceElements(StackTraceElement.java:0), which makes up 96% of the total method execution time.

A screenshot of a call graph.

Kai recognizes that this code belongs to the Application Identity team. Kai copies the link to the call graph and sends it to the code owner, who is able to track the issue back to a recent code push that accidentally increased latency. The code owner reverts the code and resolves the issue.

Summary

Kai identified a business transaction with a latency issue and used a call graph to identify the exact method that caused the error as well as the owner of the problematic code. Kai then shared the call graph with the code owner, who used it to quickly troubleshoot and resolve the issue.

Learn more

To learn more about business transactions, see Configure business transaction rules.

To learn more about Call Graph Profiling, see Introduction to Call Graph Profiling in Splunk APM.