Find a code-level issue affecting a business transaction using Call Graph Profiling
An example of a real-world scenario for finding a code-level issue affecting a business transaction using Call Graph Profiling.
Kai, a site reliability engineer at Buttercup Games, sees that a critical business transaction is affecting the checkout process for the company’s e-commerce site.
To determine how to resolve the issue, Kai takes the following steps:
Kai views the list of business transactions
Kai uses the Splunk Observability Cloud main menu to navigate to the APM Overview. Kai selects the Business transactions tab to view the list of business transactions configured for the e-commerce application. The list represents the business processes that are most important to Buttercup Games.
Kai finds a business transaction named ecom-ecommerce-green-svc:POST /ecommerce/checkout with a Critical alert. This business transaction represents the traces involved in the checkout process of the e-commerce site.Kai selects the alert to view additional details and sees that the triggered alert signals elevated latency. Kai now knows that users may be facing slow response times in the checkout process of the application.
Kai investigates the affected business transaction
Kai selects the name of the business transaction to navigate to the business transaction view. Kai views the Business transaction duration chart and confirms that the business transaction has elevated latency.
Kai then views the service map and sees that the business transaction involves 2 services, 1 inferred service, and 3 inferred databases, which narrows the investigation.
Kai finds problematic traces
Kai selects the Traces tab to view the list of Long traces, or traces with long latency. In the list, Kai sees a trace with a call graph, which is denoted by a blue icon.
Kai views the call graph and determines the code owner for the issue
Kai selects the trace link to navigate to the trace view, which displays the spans which represent all of the individual operations in the business transaction. Kai selects the blue call graph icon on the first span.
Kai investigates the individual methods executed during this time. Kai spots the longest method, a call to java.lang.StackTraceElement.initStackTraceElements(StackTraceElement.java:0), which makes up 96% of the total method execution time.
Kai recognizes that this code belongs to the Application Identity team. Kai copies the link to the call graph and sends it to the code owner, who is able to track the issue back to a recent code push that accidentally increased latency. The code owner reverts the code and resolves the issue.
Summary
Kai identified a business transaction with a latency issue and used a call graph to identify the exact method that caused the error as well as the owner of the problematic code. Kai then shared the call graph with the code owner, who used it to quickly troubleshoot and resolve the issue.
Learn more
To learn more about business transactions, see Configure business transaction rules.
To learn more about Call Graph Profiling, see Introduction to Call Graph Profiling in Splunk APM.