Drill Into a Suspected Cause
Click More Details for the Suspected Cause to review:
- Simplified timeline
- Metrics graphed over time
Two types of graphed metrics display:
- Top Deviating Metrics for the Business Transaction
- Suspected Cause Metrics
Examine Top Deviating Metrics for the Business Transaction
Deviating Business Transaction metrics can indicate why an anomaly was important enough to surface. (The system does not surface anomalies for every transitory or slight deviation in metrics. Such anomalies would be of dubious value, since their customer impact is minimal. For the same reason, anomalies are surfaced for Business Transactions which have a CPM of under 20.)
Each deviating metric is shown as a thin blue line (the metric's value) against a wide gray band (the metric's Expected Range).
You can:
- Scroll along the graph to compare a metric’s value with its Expected Range at any time point
- Hover over a time point to view the metric's value and Expected Range in numerical form
In this example:
- The deviating metric spiked remained elevated for about 30 minutes, then subsided back into Expected Range
- Seven minutes after the metric returned to its Expected Range, the Severity Level changed from Critical to Warning, and eight minutes after that, to Normal
Hovering over time points tells us that for the period of deviation: the Average Response Time was around 1200 ms and above, while its Expected Range was from 370.08 to 721.24 ms.
With a key metric elevated by this large amount, it made sense for the system to surface this anomaly.
The Top Deviating Metrics timeline also displays the evaluation period of an anomaly in the grey color. The evaluation time period is the duration in which the data is analyzed to detect the anomaly. This timeline helps you to precisely identify the time when the issue started. The following image shows the evaluation period:
Examine Suspected Cause Metrics
You view, scroll through, and hover over Suspected Cause Metrics similar to Top Deviating Metrics.
In this example,
- Suspected Cause Metrics are shown for the eretail.prod.payment01_1 node within the ProcessPayment1 tier
- That is the only node the tier has. If the tier had multiple nodes, metrics could be viewed separately for each node
- The pattern of elevation in the Process CPU Burnt and Process CPU Used metrics perfectly matches the pattern we saw in the Business Transaction metrics.
The hypothesis is now confirmed:
- CPU usage spiked on ProcessPayment1, a tier that is downstream from the tier where the Business Transaction starts.
- This slowed down response time on ProcessPayment1, including its HTTP response to the HTTP request from OrderService.
- The slow HTTP call, in turn, slowed response time on OrderService.
- Since OrderService is where the Checkout Business Transaction starts, Checkout has a slow response time anomaly.
- Since the Process CPU Burn issue on ProcessPayment1 is the Suspected Cause that's deepest in the entity tree, that is the root cause of the anomaly.