Monitor and optimize your resource allocation for Kubernetes workloads
Access and use the Kubernetes Workload Optimization view.
Beta features described in this document are provided by Splunk to you "as is" without any warranties, maintenance and support, or service-level commitments. Splunk makes this beta feature available in its sole discretion and may discontinue it at any time. These documents are not yet publicly available and we ask that you keep such information confidential. Use of beta features is subject to the Splunk Pre-Release Agreement for Hosted Services.
You can use this view to:
-
Gain insights into the efficiency of your CPU and memory resource allocations for your Kubernetes workloads.
-
Identify resource over-provisioning that may be contributing to extra costs or under-provisioning that may be causing performance or reliability issues.
-
Optimize your resource allocations based on recommendations.
Prerequisites
To use workload optimization, you must meet the following requirements.
-
You have configured data collection from Kubernetes. For instructions, see Collect Kubernetes data.
-
You are using a supported Kubernetes distribution. For more information, see Supported Kubernetes distributions.
-
You are using one of the following supported Kubernetes workload kinds:
-
Deployment
-
StatefulSet
-
DaemonSet
-
-
All metrics that are collected by default with the Splunk Distribution of the Collector for Kubernetes are present in your data. No action is required if you have not deactivated any metrics. For the list of default metrics, see Collected metrics and dimensions for Kubernetes.
How workload optimization recommendations are calculated
Splunk Observability Cloud provides recommendations to optimize your resource allocation based on an algorithm that analyzes the last 14 days of your workload data. Recommendations are only provided when 90% or more of the last 24 hours of data is available.
The algorithm determines if a resource is under-provisioned or over-provisioned based on a defined threshold. It provides a recommendation to either increase or decrease your resource allocation to align with the threshold.
| Resource status | Threshold | Recommendation |
|---|---|---|
| Under-provisioned | The 95th percentile of your data points over a 14-day sample period reported greater than 85% resource usage. | Increase your CPU or memory settings to align with the 85% usage threshold. |
| Over-provisioned | The absolute maximum of your data points over a 14-day sample period reported less than 85% resource usage. | Decrease your CPU or memory settings to align with the 85% usage threshold. |
-
A workload with CPU usage in the 95th percentile is considered under-provisioned. It needs more resources to perform optimally. To align with the 85% usage threshold, Splunk Observability Cloud recommends increasing your CPU settings by 10%. The recommendation is provided in cores.
-
A workload with an absolute maximum memory usage of 40% is considered over-provisioned. It has more resources than necessary to perform optimally, and you may reduce costs by reducing your resource allocation. To align with the 85% usage threshold, Splunk Observability Cloud recommends reducing your memory settings by 45%. The recommendation is provided in GiB.
Recommendations are rounded to practical values. For example, consider a workload with 4 CPU cores. If the optimal value calculated by the algorithm is 4.57 cores, the recommendation is rounded to 4.50 cores. If the optimal value is 4.07 cores, the recommendation is rounded to 4.25 cores. Recommendations aren't provided for small scale-downs if you'd save less than one step (0.25 cores in this example).
Access the Workload Optimization view
To access this view, use one of the following methods:
-
From the Splunk Observability Cloud main menu, select .
-
From the new Kubernetes experience, select view all in the Workload optimization - starvation risk chart.
The following screenshot shows an example of the Workload Optimization view.
Monitor the resource allocation for all Kubernetes workloads
The Workload Optimization view displays:
-
Panels with aggregate metrics you can use track the number, starvation risk, and resource footprint of all of the Kubernetes workloads in your environment.
-
A Kubernetes Workloads table that lists the workloads in your environment, along with properties and metrics related to resource allocation.
The following sections describe how to use the panels in this view:
Monitor the number of total and processed workloads
The Workloads panel displays the number of Kubernetes workloads successfully processed by the Workload Optimization view. This number is the total number of workloads minus the following:
-
Workloads that you added fewer than 24 hours ago. The Workload Optimization view processes data once every 24 hours.
-
Workload kinds that are not supported by the Workload Optimization view.
-
Workloads that were not processed due to an error.
Monitor the starvation risk of your workloads
The Workloads by Starvation Risk panel displays the number of workloads at high, medium, low, or minimal risk of running out of CPU or memory.
| Starvation risk category | Description |
|---|---|
| High | The container is using 95% or higher or its limit settings. |
| Medium |
The container meets at least one of the following conditions:
|
| Low | The container's CPU or memory usage is lower than the target usage. |
| Minimal | None of the above conditions are detected. All containers have request settings for both CPU and memory, and neither of these resources has usage exceeding its target usage. |
Monitor the resource footprint of your workloads
| Metric name | Description |
|---|---|
| Current | The sum of the request settings for the pods of your Kubernetes workloads. If the request value is not set, the number represents the sum of actual usage. |
| Recommended | The projected resource usage if all recommendations are applied. |
| Impact | The impact of applying the resource usage recommendations, in absolute units and in the percentage between the recommended and current usage. |
Drill down to the workload optimization detail view
From the Workload Optimization view, select a workload name in the Kubernetes Workloads table to drill down to the detail view for the workload. You can use the detail view to analyze and optimize the resource allocation of a workload.
The following sections describe how to use the detail view:
Analyze the resource efficiency of a workload
The Efficiency Analysis section displays the following panels that you can use to analyze the resource efficiency of the workload:
-
Resource Starvation Risk: The workload's risk of running out of CPU or memory. For more information on the risk categories, see Monitor the starvation risk of your workloads.
-
Average Pod Count: The number of pod replicas running for this workload averaged over the analysis period.
-
Resource Footprint: The current resource usage, the resource usage if all recommendations are applied, and the impact of applying the resource usage recommendations. For details, see Monitor the resource footprint of your workloads.
-
Resource Efficiency: The ratio of resource usage to resource allocation. This is a percentage relative to allocated resources. Resource efficiency above 70-80% may introduce resource starvation risks.
View and optimize the resource allocation for the containers in a workload
-
The current and recommended resource usage.
-
A chart visualizing the historical resource usage.
-
YAML snippets you can copy and paste into your configuration to improve your resource usage.
Optimize your HPA resource allocation
If the workload has an associated horizontal pod autoscaler (HPA), the HPA Recommendation section lists compatible adjustments you can make to your HPA configuration to improve its resource allocation.
You can specify your HPA configuration by sending HPA metrics to Splunk Observability Cloud. If the HPA configuration does not populate in this section, you can manually specify the configuration with the following steps.
-
In the Type column, specify the scope:
-
Select Pod if your HPA's CPU utilization target applies to the CPU utilization of the pod as a whole. This is the only option supported for HPA v1 resources.
-
Select Container if your HPA's CPU utilization target applies to a particular container only. This capability is supported for HPA v2 resources.
-
-
In the Current target column, enter the percentage that matches the current CPU utilization target value from your HPA configuration file.