Monitor and optimize your resource allocation for Kubernetes workloads

Access and use the Kubernetes Workload Optimization view.

Attention:

Beta features described in this document are not yet generally available, and are provided by Splunk to you "as is" without any warranties, maintenance and support, or service-level commitments. All beta features are made available at the sole discretion of Splunk and may be discontinued at any time. Use of beta features is subject to the Splunk Pre-Release Agreement for Hosted Services.

Monitor and optimize the resource allocation for your Kubernetes workloads with the Workload Optimization view.

You can use this view to:

Gain insights into the efficiency of your CPU and memory resource allocations for your Kubernetes workloads.

Identify resource over-provisioning that may be contributing to extra costs or under-provisioning that may be causing performance or reliability issues.

Optimize your resource allocations based on recommendations.

Prerequisites

To use workload optimization, you must meet the following requirements.

You have configured data collection from Kubernetes. For instructions, see Collect Kubernetes data.

You are using a supported Kubernetes distribution. For more information, see Supported Kubernetes distributions.

You are using one of the following supported Kubernetes workload kinds:
- Deployment
- StatefulSet
- DaemonSet

All metrics that are collected by default with the Splunk Distribution of the Collector for Kubernetes are present in your data. No action is required if you have not deactivated any metrics. For the list of default metrics, see Collected metrics and dimensions for Kubernetes.

How workload optimization recommendations are calculated

Splunk Observability Cloud provides recommendations to optimize your resource allocation based on an algorithm that analyzes the last 14 days of your workload data. Recommendations are only provided when 90% or more of the last 24 hours of data is available.

The algorithm determines if a resource is under-provisioned or over-provisioned based on a defined threshold. It provides a recommendation to either increase or decrease your resource allocation to align with the threshold.

The following table describes the resource statuses and thresholds.


Resource status	Threshold	Recommendation
Under-provisioned	The 95th percentile of your data points over a 14-day sample period reported greater than 85% resource usage.	Increase your CPU or memory settings to align with the 85% usage threshold.
Over-provisioned	The absolute maximum of your data points over a 14-day sample period reported less than 85% resource usage.	Decrease your CPU or memory settings to align with the 85% usage threshold.

For example:

A workload with CPU usage in the 95th percentile is considered under-provisioned. It needs more resources to perform optimally. To align with the 85% usage threshold, Splunk Observability Cloud recommends increasing your CPU settings by 10%. The recommendation is provided in cores.

A workload with an absolute maximum memory usage of 40% is considered over-provisioned. It has more resources than necessary to perform optimally, and you may reduce costs by reducing your resource allocation. To align with the 85% usage threshold, Splunk Observability Cloud recommends reducing your memory settings by 45%. The recommendation is provided in GiB.

Recommendations are rounded to practical values. For example, consider a workload with 4 CPU cores. If the optimal value calculated by the algorithm is 4.57 cores, the recommendation is rounded to 4.50 cores. If the optimal value is 4.07 cores, the recommendation is rounded to 4.25 cores. Recommendations aren't provided for small scale-downs if you'd save less than one step (0.25 cores in this example).

Access the Workload Optimization view

To access this view, use one of the following methods:

From the Splunk Observability Cloud main menu, select Infrastructure > Workload Optimization.

From the new Kubernetes experience, select view all in the Workload optimization - starvation risk chart.

The following screenshot shows an example of the Workload Optimization view.

The Workload Optimization list view.

Monitor the resource allocation for all Kubernetes workloads

The Workload Optimization view displays:

Panels with aggregate metrics you can use track the number, starvation risk, and resource footprint of all of the Kubernetes workloads in your environment.

A Kubernetes Workloads table that lists the workloads in your environment, along with properties and metrics related to resource allocation.

The following sections describe how to use the panels in this view:

Monitor the number of total and processed workloads

The Workloads panel displays the number of Kubernetes workloads successfully processed by the Workload Optimization view. This number is the total number of workloads minus the following:

Workloads that you added fewer than 24 hours ago. The Workload Optimization view processes data once every 24 hours.
Workload kinds that are not supported by the Workload Optimization view.
Workloads that were not processed due to an error.

Monitor the starvation risk of your workloads

The Workloads by Starvation Risk panel displays the number of workloads at high, medium, low, or minimal risk of running out of CPU or memory.

The following table describes the starvation risk categories and how they are calculated.


Starvation risk category	Description
High	The container is using 95% or higher or its limit settings.
Medium	The container meets at least one of the following conditions: At least one resource (CPU or memory) is undefined. All request settings are defined and usage of at least one resource exceeds its `request` setting for any time slot.
Low	The container's CPU or memory usage is lower than the target usage.
Minimal	None of the above conditions are detected. All containers have `request` settings for both CPU and memory, and neither of these resources has usage exceeding its target usage.

Monitor the resource footprint of your workloads

The Resource Footprint panel displays the following metrics related to CPU and memory usage.


Metric name	Description
Current	The sum of the `request` settings for the pods of your Kubernetes workloads. If the `request` value is not set, the number represents the sum of actual usage.
Recommended	The projected resource usage if all recommendations are applied.
Impact	The impact of applying the resource usage recommendations, in absolute units and in the percentage between the recommended and current usage.

Drill down to the workload optimization detail view

From the Workload Optimization view, select a workload name in the Kubernetes Workloads table to drill down to the detail view for the workload. You can use the detail view to analyze and optimize the resource allocation of a workload.

The following sections describe how to use the detail view:

Analyze the resource efficiency of a workload

The Efficiency Analysis section displays the following panels that you can use to analyze the resource efficiency of the workload:

Resource Starvation Risk: The workload's risk of running out of CPU or memory. For more information on the risk categories, see Monitor the starvation risk of your workloads.

Average Pod Count: The number of pod replicas running for this workload averaged over the analysis period.

Resource Footprint: The current resource usage, the resource usage if all recommendations are applied, and the impact of applying the resource usage recommendations. For details, see Monitor the resource footprint of your workloads.

Resource Efficiency: The ratio of resource usage to resource allocation. This is a percentage relative to allocated resources. Resource efficiency above 70-80% may introduce resource starvation risks.

View and optimize the resource allocation for the containers in a workload

The Instant Recommendations section displays recommendations for optimizing the resource allocation for the containers in this workload.

Note: If your cluster has a horizontal pod autoscaler (HPA) enabled, you must apply any CPU recommendations along with HPA recommendations. If these recommendations are not applied together, your HPA configuration may undo your new CPU settings. For more information on HPA recommendations, see

Optimize your HPA resource allocation.

The Workload Breakdown table lists the containers in the workload. For each container, the table includes:

The current and recommended resource usage.

A chart visualizing the historical resource usage.

YAML snippets you can copy and paste into your configuration to improve your resource usage.

Optimize your HPA resource allocation

If the workload has an associated horizontal pod autoscaler (HPA), the HPA Recommendation section lists compatible adjustments you can make to your HPA configuration to improve its resource allocation.

You can specify your HPA configuration by sending HPA metrics to Splunk Observability Cloud. If the HPA configuration does not populate in this section, you can manually specify the configuration with the following steps.

In the Type column, specify the scope:
1. Select Pod if your HPA's CPU utilization target applies to the CPU utilization of the pod as a whole. This is the only option supported for HPA v1 resources.
2. Select Container if your HPA's CPU utilization target applies to a particular container only. This capability is supported for HPA v2 resources.
In the Current target column, enter the percentage that matches the current CPU utilization target value from your HPA configuration file.

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

Enterprise Security

SOAR

IT Service Intelligence

Content Packs

Splunk Observability Cloud

AppDynamics SaaS

AppDynamics On-Premises

Virtual Appliance (Self-Hosted)

Developer Documentation

Splunkbase

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

DATA MANAGEMENT

SEARCH AND ANALYTICS

ADMINISTRATION

Enterprise Security

SOAR

ENTERPRISE SECURITY

SOAR

RELATED APPS

IT Service Intelligence

Content Packs

ITSI

IT Ops

ADMINISTRATION

EXTENSIONS

Splunk Observability Cloud

MONITORING

DATA MANAGEMENT

ADMINISTRATION

AppDynamics SaaS

AppDynamics On-Premises

Virtual Appliance (Self-Hosted)

ESSENTIALS

MONITORING

ADMINISTRATION

Developer Documentation

Splunkbase

PLATFORM

OBSERVABILITY

REFERENCE

Resources

REFERENCE

Learn More

Support

Monitor and optimize your resource allocation for Kubernetes workloads

Prerequisites

How workload optimization recommendations are calculated

Access the Workload Optimization view

Monitor the resource allocation for all Kubernetes workloads

Monitor the number of total and processed workloads

Monitor the starvation risk of your workloads

Monitor the resource footprint of your workloads

Drill down to the workload optimization detail view

Analyze the resource efficiency of a workload

View and optimize the resource allocation for the containers in a workload

Optimize your HPA resource allocation