Splunk Operator for Kubernetes
Overview of the Splunk Operator for Kubernetes (SOK) and its general features and limitations.
The Splunk Operator for Kubernetes (SOK) provides a scalable, Kubernetes-native solution for deploying and managing Splunk Enterprise in private and public cloud environments. Built on Kubernetes best practices, it leverages custom resource objects to automate deployment, streamline operations, and ensure high availability.
The true power of SOK is realized in multi-environment, large-scale deployments, particularly when supporting multiple use cases such as setting up non-production and production environments, deploying dedicated search heads for distinct functions (e.g., security and business operations), and managing distinct environments for different teams within the same organization. This makes it especially valuable for organizations with mature DevOps practices and established Kubernetes infrastructure.
This Splunk Validated Architecture outlines deployment options and best practices across key areas, including security, App Framework, premium apps, and data ingestion, to help organizations fully leverage the capabilities of SOK.
General Features and Limitations
Benefits
- Reduce manual work and daily management with built-in automation, leveraging Kubernetes’ reliability for efficient Splunk operation.
- Seamless scalability by using Horizontal Pod Autoscalers (HPA) to automatically adjust indexer capacity based on real-time CPU usage
- Enhanced high availability with Pod Disruption Budgets (PDBs) that maintain service continuity during node disruptions. For example, enforcing a minimum of three standalone pods to remain active even during maintenance events or other disruptions.
- When configuring Universal Forwarders with SOK, it is not necessary to specify multiple indexer targets explicitly. The Kubernetes service associated with the Splunk indexer cluster deployed by SOK inherently provides load balancing across available indexers. The Universal Forwarders can rely on the Kubernetes service to distribute data efficiently, simplifying deployment and ensuring scalable data ingestion.
Limitations
- SmartStore is mandatory for production deployments, limiting flexibility in storage configurations.
- No reduction in hardware requirements compared to traditional VM-based Splunk setups; resource demands remain the same.
- In single-site indexer clusters, if the number of replicas specified in the IndexerCluster CR is set below the replication factor (RF) defined on the Cluster Manager, SOK will automatically scale up
- SOK does not offer a dedicated Custom Resource Definition (CRD) for heavy forwarders or deployment servers. However, a standalone instance CRD can be repurposed to act as a heavy forwarder or deployment server, allowing flexibility in deployment while maintaining SOK compatibility.
- SOK does not support the splunk/splunkforwarder image. As a result, Universal Forwarders must be deployed and managed outside of the SOK’s orchestration scope. These forwarders can still forward data into the SOK-managed indexer layer via standard Splunk configurations.
Deployment Options
Details on various Splunk Operator for Kubernetes (SOK) deployment configurations, including single-site, multi-site, and different namespace strategies.
Single-Site Deployment
SOK fully supports the following single-site deployments defined under Splunk Validated Architectures:
- Single Server Deployment (S1)
- Distributed Clustered Deployment - Single Site (C1 / C11)
- Distributed Clustered Deployment with SHC - Single Site (C3 / C13)
Distributed Non-Clustered Deployment (D1 / D11) architecture is not recommended when using SOK, as deployments without Indexer Clustering lack the redundancy normally expected within Kubernetes environments.
Available Helm Chart Support:
- Single Server Deployment (S1)
- Distributed Clustered Deployment + SHC - Single Site (C3)
Benefits
- Centralized deployment within a single Kubernetes cluster reduces operational complexity.
- Access and process data locally to minimize latency and enhance query response times.
- Consolidation of resources in one site can reduce infrastructure and networking costs.
Limitations
- Despite redundancy, the entire deployment is confined to one site, which may be vulnerable to site-wide outages or disasters.
- Does not provide disaster recovery across multiple locations.
- Maintenance or upgrades may require careful planning to avoid service interruptions due to the single-site nature.
Multi-Site Deployment
SOK fully supports all multi-site deployments defined under Splunk Validated Architectures:
- Distributed Clustered Deployment - Multisite (M2 / M12)
- Distributed Clustered Deployment with SHC - Multisite (M3 / M13)
- Distributed Clustered Deployment + Stretched SHC - Multisite (M4 / M14)
Helm Chart Support
- Distributed Clustered Deployment + SHC - Multi-Site (M4)
Benefits
- Highly available across multiple availability zones.
- Some operations are performed per site which mitigates the risk of impact on the whole cluster such as Splunk upgrades and scaling up resources.
- Specific indexer services are created per site allowing to send events to the indexers located in the same zone, avoiding possible cost of cross-zone traffic.
- Allows site affinity to be configured for the SHC.
- Scaling up of indexers is smooth and is handled per zone.
Limitations
- Deployed within one region only.
- All the IndexerCluster resources must be located in the same K8s namespace.
- Single object store required, accessible across zones to meet the smartstore requirement.
- Cluster Manager high availability is managed by Kubernetes. Active/standby option not yet available with SOK.
- Management Servers high availability is within one zone only.
- Search head clusters can only be deployed within one zone.
One SOK Instance, Multiple Kubernetes Namespaces
Architecture and Topology
- A single Splunk SOK is configured with cluster-wide permissions to monitor and manage multiple Kubernetes namespaces, each containing a distinct Splunk deployment.
- Each Splunk cluster is deployed in its own dedicated namespace, ensuring separation of resources and configurations.
- The License Manager must be externally deployed and managed by the customer outside the Kubernetes cluster.
- Full isolation between Splunk clusters through namespace-level segregation.
- Each Splunk cluster maintains its own Kubernetes secret object.
- Centralized management with independent operational domains per namespace.
Use Cases
- Centralized Operations Team: A single team responsible for operating and maintaining SOK across the cluster.
- Decentralized Access Control: Different business teams have isolated access to their respective Splunk clusters and namespaces, ensuring data and access segregation.
Benefits
- A single SOK instance handles all deployments across namespaces, reducing the need to deploy and manage multiple SOK instances.
- Each Splunk cluster operates in its own namespace, ensuring strong separation of resources, secrets, and configurations.
- Easily extendable to new teams or environments by creating additional namespaces and deploying new clusters under the same SOK instance.
- Centralized governance allows a single operations team to manage SOK, while still maintaining isolation for each Splunk deployment.
- Kubernetes RBAC policies can restrict access to specific namespaces, ensuring that only authorized teams can view or modify their deployments.
Limitations
- Since a single SOK instance manages all namespaces, upgrades apply to all Splunk clusters simultaneously, limiting flexibility for tenant-specific upgrade cycles.
- SOK must be granted cluster-wide permissions, which may raise security concerns in highly regulated environments.
- Licensing cannot be managed within Kubernetes and must be externally hosted and maintained, adding architectural complexity and management overhead.
- Although namespaces provide logical isolation, runtime isolation is still limited by the shared nature of the Kubernetes cluster (e.g., noisy neighbors).
Multiple SOK Instances, Multiple Kubernetes Namespaces
Architecture and Topology
- Multiple namespaces exist within a single Kubernetes cluster
- Each namespace hosts
- One SOK instance
- One or more Splunk Validated Architecture (SVA) deployments
- Each Splunk deployment is assigned its own dedicated Kubernetes secret object
- A single License Server can be externally deployed and managed outside of SOK, or a License Server can be deployed per namespace.
- Complete operational and resource isolation between namespaces and their respective Splunk deployments
- Each namespace operates independently, with no shared components or dependencies
Use Cases
- Independent Team Management
- Teams that need to fully manage their own Splunk deployments, including access to the SOK and infrastructure setup
- Tenants requiring strict separation of management, operations, and data across different Splunk environments
Benefits
- Complete separation of resources, configurations, and access per tenant or team which enhances security and fault containment between deployments.
- Easy to onboard new teams or business units by provisioning a new namespace with its own SOK instance and SVAs.
- Kubernetes RBAC can be applied at the namespace level to enforce fine-grained access and operational boundaries.
Limitations
- Running multiple SOK instances and Splunk components in separate namespaces can increase CPU/memory usage and cluster complexity.
- Each namespace needs separate management (SOK upgrades, secret management, troubleshooting), which may lead to duplicated effort.
- The license server cannot be shared across namespaces and must be externally deployed, introducing complexity and potential cost.
- Although namespaces are isolated, they still share the underlying cluster resources (CPU, memory, storage limits), which can lead to contention.
- Without centralized governance, deployments may diverge in configuration, security standards, or monitoring practices.
- As the number of namespaces and deployments grows, managing them at scale (e.g., monitoring, upgrades, security patching) becomes more challenging.
One SOK instance, One Kubernetes Namespace
Architecture and Topology
- Single Kubernetes Cluster
- Single Namespace
- One SOK instance managing multiple Splunk clusters within the same namespace
- Example deployments within the namespace:
- Two distinct SVAs (Splunk Validated Architectures)
- 2 Search Head Clusters + 1 Indexer Cluster
- 1 Search Head Cluster + 2 Indexer Clusters
- License Manager can be deployed:
- Within the same namespace, or
- Externally managed by the customer outside the K8s environment
- Centralized Management: All Splunk components are deployed and managed within a single namespace, simplifying operations and upgrades.
- Inter-Cluster Communication: Splunk components can freely communicate across clusters (e.g., search heads querying multiple indexer clusters).
- Secret Management:
- Different Kubernetes secret objects can be used for each SVA.
- SOK creates a copy of the global secret per cluster, which can then be manually modified if customization is required.
- No Isolation Required: The architecture is designed for use cases where isolation between Splunk clusters is not a requirement.
Use Cases
- Teams requiring cross-access between multiple Splunk deployments (e.g., ES and core SHC setups).
- Example setups include:
- One Ad-hoc Search Head + One ES Search Head Cluster + Shared Indexer Set
- Two Indexer Clusters + One Shared Search Head Cluster
- Multiple SVA setups within the same namespace
Benefits
- All Splunk components are managed centrally within one namespace, reducing deployment complexity and operational overhead.
- A single SOK instance simplifies upgrades and configuration management across all Splunk clusters.
- Although secrets are copied from a global object, they can be manually modified per cluster to accommodate different credentials or configurations.
- Well-suited for internal teams or business units that require collaboration and shared access across different Splunk setups (e.g., core + ES).
- Capable of handling multiple SVAs within the same namespace, including shared and independent search head/indexer clusters.
Limitations
- Limited access control granularity if multiple teams operate within the same namespace — RBAC and role isolation are harder to enforce.
- Secret objects require manual editing per cluster after the initial copy, increasing administrative effort and risk of inconsistency.
- All clusters are affected by SOK-level changes, making staggered upgrades or cluster-specific versioning difficult.
- As more clusters are added to the same namespace, resource contention and namespace-level limits (e.g., number of objects) may become bottlenecks.
Storage Considerations
Information on storage requirements for Splunk Operator for Kubernetes (SOK), focusing on SmartStore and persistent volumes.
SOK mandates the use of SmartStore, a feature that decouples compute and storage by leveraging remote object storage for indexed data. This approach enhances scalability and storage efficiency by offloading warm and cold data to external storage while keeping hot data on local persistent volumes. Storage provisioning still relies on Kubernetes Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) for hot data and configuration persistence, but SmartStore fundamentally changes the storage architecture and operational considerations. SmartStore has its own Splunk SVA which can be found here.
Key Considerations:
- SOK requires SmartStore to be enabled, which uses remote object storage for cold data, reducing local storage needs and improving scalability.
- Local persistent volumes are still required for hot buckets, and caching to ensure low-latency access and indexing performance.
- Select and configure compatible remote object storage solutions (e.g., S3-compatible storage) that meet performance, durability, and security requirements. See requirements here.
- Choose Kubernetes storage classes optimized for high IOPS and low latency to support hot data storage.
- Use appropriate access modes for local persistent volumes; remote object storage access is managed by SmartStore.
- Optimize local storage for fast access to hot data; monitor network and object storage performance for cold data retrieval.
- Critical data must not reside on ephemeral storage due to volatility.
Security Principles
Guidelines for securing Splunk deployments with SOK, covering ingress, network security, password management, and FIPS compliance.
Ingress and Network Security
Protecting Ingress in Kubernetes against security threats.
Ingress in Kubernetes allows external traffic to securely access internal services and pods. In a Splunk deployment, common use cases include accessing Splunk Web via browser, forwarding data into the cluster, or calling REST APIs from outside.
It is strongly recommended that all communications with external systems be encrypted using SSL/TLS to protect data in transit. Refer to Splunk’s official security documentation for detailed guidance.
There are two modes of ingress encryption termination:
- End-to-End Termination: Data is encrypted by the forwarder and only decrypted by the indexer. The entire path, including gateways and virtual services, carries the data securely without modification.
- Gateway Termination: Data is encrypted by the forwarder but decrypted at the ingress gateway before being passed along in plain text to internal services.
Forwarder Management
Forwarding Data Ingress supports both end-to-end termination and gateway termination.
Key Considerations:
- Forwarding Data Ingress supports both end-to-end termination and gateway termination
- When configuring ingress for use with Splunk Forwarders, the configured ingress load balancer must resolve to two or more IPs. This is required so the auto load balancing capability of the forwarders is preserved.
- For Ingress we recommend using separate ports for encrypted and non-encrypted traffic.
- Indexer Discovery is not supported on a Kubernetes cluster. Instead, the Ingress controllers will be responsible to connect forwarders to peer nodes in Indexer clusters.
Forwarding Data Ingress:
End-to End Termination:
Forwarder encrypts the data, and Indexer decrypts it. It travels throughout the channel encrypted, passing through the Gateway and Virtual Service without change.
When using TLS for Ingress, we recommend adding an additional port for secure communication. By default, port 9997 will be assigned for non-encrypted traffic and you can use any other available port for secure communications.
Benefits
- End to end encryption: Confidentiality is preserved throughout the entire transport path, including through the Gateway and Virtual Services.
- Compliance: This approach helps meet regulatory requirements (e.g. HIPPA, GDPR) by ensuring data is encrypted in transit across all network boundaries.
- Using a dedicated port for encrypted traffic (different from the default 9997) allows for clear separation between secure and non-secure channels which helps with traffic management and firewall rules.
Limitations
- More operational effort needing to configure TLS certificates on the Forwarder as well as any Splunk Enterprise indexers, cluster peers, or standalone instances.
- Cert manager is not yet supported so manual certificate lifecycle update needed (coming soon).
Gateway Termination:
The Forwarder encrypts the data and sends it to the Gateway, which decrypts the data before handling it to the Virtual Service.
Note that in this case the Forwarder's outputs.conf should be configured for TLS, while the Indexer's input.conf should be configured to accept non-encrypted traffic.
Benefits
- Simplified certificate deployment, renewal and rotation especially in large clusters as TLS is handled on the Gateway only.
- Scaling indexers is easier since you don’t need to provision or manage TLS certificates for each indexer.
Limitations
- No end-to-end encryption.
- The forwarder still needs to be configured for TLS , which means managing certs and secure outputs on the sending side.
- Mixed configuration where the outputs.conf on the forwarder is set to use TLS while the inputs.conf on the Indexer accepts non-TLS traffic, this can introduce a risk of misalignment or misconfiguration.
Splunk Web and REST API
Splunk Web and the REST API only support end-to-end TLS.
Splunk Web and REST API only support end-to-end TLS, meaning that encrypted traffic from the client must be terminated directly at the Splunk service, not at the Gateway. In this configuration, the Gateway is used in TLS passthrough mode, forwarding encrypted traffic without decrypting it. This approach ensures full encryption from the client to Splunk, enhancing security and compliance, but it also prevents use of Layer 7 functions like HTTP routing, inspection, or authentication on this traffic.
Password Management
SOK utilizes a global Kubernetes secret object to manage credentials.
SOK utilizes a global Kubernetes secret object (splunk-<namespace>-secret) to centrally manage authentication credentials across all Splunk Enterprise custom resources (CRs) within a namespace. This secret object contains the following configurable tokens:
- HTTP Event Collector token for data ingestion
- Default administrator password
- pass4Symmkey: Shared authentication key for internal communication
- IDXC pass4Symmkey: Indexer clustering secret
- SHC pass4Symmkey: Search head clustering secret
The secret is volume-mounted on all pods within the namespace and is automatically consumed by SOK during instance initialization and updates. All secrets must be maintained within this object. Modifications via the Splunk UI or CLI are not supported and can result in misalignment.
Benefits
- Centralized secret management by ensuring consistency across all Splunk components in the namespace by maintaining a single source of truth.
- Simplified automation by integrating with SOK’s workflow for initializing and updating Splunk instances.
- Namespace isolation by scoping secrets to specific namespaces; this is primarily beneficial when SOK is managing multiple Splunk deployments across different namespaces.
Limitations
- No built-in rotation or expiry of secrets thus manual intervention is needed to rotate or expire secrets. If secrets are modified directly via the Splunk UI or CLI, SOK will not reflect these changes, potentially leading to misconfiguration.
- Expected service disruptions during secret updates.
Compliance : FIPS 140-3 Enabled Cluster
SOK is fully certified to run on FIPS 140-3 compliant clusters.
SOK is fully certified to run on FIPS 140-3 compliant clusters.
Application Lifecycle Management
Information on managing Splunk applications and premium apps within SOK, including the App Framework and support for specific Splunk products.
App Framework
The App Framework automates the distribution and management of Splunk apps.
The SOK App Framework is a feature of SOK that automates the distribution and management of Splunk apps across different types of Splunk resources, such as indexer clusters, search head clusters, and standalone instances. It allows you to define app sources hosted in remote object storage and uses a declarative approach to fetch, stage, and deploy apps to the appropriate pods. This enables centralized app lifecycle management, consistent configuration across environments, and seamless integration with CI/CD pipelines, helping teams scale Splunk deployments efficiently in Kubernetes-native environments.
Key Considerations:
- The App Framework relies on external S3-compatible object storage (like AWS S3, Azure Blob Storage, or GCP Cloud Storage) to host Splunk apps and add-ons. This necessitates designing and implementing a robust, secure, and highly available external storage solution as a prerequisite. Implementing strict read-only access for SOK to the app storage location is a critical security best practice to maintain the integrity of the stored data. Secure connections using a minimum of TLS 1.2 should be enforced.
- The framework's use of external storage aligns well with CI/CD pipelines, allowing for automated testing, versioning, and deployment of Splunk apps. The design should incorporate how app updates will be managed and pushed to the storage bucket.
- The SOK pod itself requires a persistent storage volume. The design needs to account for this requirement to ensure SOK’s state is maintained.
- Consider the methods for SOK and Splunk instances to authenticate with the external storage, such as using Kubernetes Secrets with static credentials, IAM roles (for AWS), Managed Identity (for Azure), or Workload Identity (for GCP). Workload Identity is often preferred for enhanced security by avoiding the need to store credentials in secrets.
- Ensure the SOK version is compatible with the desired Splunk Enterprise version and the Kubernetes distribution being used. Reviewing release notes and compatibility matrices is essential during the design phase.
- SOK, including the App Framework, leverages the Kubernetes Operator pattern and custom resources for declarative management of Splunk deployments. The solution design should embrace this declarative approach for consistency and automation.
- Ingest Actions via App Framework: Rulesets and filtering logic can be enabled using the App Framework to manage Ingest Actions centrally. This allows for refined control over what data is indexed, routed, or dropped at the ingestion point, supporting use cases such as metadata enrichment or event routing directly from the heavy forwarder or standalone instance.
Benefits
- The framework streamlines the process of deploying Splunk apps and add-ons by fetching them from a central external location, abstracting the need for direct filesystem access within containers. In newer versions (from SOK 2.0), the ability to update apps without requiring pod restarts enhances operational efficiency and reduces downtime. It is particularly beneficial for managing apps in scaled-out Splunk environments like Search Head and Indexer Clusters, handling content distribution.
- The framework integrates with Splunk Enterprise custom resources, allowing for centralized definition and management of the desired state of the Splunk deployment, including app configurations.
- By externalizing app management, the App Framework enables greater automation of deployment and configuration workflows.
Limitations
- The App Framework does not support the automatic deletion of applications. Manual intervention is required to remove apps from pods if they are removed from the external storage.
- Deploying apps to certain components like the cluster manager or standalone search heads might not trigger automatic restarts, requiring manual steps.
- The App Framework itself does not preview, analyze, or verify the versions or contents of the Splunk apps and add-ons. The administrator is responsible for ensuring app compatibility and validity.
- The framework might deploy apps immediately upon detecting changes in the external storage, which may not always be the desired behavior for controlled rollouts.
- Support for certain premium Splunk apps, such as Splunk Enterprise Security (ES), can have version dependencies and may require additional manual steps for deploying supplemental components.
- The requirement for external object storage adds a dependency on external infrastructure and necessitates managing credentials and network connectivity.
Premium Apps
Splunk Premium Apps can interact with Splunk Operator for Kubernetes.
Splunk Enterprise Security (ES)
Splunk ES is supported for deployment automation with SOK in the following architectures:
- Single Server Deployment (S1)
- Distributed Clustered Deployment - Single Site (C1 / C11)
- Distributed Clustered Deployment + SHC - Single Site (C3 / C13)
Key Considerations:
- For Standalone Splunk instances and standalone search heads SOK will install Splunk ES and all associated domain add-ons (DAs), and supporting add-ons (SAs).
- When installing ES in a Search Head Cluster, SOK will automatically deploy the app through the Deployer running the essinstall command, and push the cluster bundle from the Deployer to the Search Heads.
- When installing ES in an indexer clustering environment through SOK it is necessary to manually extract and deploy the supplemental Splunk_TA_For Indexers app from the ES package to the indexer cluster members via the cluster manager and App Framework.
Benefits
- It is possible to automate deployment of ES with SOK.
- You can scale horizontally much easier using Kubernetes and SOK for the SHC nodes.
- Multiple architectures are supported.
Limitations
- SOK manages the SHC lifecycle but does not fully abstract all components involved with ES.
- Auto-scaling is limited; Splunk pods are mostly stateful and horizontal scaling isn't always linear.
Splunk IT Service Intelligence (ITSI)
Splunk ITSI is not currently supported for deployment with SOK. This may change in future versions. ITSI search heads can be deployed on physical hardware or VMs and be configured to use resources such as indexers inside the Kubernetes environment. This is the only current workaround at the current time.
Benefits
- It is possible to deploy ITSI in conjunction with SOK environments
Limitations
- ITSI search heads have to be managed separately from SOK resources, adding operational overhead.
Resilience and Scaling
Details on how Splunk Operator for Kubernetes (SOK) ensures high availability, fault tolerance, and scalability for Splunk deployments.
SOK incorporates resilience features designed to ensure high availability, fault tolerance, and operational continuity of Splunk deployments within Kubernetes environments. These features help maintain service uptime and data integrity despite failures or disruptions in the underlying infrastructure.
Key Considerations
- SOK leverages Kubernetes native capabilities such as StatefulSets and Persistent Volumes to maintain stable identities and persistent storage for Splunk pods, which is critical for resilience.
- SOK monitors the health of Splunk components and automatically restarts or replaces failed pods to minimize downtime.
- The use of persistent storage ensures that data is retained across pod restarts and node failures, preventing data loss.
- Horizontal Pod Autoscaler (HPA) is supported and will automatically scale the number of pod replicas based on observed CPU/memory usage or custom metrics to handle varying workloads.
- Pod Disruption Budgets (PDBs) are honoured by SOK and can define minimum available pods during voluntary disruptions (e.g., node maintenance) to maintain service continuity.
- Pod Health Checks are implemented using liveness and readiness probes to monitor pod health and automatically restart or remove unhealthy pods.
Benefits
- Automatic pod recovery and Kubernetes orchestration ensure Splunk services remain available with minimal manual intervention.
- Persistent volumes safeguard indexed data and configuration, enabling recovery from node or pod failures without data loss.
- SOK’s self-healing capabilities reduce downtime and operational overhead.
- Resilience is enhanced by the ability to scale out Splunk components, balancing load and improving fault tolerance.
- HPA adjusts pod counts in response to workload changes, maintaining performance and resource efficiency.
- PDBs prevent simultaneous pod disruptions, ensuring Splunk services remain available during planned maintenance.
- Health checks enable Kubernetes to detect and recover from pod failures without manual intervention.
Limitations
- Resilience is contingent on the underlying Kubernetes cluster’s health and configuration; cluster-level failures can impact Splunk availability.
- The durability of data depends on the storage solution used; not all storage classes provide replication or high availability features. Insufficient CPU, memory, or storage resources can impair SOK’s ability to maintain resilience.
- While SOK handles common failure modes, complex or cascading failures may require manual intervention.
- HPA effectiveness depends on accurate and timely metrics; delays or inaccuracies can cause suboptimal scaling.
- HPA thresholds should be tuned to avoid oscillations and ensure smooth scaling.
- Overly strict PDBs can delay or prevent node upgrades or scaling operations, impacting cluster agility. Improperly configured probes may cause false positives, leading to unnecessary pod restarts.