Troubleshoot Splunk POD

Diagnose and resolve issues with Splunk Kubernetes clusters deployed using the Kubernetes Installer.

Use this topic to diagnose and resolve issues with Splunk Kubernetes clusters deployed using the Kubernetes Installer. It covers diagnostic commands, log retrieval procedures, and solutions for common deployment scenarios.

Diagnostic tools

Troubleshoot the cluster using Kubernetes installer commands or open a kubectl debug session from the bastion node.

Installer status commands

Use these commands to get a high-level overview of the cluster state directly from the installer:

  • List all pods:
    CODE
    ./kubernetes-installer-standalone -static.cluster <cluster-config.yaml> -status
  • List worker nodes:
    CODE
    ./kubernetes-installer-standalone -static.cluster <cluster-config.yaml> -status.workers

Access the Kubectl debug shell

Standard Kubernetes commands are available through a debug shell on the bastion node.

  1. Start a debug session:
    CODE
    ./kubernetes-installer-standalone -static.cluster <cluster-config.yaml> -kubectl
  2. Common diagnostic commands:
    Action Command
    List all nodes kubectl get nodes
    View node details kubectl describe node <node-name>
    List Splunk pods with node mapping kubectl get pods -n splunk -o wide
    List all pods across all namespaces kubectl get pods -A
    View Splunk pod details and events kubectl describe pod -n splunk <pod-name>
    View pod logs kubectl logs –n <namespace> <pod-name>

Log management and retrieval

Logs are categorized by the component they monitor: the installer, the Kubernetes operator, or the Splunk application itself.

Locate and retrieve logs for the various components of the Splunk POD environment.

Installer audit logs

These logs track actions performed by the installer on the bastion node.

Location: ~/.splunk/splunk-kubernetes-installer.log (or kubernetes-installer.log)

Splunk Ansible logs

Splunk pods use Ansible for initial configuration. If a pod fails to start, check these logs for Ansible output or splunkd startup errors.

First, open a kubectl session:

CODE
./kubernetes-installer-standalone -static.cluster <cluster-config.yaml> -kubectl

Then, run the following kubectl command to view the logs:

CODE
kubectl logs -n splunk -f <pod-name>

Splunk Operator for Kubernetes (SOK) logs

Use these logs to troubleshoot the controller manager's operations.

Identify the controller pod:

CODE
kubectl get pods -n splunk-operator

View the SOK logs:

CODE
kubectl logs -n splunk-operator -f <controller-pod-name>

Splunk Enterprise logs

To retrieve logs generated by the splunkd process, run the following command:

CODE
./kubernetes-installer-standalone -static.cluster <cluster-config.yaml> -get.logs

Select the desired pod from the prompted list. Logs are downloaded to: <installer-dir>/logs/<pod-name>/

Splunk Diags

Splunk diags contain all Splunk-produced logs and configuration files in a single package. Run the following command:

CODE
./kubernetes-installer-standalone -static.cluster <cluster-config.yaml> -get.diag

Select the pod when prompted. Diags are downloaded to: <installer-dir>/diags/<pod-name>/

SSH and connectivity issues

Resolve "Permission denied" or "Host key changed" errors during installation.

"Permission denied" errors during SCP, "Host key changed" errors, or failed connections to controller/worker nodes after an installation attempt.

  1. Verify that the SSH key and user specified in your static configuration are correct.
  2. Remove or rename the ~/.ssh/known_hosts file on the bastion node and re-attempt the installation.

Installer freezes

Resolve issues where the installer hangs at the "starting registry service" stage.

The installer hangs at the "starting registry service" stage.

SELinux interference.

Ensure SELinux is disabled or set to permissive mode on all controller and worker nodes, as required by the deployment prerequisites.

Pod scheduling issues

Resolve issues where pods remain in a "Pending" state or are scheduled on incorrect nodes.

Pods remain in a "Pending" state or are scheduled on incorrect nodes.

  1. Check resources: Run kubectl describe nodes | grep -A5 "Allocated resources" to verify CPU and memory availability.
  2. Check events: Run kubectl describe pod -n splunk <pod-name> and look at the “Events” section for scheduling failures.
  3. Verify configuration: Ensure that disk space and disk configuration are consistent with the Splunk POD CVD. Confirm that the resource constraints in your cluster-config.yaml match the physical capabilities of your worker nodes.

Licensing issues

Resolve issues where only the License Manager pod is ready while other pods remain in a non-running state.

The License Manager pod is "Ready," but all other Splunk pods show 0/1 Running.

Expired Splunk Enterprise license or an incorrect license path in the configuration.

  1. Open the Licensing page in the License Manager UI to check license usage: https://<worker> IP:2443/en-US/manager/system/licensing
  2. Update the license file path in cluster-config.yaml.
  3. Rerun the installer.
    The Splunk Kubernetes Operator will initiate a rolling restart (this may take 30–40 minutes).