Troubleshoot Splunk POD
Diagnose and resolve issues with Splunk Kubernetes clusters deployed using the Kubernetes Installer.
Use this topic to diagnose and resolve issues with Splunk Kubernetes clusters deployed using the Kubernetes Installer. It covers diagnostic commands, log retrieval procedures, and solutions for common deployment scenarios.
Diagnostic tools
Troubleshoot the cluster using Kubernetes installer commands or open a kubectl debug session from the bastion node.
Installer status commands
Use these commands to get a high-level overview of the cluster state directly from the installer:
- List all pods:
CODE
./kubernetes-installer-standalone -static.cluster <cluster-config.yaml> -status - List worker nodes:
CODE
./kubernetes-installer-standalone -static.cluster <cluster-config.yaml> -status.workers
Access the Kubectl debug shell
Standard Kubernetes commands are available through a debug shell on the bastion node.
- Start a debug session:
CODE
./kubernetes-installer-standalone -static.cluster <cluster-config.yaml> -kubectl - Common diagnostic commands:
Action Command List all nodes kubectl get nodesView node details kubectl describe node <node-name>List Splunk pods with node mapping kubectl get pods -n splunk -o wideList all pods across all namespaces kubectl get pods -AView Splunk pod details and events kubectl describe pod -n splunk <pod-name>View pod logs kubectl logs –n <namespace> <pod-name>
Log management and retrieval
Logs are categorized by the component they monitor: the installer, the Kubernetes operator, or the Splunk application itself.
Locate and retrieve logs for the various components of the Splunk POD environment.
Installer audit logs
These logs track actions performed by the installer on the bastion node.
Location: ~/.splunk/splunk-kubernetes-installer.log (or kubernetes-installer.log)
Splunk Ansible logs
Splunk pods use Ansible for initial configuration. If a pod fails to start, check these logs for Ansible output or splunkd startup errors.
First, open a kubectl session:
./kubernetes-installer-standalone -static.cluster <cluster-config.yaml> -kubectl
Then, run the following kubectl command to view the logs:
kubectl logs -n splunk -f <pod-name>
Splunk Operator for Kubernetes (SOK) logs
Use these logs to troubleshoot the controller manager's operations.
Identify the controller pod:
kubectl get pods -n splunk-operator
View the SOK logs:
kubectl logs -n splunk-operator -f <controller-pod-name>
Splunk Enterprise logs
To retrieve logs generated by the splunkd process, run the following command:
./kubernetes-installer-standalone -static.cluster <cluster-config.yaml> -get.logs
Select the desired pod from the prompted list. Logs are downloaded to: <installer-dir>/logs/<pod-name>/
Splunk Diags
Splunk diags contain all Splunk-produced logs and configuration files in a single package. Run the following command:
./kubernetes-installer-standalone -static.cluster <cluster-config.yaml> -get.diag
Select the pod when prompted. Diags are downloaded to: <installer-dir>/diags/<pod-name>/
SSH and connectivity issues
Resolve "Permission denied" or "Host key changed" errors during installation.
"Permission denied" errors during SCP, "Host key changed" errors, or failed connections to controller/worker nodes after an installation attempt.
- Verify that the SSH key and user specified in your static configuration are correct.
- Remove or rename the ~/.ssh/known_hosts file on the bastion node and re-attempt the installation.
Installer freezes
Resolve issues where the installer hangs at the "starting registry service" stage.
The installer hangs at the "starting registry service" stage.
SELinux interference.
Ensure SELinux is disabled or set to permissive mode on all controller and worker nodes, as required by the deployment prerequisites.
Pod scheduling issues
Resolve issues where pods remain in a "Pending" state or are scheduled on incorrect nodes.
Pods remain in a "Pending" state or are scheduled on incorrect nodes.
- Check resources: Run
kubectl describe nodes | grep -A5 "Allocated resources"to verify CPU and memory availability. - Check events: Run
kubectl describe pod -n splunk <pod-name>and look at the “Events” section for scheduling failures. - Verify configuration: Ensure that disk space and disk configuration are consistent with the Splunk POD CVD. Confirm that the resource constraints in your cluster-config.yaml match the physical capabilities of your worker nodes.
Licensing issues
Resolve issues where only the License Manager pod is ready while other pods remain in a non-running state.
The License Manager pod is "Ready," but all other Splunk pods show 0/1 Running.
Expired Splunk Enterprise license or an incorrect license path in the configuration.