Advanced configuration for Kubernetes
Advanced configurations for the Splunk Distribution of OpenTelemetry Collector for Kubernetes.
See the following advanced configuration options for the Collector for Kubernetes.
For basic Helm chart configuration, see Configure the Collector for Kubernetes with Helm. For log configuration, see Collect logs and events with the Collector for Kubernetes.
Override the default configuration
You can override the default configuration to use your own. To do this, include a custom configuration using the agent.config
, clusterReceiver.config
, or gateway.config
parameter in the values.yaml file. Find examples at values.yaml , agent , cluster receiver , and gateway .
For example:
agent:
config:
processors:
# Exclude logs from pods named 'podNameX'
filter/exclude_logs_from_pod:
logs:
exclude:
match_type: regexp
resource_attributes:
- key: k8s.pod.name
value: '^(podNameX)$'
# Define the logs pipeline with the default values as well as your new processor component
service:
pipelines:
logs:
processors:
- memory_limiter
- k8sattributes
- filter/logs
- batch
- resourcedetection
- resource
- resource/logs
- filter/exclude_logs_from_pod
This custom configuration is merged into the default agent configuration.
service
, pipelines
, logs
, and processors
.Configure control plane metrics
Control plane metrics are available for the following components: coredns
, etcd
, kube-controller-manager
, kubernetes-apiserver
, kubernetes-proxy
, and kubernetes-scheduler
. You can use the Collector Helm agent to obtain control plane metrics from a specific component by setting agent.controlPlaneMetrics.{otel_component}
to true
.
The Helm chart uses the Collector on each node to use the receiver creator to represent control plane receivers at runtime. The receiver creator has a set of discovery rules that know which control plane receivers to create. The default discovery rules can vary depending on the Kubernetes distribution and version. See Receiver creator receiver for more information.
If your control plane is using non-standard specifications, then you can provide a custom configuration to allow the Collector to successfully connect to it.
Supported versions
The Collector relies on pod-level network access to collect metrics from the control plane pods. Since most cloud Kubernetes as a service distributions don’t expose the control plane pods to the end user, collecting metrics from these distributions is not supported.
The following table shows which Kubernetes distributions support control plane metrics collection:
Supported |
Unsupported |
---|---|
|
|
See the agent template for the default configurations for the control plane receivers.
Availability
The following components provide control plane metrics:
Use custom configurations for non-standard control plane components
You can override the default configuration values used to connect to the control plane. If your control plane uses nonstandard ports or custom TLS settings, you need to override the default configurations.
The following example shows how to connect to a nonstandard API server that uses port 3443
for metrics and custom TLS certs stored in the /etc/myapiserver/ directory.
agent:
config:
receivers:
receiver_creator:
receivers:
# Template for overriding the discovery rule and configuration.
# smartagent/{control_plane_receiver}:
# rule: {rule_value}
# config:
# {config_value}
smartagent/kubernetes-apiserver:
rule: type == "port" && port == 3443 && pod.labels["k8s-app"] == "kube-apiserver"
config:
clientCertPath: /etc/myapiserver/clients-ca.crt
clientKeyPath: /etc/myapiserver/clients-ca.key
skipVerify: true
useHTTPS: true
useServiceAccount: false
Activate Kubernetes control plane metrics with the Prometheus receiver
To activate control plane metrics with the OpenTelemetry Prometheus receiver instead, use the feature flag useControlPlaneMetricsHistogramData
:
featureGates:
useControlPlaneMetricsHistogramData: true
To learn more see Prometheus receiver.
Known issues
There is a known limitation for the Kubernetes proxy control plane receiver. When using a Kubernetes cluster created using kops, a network connectivity issue prevents proxy metrics from being collected. The limitation can be addressed by updating the kubeProxy metric bind address in the kops cluster specification:
-
Set
kubeProxy.metricsBindAddress: 0.0.0.0
in the kops cluster specification. -
Run
kops update cluster {cluster_name}
andkops rolling-update cluster {cluster_name}
to deploy the change.
Run the container in non-root user mode
Collecting logs often requires reading log files that are owned by the root user. By default, the container runs with securityContext.runAsUser = 0
, which gives the root
user permission to read those files.
To run the container in non-root
user mode, use agent.securityContext
to adjust log data permissions to match the securityContext
configurations. For instance:
agent:
securityContext:
runAsUser: 20000
runAsGroup: 20000
Configure custom TLS certificates
If your organization requires custom TLS certificates for secure communication with the Collector, follow these steps:
1. Create a Kubernetes secret containing the Root CA certificate, TLS certificate, and private key files
Store your custom CA certificate, key, and cert files in a Kubernetes secret in the same namespace as the your Splunk Helm chart.
For example, you can run this command:
kubectl create secret generic my-custom-tls --from-file=ca.crt=/path/to/custom_ca.crt --from-file=apiserver.key=/path/to/custom_key.key --from-file=apiserver.crt=/path/to/custom_cert.crt -n <namespace>
2. Mount the secret in the Splunk Helm Chart
Apply this configuration to the agent
, clusterReceiver
, or gateway
using the following Helm values:
-
agent.extraVolumes
,agent.extraVolumeMounts
-
clusterReceiver.extraVolumes
,clusterReceiver.extraVolumeMounts
-
gateway.extraVolumes
,gateway.extraVolumeMounts
Learn more about Helm components at Helm chart architecture and components.
For example:
agent:
extraVolumes:
- name: custom-tls
secret:
secretName: my-custom-tls
extraVolumeMounts:
- name: custom-tls
mountPath: /etc/ssl/certs/
readOnly: true
clusterReceiver:
extraVolumes:
- name: custom-tls
secret:
secretName: my-custom-tls
extraVolumeMounts:
- name: custom-tls
mountPath: /etc/ssl/certs/
readOnly: true
gateway:
extraVolumes:
- name: custom-tls
secret:
secretName: my-custom-tls
extraVolumeMounts:
- name: custom-tls
mountPath: /etc/ssl/certs/
readOnly: true
3. Override your TLS configuration
Update the TLS configuration for specific Collector components, such as the agent’s kubeletstatsreceiver
, to use the mounted certificate, key, and CA files.
For example:
agent:
config:
receivers:
kubeletstats:
auth_type: "tls"
ca_file: "/etc/ssl/certs/custom_ca.crt"
key_file: "/etc/ssl/certs/custom_key.key"
cert_file: "/etc/ssl/certs/custom_cert.crt"
insecure_skip_verify: true
Collect network telemetry using eBPF
You can collect network metrics and analyze them in Network Explorer using the OpenTelemetry eBPF Helm chart. See Introduction to Network Explorer for more information. To install and configure the eBPF Helm chart, see Install the eBPF Helm chart.
networkExplorer
setting of the Splunk OpenTelemetry Collector Helm chart is deprecated. If you wish to continue using Network Explorer to see data in Splunk Observability Cloud, point the upstream eBPF Helm chart to the OpenTelemetry Collector running as a gateway as explained in Migrate from networkExplorer to eBPF Helm chart.While Splunk Observability Cloud fully supports the Network Explorer navigator, the upstream OpenTelemetry eBPF Helm chart is not covered under official Splunk support. Any feature updates, security, or bug fixes to it are not bound by any SLAs.Prerequisites
The OpenTelemetry eBPF Helm chart requires:
-
Kubernetes 1.24 or higher
-
Helm 3.9 or higher
Network metrics collection is only supported in the following Kubernetes-based environments on Linux hosts:
-
Red Hat Linux 7.6 or higher
-
Ubuntu 16.04 or higher
-
Debian Stretch or higher
-
Amazon Linux 2
-
Google COS
Modify the reducer footprint
The reducer is a single pod per Kubernetes cluster. If your cluster contains a large number of pods, nodes, and services, you can increase the resources allocated to it.
The reducer processes telemetry in multiple stages, with each stage partitioned into 1 or more shards, where each shard is a separate thread. Increasing the number of shards in each stage expands the capacity of the reducer. There are 3 stages: ingest, matching, and aggregation. You can set between 1 to 32 shards for each stage. There is one shard per reducer stage by default.
The following example sets the reducer to use 4 shards per stage:
reducer:
ingestShards: 4
matchingShards: 4
aggregationShards: 4
Customize network telemetry generated by eBPF
You can deactivate metrics through the Helm chart configuration, either individually or by entire categories. See the values.yaml for a complete list of categories and metrics.
To deactivate an entire category, give the category name, followed by .all
:
reducer:
disableMetrics:
- tcp.all
Deactivate individual metrics by their names:
reducer:
disableMetrics:
- tcp.bytes
You can mix categories and names. For example, to turn off all HTTP metrics and the udp.bytes
metric, use:
reducer:
disableMetrics:
- http.all
- udp.bytes
Reactivate metrics
To activate metrics you previously deactivated, use enableMetrics
.
The disableMetrics
flag is evaluated before enableMetrics
, so you can deactivate an entire category, then reactivate individual metrics in that category that you are interested in.
For example, to deactivate all internal and http metrics but keep ebpf_net.collector_health
, use:
reducer:
disableMetrics:
- http.all
- ebpf_net.all
enableMetrics:
- ebpf_net.collector_health
Configure features using gates
Use the agent.featureGates
, clusterReceiver.featureGates
, and gateway.featureGates
configs to activate or deactivate features of the otel-collector
agent, clusterReceiver
, and gateway, respectively. These configs are used to populate the otelcol binary startup argument -feature-gates
.
For example, to activate feature1
in the agent, activate feature2
in the clusterReceiver
, and deactivate feature2
in the gateway, run:
helm install {name} --set agent.featureGates=+feature1 --set clusterReceiver.featureGates=feature2 --set gateway.featureGates=-feature2 {other_flags}
Set the pod security policy manually
Support of Pod Security Policies (PSP) was removed in Kubernetes 1.25. If you still rely on PSPs in an older cluster, you can add PSP manually:
-
Run the following command to install the PSP. Don’t forget to add the
--namespace
kubectl argument if needed:cat <<EOF | kubectl apply -f - apiVersion: policy/v1beta1 kind: PodSecurityPolicy metadata: name: splunk-otel-collector-psp labels: app: splunk-otel-collector-psp annotations: seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'runtime/default' apparmor.security.beta.kubernetes.io/allowedProfileNames: 'runtime/default' seccomp.security.alpha.kubernetes.io/defaultProfileName: 'runtime/default' apparmor.security.beta.kubernetes.io/defaultProfileName: 'runtime/default' spec: privileged: false allowPrivilegeEscalation: false hostNetwork: true hostIPC: false hostPID: false volumes: - 'configMap' - 'emptyDir' - 'hostPath' - 'secret' runAsUser: rule: 'RunAsAny' seLinux: rule: 'RunAsAny' supplementalGroups: rule: 'RunAsAny' fsGroup: rule: 'RunAsAny' EOF
-
Add the following custom ClusterRole rule in your values.yaml file along with all other required fields like
clusterName
,splunkObservability
orsplunkPlatform
:rbac: customRules: - apiGroups: [extensions] resources: [podsecuritypolicies] verbs: [use] resourceNames: [splunk-otel-collector-psp]
-
Install the Helm chart:
helm install my-splunk-otel-collector -f my_values.yaml splunk-otel-collector-chart/splunk-otel-collector
Configure data persistence queues
Without any configuration, data is queued in memory only. When data can’t be sent, it’s retried a few times for up to 5 minutes by default, and then dropped. If, for any reason, the Collector is restarted in this period, the queued data is discarded.
If you want the queue to be persisted on disk if the Collector restarts, set splunkPlatform.sendingQueue.persistentQueue.enabled=true
to enable support for logs, metrics and traces.
By default, data is persisted in the /var/addon/splunk/exporter_queue
directory. To override this path, use the splunkPlatform.sendingQueue.persistentQueue.storagePath
option.
Check the Data Persistence in the OpenTelemetry Collector for a detailed explantion.
Config examples
Use following in values.yaml to deactivate data persistense for logs, metrics, or traces:
Logs
agent:
config:
exporters:
splunk_hec/platform_logs:
sending_queue:
storage: null
Metrics
agent:
config:
exporters:
splunk_hec/platform_metrics:
sending_queue:
storage: null
Traces
agent:
config:
exporters:
splunk_hec/platform_traces:
sending_queue:
storage: null
Support for persistent queue
The following support is offered:
Support forGKE/AutopilotandEKS/Fargate
GKE/Autopilot
and EKS/Fargate
GKE/Autopilot
and EKS/Fargate
Persistent buffering is not supported for GKE/Autopilot
and EKS/Fargate
, since the directory needs to be mounted via hostPath
.
Also, GKE/Autopilot
and EKS/Fargate
don’t allow volume mounts, as Splunk Observability Cloud doesn’t manage the underlying infrastructure.
Refer to aws/fargate and gke/autopilot for more information.
Gateway support
The filestorage extention acquires an exclusive lock for the queue directory.
It’s not possible to run persistent buffering if there are multiple replicas of a pod. Even if support could be provided, only one of the pods will be able to acquire the lock and run, while the others will be blocked and unable to operate.
Cluster Receiver support
The Cluster receiver is a 1-replica deployment of the OpenTelemetry Collector. Because the Kubernetes control plane can select any available node to run the cluster receiver pod (unless clusterReceiver.nodeSelector
is explicitly set to pin the pod to a specific node), hostPath
or local
volume mounts don’t work for such environments.
Data persistence is currently not applicable to the Kubernetes cluster metrics and Kubernetes events.