Apache Spark receiver

The Apache Spark receiver monitors Apache Spark clusters and the applications running on them through the collection of performance metrics like memory utilization, CPU utilization, shuffle operations, and more. The supported pipeline type is metrics. See Process your data with pipelines for more information.

Note: Out-of-the-box dashboards and navigators aren’t supported for the Apache Spark receiver.

The receiver retrieves metrics through the Apache Spark REST API using the following endpoints: /metrics/json, /api/v1/applications/[app-id]/stages, /api/v1/applications/[app-id]/executors, and /api/v1/applications/[app-id]/jobs endpoints.

Prerequisites

This receiver supports Apache Spark versions 3.3.2 or higher.

Deploy the collector

See Deploy the Splunk Distribution of the OpenTelemetry Collector.

Configure the receiver

To activate the Apache Spark receiver, add apachespark to the receivers section of your configuration file:

YAML

receivers:
  apachespark:
    collection_interval: 60s
    endpoint: http://localhost:4040
    application_names:
    - PythonStatusAPIDemo
    - PythonLR

receivers:
  apachespark:
    collection_interval: 60s
    endpoint: http://localhost:4040
    application_names:
    - PythonStatusAPIDemo
    - PythonLR

To complete the configuration, include the receiver in the metrics pipeline of the service section of your configuration file:

YAML

service:
  pipelines:
    metrics:
      receivers: [apachespark]

service:
  pipelines:
    metrics:
      receivers: [apachespark]

Advanced configurations

Activate or deactivate specific metrics

You can activate or deactivate specific metrics by setting the enabled field in the metrics section for each metric. For example:

YAML

receivers:
  samplereceiver:
    metrics:
      metric-one:
        enabled: true
      metric-two:
        enabled: false

receivers:
  samplereceiver:
    metrics:
      metric-one:
        enabled: true
      metric-two:
        enabled: false

The following is an example of host metrics receiver configuration with activated metrics:

YAML

receivers:
  hostmetrics:
    scrapers:
      process:
        metrics:
          process.cpu.utilization:
            enabled: true

receivers:
  hostmetrics:
    scrapers:
      process:
        metrics:
          process.cpu.utilization:
            enabled: true

Note: Deactivated metrics aren’t sent to Splunk Observability Cloud.

Billing

If you’re in a MTS-based subscription, all metrics count towards metrics usage.
If you’re in a host-based plan, metrics listed as active (Active: Yes) on this document are considered default and are included free of charge.

Learn more at Infrastructure Monitoring subscription usage (Host and metric plans).

Restart the collector

The restart command varies depending on what platform you deployed the collector on and what tool you used to deploy it. Here are general examples of the restart command:

Linux

Linux with installer script:

BASH

sudo systemctl restart splunk-otel-collector

sudo systemctl restart splunk-otel-collector

Windows

Windows with installer script:

BASH

stop-service splunk-otel-collector
start-service splunk-otel-collector

stop-service splunk-otel-collector
start-service splunk-otel-collector

Kubernetes

Kubernetes with Helm:

BASH

helm upgrade your-splunk-otel-collector splunk-otel-collector-chart/splunk-otel-collector -f your-override-values.yaml

helm upgrade your-splunk-otel-collector splunk-otel-collector-chart/splunk-otel-collector -f your-override-values.yaml

where splunk-otel-collector-chart is the name you gave to the Helm chart in the helm repo add command.

Settings reference

The following settings are optional:

collection_interval. 60s by default. Sets the interval this receiver collects metrics on.
- This value must be a string readable by Golang’s time.ParseDuration. Learn more at Go’s official documentation at https://pkg.go.dev/time#ParseDuration.
- Valid time units are ns, us (or µs), ms, s, m, h.
- initial_delay. 1s by default. Determines how long this receiver waits before collecting metrics for the first time.
endpoint. http://localhost:4040 by default. Apache Spark endpoint to connect to in the form of [http][://]{host}[:{port}].
application_names. An array of Spark application names for which metrics are collected from. If no application names are specified, metrics are collected for all Spark applications running on the cluster at the specified endpoint.

The full list of settings exposed for this receiver are documented in the Apache Spark receiver config repo in GitHub.

Metrics reference

The following metrics, resource attributes, and attributes are available.

Note: The SignalFx exporter excludes some available metrics by default. Learn more about default metric filters in List of metrics excluded by default.

included

https://raw.githubusercontent.com/splunk/collector-config-tools/main/metric-metadata/apachesparkreceiver.yaml

Troubleshooting

See Troubleshoot the Splunk OpenTelemetry Collector.

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

Enterprise Security

SOAR

IT Service Intelligence

Content Packs

Splunk Observability Cloud

AppDynamics SaaS

AppDynamics On-Premises

Virtual Appliance (Self-Hosted)

Developer Documentation

Splunkbase

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

DATA MANAGEMENT

SEARCH AND ANALYTICS

ADMINISTRATION

Enterprise Security

SOAR

ENTERPRISE SECURITY

SOAR

RELATED APPS

IT Service Intelligence

Content Packs

ITSI

IT Ops

ADMINISTRATION

EXTENSIONS

Splunk Observability Cloud

MONITORING

DATA MANAGEMENT

ADMINISTRATION

AppDynamics SaaS

AppDynamics On-Premises

Virtual Appliance (Self-Hosted)

ESSENTIALS

MONITORING

ADMINISTRATION

Developer Documentation

Splunkbase

PLATFORM

OBSERVABILITY

REFERENCE

Resources

REFERENCE

Learn More

Support

Prerequisites

Deploy the collector

Configure the receiver

Advanced configurations

Restart the collector

Settings reference

Metrics reference

Troubleshooting