Apache Spark

Use this Splunk Observability Cloud integration for the Apache Spark clusters monitor. See benefits, install, configuration, and metrics

Note: If you’re using the Splunk Distribution of the OpenTelemetry Collector and want to collect Apache Spark cluster metrics, use the native OTel component Apache Spark receiver.

The Splunk Distribution of the OpenTelemetry Collector uses the Smart Agent receiver with the Apache Spark monitor type to monitor Apache Spark clusters. It does not support fetching metrics from Spark Structured Streaming.

For the following cluster modes, the integration only supports HTTP endpoints:

  • Standalone

  • Mesos

  • Hadoop YARN

This collectd plugin is not compatible with Kubernetes cluster mode. You need to select distinct monitor configurations and discovery rules for primary and worker processes. For the primary configuration, set isMaster to true. When you run Apache Spark on Hadoop YARN, this integration can only report application metrics from the primary node.

This integration is only available on Linux.

Benefits

After you configure the integration, you can access these features:

Installation

Follow these steps to deploy this integration:

  1. Deploy the Splunk Distribution of OpenTelemetry Collector to your host or container platform:

  2. Configure the monitor, as described in the Configuration section.

  3. Restart the Splunk Distribution of OpenTelemetry Collector.

Configuration

To use this integration of a Smart Agent monitor with the Collector:

  1. Include the Smart Agent receiver in your configuration file.

  2. Add the monitor type to the Collector configuration, both in the receiver and pipelines sections.

Example

To activate this integration, add one of the following to your Collector configuration:

receivers:
  smartagent/collectd_spark_master:
    type: collectd/spark
    ...  # Additional config
receivers:
  smartagent/collectd_spark_worker:
    type: collectd/spark
    ...  # Additional config

Next, add the monitor to the service.pipelines.metrics.receivers section of your configuration file:

service:
  pipelines:
    metrics:
      receivers: [smartagent/collectd_spark_master]
service:
  pipelines:
    metrics:
      receivers: [smartagent/collectd_spark_worker]
Note: The names collectd_spark_master and collectd_spark_worker are for identification purposes only and don’t affect functionality. You can use either name in your configuration, but you need to select distinct monitor configurations and discovery rules for primary and worker processes. For the primary configuration, see the isMaster field in the configuration settings section.

Configuration settings

The following table shows the configuration options for this integration:

Option

Required

Type

Description

pythonBinary

no

string

This option specifies the path to a Python binary that executes

the Python code. If you don’t set this option, the system uses a built-in runtime. You can also include arguments to the binary.

host

yes

string

port

yes

integer

isMaster

no

bool

Set this option to true when you want to monitor a primary

Spark node. The default is false.

clusterType

yes

string

Set this option to the type of cluster you’re monitoring. The

allowed values are Standalone, Mesos or Yarn. The system doesn’t collect cluster metrics for Yarn. Use the collectd/hadoop monitor to gain insights to your cluster’s health.

collectApplicationMetrics

no

bool

The default is false.

enhancedMetrics

no

bool

The default is false.

Metrics

These are the metrics available for this integration:

https://raw.githubusercontent.com/signalfx/splunk-otel-collector/main/internal/signalfx-agent/pkg/monitors/collectd/spark/metadata.yaml

Notes

  • To learn more about the available in Splunk Observability Cloud see Metric types.

  • In host-based subscription plans, default metrics are those metrics included in host-based subscriptions in Splunk Observability Cloud, such as host, container, or bundled metrics. Custom metrics are not provided by default and might be subject to charges. See Metric categories for more information.

  • In MTS-based subscription plans, all metrics are custom.

  • To add additional metrics, see how to configure extraMetrics in Add additional metrics.

Troubleshooting

If you are a Splunk Observability Cloud customer and are not able to see your data in Splunk Observability Cloud, you can get help in the following ways.

Available to Splunk Observability Cloud customers

Available to prospective customers and free trial users

  • Ask a question and get answers through community support at Splunk Answers.

  • Join the Splunk #observability user group Slack channel to communicate with customers, partners, and Splunk employees worldwide. To join, see Chat groups.