Apache Spark
Use this Splunk Observability Cloud integration for the Apache Spark clusters monitor. See benefits, install, configuration, and metrics
The Splunk Distribution of the OpenTelemetry Collector uses the Smart Agent receiver with the Apache Spark monitor type to monitor Apache Spark clusters. It does not support fetching metrics from Spark Structured Streaming.
For the following cluster modes, the integration only supports HTTP endpoints:
-
Standalone
-
Mesos
-
Hadoop YARN
This collectd plugin is not compatible with Kubernetes cluster mode. You need to select distinct monitor configurations and discovery rules for primary and worker processes. For the primary configuration, set isMaster
to true
. When you run Apache Spark on Hadoop YARN, this integration can only report application metrics from the primary node.
This integration is only available on Linux.
Benefits
After you configure the integration, you can access these features:
-
View metrics. You can create your own custom dashboards, and most monitors provide built-in dashboards as well. For information about dashboards, see View dashboards in Splunk Observability Cloud.
-
View a data-driven visualization of the physical servers, virtual machines, AWS instances, and other resources in your environment that are visible to Infrastructure Monitoring. For information about navigators, see Use navigators in Splunk Infrastructure Monitoring.
-
Access the Metric Finder and search for metrics sent by the monitor. For information, see Search the Metric Finder and Metadata Catalog.
Installation
Follow these steps to deploy this integration:
-
Deploy the Splunk Distribution of OpenTelemetry Collector to your host or container platform:
-
Configure the monitor, as described in the Configuration section.
-
Restart the Splunk Distribution of OpenTelemetry Collector.
Configuration
To use this integration of a Smart Agent monitor with the Collector:
-
Include the Smart Agent receiver in your configuration file.
-
Add the monitor type to the Collector configuration, both in the receiver and pipelines sections.
-
See how to Use Smart Agent monitors with the Collector.
-
See how to set up the Smart Agent receiver.
-
For a list of common configuration options, refer to Common configuration settings for monitors.
-
Learn more about the Collector at Get started: Understand and use the Collector.
-
Example
To activate this integration, add one of the following to your Collector configuration:
receivers:
smartagent/collectd_spark_master:
type: collectd/spark
... # Additional config
receivers:
smartagent/collectd_spark_worker:
type: collectd/spark
... # Additional config
Next, add the monitor to the service.pipelines.metrics.receivers
section of your configuration file:
service:
pipelines:
metrics:
receivers: [smartagent/collectd_spark_master]
service:
pipelines:
metrics:
receivers: [smartagent/collectd_spark_worker]
collectd_spark_master
and
collectd_spark_worker
are for identification purposes only and don’t
affect functionality. You can use either name in your configuration, but
you need to select distinct monitor configurations and discovery rules
for primary and worker processes. For the primary configuration, see the
isMaster
field in the configuration settings section.Configuration settings
The following table shows the configuration options for this integration:
Option |
Required |
Type |
Description |
---|---|---|---|
|
no |
| This option specifies the path to a Python binary that executes the Python code. If you don’t set this option, the system uses a built-in runtime. You can also include arguments to the binary. |
|
yes |
| |
|
yes |
| |
|
no |
| Set this option to true when you want to monitor a primarySpark node. The default is |
|
yes |
| Set this option to the type of cluster you’re monitoring. The allowed values are |
|
no |
|
The default is |
|
no |
|
The default is |
Metrics
These are the metrics available for this integration:
https://raw.githubusercontent.com/signalfx/splunk-otel-collector/main/internal/signalfx-agent/pkg/monitors/collectd/spark/metadata.yaml
Notes
-
To learn more about the available in Splunk Observability Cloud see Metric types.
-
In host-based subscription plans, default metrics are those metrics included in host-based subscriptions in Splunk Observability Cloud, such as host, container, or bundled metrics. Custom metrics are not provided by default and might be subject to charges. See Metric categories for more information.
-
In MTS-based subscription plans, all metrics are custom.
-
To add additional metrics, see how to configure
extraMetrics
in Add additional metrics.
Troubleshooting
If you are a Splunk Observability Cloud customer and are not able to see your data in Splunk Observability Cloud, you can get help in the following ways.
Available to Splunk Observability Cloud customers
-
Submit a case in the Splunk Support Portal.
-
Contact Splunk Support.
Available to prospective customers and free trial users
-
Ask a question and get answers through community support at Splunk Answers.
-
Join the Splunk #observability user group Slack channel to communicate with customers, partners, and Splunk employees worldwide. To join, see Chat groups.