About the Splunk App for Data Science and Deep Learning
The Splunk App for Data Science and Deep Learning is a free app you can download from Splunkbase.
The Splunk App for Data Science and Deep Learning (DSDL), formerly known as the Deep Learning Toolkit (DLTK), lets you integrate advanced custom machine learning and deep learning systems with the Splunk platform. The app extends the Splunk Machine Learning Toolkit (MLTK) with prebuilt Docker containers for TensorFlow, PyTorch, and a collection of data science, NLP, and classical machine learning libraries.
When you use the predefined workflows of Jupyter Lab Notebooks, the app enables you to build, test, and operationalize customized models with the Splunk platform. You can leverage GPUs for compute-intense training tasks and deploy models on CPU- or GPU-enabled containers.
The following image shows a high-level workflow you can follow when using DSDL.
Splunk App for Data Science and Deep Learning features
The following features are included with the Splunk App for Data Science and Deep Learning:
- More than 30 examples that showcase different deep learning and machine learning algorithms for classification, regression, forecasting, clustering, graph analytics and NLP. These examples can inform how to tackle advanced data science use cases in the areas of IT operations, security, application development, IoT, and business analytics.
- Rapid model development workflows leveraging Jupyter Lab Notebooks. You can address advanced modeling use cases that are not possible to address with MLTK.
- Familiar SPL syntax from MLTK including the ML-SPL commands
fit
,apply
, andsummary
. - Acts as an extension of MLTK functionality that lets you develop your own custom analytics with high computational workloads that rely on any Python open source library, and operationalize your custom defined models on the Splunk platform including dashboards and alerts.
- Ability to connect your Splunk search head to container environments including Docker, Kubernetes, and OpenShift, each including optional GPU support.
- Access to pre-built containers including Golden Image GPU for TensorFlow, PyTorch, and DASK.
Requirements for the Splunk App for Data Science and Deep Learning
In order to successfully run the Splunk App for Data Science and Deep Learning, the following is required:
- Splunk Enterprise 8.1.x or higher, or Splunk Cloud Platform.
- Installation of the correct version of the Python for Scientific Computing (PSC) add-on from Splunkbase.
- Mac OS environment.
- Windows 64-bit environment.
- Linux 64-bit environment.
- Installation of the Splunk Machine Learning Toolkit app from Splunkbase.
- An internet connected Docker (https://www.docker.com), Kubernetes (https://kubernetes.io), or OpenShift (https://www.openshift.com) environment.
- If you use Docker in an air-gapped environment, see Configure the Splunk App for Data Science and Deep Learning.
Splunk App for Data Science and Deep Learning navigation
Key terms in the Splunk App for Data Science and Deep Learning
DSDL offers an open, plug-in architecture for any algorithm, runtime, or execution environment. See the following key terms to gain familiarity with the DSDL structure.
Key term | Description |
---|---|
Model | When you run an algorithm on a dataset you create a model. Typically models are created to detect patterns in your data. |
Algorithm | An algorithm operates like a program, mapping input data to output data. Use an algorithm to train a model or apply a pre-trained model on new data. In DSDL an algorithm always depends on a specific runtime. |
Runtime | The framework for an algorithm that typically uses a certain set of libraries. In DSDL, to execute an algorithm you must deploy it with a specific runtime into an environment. |
Environment | The infrastructure or service that executes the algorithm and serves the model. |
Golden Image | Available from the Container Image dropdown menu on the Containers page. The Golden Image for CPU and GPU contains most of the recent popular libraries including TensorFlow, PyTorch and various others. Other images prebuilt for specific libraries, such as Spark, River or Rapids are also available. |
Splunk App for Data Science and Deep Learning restrictions
See the following table for DSDL app limitations and restrictions:
App limitation or restriction | Description |
---|---|
Docker, Kubernetes, and OpenShift environments | The architecture only supports Docker, Kubernetes, and OpenShift as target container environments. |
No indexer distribution | Data is processed on the search head and sent to the container environment. Data cannot be processed in a distributed manner, such as streaming data in parallel from indexers to one or many containers. However, all advantages of search in a distributed Splunk platform deployment still exist. |
Security protocols | Data is sent from a search head to a container over HTTPS protocol. Splunk administrators must take steps to secure the setup of DSDL and container environment accordingly. |
Atomar container model | Models created using the Splunk App for Data Science and Deep Learning (DSDL) are atomar in that each model is served by one container. |
Global model sharing | Models must be shared if they need to be served from a dedicated container. Set the model permission to Global. |
Naming convention | Model names must not include white spaces for model configuration to work properly. |