Extend the Splunk App for Data Science and Deep Learning with custom notebooks
You can define custom notebooks for specialized machine learning or deep learning tasks with the Splunk App for Data Science and Deep Learning (DSDL). By writing your own Jupyter notebooks, you can incorporate custom algorithms, advanced Python libraries, domain-specific logic, and pull in data from the Splunk platform within the same environment.
Create, export, and maintain notebooks so that they seamlessly integrate with the ML-SPL commands of fit
, apply
, and summary
.
Overview
When you develop a notebook in DSDL you can perform the following tasks:
- Write Python code for data preprocessing, model training, or inference.
- Expose that code to ML-SPL by defining functions such as
fit
andapply
within special notebook cells. - Automatically export the code into a Python module at runtime.
- Call those functions from Splunk platform searches.
- Pull data directly from the Splunk platform using the Splunk Search API integration, allowing for interactive data exploration in your Jupyter notebook environment.
DSDL notebook components
A DSDL notebook typically includes the following components:
Component | Description |
---|---|
Imports and setup | Imported libraries such as NumPy, Pandas, and PyTorch.
Can define global constants or utility functions. |
fit function
|
A Python function that trains or fits your model.
Accepts data as a Pandas DataFrame and hyperparameters, returning model artifacts. |
apply function
|
(Optional) Used for inference or prediction.
Accepts new data and the trained model, and returns predictions. |
summary function
|
(Optional) Provides metadata about the model such as hyperparameters or training stats. |
Other utility functions | (Optional) Runs data cleaning, advanced transforms, or direct data pulls using the Splunk Search API. |
/srv/notebooks/app/
directory or a similar directory. The corresponding .py file uses the same base name as the notebook, for example my_notebook.py
. When saved, the fit
, apply
, and summary
functions can be called from ML-SPL.The following example notebook is comprised of different components:
# ---
# jupyter:
# jupytext:
# formats: ipynb,py
# notebook_metadata_filter: all
# ---
import json
import numpy as np
import pandas as pd
import os
MODEL_DIRECTORY = "/srv/app/model/data/"
from dsdlsupport import SplunkSearch as SplunkSearch
search = SplunkSearch.SplunkSearch()
df = search.as_df()
df
def init(df,param):
model = {}
model['hyperparameter'] = 42.0
return model
model = init(df,param)
print(model)
def fit(model,df,param):
info = {"message": "model trained"}
return info
print(fit(model,df,param))
def apply(model,df,param):
y_hat = df.index
result = pd.DataFrame(y_hat, columns=['index'])
return result
print(apply(model,df,param))
def summary(model=None):
returns = {"version": {"numpy": np.__version__, "pandas": pd.__version__} }
return returns
Notebook-to-module mechanism
DSDL runs the following internal mechanism that scans the notebook for functions named fit
, apply
, and summary
:
- Trigger autosave: Each time you save the notebook in JupyterLab, a conversion step occurs.
- Export Python: Relevant Python cells such as a cell containing the
fit
function, are written into a .PY module. For example/srv/notebooks/app/<notebook_name>.py
. - Run ML-SPL lookup: The
MLTKContainer
command dynamically imports<notebook_name>
at runtime to run thefit
,apply
, andsummary
functions.
You can help ensure this internal mechanism runs well in the following ways:
- Avoid function name collisions such as 2 separate
fit
definitions in the same notebook. - If you rename your notebook file, a new .PY module is created but the older file isn't deleted. Remove older references that you no longer need.
Defining and passing parameters
Document your notebook's expected parameters so users know which SPL arguments to provide. Use sensible defaults to avoid a Python KeyError if a parameter (param) is missing. All parameter values from ML-SPL are strings. You can convert parameters to int
, float
, or bool
as needed.
In the following example, all ML-SPL arguments after algo=<my_notebook>
are passed to your notebook's Python code as the param dictionary:
| fit MLTKContainer algo=my_notebook alpha=0.01 epochs=10 ...
def fit(df, param):
alpha = float(param.get('alpha', 0.001))
epochs = int(param.get('epochs', 10))
...
param.get('key', default_value)
to handle optional arguments.Stage data with iterative development in notebooks
You can use mode=stage
for iterative development and data staging. Complete the following steps:
- If you want to push only a subset of Splunk platform data to your notebook without training, follow this syntax:
| fit MLTKContainer mode=stage algo=my_notebook features_* into app:MyDevModel
fit
command. - Open JupyterLab, and define or call a helper function as follows:
def stage(name): with open("data/"+name+".csv", 'r') as f: df = pd.read_csv(f) with open("data/"+name+".json", 'r') as f: param = json.load(f) return df, param df, param = stage("MyDevModel")
- To debug, open my_notebook.ipynb in JupyterLab to test or modify code, using the staged data.
- Manually call your
init
,fit
, orapply
functions on that data to debug as needed.
Pull data directly into a notebook using the Splunk Search API
In addition to staging data with mode=stage
, you can pull data directly using the Splunk Search API.
Complete the following steps:
- Turn on access to the Splunk platform on the DSDL Setup page. Provide your Splunk host, port 8089, and a valid token.
- Import
SplunkSearch
in your notebook, then either use an interactive search widget or define a predefined query. - Run the query to retrieve data into a Pandas DataFrame in your notebook.
For example:
from dsdlsupport import SplunkSearch search = SplunkSearch.SplunkSearch() df = search.as_df() df
Note: If you encounter connectivity issues, confirm firewall rules or check the_internal
logs formltk-container
errors referencing timeouts.
Storing and sharing notebooks
Apply the following methods for version control and collaboration with custom models:
- Store notebooks in a Git repo, allowing for merges, pull requests, and versioning.
- By default, notebooks are stored in
/srv/notebooks/
. You can sort them by projects or by teams. - Jupyter saves automatically, but you can consider manually committing .IPYNB and .PY changes to Git for auditing.
Advanced notebook patterns
You can use advanced notebook patterns with custom models:
Notebook pattern | Description |
---|---|
Multiple models per notebook | You can define multiple training algorithms in a single .IPYNB file, but only one fit method is recognized. If you want to differentiate between them, parse extra arguments in param or create separate notebooks for clarity.
|
Additional utility functions | You can define custom data preparation, feature engineering, or advanced plotting in separate Python cells. As long as they're not named fit, apply, or summary, they won't be exported to ML-SPL. |
Auto-generating additional metrics | You can log metrics or epoch-by-epoch logs to the Splunk platform. For example, you can write them to a CSV file that's forwarded, or send them to HTTP Event Collector (HEC) in real time. |
Best practices for creating notebooks
Consider the following when creating custom notebooks:
- DSDL only recognizes exact cell names. Be mindful of any typos when using
init
,fit
,apply
, andsummary
. - All parameter values from ML-SPL are strings. You can convert parameters to
int
,float
, orbool
as needed. - If your container image lacks large libraries, it results in an
ImportError
. Add large libraries through Docker. - Use unique .IPYNV filenames to help avoid conflicts or overwriting the file in
/srv/notebooks/app/
. - If you rely on Splunk search, ensure the container can reach the Splunk platform firewall, DNS, and TLS settings.
Example: Create a custom notebook
The following is an example workflow of creating a custom notebook:
- Start a dev container in DSDL, then open JupyterLab.
- Create a notebook and save it as
my_custom_algo.ipynb
in JupyterLab. - Define code: Write cells for
init
,fit
,apply
, andsummary
, optionally using the Splunk Search API. - Pull data with
df
,param = stage("MyTestModel")
or use the Splunk Search API. Test logic interactively. - Save your file. DSDL exports your code to
my_custom_algo.py
. - Train the model in the Splunk platform:
index=my_data | fit MLTKContainer algo=my_custom_algo features_* into app:MyProdModel
- Apply the model:
index=my_data | apply my_custom_algo