Model governance and security in the Splunk App for Data Science and Deep Learning
Train and serve advanced ML models in containerized environments with tThe Splunk App for Data Science and Deep Learning (DSDL). Enterprise-grade machine learning might require model governance, secure container management, and strict access controls to ensure that data, models, and container images meet compliance and operational standards.
Ensure you fulfill these model governance and security requirements for your advanced ML models.
Overview
DSDL supports the following model governance features:
- Model training or versioning
- Automatic sync for notebooks and models
- Container image security, including private registries, image scanning, restricted GPU usage, and custom TLS certificates
- Roles, capabilities, and container access
- Auditing and traceability
- Transport Layer Security (TLS) and data encryption
The following permissions are available with your models:
Permissions | Description |
---|---|
App context | By default, model names such as app:MyModel are recognized by DSDL.
|
Sharing | Splunk knowledge object sharing can be set to User, App, or Global. |
User | Visible only to the model creator. |
App | Shared by users of the same Splunk app. |
Global | Visible across the Splunk platform and suitable for widely used HPC or production models. |
Model retraining or versioning
First, run the following command to create and train a new model:
| fit MLTKContainer algo=my_notebook ... into app:MyModel
DSDL spins up a container, runs the training, and saves model artifacts under app:MyModel.
To retrain the model, run the following command with new data or parameters. This overwrites old artifacts:
| fit MLTKContainer algo=my_notebook ... into app:MyModel
To version the model , for example MyModel_v2
, specify a new name in the into app:
clause.
Automatic sync for notebooks and models
DSDL automatically stores your notebooks and model files in the Splunk platform instance. Because containers are ephemeral by default, automatic sync prevents data loss if ephemeral or NFS volumes go offline and lets new containers retrieve the same notebooks and models.
The SyncHandler and related scripts remove orphaned containers, reconcile stanzas with actual containers, and ensure ephemeral data is synced. This preserves your environment from data loss, letting you focus on the machine learning workflow, rather than container lifecycle details.
Container image security
Review the following options to secure your container images.
Private registry and air-gapped images
You can use a private Docker registry or an air-gapped approach. Push images from golden-cpu, golden-gpu, or custom to your internal registry. In DSDL go to Setup and then Container Settings, and specify that private registry URL so DSDL pulls from it.
bulk_build.sh
. Keep a separate Git or artifact repository with Dockerfiles and pinned requirements.Image scanning and hardening
Follow these best practices for image scanning and hardening:
- Use scripts from
[splunk-mltk-container-docker](#)
or tools such as Trivy to detect known common vulnerabilities and exposures (CVE). - Remove unneeded packages for minimal images.
- Patch OS-level vulnerabilities regularly such as Debian, Red Hat UBI, and so on.
GPU resource restrictions
In Kubernetes or OpenShift, define resource requests so only authorized machine learning tasks can claim GPUs.
In single-host Docker containers, pass --gpus
or runtime=nvidia
to control GPU usage.
Embedding custom certificates for production HTTPS
In production environments, you must have trusted HTTPS on container endpoints. DSDL images can include your own TLS certificates instead of the default, self-signed certificates. The splunk-mltk-container-docker repo includes a certificates folder showing how to embed custom certificates.
Follow these steps:
- Clone the repo:
git clone https://github.com/splunk/splunk-mltk-container-docker
- Place your certificates in the certificates directory, named
dltk.key
for the private key, anddltk.pem
for the certificate. - (Optional) Generate self-signed certificates for testing:
openssl req -x509 -nodes -days 3650 -newkey rsa:2048 \ -keyout dltk.key -out dltk.pem \ -subj "/CN=bobobobobbo"
- Build your container image using scripts:
./build.sh golden-cpu-custom splunk/ 5.2.0
dltk.key
anddltk.pem
into/dltk/.jupyter/
. This sets up the container to serve HTTPS endpoints with your certificate.
dltk.key
and dltk.pem
or adapt the Dockerfile references so the container recognizes them. Only these exact filenames are used at runtime.Roles, capabilities, and container access
Review the following for information on roles and permissions in DSDL.
DSDL roles and capabilities
DSDL offers the following container-related capabilities:
Capability | Description |
---|---|
configure_mltk_container
|
Manages container settings such as Observability tokens, and certificate configurations. |
list_mltk_container
|
Lists containers on the container dashboard. |
control_mltk_container
|
Starts or stops containers from the DSDL app. |
configure_mltk_container
capabilities for Splunk admins, control_mltk_container
for data-science roles, and list_mltk_container
for general usage.Model permissions
The following permissions are available with your models:
Permissions | Description |
---|---|
App context | By default, model names such as app:MyModel are recognized by DSDL. |
Sharing | Shares Splunk knowledge objects at the user, app, or global level. |
User | Shares the model only to the model creator. |
App | Shares the model to users of the same Splunk app. |
Global | Shares the model across the Splunk platform. Suitable for widely used HPC or production models. |
By default, only the model creator sees the model. For HPC or large production usage, set model sharing to Global.
Securing HEC, Observability, and container endpoints
Use Splunk HEC tokens carefully if you log partial training data. If Observability is enabled, guard your Observability Access Token. If you want production-level TLS in the container, use embedding custom certificates.
Auditing and traceability
Review the following options for model auditing and traceability.
Option | Description |
---|---|
Track model creation in _internal logs
|
Use _internal logs to help track who trained which model and when. When you run fit ... into app:MyModel , logs appear in _internal , referencing information including container staging.
For example:
|
Audit with model summary and metadata | Running summary MyModel returns model information such as hyperparameters and creation time.
You can build a model catalog or store these events in a dedicated Splunk index for extended auditing. |
Collaborate on and roll back changes with notebook versioning in Git | DSDL automatically syncs notebooks to the Splunk platform, but you can also store .ipynb files in Git for collaboration and rollback. |
TLS and data encryption
Review the following table for information on TLS and data encryption in model governance and security:
Option | Description |
---|---|
TLS from the Splunk platform to container | Developer containers can use self-signed certificates. Production containers must have properly signed certificates for TLS.
For a Docker single-host container, the container endpoints handle TLS. For Kubernetes, often an Ingress object handles TLS termination. |
GPU data in transit | Data from the Splunk platform is subject to TLS encryption, even if the container uses GPUs.
The ephemeral GPU usage does not affect encryption but matters for ephemeral volumes, mitigated by the automated sync to the Splunk platform. |
Governance and security guidelines
Review the following guidelines for model governance and security:
- Restrict advanced container management capabilities to admin or power users.Use minimal images, adding only the libraries you need.
- Use minimal images, adding only the libraries you need.
- Use DSDL's automatic sync to avoid ephemeral data loss, and store .ipynb files in Git for version control.
- Scan container images with Trivy or the built-in scripts from splunk-mltk-container-docker.
- Use custom certificates for production HTTPS in containers.
- If Observability is toggled on in DSDL, container endpoints are auto-instrumented with OTel. Confirm your endpoint, token, and service name.
Troubleshooting model governance and security
See the following table for issues you might experience and how to resolve them:
Problem | Cause | Solution |
---|---|---|
You see the error model not found: MyModel
|
The model is private or in a different app context. | Adjust sharing permissions or confirm container logs. Search the _internal logs for mltk-container for references to your model.
|
HPC node can't pull image | There is a private registry or TLS error. | Check your Docker or Kubernetes credentials, or check your images.conf file references to the registry.
|
Observability instrumentation not active on endpoints | Observability is toggled off or has an invalid token in DSDL. | n DSDL, go to Setup, then Observability Settings. You might need to restart the container with new configurations. |
Notebooks vanish after container restarts | Ephemeral volume is wiped or NFS is gone. | Restore the notebooks with the automatic notebook to model sync in the Splunk platform. Check the _internal logs and mltk-container for any sync errors.
|
You see Invalid certificate on container endpoint
|
The container uses self-signed or misnamed cerificatest, or the container lacks your official CA. | Place your real certificate in certificates/dltk.pem and certificates/dltk.key and the rebuild container. Review Docker logs for TLS load errors.
|