Advanced container customization

The Splunk App for Data Science and Deep Learning (DSDL) relies on container images for CPU or GPU-based machine learning workloads. The app ships with certain default images, including Golden CPU and Golden GPU.

Your organization might need to modify, secure, or extend these images for specialized libraries, security requirements, or offline, otherwise known as air-gapped environments. The splunk-mltk-container-docker GitHub repository provides the Dockerfiles, requirements files, and scripts required to build, customize, and manage these container images.

You can use the repository to build and tailor container images for high-performance computing (HPC) -level model training, specialized Python dependencies, or compliance with internal security standards. By understanding the scripts, tag_mapping.csv, and Dockerfiles, you can complete the following tasks:

Add new libraries or pinned dependencies.
Switch to GPU runtimes or different base OS images.
Scan containers for vulnerabilities.
Automate builds with a continuous integration (CI) pipeline.
Integrate seamlessly with the images.conf file for a frictionless DSDL experience.

If you require additional customization such as Red Hat environment compliance, or packaging proprietary libraries, the flexible scripting approach in the build.sh, bulk_build.sh, and compile_image_python_requirements.sh files ensures you can keep your images in sync with your enterprise policies. Combine this customization with other DSDL advanced workflows to fully operationalize machine learning at scale within the Splunk platform.

What's in the splunk-mltk-container-docker repository

The splunk-mltk-container-docker GitHub repository hosts the Dockerfiles, scripts, and requirements needed to create the container images used by DSDL.

Review the following table for descriptions of the repository components:


Component	Description
`/dockerfiles/`	Contains multiple Dockerfiles and includes files in different base operating systems such as Debian and UBI, and library sets such as golden-cpu, minimal-cpu, spark, and rapids. For example, Dockerfile.debian.golden-cpu installs an array of data science libraries on Debian. Dockerfile.redhat.ubi.golden-cpu does the same for Red Hat UBI9.
`/requirements_files/`	Houses Python requirements files that define which libraries get installed in each image variant. Each image is typically split into `base_requirements` and `specific_requirements` to handle minimal and golden additions.
`/scripts/`	Shell scripts for building, bulk-building, scanning, and testing images. For example: The build.sh shell script builds a single image based on a chosen tag from tag_mapping.csv. The bulk_build.sh builds multiple images from a list of tags. The scan_container.sh uses Trivy to scan for vulnerabilities. The test_container.sh runs Playwright-based UI tests on a container.
tag_mapping.csv	A critical .CSV file enumerating build configurations. For example, golden-cpu, golden-gpu, and minimal-cpu. Each row maps a tag to the base image, Dockerfile, requirements files, and runtime flags.
`/images_conf_files/`	Stores the generated images.conf stanzas that you can merge into the Splunk platform `/mltk-container/local/` directory. Each .conf file corresponds to a built image, letting DSDL know which images are available.

Configuring the main scripts in the splunk-mltk-container-docker repository

Review the following descriptions of the main scripts and how they fit into a typical build pipeline:

Note: All scripts are in the /scripts/ directory of the repository.

build.sh


Script information	Description
Overview	Build a single container image using configuration from tag_mapping.csv.
Syntax	`./build.sh <build_configuration_tag> <repo_name> <version>`
Parameters	`build_configuration_tag`: References a row in tag_mapping.csv. `repo_name`: Optional prefix for the Docker repo. `version`: A version tag for the final image.
Example	`./build.sh golden-cpu splunk/ 5.1.1`
Workflow	Reads the row for golden-cpu in tag_mapping.csv to find the base image, Dockerfile, and requirements files. Optionally compiles Python requirements if not pre-compiled. Executes docker build with the chosen Dockerfile and context. Generates a .conf file in /images_conf_files/ describing the new image.

bulk_build.sh


Script information	Description
Overview	Builds all containers or a subset from a CSV listing.
Key step	Iterates over each line in tag_mapping.csv to call build.sh for each configured tag.
Syntax	`./bulk_build.sh <tag_mapping.csv> <repo_name> <version>`
Example	`./bulk_build.sh tag_mapping.csv splunk/ 5.1.1`

compile_image_python_requirements.sh


Script information	Description
Overview	Pre-compiles or locks Python dependencies for a given image variant to reduce build-time or avoid dynamic dependency resolution.
Syntax	`./compile_image_python_requirements.sh <tag_name>`
Workflow	Builds a minimal environment Dockerfile to resolve Python packages. Outputs a pinned/locked requirements file in requirements_files/compiled_*.txt. Speeds up future builds by installing pinned versions of each library.

test_container.sh


Script information	Description
Overview	Runs a set of integration tests against a built container using Playwright or other testing frameworks. Used to simulate Splunk platform and Jupyter interactions or to validate container endpoints are running as expected.
Prerequisite	A local Python virtual environment or system Python with the correct dependencies to run Playwright.
Syntax	`./test_container.sh <tag_name> <repo_name> <version>`

scan_container.sh


Script information	Description
Overview	Uses Trivy to scan the built container for vulnerabilities.
Syntax	`./scan_container.sh <tag_name> <repo_name> <version>`
Benefits	Identifies potential Common Vulnerabilities or Exposures (CVEs) or insecure packages in the final image. Ensures compliance with security standards for production-grade images.

Configuring tag mapping

The tag_mapping.csv file orchestrates the following build logic:


Column name	Details
Tag	Short name for the image variant. For example golden-cpu, minimal-cpu, or ubi-golden-cpu. Used as `<build_configuration_tag>` in build.sh.
base_image	Base operating system (OS) image. For example `debian:bullseye` or `registry.access.redhat.com/ubi9/ubi:latest`. Note: Must be accessible to `docker pull`.
dockerfile	The Dockerfile to use. For example Dockerfile.debian.golden-cpu. Is located in `/dockerfiles/`.
base_requirements	The base Python requirements file. For example base_functional.txt. Is found in `/requirements_files/`.
specific_requirements	Additional specialized libraries. For example specific_golden_cpu.txt. Is usually large ML libs like TensorFlow or PyTorch.
runtime	Is `none` or `nvidia` for GPU usage. If `nvidia` the script sets up GPU libraries.
requirements_dockerfile	Optional Docker file used for pre-compiling Python dependencies. For example `Dockerfile.debian.requirements`.

The following is an example of the content found within 1 row of the tag_mapping.csv file:

CODE

Tag,base_image,dockerfile,base_requirements,specific_requirements,runtime,requirements_dockerfile
golden-cpu,deb:bullseye,Dockerfile.debian.golden-cpu,base_functional.txt,specific_golden_cpu.txt,none,Dockerfile.debian.requirements

Tag,base_image,dockerfile,base_requirements,specific_requirements,runtime,requirements_dockerfile
golden-cpu,deb:bullseye,Dockerfile.debian.golden-cpu,base_functional.txt,specific_golden_cpu.txt,none,Dockerfile.debian.requirements

Customize container images

Review the options for customizing container images.

Add extra libraries

If you need a library such as pyarrow or transformers that is not included in specific_golden_cpu.txt you can complete these steps:

Fork or clone the repository.
Edit or create a new requirements_files/ text file with your library pinned. For example:
CODE
# my_custom_libraries.txt pyarrow==10.0.1 transformers==4.25.1
```
# my_custom_libraries.txt
pyarrow==10.0.1
transformers==4.25.1
```
Create or edit a row in tag_mapping.csv referencing your new file. For example:
CODE
Tag,base_image,dockerfile,base_requirements,specific_requirements,runtime,requirements_dockerfile golden-cpu-custom,deb:bullseye,Dockerfile.debian.golden-cpu,base_functional.txt,my_custom_libraries.txt,none,Dockerfile.debian.requirements
```
Tag,base_image,dockerfile,base_requirements,specific_requirements,runtime,requirements_dockerfile
golden-cpu-custom,deb:bullseye,Dockerfile.debian.golden-cpu,base_functional.txt,my_custom_libraries.txt,none,Dockerfile.debian.requirements
```
Build the custom variant. For example:
CODE
./build.sh golden-cpu-custom splunk/ 5.2.0
```
./build.sh golden-cpu-custom splunk/ 5.2.0
```
After building, your new image is tagged as splunk/golden-cpu-custom:5.2.0, plus a .conf file is created in images_conf_files/ that you can merge into the Splunk platform images.conf.

Use a Red Hat Universal Base Image (UBI) minimal approach

If you need a Red Hat UBI9 base for enterprise compliance, complete these steps:

Select a row or create a row referencing Dockerfile.redhat.ubi.golden-cpu.
Edit the ubi-minimal or ubi-golden-cpu Dockerfile for your internal repos and set tag_mapping.csv accordingly.

Accommodate air-gapped deployments

If your deployment is air-gapped or offline, complete these steps:

Bulk build images on an internet-connected machine using bulk_build.sh.
Scan the images with scan_container.sh.
Push to a local registry or save as a .TAR file using Docker Save.
Transfer to the offline environment and use Docker Load.
Update the Splunk platform images.conf to point to your internal registry references.

Guidelines for using customized containers

Consider the following guidelines when using customized containers:


Component	Guideline
Python dependency conflicts	Some advanced ML libraries conflict with older ones. Always run scan_container.sh and consider using compile_image_python_requirements.sh to lock consistent versions.
Large image sizes	Golden CPU or GPU images can be multiple gigabytes (GB) in size. Consider minimal images if you need only a subset of libraries.
Requirements Dockerfile	If you update requirements_files/, remove or regenerate compiled files in /requirements_files/compiled_* or they won't reflect new pins.
No official support	Some scripts and Dockerfiles are unofficial or community features. The Splunk platform fully supports only the official DSDL containers for standard usage. CAUTION: Use your custom builds at your own risk.
Security hardening	For production, consider scanning your images frequently and applying OS-level hardening. The scan_container.sh script is useful, but you can also consider removing unneeded packages or reduce root privileges in Dockerfiles.
Version management	Maintain a separate branch or fork of the `splunk-mltk-container-docker` repository. Tag each commit with the container version you produce so you can replicate or revert builds if needed.

Example: Container customization

The following is an example workflow for a custom golden-cpu image with pinned requirements:

Clone the repo:
CODE
git clone https://github.com/splunk/splunk-mltk-container-docker cd splunk-mltk-container-docker
```
git clone https://github.com/splunk/splunk-mltk-container-docker
cd splunk-mltk-container-docker
```
Create or edit your row in tag_mapping.csv:
CODE
Tag,base_image,dockerfile,base_requirements,specific_requirements,runtime,requirements_dockerfile golden-cpu-custom,deb:bullseye,Dockerfile.debian.golden-cpu,base_functional.txt,my_custom_libraries.txt,none,Dockerfile.debian.requirements
```
Tag,base_image,dockerfile,base_requirements,specific_requirements,runtime,requirements_dockerfile
golden-cpu-custom,deb:bullseye,Dockerfile.debian.golden-cpu,base_functional.txt,my_custom_libraries.txt,none,Dockerfile.debian.requirements
```
(Optional) Pre-compile Python requirements:
CODE
./compile_image_python_requirements.sh golden-cpu-custom
```
./compile_image_python_requirements.sh golden-cpu-custom
```
Build the new image:
CODE
./build.sh golden-cpu-custom splunk/ 5.2.0
```
./build.sh golden-cpu-custom splunk/ 5.2.0
```
Scan the built image:
CODE
./scan_container.sh golden-cpu-custom splunk/ 5.2.0
```
./scan_container.sh golden-cpu-custom splunk/ 5.2.0
```
Push the image to your Docker registry:
CODE
docker push splunk/golden-cpu-custom:5.2.0
```
docker push splunk/golden-cpu-custom:5.2.0
```
Copy the generated .conf snippet in ./images_conf_files/ into your Splunk search head mltk-container/local/images.conf file.
Restart the Splunk platform or reload DSDL to see the new container listed.
Use the new container in DSDL commands:
CODE
index=my_data | fit MLTKContainer algo=barebone_template mode=stage into app:MyNewModel container_image="splunk/golden-cpu-custom:5.2.0"
```
index=my_data
| fit MLTKContainer algo=barebone_template mode=stage into app:MyNewModel container_image="splunk/golden-cpu-custom:5.2.0"
```

Splunk Cloud Platform

Advanced container customization

What's in the splunk-mltk-container-docker repository

Configuring the main scripts in the splunk-mltk-container-docker repository

build.sh

bulk_build.sh

compile_image_python_requirements.sh

test_container.sh

scan_container.sh

Configuring tag mapping

Customize container images

Add extra libraries

Use a Red Hat Universal Base Image (UBI) minimal approach

Accommodate air-gapped deployments

Guidelines for using customized containers

Example: Container customization

ON THIS PAGE

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

Enterprise Security

SOAR

IT Service Intelligence

Content Packs

Splunk Observability Cloud

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

Developer Documentation

Splunkbase

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

DATA MANAGEMENT

SEARCH AND ANALYTICS

ADMINISTRATION

Enterprise Security

SOAR

ENTERPRISE SECURITY

SOAR

RELATED APPS

IT Service Intelligence

Content Packs

ITSI

IT Ops

ADMINISTRATION

EXTENSIONS

Splunk Observability Cloud

MONITORING

DATA MANAGEMENT

ADMINISTRATION

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

ESSENTIALS

MONITORING

ADMINISTRATION

Developer Documentation

Splunkbase

PLATFORM

OBSERVABILITY

REFERENCE

Resources

REFERENCE

Learn More

Support

Advanced container customization

What's in the splunk-mltk-container-docker repository

Configuring the main scripts in the splunk-mltk-container-docker repository

build.sh

bulk_build.sh

compile_image_python_requirements.sh

test_container.sh

scan_container.sh

Configuring tag mapping

Customize container images

Add extra libraries

Use a Red Hat Universal Base Image (UBI) minimal approach

Accommodate air-gapped deployments

Guidelines for using customized containers

Example: Container customization