Share data in the Splunk Machine Learning Toolkit
What data is collected
The Splunk Machine Learning Toolkit collects the following basic usage information:
Component | Description | Example |
---|---|---|
algo_name
|
Name of algorithm used in fit or apply .
|
|
app_context
|
Name of the app from which search is run. |
|
apply_time
|
Time the apply command took.
|
|
app.session.Splunk_ML_Toolkit.changeSmartAssistantStep
|
User progress through an MLTK Smart Assistant. |
|
app.session.Splunk_ML_Toolkit.createExperiment
|
User creating an MLTK Experiment. |
|
app.session.Splunk_ML_Toolkit.createExperimentAlert
|
Users creating alerts for MLTK Experiments. |
|
app.session.Splunk_ML_Toolkit.loadAssistant
|
Number of times the user has loaded a MLTK Assistant. |
|
app.session.Splunk_ML_Toolkit.saveExperiment
|
Users saving their work in MLTK Experiments. |
|
app.session.Splunk_ML_Toolkit.scheduleExperimentTraining
|
Users scheduling model re-training for MLTK Experiments. |
|
col_dimension
|
Collects dimension of the dataset from model schema. Triggered during apply .
|
|
columns
|
The number of columns being run through fit command.
|
|
command
|
fit , apply , or score |
|
csv_parse_time
|
CSV parse time. |
|
csv_read_time
|
CSV read time. |
|
csv_render_time
|
CSV render time. |
|
deployment.app
|
Apps installed per Splunk instance. |
|
df_shape
|
Shape of data input received from splunk. Triggered during apply .
|
|
example_name
|
Name of the Showcase example being run. |
|
experiment_id
|
ID of the fit and apply run on the Experiments page. All preprocessing steps and final fit have the same ID.
|
|
fit_time
|
Amount of time it took to run the fit command.
|
|
full_punct
|
The punct of the data during fit or apply .
|
|
handle_time
|
Time for the handler to handle the data. |
|
metrics_type
|
Collects the type of request sent. Used to differentiate model upload and model inference call flows.
Contains two values:
|
|
modelId
|
Model ID in which user saves their model. |
|
model_upload
|
Monitors the model upload process to determine if the model has been successfully uploaded and is ready for inference. |
|
numColumns
|
Total number of columns in the dataset. |
|
numRows
|
Total number of rows (events) in the dataset. |
|
num_fields
|
Total number of fields. |
|
num_fields_fs
|
Number of fields that have the fs for Field Selector prefix.
|
|
num_fields_PC
|
Number of fields that have the PC for preprocessed prefix.
|
|
num_fields_prefixed
|
Total number of preprocessed fields. |
|
num_fields_RS
|
Number of fields that have the RS for Robust Scaler prefix.
|
|
num_fields_SS
|
Number of fields that have the SS for Standard Scaler prefix.
|
|
num_fields_tfidf
|
Number of fields that have used term frequency-inverse document frequency preprocessing. |
|
onnx_input_shape
|
Shape of input data stored in the onnx model schema. Triggered during apply time. |
|
onnx_model_size_on_disk
|
Total size in MB taken up by the model file on the disk after encoding. Triggered during model upload. |
|
onnx_upload_time
|
Time taken to upload an onnx model file from UI. Triggered during model upload. |
|
orig_sourcetype
|
The original sourcetype of the machine data. |
|
params
|
Optional parameters used in fit step.
|
|
partialFit
|
Whether or not the fit is a type of partial fit action.
|
|
PID
|
Process identifier associated with the command. |
|
pipeline_stage
|
Each preprocessing step on the Experiments page is assigned a number starting from 0. This helps determine the order of the preprocessing steps and length of the pipeline. |
|
rows
|
The number of rows being run through fit command.
|
|
scoringName
|
Name of the scoring operation if whitelisted. If name is not whitelisted, logs the hash of the scoringName .
|
|
scoringTimeSec
|
Time taken by the scoring operation. |
|
UUID
|
Universally unique identifier associated with command. This is 128-bit and used to keep each fit /apply unique.
|
|