Share data in the AI Toolkit
What data is collected
The AI Toolkit collects the following basic usage information:
| Component | Description | Example |
|---|---|---|
ai_processing_time |
Time taken to process the ai command request. Triggered during ai command usage. |
JSON
|
algo_name |
Name of algorithm used in fit or apply. |
JSON
|
app_context |
Name of the app from which search is run. |
JSON
|
apply_time |
Time the apply command took. |
JSON
|
app.session.Splunk_ML_Toolkit.changeSmartAssistantStep |
User progress through an AI Toolkit Smart Assistant. |
JSON
|
app.session.Splunk_ML_Toolkit.createExperiment |
User creating an AI Toolkit Experiment. |
JSON
|
app.session.Splunk_ML_Toolkit.createExperimentAlert |
Users creating alerts for AI Toolkit Experiments. |
JSON
|
app.session.Splunk_ML_Toolkit.loadAssistant |
Number of times the user has loaded an AI Toolkit Assistant. |
JSON
|
app.session.Splunk_ML_Toolkit.saveExperiment |
Users saving their work in AI Toolkit Experiments. |
JSON
|
app.session.Splunk_ML_Toolkit.scheduleExperimentTraining |
Users scheduling model re-training for AI Toolkit Experiments. |
JSON
|
col_dimension |
Collects dimension of the dataset from model schema. Triggered during apply. |
JSON
|
columns |
The number of columns being run through fit command. |
JSON
|
command |
fit, apply, or score |
JSON
JSON
JSON
|
csv_parse_time |
CSV parse time. |
JSON
|
csv_read_time |
CSV read time. |
JSON
|
csv_render_time |
CSV render time. |
JSON
|
deployment.app |
Apps installed per Splunk instance. |
JSON
|
df_shape |
Shape of data input received from splunk. Triggered during apply. |
JSON
|
example_name |
Name of the Showcase example being run. |
JSON
|
experiment_id |
ID of the fit and apply run on the Experiments page. All preprocessing steps and final fit have the same ID. |
JSON
|
fit_time |
Amount of time it took to run the fit command. |
JSON
|
full_punct |
The punct of the data during fit or apply. |
JSON
|
handle_time |
Time for the handler to handle the data. |
JSON
|
metrics_type |
Collects the type of request sent. Used to differentiate model upload and model inference call flows.
Contains two values:
|
JSON
|
model |
To capture the LLM model name under the specific provider while running the ai command. |
JSON
|
modelId |
Model ID in which user saves their model. |
JSON
|
model_upload |
Monitors the model upload process to determine if the model has been successfully uploaded and is ready for inference. |
JSON
|
numColumns |
Total number of columns in the dataset. |
JSON
|
numRows |
Total number of rows (events) in the dataset. |
JSON
|
num_fields |
Total number of fields. |
JSON
|
num_fields_fs |
Number of fields that have the fs for Field Selector prefix. |
JSON
|
num_fields_PC |
Number of fields that have the PC for preprocessed prefix. |
JSON
|
num_fields_prefixed |
Total number of preprocessed fields. |
JSON
|
num_fields_RS |
Number of fields that have the RS for Robust Scaler prefix. |
JSON
|
num_fields_SS |
Number of fields that have the SS for Standard Scaler prefix. |
JSON
|
num_fields_tfidf |
Number of fields that have used term frequency-inverse document frequency preprocessing. |
JSON
|
onnx_input_shape |
Shape of input data stored in the onnx model schema. Triggered during apply time. |
JSON
|
onnx_model_size_on_disk |
Total size in MB taken up by the model file on the disk after encoding. Triggered during model upload. |
JSON
|
onnx_upload_time |
Time taken to upload an onnx model file from UI. Triggered during model upload. |
JSON
|
orig_sourcetype |
The original sourcetype of the machine data. |
JSON
|
params |
Optional parameters used in fit step. |
JSON
|
params |
Collects the boolean value of supervise_split_by. Checks whether DecisionTreeRegressor is used as part of DensityFunction. |
JSON
|
partialFit |
Whether or not the fit is a type of partial fit action. |
JSON
|
PID |
Process identifier associated with the command. |
JSON
|
pipeline_stage |
Each preprocessing step on the Experiments page is assigned a number starting from 0. This helps determine the order of the preprocessing steps and length of the pipeline. |
JSON
|
provider |
To capture the provider name while running the ai command. |
JSON
|
rows |
The number of rows being run through fit command. |
JSON
|
rows |
The number of rows processed at a given ai command request. |
JSON
|
rows_processor_time |
Time taken to process the rows in seconds while using the ai command request. |
JSON
|
SageMaker model apply/inference event |
The AWS Sagemaker model apply/inference event. |
JSON
|
scoringName |
Name of the scoring operation if whitelisted. If name is not whitelisted, logs the hash of the scoringName. |
CODE
|
scoringTimeSec |
Time taken by the scoring operation. |
CODE
|
UUID |
Universally unique identifier associated with command. This is 128-bit and used to keep each fit/apply unique. |
JSON
|
container_id, status, cluster_type, hpa, and memory usage of a container |
Information about the container, including memory usage, cluster type, and HPA behavior, when the container is started and when the fit command is executed in AI Toolkit version 5.7.0. |
JSON
|
model, rows, rows_ processing_ time, column_count |
Information about the model used, including the number of rows processed, the processing time taken, and the number of columns passed to the model when the predictai command is executed in AI Toolkit version 5.7.0. |
JSON
|
Invocation of the ai command using Splunk hosted LLMs |
Information about the model used with input and output tokens. |
JSON
|