What is the AI Toolkit process?
Machine learning is a process for gaining data insights and creating actionable machine learning models from your data. This machine learning process is comprised of a series of steps from data ingestion to model operationalization. The AI Toolkit operates like an extension to the Splunk platform and enables users to complete this machine learning process.
The machine learning process
The machine learning process is comprised of several generally accepted steps. The following image shows a common machine learning process:
Although the process typically begins with data collection, and ends with model deployment, it is not always a straightforward process and does not account for time spent finding and cleaning data. Data scientists and analysts also spend time experimenting with clean and collected data, evaluating experiment results and then adjusting experiment settings in order to generate good machine learning results that are suitable to put into operation.
Using the AI Learning Toolkit, this entire process, from ingesting the data to model deployment, can all occur inside the Splunk platform.
The machine learning process within the AI Toolkit
The AI Toolkit is a way to create custom machine learning outcomes. The AI Toolkit enables a machine learning workflow using a suite of guided modeling Assistants. The AI Toolkit can also be used outside of the guided framework with a series of machine learning specific Search Processing Language (SPL) commands and over 30 algorithms.
Explore Data
Once data is ingested you must explore that data to ensure it is suitable for and ready to be used in a machine learning process. Data you ingest into the AI Toolkit is easily visualized in both tables and graphics. The Splunk platform and AI Toolkit also offer several methods through which you can clean and transform your data and address common data issues including the identification and removal of errors, addressing missing values, and potentially converting categorical values into numeric values.
- To learn about the options and methods to clean and transform your machine data, see Preparing your data for machine learning.
Experiment
Data experimentation is the process of training your data and creating a working machine learning model. The Splunk Machine Learning Toolkit offers several machine learning commands and built-in algorithms through which you can perform data experimentation. The AI Toolkit also offers guided machine learning workflows through a series of Smart Assistants and Experiment Assistants.
- To learn about the supported machine learning Search Processing Language (SPL) commands, see Search commands for machine learning.
- To learn about the algorithms available in the AI Toolkit, see Algorithms in the AI Toolkit.
- To learn about the available suite of Smart Assistants, see Smart Assistant guided workflows.
- To learn about the available suite of Experiment Assistants, see Experiment Assistant guided workflows.
Evaluate Results
Evaluating results as you experiment with your data is an important part of getting a useful machine learning model. Are you asking the right question from your data? Do you have enough data, sufficiently cleaned data, or the right data to conduct your experiment? Do your experiment outcomes give you the results you expected? The AI Toolkit guided modeling Assistants all include data visualizations through which you can quickly assess experiment results. You can also choose from a range of scoring metrics to measure your machine learning results.
- To learn about the available data visualizations in the AI Toolkit, see Custom visualizations in the AI Toolkit.
- To learn about using the score command in the AI Toolkit, see Scoring metrics in the AI Toolkit.
Tune and Iterate
As part of evaluating experiment results, you can tune and iterate the machine learning model. Adjusting model settings ensures you get the desired machine learning results prior to applying the model to unseen data and putting the model into operation. The AI Toolkit guided Assistants make it simple to adjust model settings and gauge model performance improvement.
Repeat the steps of experimenting, evaluating, and tuning until you are ready to put your trained model into production.
Deploy Model
A trained machine learning model is ready for deployment and application on new, never-before-seen data. As a best practice, regularly check your model outcomes, as well as the sources of the new data and make adjustment to your machine learning model settings as needed.
In the AI Toolkit guided modeling Assistants, you can schedule model retraining, get alerted about model, and publish models.
Learn more
Learn to use the AI Toolkit by working through this User Guide or with the following links:
- To learn about implementing analytics and data science projects using Splunk platform statistics, machine learning, built-in and custom visualization capabilities, see the Splunk for Analytics and Data Science course.
- To learn more about adding and searching your data in the Splunk platform, begin with the Search Tutorial.
- To see a series of use-cases based on different machine learning goals, see Showcases in the AI Toolkit.
- To read more about the available data sets within the toolkit, see AI Toolkit data set credits.
- To learn about installing the AI Toolkit, see Install the AI Toolkit.
- To learn about companion apps, cheat-sheets, videos, and courses, see Learn more about the AI Toolkit.
- To learn about further support available for the AI Toolkit, see Support for the AI Toolkit.