AI service data in Splunk AI Assistant for SPL

Allowing access to AI service data is how you provide feedback. When you interact with the Splunk AI Assistant for SPL, Splunk may use your chat history, including inputs and outputs, context data collected from your environment as noted in this section and updated from time to time, and in-product feedback you give to maintain and modify the assistant.

Use of your data for maintenance and modification purposes does not include using your data for Training and Fine-Tuning which is defined as follows:

Training and Fine-Tuning means teaching or conditioning AI Models to learn patterns and perform specific tasks by supplying the AI Models with datasets and optimizing their relevant parameters. It includes adapting pre-trained AI Models to improve performance through methods such as adjusting relevant weights.

How to opt in or out of Training and Fine-Tuning

Allowing the use of your AI Service Data is turned on by default. You can turn access off from within Splunk AI Assistant for SPL on the Settings tab of the app.

Toggle the selector next to Allow Splunk to use your AI Service Data for Training and Fine-Tuning as defined in the Splunk AI Features Specific Offerings Terms as shown in the following image:

This image shows the Settings tab of Splunk AI Assistant for SPL. A tick-box labeled as Allow Splunk to use your AI Service Data for Training and Fine-Tuning as defined in the Splunk AI Features Specific Offerings Terms is highlighted.

What data is collected

Splunk AI Assistant for SPL collects different context data depending on if you opt-in to allow use of your AI Service Data and opt-in to use the Personalization feature.

Context data

In addition to your chat history, including inputs and outputs, and in-product feedback, Splunk AI Assistant for SPL collects the following context data:


Category	Description
User prompts or inputs	This is the text entered into the AI assistant chat by an end-user. Examples are "Show storage freespace in winhostmon," "What data is being collected in my environment?" and "index=myindex". Source="WinEventLog:Security(Event code=123"
Grounding data	This is the data processed by the AI model during the retrieval-augmented generation step to generate a relevant response to a user prompt. This includes relevant searches and contextual metadata such as index, sourcetype, and field names. Some of the data comes from the Splunk knowledge base, but if you opt into personalization, this can also come from your Splunk deployment.
Assistant responses	The output generated by the AI assistant. This might contain an SPL search or a derivation of it, such as an optimized version of the search ot explanation of the search, or a summarized answer for a Splunk product question from the Splunk documentation.
Feedback	Any user-entered feedback.
Service data	Service data is described more fully in the Splunk Privacy Statement. Examples include "thumbs up", "thumbs down", "chat ID", "copy", "token used", and "response length".

Personalization data

Personalization is turned off by default. You can turn Personalization on or back off from within Splunk AI Assistant for SPL on the Settings tab of the app. Deselect the box next to Personalize results.

Collected data is stored in the vector DB, and a cleanup job runs weekly to delete this information if you decide to opt-out of Personalization at a later date.

Data retention

Data outlined in this section is retained as set forth in the Splunk Data Retention Policy.

Chat data is stored in the KVStore on the customer's stack. If you choose to delete a chat, that chat data is deleted from your local KVStore collection.

Note: If you opt-in for the Personalization feature the collected data is stored in the vector database. If you opt-out of Personalization at a later date, a cleanup job runs weekly to delete any collected data.

Field specific data details


Component	Description	Example
`app.session.copy_spl_clicked`	Data collected when SPL generated using the app is copied with the "Copy" button.	JSON app: splunk_instrumentation component: app.session.copy_spl_clicked data: { [-] app: Splunk_AI_Assistant_Cloud page: dashboard source: SAIA UI Telemetry spl: index=_internal sourcetype=splunkd log_level=ERROR\| timechart count\| rename _time as Time, count as Count } `app: splunk_instrumentation component: app.session.copy_spl_clicked data: { [-] app: Splunk_AI_Assistant_Cloud page: dashboard source: SAIA UI Telemetry spl: index=_internal sourcetype=splunkd log_level=ERROR\| timechart count\| rename _time as Time, count as Count }`
`app.Splunk_AI_Assistant`	Information including `type`, `tenant`, `query`, `enabled_features`, and `request_id`.	JSON { 'type': 'inference_spl_generation', 'tenant': 'saia-stg-custom', 'query': ' SAIA has expert knowledge of the Splunk platform and Splunk...', 'enabled_features': "['customization']", 'request_id' : c88bbad8-92ab-4851-ac5f-b417b984f53c } `{ 'type': 'inference_spl_generation', 'tenant': 'saia-stg-custom', 'query': ' SAIA has expert knowledge of the Splunk platform and Splunk...', 'enabled_features': "['customization']", 'request_id' : c88bbad8-92ab-4851-ac5f-b417b984f53c }`
`app.Splunk_AI_Assistant`	Information including tenant, and type.	JSON { 'type': 'customization_opt_in', 'tenant': 'saia-stg-custom' } `{ 'type': 'customization_opt_in', 'tenant': 'saia-stg-custom' }`
`app.Splunk_AI_Assistant.splgen`	Collects the chat_id.	JSON { .... 'chat_id': 4 } `{ .... 'chat_id': 4 }`
`app.Splunk_AI_Assistant.splgen.feedback`	Information including `enabled_features`, `feedback_id`, and `query`.	JSON { enabled_features : ['customization'] feedback_id : '4e618319-2276-4ae7-9436-ab2713735629' query : 'List available indices' } `{ enabled_features : ['customization'] feedback_id : '4e618319-2276-4ae7-9436-ab2713735629' query : 'List available indices' }`
`app.Splunk_AI_Assistant_Cloud.splgen`	Logging from Splunk AI Assistant for SPL Splunk app REST handlers.	CODE 2024-05-27 16:26:25 UTC, Level=INFO, Pid=1063271, Logger=ChatHistoryHandler, File=chat_history_handler.py, Line=43, UUID="34547aed-648c-4d3f-b2ce-f1ce066a57ad", message="Handling chat history request" `2024-05-27 16:26:25 UTC, Level=INFO, Pid=1063271, Logger=ChatHistoryHandler, File=chat_history_handler.py, Line=43, UUID="34547aed-648c-4d3f-b2ce-f1ce066a57ad", message="Handling chat history request"`
`app.Splunk_AI_Assistant_Cloud.splgen`	Generation time. End to end (e2e) time from request start to end.	CODE 2024-05-24 18:05:50 UTC, Level=INFO, Pid=2248783, Logger=AsyncHttpJobs, File=jobs.py, Line=87, UUID="4475f233-2559-42ee-b7ff-c2891ae0d549", apply_time="2.16974", user="haydn" `2024-05-24 18:05:50 UTC, Level=INFO, Pid=2248783, Logger=AsyncHttpJobs, File=jobs.py, Line=87, UUID="4475f233-2559-42ee-b7ff-c2891ae0d549", apply_time="2.16974", user="haydn"`
`app.Splunk_AI_Assistant_Cloud.splgen.openinsearch`	When the user clicks on the "Open in Search" button for some generated SPL.	JSON { "data": { "_time": 1688763330, "_sourcetype": "splgen_feedback", "session_id": "1dd4af3e-a567-4d68-a491-75964913d868", "spl": "'\| rest splunk_server=local /services/cluster/master/peers \| stats sum(bucket_count) by label \| rename label as peer'", "user": "<hashed username>", "_kv": 1, "_serial": 0 } } `{ "data": { "_time": 1688763330, "_sourcetype": "splgen_feedback", "session_id": "1dd4af3e-a567-4d68-a491-75964913d868", "spl": "'\| rest splunk_server=local /services/cluster/master/peers \| stats sum(bucket_count) by label \| rename label as peer'", "user": "<hashed username>", "_kv": 1, "_serial": 0 } }`
`app.Splunk_AI_Assistant_Cloud.splgen.usage`	Feedback submitted by users with thumbs up/thumbs down/additional details UI in app.	JSON { "data": { "_time": 1688763330, "response": "'Concise Summary:\nThe query retrieves the total number of buckets per peer in a Splunk cluster.\nDetailed Explanation:\n- `\| rest splunk_server=local /services/cluster/master/peers`: This part of the query uses the REST command to access the local Splunk cluster master'", "_sourcetype": "splgen_feedback", "session_id": "1dd4af3e-a567-4d68-a491-75964913d868", "query": "'\| rest splunk_server=local /services/cluster/master/peers \| stats sum(bucket_count) by label \| rename label as peer'", "correct": "true", "_kv": 1, "_serial": 0 } } { "data": { "_time": 1688763330, "response": "'Concise Summary:\nThe query retrieves the total number of buckets per peer in a Splunk cluster.\nDetailed Explanation:\n- `\| rest splunk_server=local /services/cluster/master/peers`: This part of the query uses the REST command to access the local Splunk cluster master'", "_sourcetype": "splgen_feedback", "session_id": "1dd4af3e-a567-4d68-a491-75964913d868", "query": "'\| rest splunk_server=local /services/cluster/master/peers \| stats sum(bucket_count) by label \| rename label as peer'", "correct": "true", "_kv": 1, "_serial": 0 } }
`inference_spl_generation` `inference_spl_explanation`	Natural language prompt entered by the user in `user_prompt` field and intermediate rag/metadata responses retrieved from the large language models (LLMs).	JSON { 'user_prompt' : "show storage freespace in winhostmon", 'retrieved_rag': ```search 'search index=windows sourcetype=WinHostMon Type=Disk \| table host, Name, DriveType, TotalSpaceGB, FreeSpaceGB, FreeSpacePct \| sort FreeSpacePct'```, 'retrieved_personalization_metadata': ['component', 'datetime', 'log_level', 'data.total_size', 'data.name', 'dns_alt_name', 'sh_label', 'data.total_bucket_count', 'data.bucket_dirs.cold.capacity', 'data.bucket_dirs.home.capacity'], 'generated_response': ``` index=windows sourcetype=WinHostMon Type=Disk \| stats sum(FreeSpaceKB) as total_free_space by Name \| eval total_free_space_GB = round(total_free_space / 1024 / 1024, 2) \| table Name, total_free_space_GB ``` } { 'user_prompt' : "show storage freespace in winhostmon", 'retrieved_rag': ```search 'search index=windows sourcetype=WinHostMon Type=Disk \| table host, Name, DriveType, TotalSpaceGB, FreeSpaceGB, FreeSpacePct \| sort FreeSpacePct'```, 'retrieved_personalization_metadata': ['component', 'datetime', 'log_level', 'data.total_size', 'data.name', 'dns_alt_name', 'sh_label', 'data.total_bucket_count', 'data.bucket_dirs.cold.capacity', 'data.bucket_dirs.home.capacity'], 'generated_response': ``` index=windows sourcetype=WinHostMon Type=Disk \| stats sum(FreeSpaceKB) as total_free_space by Name \| eval total_free_space_GB = round(total_free_space / 1024 / 1024, 2) \| table Name, total_free_space_GB ``` }
`saia-tenant-id`	Hashed name of the tenant or stack ID.	JSON { ..... saia-tenant-id: 1b366eb2-3dfa-520e-b353-8178af77cfbd sourcetype: saia_api_event } `{ ..... saia-tenant-id: 1b366eb2-3dfa-520e-b353-8178af77cfbd sourcetype: saia_api_event }`
`stackID` `userID` `chat_id` `app_version`	Information collected from the StackID, UserID, ChatID, and App Version fields.	CODE { stackID=CLOUD-7e42604c501e415b0b72b841bd788e84db49ea089713d9a5afe2a17d74e9b7a9, userID=677ee9314a5407cfdb0a224f, chat_id=0, app_version="1.0.6", } `{ stackID=CLOUD-7e42604c501e415b0b72b841bd788e84db49ea089713d9a5afe2a17d74e9b7a9, userID=677ee9314a5407cfdb0a224f, chat_id=0, app_version="1.0.6", }`
`job_id` `user_key` `user` `chat_id`	Information collected from the JobID, UserKey, User, and ChatID fields.	CODE .... request_id: job_id=5637081e-ab41-432d-bce9-9f76c61c9b1c user_key=677ee9314a5407cfdb0a224f chat_id=0 user=2340314992997373707 } `.... request_id: job_id=5637081e-ab41-432d-bce9-9f76c61c9b1c user_key=677ee9314a5407cfdb0a224f chat_id=0 user=2340314992997373707 }`
`input_word_count` `input_char_count` `output_word_count` `output_char_count`	Total numbers of the word and character counts for input and output responses.	JSON { input_char_count: 115 input_word_count: 20 output_char_count: 1896 output_word_count: 236 } `{ input_char_count: 115 input_word_count: 20 output_char_count: 1896 output_word_count: 236 }`
`source_app_id`	SourceAppID information.	CODE source_app_id: Splunk_AI_Assistant_Cloud_Custom `source_app_id: Splunk_AI_Assistant_Cloud_Custom`
`num_distinct_clusters` `avg_clusters_per_srctype` `avg_fields_per_cluster` `min_fields_per_cluster` `max_fields_per_cluster`	Information collected on distinct clusters formed for each tenant, average number of clusters formed per sourcetype, average number of field lists collected per cluster, minimum number of fields per cluster, and maximum number of fields per cluster.	JSON { num_distinct_clusters: 11 avg_clusters_per_srctype: 2 avg_fields_per_cluster: 4.5 min_fields_per_cluster: 1 max_fields_per_cluster: 139 } `{ num_distinct_clusters: 11 avg_clusters_per_srctype: 2 avg_fields_per_cluster: 4.5 min_fields_per_cluster: 1 max_fields_per_cluster: 139 }`
`generate_optimized_spl`	Tracks runtime optimization decisions and user behaviors during SPL search generation.	JSON { "query_id": "def-789", "user_id": "u-998", "timestamp": "2025-06-30T16:48:02Z", "original_spl": "search error \| stats count by host", "optimized_spl": "search index=_internal error \| stats count by host", "optimization_type": "index_specifier", "optimization_applied": true, "parsability": null, "manual_override": true } `{ "query_id": "def-789", "user_id": "u-998", "timestamp": "2025-06-30T16:48:02Z", "original_spl": "search error \| stats count by host", "optimized_spl": "search index=_internal error \| stats count by host", "optimization_type": "index_specifier", "optimization_applied": true, "parsability": null, "manual_override": true }`
`enabled_features`	Tracks the app features currently turned on by the customer.	JSON { "query_id": "def-789", "user_id": "-9922228", "timestamp": "2025-06-30T16:48:02Z", "query": "search error \| stats count by host", "response": "search index=_internal error \| stats count by host", "enabled_features": ["customization", "external_llm"] } `{ "query_id": "def-789", "user_id": "-9922228", "timestamp": "2025-06-30T16:48:02Z", "query": "search error \| stats count by host", "response": "search index=_internal error \| stats count by host", "enabled_features": ["customization", "external_llm"] }`
`orchestration_decision`	Tracks the orchestrated intent returned for a given user input, by the intent orchestration component.	JSON { "job_id": "4d8ee15b-162f-4c06-8882-176823653220", "intent": 0, "user_prompt": "Use the common information model to search for successful logins", "tool_content": {'name': 'write_spl', 'query': 'Use the common information model to search for successful logins', 'confidence': 0.9998138806751069, 'id': 'call_Lm0ILbsV7hIZyzzG0wix8DqA', 'arguments': '{"user_prompt":"Use the common information model to search for successful logins"}'} } `{ "job_id": "4d8ee15b-162f-4c06-8882-176823653220", "intent": 0, "user_prompt": "Use the common information model to search for successful logins", "tool_content": {'name': 'write_spl', 'query': 'Use the common information model to search for successful logins', 'confidence': 0.9998138806751069, 'id': 'call_Lm0ILbsV7hIZyzzG0wix8DqA', 'arguments': '{"user_prompt":"Use the common information model to search for successful logins"}'} }`
`data_upload_sourcetype_metadata`	Tracks description generation/metadata collection stats, such as number of sourcetypes/indexes we have collected data for.	JSON { "saia-tenant-id": "4d8ee15b-162f-4c06-8882-176823653220", "deployment-id": "CLOUD-ccj3ted162f4c068882176823653220", "app_version": "1.5.0", "num_entries_saved": 40, "num_unique_indexes": 10, "num_of_unique_sourcetypes": 20, "num_of_unique_index_sourcetypes": 20 } `{ "saia-tenant-id": "4d8ee15b-162f-4c06-8882-176823653220", "deployment-id": "CLOUD-ccj3ted162f4c068882176823653220", "app_version": "1.5.0", "num_entries_saved": 40, "num_unique_indexes": 10, "num_of_unique_sourcetypes": 20, "num_of_unique_index_sourcetypes": 20 }`

Personalization data

Personalization is turned off by default. You can turn data sharing on or back off from within Splunk AI Assistant for SPL on the Settings tab of the app. Deselect the box next to Personalize results as shown in the following image:

This image shows the Settings tab of Splunk AI Assistant for SPL. The toggle button next to Personalize results is highlighted.

The following context data is collected if you opt-in to use Personalization.

This data is collected using 2 saved searches bundled with the assistant. These searches are only enabled if you opt-in for Personalization:

Splunk AI Assistant for SPL - Field Summary
Splunk AI Assistant for SPL - Search Logs

Collected data is stored in the vector database, and a cleanup job runs weekly to delete this information if you decide to opt-out of Personalization.


Component	Description	Example
`app.Splunk_AI_Assistant.index_metadata`	Sourcetype metadata	JSON { "tenant": "caeinternal1", "index_metadata": "[{ 'max': '2846', 'min': '0', 'mean': '2.054869684499314', 'count': '3645', 'field': 'duration_command_search_rawdata', 'index': 'main', 'sourcetype':'audittrail', 'stdev': '51.19505709576045', 'is_exact': '1', 'distinct_count': '33', 'numeric_count': '3645', 'is_numeric': True}]" } `{ "tenant": "caeinternal1", "index_metadata": "[{ 'max': '2846', 'min': '0', 'mean': '2.054869684499314', 'count': '3645', 'field': 'duration_command_search_rawdata', 'index': 'main', 'sourcetype':'audittrail', 'stdev': '51.19505709576045', 'is_exact': '1', 'distinct_count': '33', 'numeric_count': '3645', 'is_numeric': True}]" }`
`app.Splunk_AI_Assistant.previous_searches`	Previous searches	JSON { "tenant": "saia-play-custom", "searches": [ { "user": "admin", "spl": "\| search index=\"_internal\" sourcetype=\"splunk_ai_assistant-3\" \| fieldsummary \| eval index=\"_internal\", sourcetype=\"splunk_ai_assistant-3", "count": 1, "roles": ["admin" , "mltk_model_admin"] }, { "user": "admin", "spl": "\| search index=\"_introspection\" sourcetype=\"splunk_telemetry\" \| fieldsummary \| eval index=\"_introspection\", sourcetype=\"splunk_telemetry\"", "count": 1, "roles": ["admin" , "power_user", "mltk_model_admin"] } ] } { "tenant": "saia-play-custom", "searches": [ { "user": "admin", "spl": "\| search index=\"_internal\" sourcetype=\"splunk_ai_assistant-3\" \| fieldsummary \| eval index=\"_internal\", sourcetype=\"splunk_ai_assistant-3", "count": 1, "roles": ["admin" , "mltk_model_admin"] }, { "user": "admin", "spl": "\| search index=\"_introspection\" sourcetype=\"splunk_telemetry\" \| fieldsummary \| eval index=\"_introspection\", sourcetype=\"splunk_telemetry\"", "count": 1, "roles": ["admin" , "power_user", "mltk_model_admin"] } ] }
`num_indexes` `num_distinct_indexes` `num_sourcetypes` `num_distinct_sourcetypes` `average_sourcetype_per_index` `num_spls` `num_distinct_spls` `num_users` `num_distinct_users` `average_spls_per_user`	VectorDB metrics for all the tenants who opted for the personalization feature.	JSON { average_spls_per_user: 1 num_distinct_spls: 11 num_distinct_users: 2 num_spls: 11 num_users: 11 } ........ { average_sourcetype_per_index: 6.625 num_distinct_indexes: 8 num_distinct_sourcetypes: 49 num_indexes: 53 num_sourcetypes: 53 } `{ average_spls_per_user: 1 num_distinct_spls: 11 num_distinct_users: 2 num_spls: 11 num_users: 11 } ........ { average_sourcetype_per_index: 6.625 num_distinct_indexes: 8 num_distinct_sourcetypes: 49 num_indexes: 53 num_sourcetypes: 53 }`

Splunk Enterprise

AI service data in Splunk AI Assistant for SPL

How to opt in or out of Training and Fine-Tuning

What data is collected

Context data

Personalization data

Data retention

Field specific data details

Personalization data

ON THIS PAGE

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

Enterprise Security

SOAR

IT Service Intelligence

Content Packs

Splunk Observability Cloud

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

Developer Documentation

Splunkbase

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

DATA MANAGEMENT

SEARCH AND ANALYTICS

ADMINISTRATION

Enterprise Security

SOAR

ENTERPRISE SECURITY

SOAR

RELATED APPS

IT Service Intelligence

Content Packs

ITSI

IT Ops

ADMINISTRATION

EXTENSIONS

Splunk Observability Cloud

MONITORING

DATA MANAGEMENT

ADMINISTRATION

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

ESSENTIALS

MONITORING

ADMINISTRATION

Developer Documentation

Splunkbase

PLATFORM

OBSERVABILITY

REFERENCE

Resources

REFERENCE

Learn More

Support

AI service data in Splunk AI Assistant for SPL

How to opt in or out of Training and Fine-Tuning

What data is collected

Context data

Personalization data

Data retention

Field specific data details

Personalization data