AI service data in Splunk AI Assistant for SPL

Allowing access to AI service data is how you provide feedback. When you interact with the Splunk AI Assistant for SPL, Splunk may use your chat history, including inputs and outputs, context data collected from your environment as noted in this section and updated from time to time, and in-product feedback you give to maintain and modify the assistant.

Use of your data for maintenance and modification purposes does not include using your data for Training and Fine-Tuning which is defined as follows:

Training and Fine-Tuning means teaching or conditioning AI Models to learn patterns and perform specific tasks by supplying the AI Models with datasets and optimizing their relevant parameters. It includes adapting pre-trained AI Models to improve performance through methods such as adjusting relevant weights.

How to opt in or out of Training and Fine-Tuning

Allowing the use of your AI Service Data is turned on by default. You can turn access off from within Splunk AI Assistant for SPL on the Settings tab of the app.

Toggle the selector next to Allow Splunk to use your AI Service Data for Training and Fine-Tuning as defined in the Splunk AI Features Specific Offerings Terms as shown in the following image:

This image shows the Settings tab of Splunk AI Assistant for SPL. A tick-box labeled as Allow Splunk to use your AI Service Data for Training and Fine-Tuning as defined in the Splunk AI Features Specific Offerings Terms is highlighted.

What data is collected

Splunk AI Assistant for SPL collects different context data depending on if you opt-in to allow use of your AI Service Data and opt-in to use the Personalization feature.

Context data

In addition to your chat history, including inputs and outputs, and in-product feedback, Splunk AI Assistant for SPL collects the following context data:

Category Description
User prompts or inputs This is the text entered into the AI assistant chat by an end-user. Examples are "Show storage freespace in winhostmon," "What data is being collected in my environment?" and "index=myindex". Source="WinEventLog:Security(Event code=123"
Grounding data This is the data processed by the AI model during the retrieval-augmented generation step to generate a relevant response to a user prompt. This includes relevant searches and contextual metadata such as index, sourcetype, and field names. Some of the data comes from the Splunk knowledge base, but if you opt into personalization, this can also come from your Splunk deployment.
Assistant responses The output generated by the AI assistant. This might contain an SPL search or a derivation of it, such as an optimized version of the search ot explanation of the search, or a summarized answer for a Splunk product question from the Splunk documentation.
Feedback Any user-entered feedback.
Service data Service data is described more fully in the Splunk Privacy Statement. Examples include "thumbs up", "thumbs down", "chat ID", "copy", "token used", and "response length".

Personalization data

Personalization is turned off by default. You can turn Personalization on or back off from within Splunk AI Assistant for SPL on the Settings tab of the app. Deselect the box next to Personalize results.

Collected data is stored in the vector DB, and a cleanup job runs weekly to delete this information if you decide to opt-out of Personalization at a later date.

Data retention

Data outlined in this section is retained as set forth in the Splunk Data Retention Policy.

Chat data is stored in the KVStore on the customer's stack. If you choose to delete a chat, that chat data is deleted from your local KVStore collection.

Note: If you opt-in for the Personalization feature the collected data is stored in the vector database. If you opt-out of Personalization at a later date, a cleanup job runs weekly to delete any collected data.

Field specific data details

Component Description Example
app.session.copy_spl_clicked Data collected when SPL generated using the app is copied with the "Copy" button.
JSON
app: splunk_instrumentation

   component: app.session.copy_spl_clicked

   data: { [-]
     app: Splunk_AI_Assistant_Cloud
     page: dashboard
     source: SAIA UI Telemetry
     spl: index=_internal sourcetype=splunkd log_level=ERROR| timechart count| rename _time as Time, count as Count
}
app.Splunk_AI_Assistant Information including type, tenant, query, enabled_features, and request_id.
JSON
{
   'type': 'inference_spl_generation',
    'tenant': 'saia-stg-custom',
     'query': ' SAIA has expert knowledge of the Splunk platform and Splunk...',
     'enabled_features': "['customization']", 
'request_id' : c88bbad8-92ab-4851-ac5f-b417b984f53c
}
app.Splunk_AI_Assistant Information including tenant, and type.
JSON
{
     'type': 'customization_opt_in',
       'tenant': 'saia-stg-custom'
}
app.Splunk_AI_Assistant.splgen Collects the chat_id.
JSON
{
....
'chat_id': 4
}
app.Splunk_AI_Assistant.splgen.feedback Information including enabled_features, feedback_id, and query.
JSON
{   
    enabled_features : ['customization']
    feedback_id : '4e618319-2276-4ae7-9436-ab2713735629'
       query : 'List available indices'
}
app.Splunk_AI_Assistant_Cloud.splgen Logging from Splunk AI Assistant for SPL Splunk app REST handlers.
CODE
2024-05-27 16:26:25 UTC, Level=INFO, Pid=1063271, Logger=ChatHistoryHandler, File=chat_history_handler.py, Line=43, UUID="34547aed-648c-4d3f-b2ce-f1ce066a57ad", message="Handling chat history request"
app.Splunk_AI_Assistant_Cloud.splgen Generation time. End to end (e2e) time from request start to end.
CODE
2024-05-24 18:05:50 UTC, Level=INFO, Pid=2248783, Logger=AsyncHttpJobs, File=jobs.py, Line=87, UUID="4475f233-2559-42ee-b7ff-c2891ae0d549", apply_time="2.16974", user="haydn"
app.Splunk_AI_Assistant_Cloud.splgen.openinsearch When the user clicks on the "Open in Search" button for some generated SPL.
JSON
{ 
"data": {
"_time": 1688763330,
"_sourcetype": "splgen_feedback",
"session_id": "1dd4af3e-a567-4d68-a491-75964913d868",
"spl": "'| rest splunk_server=local /services/cluster/master/peers | stats sum(bucket_count) by label | rename label as peer'",
"user": "<hashed username>",
"_kv": 1,
"_serial": 0 }
}
app.Splunk_AI_Assistant_Cloud.splgen.usage Feedback submitted by users with thumbs up/thumbs down/additional details UI in app.
JSON
{ 
"data": {
"_time": 1688763330,
"response": "'Concise Summary:\nThe query retrieves the total number of buckets per peer in a Splunk cluster.\nDetailed Explanation:\n- `| rest splunk_server=local /services/cluster/master/peers`: This part of the query uses the REST command to access the local Splunk cluster master'",
"_sourcetype": "splgen_feedback",
"session_id": "1dd4af3e-a567-4d68-a491-75964913d868",
"query": "'| rest splunk_server=local /services/cluster/master/peers | stats sum(bucket_count) by label | rename label as peer'",
"correct": "true",
"_kv": 1,
"_serial": 0 }
}
inference_spl_generation

inference_spl_explanation

Natural language prompt entered by the user in user_prompt field and intermediate rag/metadata responses retrieved from the large language models (LLMs).
JSON
{
'user_prompt' : "show storage freespace in winhostmon",
'retrieved_rag': ```search 'search index=windows sourcetype=WinHostMon Type=Disk | table host, Name, DriveType, TotalSpaceGB, FreeSpaceGB, FreeSpacePct | sort FreeSpacePct'```,
'retrieved_personalization_metadata': ['component', 'datetime', 'log_level', 'data.total_size', 'data.name', 'dns_alt_name', 'sh_label', 'data.total_bucket_count', 'data.bucket_dirs.cold.capacity', 'data.bucket_dirs.home.capacity'],
'generated_response': ``` index=windows sourcetype=WinHostMon Type=Disk | stats sum(FreeSpaceKB) as total_free_space by Name | eval total_free_space_GB = round(total_free_space / 1024 / 1024, 2) | table Name, total_free_space_GB ```
}
saia-tenant-id Hashed name of the tenant or stack ID.
JSON
{
   .....
    saia-tenant-id: 1b366eb2-3dfa-520e-b353-8178af77cfbd

   sourcetype: saia_api_event
}
stackID

userID

chat_id
app_version

Information collected from the StackID, UserID, ChatID, and App Version fields.
CODE
{
stackID=CLOUD-7e42604c501e415b0b72b841bd788e84db49ea089713d9a5afe2a17d74e9b7a9,
userID=677ee9314a5407cfdb0a224f,
chat_id=0,
app_version="1.0.6",
}
job_id

user_key

user
chat_id

Information collected from the JobID, UserKey, User, and ChatID fields.
CODE
....
request_id: 
job_id=5637081e-ab41-432d-bce9-9f76c61c9b1c
user_key=677ee9314a5407cfdb0a224f
chat_id=0
user=2340314992997373707
}
input_word_count

input_char_count

output_word_count
output_char_count

Total numbers of the word and character counts for input and output responses.
JSON
{
input_char_count: 115

input_word_count: 20

output_char_count: 1896

output_word_count: 236
}
source_app_id SourceAppID information.
CODE
source_app_id: Splunk_AI_Assistant_Cloud_Custom
num_distinct_clusters

avg_clusters_per_srctype

avg_fields_per_cluster min_fields_per_cluster max_fields_per_cluster

Information collected on distinct clusters formed for each tenant, average number of clusters formed per sourcetype, average number of field lists collected per cluster, minimum number of fields per cluster, and maximum number of fields per cluster.
JSON
{
num_distinct_clusters: 11
avg_clusters_per_srctype: 2
avg_fields_per_cluster: 4.5
min_fields_per_cluster: 1
max_fields_per_cluster: 139
}
generate_optimized_spl Tracks runtime optimization decisions and user behaviors during SPL search generation.
JSON
{
  "query_id": "def-789",
  "user_id": "u-998",
  "timestamp": "2025-06-30T16:48:02Z",
  "original_spl": "search error | stats count by host",
  "optimized_spl": "search index=_internal error | stats count by host",
  "optimization_type": "index_specifier",
  "optimization_applied": true,
  "parsability": null,
  "manual_override": true
}
enabled_features Tracks the app features currently turned on by the customer.
JSON
{
  "query_id": "def-789",
  "user_id": "-9922228",
  "timestamp": "2025-06-30T16:48:02Z",
  "query": "search error | stats count by host",
  "response": "search index=_internal error | stats count by host",
  "enabled_features": ["customization", "external_llm"]
}
orchestration_decision Tracks the orchestrated intent returned for a given user input, by the intent orchestration component.
JSON
{
  "job_id": "4d8ee15b-162f-4c06-8882-176823653220",
  "intent": 0,
  "user_prompt": "Use the common information model to search for successful logins",
  "tool_content": {'name': 'write_spl', 'query': 'Use the common information model to search for successful logins', 'confidence': 0.9998138806751069, 'id': 'call_Lm0ILbsV7hIZyzzG0wix8DqA', 'arguments': '{"user_prompt":"Use the common information model to search for successful logins"}'}
}
data_upload_sourcetype_metadata Tracks description generation/metadata collection stats, such as number of sourcetypes/indexes we have collected data for.
JSON
{
  "saia-tenant-id": "4d8ee15b-162f-4c06-8882-176823653220",
  "deployment-id": "CLOUD-ccj3ted162f4c068882176823653220",
  "app_version": "1.5.0",
  "num_entries_saved": 40,
  "num_unique_indexes": 10,
  "num_of_unique_sourcetypes": 20,
  "num_of_unique_index_sourcetypes": 20
  }

Personalization data

Personalization is turned off by default. You can turn data sharing on or back off from within Splunk AI Assistant for SPL on the Settings tab of the app. Deselect the box next to Personalize results as shown in the following image:

This image shows the Settings tab of Splunk AI Assistant for SPL. The toggle button next to Personalize results is highlighted.

The following context data is collected if you opt-in to use Personalization.

This data is collected using 2 saved searches bundled with the assistant. These searches are only enabled if you opt-in for Personalization:

  • Splunk AI Assistant for SPL - Field Summary
  • Splunk AI Assistant for SPL - Search Logs

Collected data is stored in the vector database, and a cleanup job runs weekly to delete this information if you decide to opt-out of Personalization.

Component Description Example
app.Splunk_AI_Assistant.index_metadata Sourcetype metadata
JSON
{
"tenant": "caeinternal1",
"index_metadata": "[{ 'max': '2846', 'min': '0', 'mean': '2.054869684499314', 'count': '3645', 'field': 'duration_command_search_rawdata', 'index': 'main', 'sourcetype':'audittrail', 'stdev': '51.19505709576045', 'is_exact': '1', 'distinct_count': '33', 'numeric_count': '3645', 'is_numeric': True}]"
}
app.Splunk_AI_Assistant.previous_searches Previous searches
JSON
{
              "tenant": "saia-play-custom",
               "searches": [
                  {
                      "user": "admin",
                      "spl": "| search index=\"_internal\" sourcetype=\"splunk_ai_assistant-3\" | fieldsummary | eval index=\"_internal\", sourcetype=\"splunk_ai_assistant-3",
                       "count": 1,
                        "roles": ["admin" , "mltk_model_admin"]
                    },
                  {
                      "user": "admin",
                       "spl": "| search index=\"_introspection\" sourcetype=\"splunk_telemetry\" | fieldsummary | eval index=\"_introspection\", sourcetype=\"splunk_telemetry\"",
                     "count": 1,
                      "roles": ["admin" , "power_user", "mltk_model_admin"]
                 }
           ]
}
num_indexes

num_distinct_indexes

num_sourcetypes
num_distinct_sourcetypes
average_sourcetype_per_index
num_spls
num_distinct_spls
num_users
num_distinct_users
average_spls_per_user

VectorDB metrics for all the tenants who opted for the personalization feature.
JSON
{
average_spls_per_user: 1

num_distinct_spls: 11

num_distinct_users: 2

num_spls: 11

num_users: 11

}
........

{
average_sourcetype_per_index: 6.625

num_distinct_indexes: 8

num_distinct_sourcetypes: 49

num_indexes: 53

 num_sourcetypes: 53

}