Context in Splunk AI Assistant for SPL

You can allow Splunk AI Assistant to use grounding metadata to generate more relevant results that are based on your unique data and environment. When you opt-in for Context the app collects non-personal data from your environment when generating results. The context data is only used within your environment.

Context supports role based access controls (RBAC) and users do not see data to which they don't have access.

Context is optional and granular. Splunk administrators can opt-out or opt-in for Context overall or choose to specify the allowance of index and sourcetype metadata, user search logs, and knowledge objects.

CAUTION: Depending on the size of your Splunk deployment, it can take up to 2 weeks to get full coverage over the context data.
Note: This feature was previously named Personalization.

Context options

The following options are available when configuring Context settings:

Option Description
Allow AI to understand your environment Allows AI Assistant to use grounding metadata from your environment to generate more relevant results . This metadata is only used within your environment.

This is the overall setting for Context. If toggled off, no Context is used. If toggled on you can further select from the more specific context options such as index metadata and knowledge objects.

Index and sourcetype metadata Collects index, sourcetype, and field names from events in your environment. Field values are not collected.
User search logs Collects searches run by users in your environment. Only includes searches previously run by the requesting user.
Knowledge objects Collects knowledge objects from your environment, including macros, data models, lookups, dashboards, reports, and alerts.

Configure settings

If you want to opt-in or out of this feature, navigate to the Settings tab of the assistant and choose the Context tab. Select or de-select the options, as shown in the following image:

This image shows the Settings page of the Splunk AI Assistant and the tab called Context. From this tab you can entirely turn the Context data feature on or off, or make specific choices about what context data you allow.

Only users with administrator privileges can opt-in or opt-out of this feature. Splunk administrators can opt-in or out of Context data at any time. This setting applies at the app level, across all users, and not at the individual user level.

If you opt-out of Context data, Splunk AI Assistant cannot use the context of your data and environment in generating the response, leading to less relevant responses.

How Context data works

Context data works by taking the following actions:

Action taken Description
Collection of metadata Scheduled jobs run daily that collect metadata from the stack. Metadata includes names of indexes, source types, fields, search query logs, macros, lookups, and data models.
Add metadata to knowledge base The collected metadata and AI-generated descriptions of the metadata are added into a Splunk AI Assistant knowledge base. Each stack has its own knowledge base, and the knowledge of one stack can not be used by another.
Retrieval Augmented Generation (RAG) The user prompt is augmented with the most relevant metadata which helps the large language model (LLM) generate a more specific response with tailored index, source type, field information, and past search queries.

What if sensitive or Personally Identifiable Information (PII) information is included in the search?

Context data does not pass your search results data. Your Splunk AI Assistant searches are only retained as part of your organization's search history.

If you use sensitive or PII information in your Splunk AI Assistant search, the information is retained as verbatim inputs, but there is no retention of the actual values.

For example, when using the app to create SPL, the app generates an SPL search that includes your specific field names, but not the actual results obtained by running the search.

To ensure your data remains secure, the system enforces strict user-level segregation on historical search logs collected by the Context data feature. Historical search logs are retrieved as generation context and only for the user who ran those searches.

Data collected by Context

Using the Context feature allows for the collection of some data for training and fine-tuning. For details on what data is collected see AI service data in Splunk AI Assistant.

Note: The data does not leave the region in which the deployment belongs.

If you opt-in for Context, collected data is stored in a Splunk database. If you opt-out of Context at a later date, a cleanup job runs weekly which deletes any collected data.

Coverage details

The Context tab includes the option to view Coverage details as shown in the following image:

This image shows the Settings page and the Context tab. A button labeled as Coverage details is highlighted. Selecting this brings up a modal window.

On the resulting modal window, administrators can choose to Calculate coverage. This action looks at the index and sourcetype combinations in your environment over the last 7 days and checks whether Splunk AI Assistant has sampled metadata for them.

Select Calculate coverage as shown in the following image:

This image shows the modal that appears after selecting Coverage details. There are 2 options available. One to close this window and another to Calculate coverage.

Selecting Calculate coverage downloads sourcetype_coverage.csv, which is a report for each index and sourcetype pair, and vector_db_collection.raw, which is the raw service response used to build that report.

Context search macros

Context runs the following search to gather the sourcetype metadata used for tailored results:

CODE
| tstats count where `saias_field_summary_indexes` by sourcetype index 
| dedup sourcetype, index 
| rename index as indexname, sourcetype as sourcetypename 
| map maxsearches=1000 search="| search index=\"$indexname$\" sourcetype=\"$sourcetypename$\" | `saias_field_summary_limit` | fieldsummary | eval index=\"$indexname$\", sourcetype=\"$sourcetypename$\"" 
| submitfielddata

The search consists of 2 parts:

  • A tstats command to determine all of the unique index and sourcetype combinations present.
  • A map subsearch which runs a fieldsummary command over each unique index and sourcetype combination. This determines what fields exist within that index and source type combination.

The following 2 macros within the search are configurable:

Note: These macros can only be configured by your stack administrator.
Configurable macro Details
saias_field_summary_indexes Defaults to (index=* OR index=_*) .

You can choose to select specific indexes to be searched by the Context saved search used for gathering source type metadata. This change can reduce the total surface area over which the search runs, and reduce computational costs of the saved search.

saias_field_summary_limit Limits the total number of events scanned over for each unique index and source type combination found by the fieldsummary subsearch.

The macro is set to head 50000 to limit the performance impact of the map subsearch on large indexes.

CAUTION: Changing these values can lead to app performance problems. For example, if you find searches are taking too long, your adjusted macro values might need review from your stack administrator.

Monitor scheduled searches for metadata collected by Context

You can monitor the scheduled, saved searches for metadata collected by Context. Complete the following steps:

  1. Open Splunk AI Assistant. Select Settings from the top navigation bar.
  2. Select Searches, reports, and alerts.
  3. Set the Owner filter to All.
  4. To check the status of the user search logs saved searches select View Recent for the saved search or "Splunk AI Assistant - Search Logs".
  5. To check the status of the metadata modular input (modinput) go to the Search tab in your Splunk instance and run the following SPL with a 24 hour lookback:
    CODE
    index =_internal source=/opt/splunk/var/log/splunk/splunk_ai_assistant.log "Index metadata submitted successfully"

    The log events that appear indicate a successful modinput execution.

    Note: These scheduled searches must not be interrupted or modified for Context data to work properly.

Context known issues

Consider the following before opting in for Context:

  • Depending on the size of your Splunk deployment, it can take up to 2 weeks to get full coverage over the context data.
    • Turning Context off and on can extend this time delay.

  • The generation speed for context results takes marginally longer than non-context results. This slight increase in generation speed allows for the search results to be specific to your environment and data.
  • Saved searches that include Context data, especially those collecting source type metadata, can be expensive. You can fine-tune using the 2 provided search macros to help this.
  • Saved searches that include Context data can run up against workload management rules and return partial results. Admins can double-check results of the Context data saved searches and make sure that no errors occurred while running the saved search.
  • The saias_field_summary_indexes macro has a default value of "(index=" OR index=_"). This can be redefined to only select indexes that admins want to be searched by the Context data saved search for gathering source type metadata. Doing so can reduce the total surface area over which the search runs, and reduce computational costs of the saved search.