Specifying remote datasets
When you create a federated index, you map the index to a specific remote dataset on a standard mode federated provider. Remote datasets can be events indexes, metrics indexes, saved searches, scheduled search jobs, or data models.
| Remote dataset type | Definition |
|---|---|
| Index | Index datasets are events indexes. Each events index on a federated provider is a searchable dataset. |
| Metric index | Each metrics index on a federated provider is a searchable dataset. |
| Saved search | The result set produced by an ad-hoc run of a saved search on a federated provider is a searchable dataset. |
| Last job | The results for the last job run for a scheduled search on a federated provider is a searchable dataset. |
| Data model | The set of events defined by a data model on a federated provider is a searchable dataset. |
You can map a federated index to an accelerated data model and then search it with the tstats command. See Run federated searches over remote Splunk platform deployments.
Use cases for saved search and last job dataset types
When you determine whether to set up federated indexes that map to saved search datasets or last job datasets, answer the following questions:
- Are you concerned about the amount of federated search processing that might take place on the remote search head?
- Do you require that the dataset contain fresh results, or can the dataset contain results from a search that was run in the recent past?
When you run a federated search that invokes a federated index which maps to a saved search dataset, the remote search head runs the saved search to get a result set, and then your federated search runs over that dataset to get the final results of the search.
If users are running a large number of searches on the remote search head, you might prefer to use last job datasets, which produce the last job run by a given scheduled search. Federated searches for Splunk that invoke last job datasets do not need to run a search on the remote search head to get the result set. Such searches use an existing result set from the last job run by a scheduled search.
The last job dataset approach can drastically reduce the amount of search processing overhead that federated searches might add to a remote search head. If your users run multiple federated searches around the same time, and these searches each invoke the same last job federated index, those searches can all run over the same result set without requiring additional search jobs to be run on the remote search head.
This table summarizes the tradeoff between the saved search and last job dataset types.
| Dataset type | Amount of search processing required on the remote search head | Recency of data in dataset |
|---|---|---|
| Saved search | Requires the remote search head to run an ad-hoc saved search job to get a result set. This result set is then sent to the federated search head for federated search processing. | Current. When you launch your federated search, Splunk software runs a saved search job and then runs your federated search over the data returned by that saved search job. |
| Last job | Does not require additional search jobs. The result set from a previously run scheduled search job is sent to the federated search head for federated search processing. | Depends on the interval of the scheduled search. For example, if the scheduled search runs on the hour, the result set can be up to an hour out of date. |
By default, all scheduled search jobs expire after a period of time that is two times the interval of the scheduled search, which means there is always a scheduled search job available for federated searches. See Extending job lifetimes in the Search Manual.
Use saved search or last job datasets to route around federated search limitations
You can use saved search datasets to get around certain limitations of federated searches over a standard mode federated provider. For example, standard mode federated searches cannot belong to the following search categories:
- Searches that use metrics search commands other than
mstatsormcatalogto search data in metrics indexes, such asmpreview, ormcollect - Searches that use any generating commands other than
search,from,loadjob,mstats,mcatalog, ortstats.
However, you can create federated indexes that map to saved search or last job datasets which use commands that federated search does not support. Then you can write federated searches that reference those federated indexes. See Run federated searches over remote Splunk platform deployments.
Remote dataset restrictions
The following kinds of indexes, searches, and data models cannot be used as remote datasets for federated searches. Do not map federated indexes to them.
- Federated indexes
- Saved and scheduled searches with federated index references in their search strings
- Data models with constraint searches that refer to federated indexes
The saved search and data model limitations relate to the fact that federated search does not support federated index chaining.
Remote dataset permissions
Review the permission settings on saved searches, scheduled searches, and data models that you want to use as federated search datasets. These knowledge objects must either be shared globally, or they must have the same app context as the federated provider that the federated index is associated with. In either case they must be shared with read permissions enabled.
For example, if you are creating a federated index for a federated provider that is associated with the Search app, any saved search dataset for that index must be shared with the Search app as well, or shared globally.
See Manage knowledge object permissions in the Knowledge Manager Manual.