Implement an agent management cluster

You can scale your agent management solution by implementing an agent management cluster consisting of multiple agent management servers that share their configurations and activities.

Agent management cluster architecture

An agent management cluster consists of multiple agent management servers that share configurations and activities via a shared drive. The agents connect to the pool of agent management servers through a load balancer or DNS mapping.

This diagram illustrates the basic architecture for an agent management cluster that uses a load balancer to connect with the agents. The DNS solution just substitutes DNS mapping in place of the load balancer.

An agent management cluster that uses a load balancer to connect with the agents

The agents point to the load balancer, rather than directly to a agent management.

The shared drive contains the two directories shared by all agent management servers in the cluster: the deployment_apps directory and a log directory named client_events.

These directories must be mounted on each agent management .

The deployment app bundle contains the usual set of user-defined deployment apps, as well as a system app, _splunk_ds_info. This app is new in Splunk Enterprise version 9.2. It includes the set of server class configuration files, which are shared across agent management servers. Do not directly edit the contents of this directory.

The shared log file directory is named client_events. It is new in version 9.2, and it tracks the clients' phone home events through log entries.

Deploy an agent management cluster

To deploy an agent management cluster:

Set up the shared drive.
Install and configure the agent management servers.
Set up a load balancer or DNS mapping.

System requirements

Each agent management in the cluster requires its own dedicated Splunk Enterprise instance, version 9.2 or higher.

The agents connecting to the cluster can run a pre-9.2 version, since their interactions are the same whether connecting to a cluster or to a standalone agent management. They simply need to connect to a load balancer or DNS, rather than directly to an agent management.

To calculate the maximum number of agent that the cluster can support, multiply the number of agent management servers by 25K. Therefore, a cluster of 3 agent management servers can service a maximum of 75K agents.

The maximum number of agent management servers in a cluster is limited to 3.

Note: Requirements for the shared drive and load balancer or DNS are discussed in their respective sections.

New configuration files

If you examine the agent management directories, you will notice some differences compared to pre-9.2. In particular, there is an app, etc/apps/SplunkDeploymentServerConfig, which contains configuration files necessary to the proper functioning of the agent management. Do not alter this directory or its files in any way. Note that this app is not a deployment app and so does not reside in etc/deployment-apps.

In addition, the system places new configurations in savedsearches.conf and macros.conf. Do not edit these system-generated configurations.

Set up a shared drive

The shared drive must be a network shared drive, such as an NFS host. The drive needs two high-level directories, one for deployment apps and another for log files. They must be readable/writeable by the splunk user on each agent management.

The shared drive does not need to be dedicated to serving the agent management servers, but its use for the cluster requires sufficient space. Specifically, the log file directory needs approximately 100MB per agent management for log files, possibly more if you choose to increase your logging beyond the defaults. The deployment app directory size depends on the number and size of deployment apps that your agent management servers need to manage.

When you create the two directories on the shared drive, you can name them as you wish. However, you must mount the directories on each agent management with the following local paths and directory names:

$SPLUNK_HOME/etc/deployment-apps
$SPLUNK_HOME/var/log/client_events

You will initially be placing any existing deployment apps in the mounted deployment-apps directory on one of the agent management servers. Since this directory is mounted to the shared drive, the deployment apps will be available to all agent management servers.

When one of the agent management servers first updates its serverclass.conf file and then reloads, the system creates a _splunk_ds_info directory in the mounted deployment-apps directory and places the serverclass.conf file in it.

Important: Do not manually edit the _splunk_ds_info directory.

Logs for agent phonehome events are placed in the mounted client_events directory.

Configure agent management servers

If you are planning to convert a standalone agent management to a cluster member, back up its deployment-apps directory and serverclass.conf file before performing the upgrade.
Install Splunk Enterprise instances, 9.2 or higher, and configure them as agent management servers in the usual way. If you are incorporating an existing standalone agent management, upgrade it to 9.2 or higher.
On each agent management, add the configuration syncMode = sharedDir to the serverclass.conf file. This setting indicates that the agent management is part of a cluster and will be sharing the app bundle and client_events directories, as well as the set of server classes.
On each agent management, set up mounts to the directories on the shared drive.
Note: If incorporating an existing agent management, first confirm that the $SPLUNK_HOME/var/log/client_events directory exists locally on the agent management. If it doesn't, then create it before mounting the corresponding shared directory.
If incorporating an existing agent management:
- Move the backed-up deployment-apps directory to the deployment apps directory on the shared drive.
- Run reload on the existing agent management. This step is necessary in order to share its serverclass.conf file across all agent management servers.

Choose load balancer or DNS mapping

For efficient use of the agent management cluster, insert a third-party load balancer or DNS record between the agents and the agent management servers. A load balancer is preferred because it can allow you to configure sticky sessions. It is recommended that you choose a load balancer that allows for sticky sessions and supports the REST-based health check API, described in the REST API Reference Manual: cluster/manager/ha_active_status.

If you want the agent to tap into the pool of agent management servers, rather than always connecting to the same agent management, you must update the agents' configurations to point to the load balancer or DNS record instead of directly to a agent management. You do so on each agent, by updating its targetUri setting under the [target-broker:deploymentServer] stanza in deploymentclient.conf and restarting the agent. See Specify the agent management.

You can update the targetUri setting on the agents over time. In the meantime, each agent will interact directly with its configured agent management. Any agent interactions will be recorded in the shared log directory, so that the information is available to all agent management servers.

Add server classes

You can add or edit server classes in the usual way, locally on any agent management. After making your server class changes, you must run reload on the agent management to make the changes available to all agent management servers.

Note: If you use the agent management interface to make the server class changes, the interface automatically triggers the reload when you save your changes.

Following the reload, the updated serverclass.conf file will be uploaded to _splunk_ds_info in the shared drive's deployment apps directory. When other agent management servers poll for changes, they will find the updated file and use it to overwrite their local serverclass.conf files. It can take up to 60 seconds for the agent management servers to sync server class updates.

Customize settings

You can change the settings in the limits.conf file, located in the etc/system/local directory, to adjust the search commands limitations to your needs.

If your environment has more than 50K forwarders, you can increase the sub-search result limit to be greater than the number of forwarders that you have. To increase this number, change the following setting:

# limits.conf in etc/system/local
[searchresults]
maxresultrows = <Default 50000>
[join]
Subsearch_maxout = <Default 50000>