Perform a rolling upgrade of a search head cluster

Splunk Enterprise version 7.1.0 and higher supports rolling upgrade for search head clusters. A rolling upgrade performs a phased upgrade of cluster members with minimal interruption to your ongoing searches. You can use a rolling upgrade to minimize search disruption when upgrading cluster members to a new version of Splunk Enterprise.

Requirements and considerations

Review the following requirements and considerations before you initiate a rolling upgrade:

Rolling upgrade only applies to upgrades from version 7.1.x to higher versions of Splunk Enterprise.
All search head cluster members, indexer cluster manager node, and indexer cluster peer nodes must be running version 7.1.0 or higher.
When performing a rolling upgrade to Splunk Enterprise version 9.0 or higher, you must manually migrate the KV store to the WiredTiger storage engine and server version 2.0, if you have not already done so. For detailed instructions, see Migrate the KV store in a clustered deployment.
Do not attempt any clustering maintenance operations, such as rolling restart, bundle pushes, or node additions, during a rolling upgrade.

Note: Hardware or network failures that prevent node shutdown or restart might require manual intervention.

How a rolling upgrade works

When you initiate a rolling upgrade, you select a cluster member and put that member into manual detention. While in manual detention, the member cannot accept new search jobs, and all in-progress searches try to complete within a configurable timeout. When all searches are complete, you perform the software upgrade and bring the member back online. You repeat this process for each cluster member until the rolling upgrade is complete.

A rolling upgrade behaves in the following ways:

Cluster members are upgraded one at a time.
While in manual detention, the following applies to a cluster member:
- The cluster member cannot receive new searches, execute ad hoc searches, or receive new search artifacts from other members.
- The cluster member continues to participate in most cluster operations, such as captain election and automatic configuration replication.
- New scheduled searches are executed on other members.
The cluster member waits for in-progress searches to complete, up to a maximum time set by the user. The default of 180 seconds is enough time for the majority of searches to complete in most cases.
Rolling upgrades apply to both historical and real-time searches.

Perform a rolling upgrade

To upgrade a search head cluster with minimal search interruption, perform the following steps:

1. Run preliminary health checks

On any cluster member, run the splunk show shcluster-status command using the verbose option to confirm that the cluster is in a healthy state before you begin the upgrade:

splunk show shcluster-status --verbose

Here is an example of the output from the command:

Captain:
		decommission_search_jobs_wait_secs : 180
		               dynamic_captain : 1
		               elected_captain : Tue Mar  6 23:35:52 2018
		                            id : FEC6F789-8C30-4174-BF28-674CE4E4FAE2
		              initialized_flag : 1
                    kvstore_maintenance_status : enabled
		                         label : sh3
		 max_failures_to_keep_majority : 1
		                      mgmt_uri : https://sroback180306192122accme_sh3_1:8089
		         min_peers_joined_flag : 1
		               rolling_restart : restart
		          rolling_restart_flag : 0
		          rolling_upgrade_flag : 0
		            service_ready_flag : 1
		                stable_captain : 1

 Cluster Manager(s):
	https://sroback180306192122accme_manager1_1:8089		splunk_version: 7.1.0

 Members:
	sh3
                                kvstore_status : maintenance
		                         label : sh3
		              manual_detention : off
		                      mgmt_uri : https://sroback180306192122accme_sh3_1:8089
		                mgmt_uri_alias : https://10.0.181.9:8089
		              out_of_sync_node : 0
		             preferred_captain : 1
		              restart_required : 0
		                splunk_version : 7.1.0
		                        status : Up
	sh2
                                kvstore_status : maintenance
		                         label : sh2
		         last_conf_replication : Wed Mar  7 05:30:09 2018
		              manual_detention : off
		                      mgmt_uri : https://sroback180306192122accme_sh2_1:8089
		                mgmt_uri_alias : https://10.0.181.4:8089
		              out_of_sync_node : 0
		             preferred_captain : 1
		              restart_required : 0
		                splunk_version : 7.1.0
		                        status : Up
	sh1
                                kvstore_status : maintenance
		                         label : sh1
		         last_conf_replication : Wed Mar  7 05:30:09 2018
		              manual_detention : off
		                      mgmt_uri : https://sroback180306192122accme_sh1_1:8089
		                mgmt_uri_alias : https://10.0.181.2:8089
		              out_of_sync_node : 0
		             preferred_captain : 1
		              restart_required : 0
		                splunk_version : 7.1.0
		                        status : Up

The output shows a stable, dynamically elected captain, enough members to support the replication factor, no out-of-sync nodes, and all members running a compatible Splunk Enterprise version (7.1.0 or higher). This indicates that the cluster is in a healthy state to perform a rolling upgrade.

For information on health check criteria, see Health check output details.

Note: Health checks do not cover all potential cluster health issues. Checks apply only to the criteria listed.

Or, send a GET request to the following endpoint to monitor cluster health:

/services/shcluster/status?advanced=1

For endpoint details, see shcluster/status in the REST API Reference Manual.

CAUTION: Based on the health check results, either fix any issues impacting cluster health or proceed with caution and continue the upgrade.

2. Initialize rolling upgrade

To initialize the rolling upgrade, run the following CLI command on any cluster member:

splunk upgrade-init shcluster-members

Or, send a POST request to the following endpoint:

/services/shcluster/captain/control/control/upgrade-init

For endpoint details, see shcluster/captain/control/control/upgrade-init in the REST API Reference Manual.

3. Put a member into manual detention mode

Select a search head cluster member other than the captain and put that member into manual detention mode:

splunk edit shcluster-config -manual_detention on

Note: The first upgraded member is elected captain when that member restarts after upgrade. This captaincy transfer occurs only once during a rolling upgrade.

Or, send a POST request to the following endpoint:

servicesNS/admin/search/shcluster/member/control/control/set_manual_detention \
-d manual_detention=on

For endpoint details, see shcluster/member/control/control/set_manual_detention in the REST API Reference Manual.

For more information on manual detention mode, see Put a search head into detention.

4. Confirm the member is ready for upgrade

Run the following command to confirm that all searches are complete:

splunk list shcluster-member-info | grep "active"

The following output indicates that all historical and real-time searches are complete:

active_historical_search_count:0
active_realtime_search_count:0

Or send a GET request to the following endpoint:

/services/shcluster/member/info

For endpoint details, see shcluster/member/info in the REST API Reference Manual.

5. Upgrade the member

Upgrade the search head following the standard Splunk Enterprise upgrade procedure. See How to upgrade Splunk Enterprise in the Installation Manual.

6. Bring the member back online

Run following command on the cluster member:
```
splunk start
```
Turn off manual detention mode:
```
splunk edit shcluster-config -manual_detention off
```
Or, send a POST request to the following endpoint:
```
servicesNS/admin/search/shcluster/member/control/control/set_manual_detention \
-d manual_detention=off
```
For endpoint details, see shcluster/member/control/control/set_manual_detention in the REST API Reference Manual.

7. Check cluster health status

After you bring the member back online, check that the cluster is in a healthy state.

Run the following command on the cluster member:

splunk show shcluster-status --verbose

Or, use this endpoint to monitor cluster health:

/services/shcluster/status?advanced=1

For endpoint details, see shcluster/status in the REST API Reference Manual.

For information on what determines a healthy search head cluster, see Health check output details.

8. Repeat steps 3-7 for all members

Repeat steps 3-7 until you have upgraded all cluster members.

9. Upgrade the deployer

Make sure that you upgrade the deployer right after you upgrade the cluster members. The deployer must run the same version as the cluster members, down to the minor level. For example, if members are running 7.1.1, the deployer must run 7.1.x.

To upgrade the deployer, do the following:

Stop the deployer.
Upgrade the deployer, following standard Splunk Enterprise upgrade procedure. See How to upgrade Splunk Enterprise in the Installation Manual.
Start the deployer.

For more information on the deployer, see Deployer requirements.

10. Finalize the rolling upgrade

Run the following CLI command on any search head cluster member.

splunk upgrade-finalize shcluster-members

Or, send a POST request to the following endpoint:

/services/shcluster/captain/control/control/upgrade-finalize

For endpoint details, see shcluster/captain/control/control/upgrade-finalize in the REST API Reference Manual.

Example upgrade automation script

Version 7.1.0 and higher includes an example automation script (shc_upgrade_template.py) that you can use as the basis for automating the search head cluster rolling upgrade process. Modify this template script based on your deployment.

shc_upgrade_template.py is located in SPLUNK_HOME/bin and includes detailed usage and workflow information.

CAUTION: shc_upgrade_template.py is an example script only. Do not apply the script to a production instance without editing it to suit your environment and testing it extensively.