Manage a High Availability Deployment

Set Up Monitoring for the HA Pair

You can set up monitoring for your HA pair by installing another Controller to act as the monitoring Controller.

If you do not already have an HA pair, set one up.
Install the monitoring Controller on the Enterprise Console host in a new platform by selecting Custom Install:
1. Create a platform (such as, Controller Monitor Platform).
  
  Warning: This platform should not be used for installing any other services.
2. Install a Controller.
3. Make sure to deselect the Install Events Service option before clicking Install.
Complete the monitoring setup by installing and configuring the App Agents and Machine Agents on your HA pair:
- Set Up App Agents for Monitoring
- Install and Set Up Machine Agents for Monitoring

Set Up App Agents for Monitoring

You can set up App Agents, which are automatically installed on the Controller hosts by the Enterprise Console, on both Controllers of an HA pair to report to the monitoring Controller. This can be done by updating the JVM options of your HA pair platform. To set up your App Agents using the Enterprise Console, perform the following steps:

Run the following command to SSH into the primary Controller box and update the controller-info.xml of the primary Controller App Agent:
```
cd <controller-install-dir>/appserver/jetty/appagent
cp conf/controller-info.xml ver<version#>/conf/
```
Repeat step 1 for the secondary Controller.
In the Enterprise Console UI, select your HA pair platform, and navigate to the JVM Options section by selecting Configurations > Controller Settings > Appserver Configurations.
Make the following updates to JVM Options:
1. Update the appdynamics.controller.hostName to the monitoring Controller IP.
2. Add the following required jvm-options in the jvm config block for monitoring:
  -Dappdynamics.agent.applicationName=<app_name>, -Dappdynamics.agent.tierName=<tier_name>, -Dappdynamics.agent.nodeName=<node_name>, -Dappdynamics.agent.accountName=<account_name>, -Dappdynamics.agent.accountAccessKey=<access_key>
  Note: You can get the access key from the Controller UI by navigating to Settings > License > Account. When you log in to the Controller, use the account specified in appdynamics.agent.accountName.
Click Save. The job applies to these properties and restarts both the primary and secondary Controllers.
In the Enterprise Console UI, select your Controller Monitor Platform, and navigate to the Controller page.
Click External URL on the widget to open the UI of the monitoring Controller.
Log in to the Controller. You should be able to see the monitoring application for both the primary and secondary Controllers.

Install and Set Up Machine Agents for Monitoring

You must install Machine Agents on both Controllers of an HA pair to report to the monitoring Controller. These agents are Java programs that collect hardware metrics. To install and set up your machine agents, perform the following steps:

Install the Machine Agent on the primary Controller box. Do not start the agent.
Repeat step 1 for the secondary Controller.
Configure the Machine Agent properties for both Machine Agents by editing the controller-info-xml file located in the <machine_agent_home>/conf directory.
1. Update the <controller-host> to the monitoring Controller's IP.
2. Model the rest of your controller-info-xml file.
Start both Machine Agents.
In the Enterprise Console UI, select your Controller Monitor Platform, and navigate to the Controller page.
Click on External URL on the widget to open the UI of the monitoring Controller.
Log in to the Controller. You should be able to see the monitoring application for both the primary and secondary Controllers.

Bouncing the Primary Controller Without Triggering Failover

The Enterprise Console does not allow you to stop and start the primary Controller without initiating failover. To workaround this, you will need to perform the following steps:

Log in to the Enterprise Console and navigate to the Appserver Configurations page by selecting Configurations, followed by the Controller Settings.
Deselect Enable Auto Failover and click Save.
SSH to the Controller machine where the Controller is installed.
Run the following commands on the Enterprise Console host:This will bounce the primary Controller in HA mode.
```
bin/platform-admin.sh stop-controller-appserver
bin/platform-admin.sh start-controller-appserver
```
Re-enable auto failover on the Enterprise Console Appserver Configurations page.

Starting and Stopping the Controller

The Enterprise Console does not allow you to shut down the primary Controller. However, you can restart the secondary Controller via the start and stop Controller commands.

To start or stop the Controller manually, use the following commands:

To start:

bin/platform-admin.sh start-controller-appserver --with-db

To stop:

bin/platform-admin.sh stop-controller-appserver --with-db

Automatic Failover

Enterprise Console includes the atchdog High Availability (HA) module which utilizes the Controller Watchdog for auto-failover. If you want to enable or disable the auto-failover, then the watchdog script needs to be running or stopped respectively.

You can also disable or enable automatic failover through the CLI.

To disable and enable the Controller Watchdog with CLI, use the following commands:

To stop the Controller Watchdog:

./platform-admin.sh submit-job --job stop-controller-watchdog --service controller

To start the Controller Watchdog:

./platform-admin.sh submit-job --job start-controller-watchdog --service controller

Performing a Manual Failover and Failback

To failover from the primary to the secondary manually, click the HA Failover option on the Controller page of the Enterprise Console or run the following command on the Enterprise Console host:

bin/platform-admin.sh submit-job --service controller --job ha-failover --platform-name <name_of_the_platform>

This changes the Appserver on the secondary as primary and database on the secondary as the replication master. It also changes the old primary to secondary.

The process for performing a failback to the old primary is the same as failing over to the secondary. You can run the following command on the Enterprise Console host:

bin/platform-admin.sh submit-job --service controller --job ha-failover --platform-name <name_of_the_platform>

Note: If the database has been down for more than seven days, you need to revive the database, as mentioned in the following section.

Initiate Controller Database Incremental Replication

Re-enable Broken Replication

Incremental replication, replication via rsync when the primary database is up, is required in cases where the database replication on the secondary Controller is lagging behind the primary Controller by more than three days. This type of replication allows the primary Controller to keep operating while the disk contents are copied to the secondary node.

To initiate incremental replication:

Run the following command on the Enterprise Console host:This launches a continuously running background job.
```
bin/platform-admin.sh submit-job --service controller --job incremental-replication
```
This launches a continuously running background job.
Make sure replication occurs four or more times by running either one of the following commands:
1. CODE
  cd <controller_home>/controller-ha ./ha_replicate.sh -r status
```
cd <controller_home>/controller-ha
./ha_replicate.sh -r status
```
2. CODE
  cd <controller_home>/controller-ha/tmp cat replication.status
```
cd <controller_home>/controller-ha/tmp
cat replication.status
```
Note: If replication fails, go to the secondary host and stop all rsync and ha-replicate.sh processes. Then try running the incremental-replication job again.
Finalize the job by running the following command on the Enterprise Console host:This stops the incremental replication loop. The command will restart the primary Controller, resulting in downtime.
```
bin/platform-admin.sh submit-job --service controller --job finalize-replication
```
This stops the incremental replication loop. The command will restart the primary Controller, resulting in downtime.
Make sure replication is working by checking that there is no significant gap between the primary and secondary Controllers. You can run the following command on the Enterprise Console host to check the replication status:It may take a few minutes to display the secondary status.
```
bin/platform-admin.sh show-service-status --platform-name <platform_name> --service controller
```
It may take a few minutes to display the secondary status.

Add a Secondary Controller Using Incremental Replication

You can convert a single Controller with a large amount of data to an HA pair by using incremental replication. This way, you can rsync most of the Controller data while the Controller is still running, limiting the downtime of adding a secondary Controller.

To add a secondary Controller using incremental replication:

Start the incremental replication, giving host and rsync parameters:

bin/platform-admin.sh submit-job --service controller --job incremental-replication --args controllerSecondaryHost=1.1.1.1 rsyncThrottle=40000 rsyncCompress=true

This launches a continuously running background job.

Make sure replication occurs four or more times, by checking <controller_home>/controller-ha/tmp/replication.status on the primary database host.Sample rsync status file output:
```
rsync started at Mon Mar  5 11:49:56 PST 2018
rsync completed at Mon Mar  5 11:50:56 PST 2018
rsync started at Mon Mar  5 11:51:01 PST 2018
rsync completed at Mon Mar  5 11:51:11 PST 2018
```
Note: If replication fails, go to the secondary host and stop all rsync and ha-replicate.sh processes. Then try running the incremental-replication job again.
Run the add secondary job. The Enterprise Console will perform a final rsync and add the secondary job.
```
bin/platform-admin.sh submit-job --service controller --job add-secondary --args controllerSecondaryHost=secondary mysqlRootPassword=‘password'
```
The command will restart the primary Controller, resulting in downtime.
Note: Until you trigger the add-secondary command, the secondary Controller is not added to the Enterprise Console platform. Therefore, the Enterprise Console will not be able to perform any other operations on the secondary Controller.

If you need to stop replication, you can run the following command:

bin/platform-admin.sh submit-job --service controller --job stop-incremental-replication

Set Replication Factors for Rsync Threads

Using the Enterprise Console UI or the CLI, you can set the number of parallel rsync threads as a job parameter when you perform incremental or finalize replication.

From the Enterprise Console UI:
1. Log in to the Enterprise Console and access the Controller page.
2. From the More menu, based on which replication you are performing, select either Incremental Replication or Finalize Replication.
3. Enter a number in the Number of parallel rsync threads field and click Submit. The default value is 1.

From the CLI, based on which replication you are performing, run either of the following commands from the Enterprise Console host and set the numberThreadForRsync argument.
CODE
bin/platform-admin.sh submit-job --job incremental-replication --args numberThreadForRsync=<number> bin/platform-admin.sh submit-job --job finalize-replication --args numberThreadForRsync=<number>
```
bin/platform-admin.sh submit-job --job incremental-replication --args numberThreadForRsync=<number> bin/platform-admin.sh submit-job --job finalize-replication --args numberThreadForRsync=<number>
```

Enable MySQL Parallel Replication

Using the Enterprise Console UI or the CLI, you can enable MySQL (available from MySQL 5.7) parallel replication when you perform finalize replication.

From the Enterprise Console UI:
1. Log in to the Enterprise Console and access the Controller page.
2. From the More menu, select Finalize Replication.
3. Select the Database parallel replication check box to enable parallel replication with the MySQL database.
4. Click Submit.
From the CLI, run the following command from the Enterprise Console host to enable MySQL parallel replication. The default value is true.
CODE
bin/platform-admin.sh submit-job --job finalize-replication --args dbParallelReplication=true
```
bin/platform-admin.sh submit-job --job finalize-replication --args dbParallelReplication=true
```

Troubleshooting the Incremental Replication Status

If your first incremental replication run is taking longer than usual, you can check the replication status by executing either one of the below commands:

CODE
cd <controller_home>/controller-ha ./ha_replicate.sh -r status
```
cd <controller_home>/controller-ha
./ha_replicate.sh -r status
```
CODE
cd <controller_home>/controller-ha/tmp cat replication.status
```
cd <controller_home>/controller-ha/tmp
cat replication.status
```

Re-enable Controller Database Replication

The Controller databases can be synchronized using the replicate script if they have been out of sync for more than seven days. Synchronizing a database that is more than seven days behind a master is considered reviving a Controller database. Reviving a database involves the same procedure as adding a new secondary Controller to an existing production Controller, as described in Set Up the Secondary Controller and Initiate Replication. You can also follow these steps in the case of an HA failover that failed at replication.

To re-enable replication or revive a Controller database:

On the Controller page, click Remove Controller, or run the following command on the Enterprise Console host:
```
bin/platform-admin.sh submit-job --job remove --service controller
```
Enter the database root credentials.

Check Remove Binaries, or run the following command on the Enterprise Console host:

bin/platform-admin.sh submit-job --job remove --service controller --args removeBinaries=true

Uncheck Remove Controller Cluster. If it is already unchecked, remove the secondary server.
Click Submit.
The command will restart the primary Controller, resulting in downtime. Add a secondary controller from the Controller page, or run the following command on the Enterprise Console host:
```
bin/platform-admin.sh submit-job --service controller --job add-secondary --args controllerSecondaryHost=secondary mysqlRootPassword=‘password'
```

The Enterprise Console will onboard the secondary Controller and re-enable replication.

Backing Up and Restoring Controller Data in an HA Pair

An HA deployment makes backing up Controller data relatively straightforward since the secondary Controller offers a complete set of production data on which you can perform a cold backup without disrupting the primary Controller service.

After setting up HA, perform a back up by stopping the Controller on the Enterprise Console and performing a file-level copy of the Splunk AppDynamics home directory (i.e., a cold backup). When finished, simply restart the Controller from the Enterprise Console. The secondary will then catch up its data to the primary.

When restoring the database from a back up in an HA or standalone environment, you should check that the primary and secondary server ha.type and ha.mode are set to active and passive, respectively.

Updating the Configuration in an HA Pair

The Enterprise Console will copy any file-level configuration customizations made on the primary controller to the secondary controller, such as changes in the Jetty XML files and db.cnf

Over time, if you need to make modifications to the Controller configuration, always do those changes in the Enterprise Console on the Controller Settings page under Configurations. These changes will be preserved during upgrades. Any changes made outside the Enterprise Console will not be preserved after upgrade.

Troubleshooting HA

Controller Diagnostic Data

The Enterprise Console writes log messages pertaining to HA to the platform-admin-server.log on the Enterprise Console host.

To diagnose the Controller, run the following command:

bin/platform-admin.sh submit-job --platform-name <name_of_the_platform> --job diagnosis --service controller

Refer to the Controller diagnostic data in the platform-admin-server.log.

Sample Controller diagnostic data

Linux

CODE

Controller diagnostic data:
123.45.0.1:
controller_database: running
controller_appserver: running
reports_service: running
operating_system: Linux
controller_version: 004-004-001-000
controller_performance_profile: small
controller_ha_type: primary
controller_appserver_mode: active
controller_metric_data_per_min: N/A
slave_io_state: Waiting for master to send event
seconds_behind_master: 0
master_server_id: 567.
master_host: controller-secondary
master_ssl_allowed: No
123.45.0.2:
controller_database: running
controller_appserver: not running
reports_service: running
operating_system: Linux
controller_version: 004-004-001-000
controller_performance_profile: small
controller_ha_type: secondary
controller_appserver_mode: passive

Controller diagnostic data:
123.45.0.1:
controller_database: running
controller_appserver: running
reports_service: running
operating_system: Linux
controller_version: 004-004-001-000
controller_performance_profile: small
controller_ha_type: primary
controller_appserver_mode: active
controller_metric_data_per_min: N/A
slave_io_state: Waiting for master to send event
seconds_behind_master: 0
master_server_id: 567.
master_host: controller-secondary
master_ssl_allowed: No
123.45.0.2:
controller_database: running
controller_appserver: not running
reports_service: running
operating_system: Linux
controller_version: 004-004-001-000
controller_performance_profile: small
controller_ha_type: secondary
controller_appserver_mode: passive

Invalid HA Controller Roles

If your HA Controller roles in the Controller databases are incorrect, the Enterprise Console will prevent discover and upgrade jobs. An invalid HA Controller state is when both of your Controller role types are identical, such as in a primary/primary or secondary/secondary case.

To fix this issue:

Identify which server is the primary.
1. Log in to one of the Controller databases by running the following command in the Controller installation directory:
```
bin/controller.sh login-db
```
2. Run the following command:
```
select * from global_configuration_local where name=‘ha.controller.type’;
```
Ensure that ha.controller.type is set correctly in the database.
1. Log in to the Controller database you would like to change by running the following command in the Controller installation directory:
```
bin/controller.sh login-db
```
2. Run the following commands to set the database to the primary or secondary:
  Primary
  CODE
  use controller; update global_configuration_local set value=‘primary’ where name=‘ha.controller.type’; update global_configuration_local set value=‘active’ where name=‘appserver.mode’;
  use controller; update global_configuration_local set value=‘primary’ where name=‘ha.controller.type’; update global_configuration_local set value=‘active’ where name=‘appserver.mode’;
  Secondary
  CODE
  use controller: update global_configuration_local set value=‘secondary’ where name=‘ha.controller.type’; update global_configuration_local set value=‘passive’ where name=‘appserver.mode’;
  use controller: update global_configuration_local set value=‘secondary’ where name=‘ha.controller.type’; update global_configuration_local set value=‘passive’ where name=‘appserver.mode’;
Restart the database for the change to take effect on the Appserver:
```
bin/platform-admin.sh stop-controller-appserver --with-db
bin/platform-admin.sh start-controller-appserver --with-db
```
If the secondary Appserver is already in a shutdown state, then there is no need to restart the database.
Verify the replication is healthy:
```
show slave status\G
```
Slave_IO_Running and Slave_SQL_Running should show Yes.

You may now retry the discover and upgrade job.

Failover Prevention

If failover is prevented on your Controller HA configuration, it may be due to one of two scenarios:

The secondary database is down. Failover cannot occur when the secondary database is not running.To fix this issue, restart the secondary database by running the following command on the secondary host:
```
bin/controller.sh start-db
```
If this does not enable failover, then it may be due to the second scenario.
Database replication is not healthy. Failover is not allowed when the database replication is not healthy.There are various reasons why this may be the case. Contact customer support to correct the issue.

AppDynamics On-Premises

Set Up Monitoring for the HA Pair

Set Up App Agents for Monitoring

Install and Set Up Machine Agents for Monitoring

Bouncing the Primary Controller Without Triggering Failover

Starting and Stopping the Controller

Automatic Failover

Performing a Manual Failover and Failback

Initiate Controller Database Incremental Replication

Re-enable Broken Replication

Add a Secondary Controller Using Incremental Replication

Set Replication Factors for Rsync Threads

Enable MySQL Parallel Replication

Troubleshooting the Incremental Replication Status

Re-enable Controller Database Replication

Backing Up and Restoring Controller Data in an HA Pair

Updating the Configuration in an HA Pair

Troubleshooting HA

Controller Diagnostic Data

Sample Controller diagnostic data

Invalid HA Controller Roles

Failover Prevention

ON THIS PAGE

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

Enterprise Security

SOAR

IT Service Intelligence

Content Packs

Splunk Observability Cloud

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

Developer Documentation

Splunkbase

Splunk Enterprise

Splunk Cloud Platform

Splunkbase

DATA MANAGEMENT

SEARCH AND ANALYTICS

ADMINISTRATION

Enterprise Security

SOAR

ENTERPRISE SECURITY

SOAR

RELATED APPS

IT Service Intelligence

Content Packs

ITSI

IT Ops

ADMINISTRATION

EXTENSIONS

Splunk Observability Cloud

MONITORING

DATA MANAGEMENT

ADMINISTRATION

AppDynamics SaaS

AppDynamics On-Premises

SAP Agent

ESSENTIALS

MONITORING

ADMINISTRATION

Developer Documentation

Splunkbase

PLATFORM

OBSERVABILITY

REFERENCE

Resources

REFERENCE

Learn More

Support

Manage a High Availability Deployment

Set Up Monitoring for the HA Pair

Set Up App Agents for Monitoring

Install and Set Up Machine Agents for Monitoring

Bouncing the Primary Controller Without Triggering Failover

Starting and Stopping the Controller

Automatic Failover

Performing a Manual Failover and Failback

Initiate Controller Database Incremental Replication

Re-enable Broken Replication

Add a Secondary Controller Using Incremental Replication

Set Replication Factors for Rsync Threads

Enable MySQL Parallel Replication

Troubleshooting the Incremental Replication Status

Re-enable Controller Database Replication

Backing Up and Restoring Controller Data in an HA Pair

Updating the Configuration in an HA Pair

Troubleshooting HA

Controller Diagnostic Data

Sample Controller diagnostic data

Invalid HA Controller Roles

Failover Prevention