Troubleshoot performance issues
Troubleshoot performance issues that occur in agent management.
Performance issues
There is an issue with slow responses from the endpoint deployment/server/clients
. This may cause a gradual decrease in performance. The following symptoms indicate the performance issue:
- The response from the endpoint
deployment/server/clients
takes more than 30 seconds. - The Agent Management home page takes more than 30 seconds to load.
- Searching by hostname or app takes more than 30 seconds to complete.
- The list of matched agents for server class takes more than 30 seconds to load.
Solutions
Apply workarounds to reduce the performance issue. You can use any of the proposed workarounds individually or combine them.
Reload the agent management periodically
Over time, the solution may slow down. To keep it running smoothly, reload the agent management periodically. Reload the agent management to fix the performance issue. Each time you reload the agent management, the performance should be improved. Reload the agent management when the performance becomes unacceptable.
To reload the agent management, on the agent management machine use the splunk reload deploy-server
command.
You can achieve similar results by restarting the whole Splunk instance with the splunk restart
command. It restarts all the other Splunk subsystems and therefore it is not recommended.
Change the index retention period
By default, the index retention period of _dsphonehome
index is 7 days. If you have a lot of clients that phone home frequently, the data gathered during the 7 day period is too big and will cause decrease in performance. To improve performance, you can change the limits of retention time period of the total phone home data stored.
Index retention period for the _dsphonehome
index is defined by the frozenTimePeriodInSecs
key in the [_dsphonehome]
stanza in the $SPLUNK_HOME/etc/apps/SplunkDeploymentServerConfig/default/indexes.conf
file. You can overwrite this value. Note that the retention time period is the minimal time the data will be available in the index. The data might stay there for some time after the retention time period, depending on the index bucket contents.
To change the index retention period, follow the steps:
- Create the
$SPLUNK_HOME/etc/apps/SplunkDeploymentServerConfig/local/indexes.conf
file on the agent management. If this file already exists, go to the next step. - Modify the
frozenTimePeriodInSecs
key in the [_dsphonehome] stanza in the$SPLUNK_HOME/etc/apps/SplunkDeploymentServerConfig/local/indexes.conf
file. - Restart the agent management using the
splunk restart
command.
For example, to modify the retention period to 3 days, insert the following stanza in the $SPLUNK_HOME/etc/apps/SplunkDeploymentServerConfig/local/indexes.conf
file:
[_dsphonehome]
frozenTimePeriodInSecs = 259200
259200 = 3(days) * 86400(seconds in a day)
If you change the index retention period, data from the offline clients will stop appearing after the new retention period. For example, if you want to see offline clients for up to 3 days, the retention time period cannot be lower than 3 days.
Change the phone home interval on every agent
You can change the value of the phoneHomeIntervalInSecs
key in the $SPLUNK_HOME/etc/system/local/deploymentclient.conf
file on an agent to affect how fast the application changes are propagated to the agents. Application changes are sent to agents when a phone home is received.
The phone home interval is the time interval, in seconds, at which an agent contacts the agent management to check for configuration updates.
phoneHomeIntervalInSecs
value, the configuration change (such as an application change) takes up to that amount of time to apply.Higher phone home frequency can affect the performance. Increasing the value of the phoneHomeIntervalInSecs
key helps to mitigate the issue.
phoneHomeIntervalInSecs
, follow the steps:
- Calculate the
phoneHomeIntervalInSecs
value that you need. To calculate thephoneHomeIntervalInSecs
value, use the following formula:NewPhoneHomeIntervalInSecs=([previousPhoneHomeIntervalInSecs]*[frozenTimePeriodInSecs])/([hours]*3600)
Where:[hours]
is a number of hours during which the performance is still acceptable. For example, if the performance is acceptable for the first two days,[hours]=48
.[previousPhoneHomeIntervalInSecs]
is the phone home interval value that will be replaced. The assumption is that thephoneHomeIntervalInSecs
is the same across all agents. If that is not the case, use average or mode for the calculations.[frozenTimePeriodInSecs]
is the value of thefrozenTimePeriodInSecs
filed under the[_dsphonehome]
stanza in the$SPLUNK_HOME/etc/apps/SplunkDeploymentServerConfig/local/indexes.conf
file (or if this does not exist, the default value:604800
). The value calculated with this formula is the lowestphoneHomeIntervalInSecs
value that enables the performance that would be acceptable to you. You can use a higher value. The higher the value, the better the performance. For example, if the value calculated with the formula is 120 but you are fine with 240s phone home interval, you can use 240 instead.
- On an agent, in the
$SPLUNK_HOME/etc/system/local/deploymentclient.conf
file, change the value of thephoneHomeIntervalInSecs
key.Note: You have to change this value on every agent in your environment. - For maximum effectiveness, change the
phoneHomeIntervalInSecs
value on all agents. Each agent with unchangedphoneHomeIntervalInSecs
contributes to bad performance. - Restart every modified agent using the
splunk restart
command.
The following is an example of a stanza with phoneHomeIntervalInSecs
set to 300:
[deployment-client]
disabled = 0
phoneHomeIntervalInSecs=300
phoneHomeIntervalInSecs
value.