Architecture and performance considerations
When adding Splunk DB Connect to your deployment, take into account architecture and performance considerations. You can install and run Splunk DB Connect on Splunk Enterprise deployments ranging from a single host (indexer and Splunk Web both running on the same system) to a large distributed deployment (multiple search heads, search head clusters, indexers, load-balanced forwarders, and so on). Performance considerations and expectations vary based on your deployment and capacity requirements.
Database performance considerations
If Splunk DB Connect retrieves a large amount of data from your database, it might affect your database performance, especially for the initial run. Subsequent runs of the same query might have less impact, as the database might cache results and only retrieve new data since the previous run of the query.
Performance considerations in distributed environments
To use Splunk DB Connect in a distributed search environment, including search head clusters, you must determine the planned use cases. For ad hoc, interactive usage of database connections by live users, install the app on search heads. For scheduled indexing from databases and output of data to databases, install the app on heavy forwarders.
When planning a large DB Connect deployment, the ideal configuration for your needs can depend on a number of factors, including:- Total number of Forwarders in the deployment, and the hardware specifications of each.
- Total expected data volume to transfer.
- Number of database inputs per Forwarder.
- Dataset size, per input, per interval.
- Execution Frequency, the interval length between a database input's separate executions.
- Fetch size (Not all JDBC drivers use this parameter for returning result sets).
Overloading the system can lead to data loss, so performance measurement and tuning can be critical. Use performance expections as the reference to plan your deployment, and monitor expected data returns for loss conditions.
Performance expectations
This section provides measured throughput data achieved under certain operating conditions. Use the information here as a basis for estimating and optimizing the DB Connect throughput performance in your own production environment. As performance might vary based on user characteristics, application usage, server configurations, and other factors, Splunk can't guarantee specific performance results.
Splunk produced the performance data in the following table with the following test bed and DB Connect configuration (Increasing cores or RAM might improve scaling characteristics):- Server: 8-core 2.60GHz CPU, 16GB RAM, 1Gb Ethernet NIC, 64bit Linux
- JVM config: MaxHeapSize = 4GB. (For more information about the JVM memory setting, see "Performance tuning advice".)
- Data Source: Oracle 11g
Inputs
- Number of inputs: 1600
- Data payload (per input execution) : 250KB
- Duration = 45 minutes
- Interval: 1 minute
total data volume = data payload * duration / interval * number of inputs = 17.5 GB
Queries
Rows in data set | 100 | 1,000 | 10,000 | 100,000 | 1,000,000 |
---|---|---|---|---|---|
DB Connect 3 | 1.2 seconds | 1.3 seconds | 1.6 seconds | 4.1 seconds | 22.9 seconds |
DB Connect 2Example data | 1.4 seconds | 1.5 seconds | 2.4 seconds | 11.4 seconds | 103.5 seconds |
Lookups
Rows in data set | 100 | 10,000 | 100,000 |
---|---|---|---|
DB Connect 3 | 1.2 seconds | 2.8 seconds | 36.0 seconds |
DB Connect 2 | 0.2 seconds | 4.3 seconds | 70.0 seconds |
Outputs
Rows in data set | 100 | 1,000 | 10,000 | 100,000 | 1,000,000 |
---|---|---|---|---|---|
DB Connect 3 | 2.1 seconds | 1.9 seconds | 3.0 seconds | 9.1 seconds | 67.2 seconds |
DB Connect 2 | 1.0 seconds | 1.5 seconds | 10.0 seconds | 83.9 seconds | 644.0 seconds |
Performance nonfactors
During testing, varying the following factors had a negligible effect on performance:
- There was no discernable performance difference between running in batch mode (all events processed) and running in rising column mode (only the new events processed) with the same dataset.
- The number of defined database connections does not limit performance. The number of connections is different from the number of database inputs.
How to scale Splunk DB Connect to support medium and high workloads
Since version 3.10.0
we use Splunk Modular Inputs, so the input execution is triggered by calling a REST API, then it is processed using Java concurrent features.
Bottleneck
HTTP Connection Pool: currently it allows 1024
concurrent requests, and the queue has size equal to 1024
. So only 1024
inputs can run in parallel and 1024
waiting, if more are coming they are refused.
Java Thread Pool Executor: currently it allows 32
concurrent executions, and the queue has size equals to 128
. So only 32
inputs can be executed in parallel and 128
waiting, if more are coming they are rejected.
Configurations
HTTP Connection Pool
It is specified in $SPLUNK_HOME/etc/apps/splunk_app_db_connect/config/dbx_task_server.yml
, as part of server attributes.
Example:
server:
minThreads: 128
maxThreads: 1256
maxQueuedRequests: 1256
Java Thread Pool Executor
It is specified in $SPLUNK_HOME/etc/apps/splunk_app_db_connect/config/dbx_task_server.yml
, as root attributes (at the end of the file).
JDBC Connection Pool
It is specified in $SPLUNK_HOME/etc/apps/splunk_app_db_connect/local/db_connections.conf
. It can be defined for all connections if they are present under [default]
stanza or individually for each connection stanza.
Java Heap Memory Size
It is specified in Splunk DB Connect > Configuration > Settings > General > Task Server JVM Options
.
Example: -Xms8g -Xmx8g
Note: the amount of memory that will be allocated to the Java Heap will depend on the amount of available memory you have. Allocate between 30
to 50
percent.
Adjusting the Configuration
HTTP Connection Pool | Java Thread Pool Executor | JDBC Connection Pool | Heap Memory Size | |
---|---|---|---|---|
Low Workload (Default). From 0 to 500 inputs.
|
- | - | - | - |
Medium Workload. From 500 to 1500 inputs.
|
|
|
|
30 percent of the available memory
|
High Workload. From 1500 to 500 inputs.
|
|
|
|
50 percent of the available memory
|
Benchmarking & Scalability Metrics
The benchmarking focused on testing three DBX versions: 3.9.0, 3.10.0, and 3.17.2, while the scalability tests involved running DBX version 3.18.0 on three configurations (low, medium, and high) to assess their performance across different number of inputs (500, 1500, and 5000) running on a given environment. The goal of benchmarking and scalability tests was to measure DBX's performance in terms of CPU and memory utilisation, event ingestion rates, and input handling capacity across different environments. By identifying the best-performing configuration, we aim to determine the optimal specifications for running DBX at scale, ensuring high efficiency and reliability under heavy loads. DBX has been tested on m5.4xlarge instance type with following specifications:
- CPU: 16 vCPUs)
- Memory: 64 GiB
- OS: Ubuntu 22.04
- Storage: 200 GiB
Benchmarking Metrics (for 3.17.2)
Rising Input
DBX was configured to collect data from a database containing 10 million events. The test checked how fast DBX can ingest all events and what ingestion rate is.
Event size [B] | Total number of events ingested | Ingestion rate | Time to ingest all events [s] |
---|---|---|---|
75 | 10,000,000 | 48,544 | 206 |
512 | 10,000,000 | 39,370 | 254 |
2048 | 10,000,000 | 25,126 | 398 |
Batch Input
In this scenario, DBX ingested 2 million events at 1 second intervals. The test checked how many events could be collected over a 10 minute window.
Event size [B] | Ingestion rate | Time to ingest all events [s] | Total number of events ingested | Number of input executions |
---|---|---|---|---|
75 | 68071.31 | 617 | 42,000,000 | 21 |
512 | 50078.25 | 639 | 32,000,000 | 16 |
1024 | 40625.00 | 640 | 26,000,000 | 13 |
Multiple Inputs
The test involved creating multiple inputs (5, 50, 100) to collect events from a database. Each input ingested 10,000 rows every 5 seconds, measuring performance as more inputs were added. Inputs were working in raising more each input had to collect 1000000 of events.
Number of inputs | Ingestion rate | Time to ingest all events [s] | Total number of events ingested |
---|---|---|---|
5 | 20,243 | 247 | 5,000,000 |
50 | 92,937 | 269 | 25,000,000 |
100 | 112,108 | 446 | 50,000,000 |
=Conclusions=
- Smaller event sizes (75B) had higher ingestion rates but required more resources.
- Larger event sizes (2048B) showed slower ingestion rates but better CPU/memory performance.
- Degradation in performance from version 3.9.0 to 3.17.2, especially for smaller event sizes.
- CPU usage showed an increase across versions, with 3.17.2 showing the least CPU consumption but longest ingestion window.
Scalability Metrics (for 3.18.0)
Performance was tested with 500, 1500, and 5000 inputs on different configurations (low, medium, high). Inputs were working in batch mode collecting events from two mySQL databases.
500 Inputs
Configuration Spec | Time to create all inputs [s] | Time to remove inputs [s] | Total number of events collected | Ingestion rate during 10 minutes ingestion window |
---|---|---|---|---|
low | 1190.51 | 1136.63 | 316,210,000 | 105,187 |
medium | 1063.30 | 988.52 | 322,143,000 | 124,997 |
high | 1201.19 | 1129.26 | 317,332,000 | 105,432 |
1500 Inputs
Configuration Spec | Time to create all inputs [s] | Time to remove inputs [s] | Total number of events collected | Ingestion rate during 10 minutes ingestion window |
---|---|---|---|---|
low | x | x | 2,039,730,000 | x |
medium | 20239.31 | 18627.15 | 3,720,000,000 | 120,058 |
high | 27287.15 | 27215.46 | 5,650,000,000 | 106,881 |
Conclusions
- CPU & Memory Usage: Resource usage (both CPU and memory) was similar across all configurations, with medium spec providing the most efficient performance and reaching 100% resource utilisation for a higher number of simultaneously running inputs.
- Medium spec outperformed others, particularly in input creation/removal times and event ingestion rates, handling ~125k events per second which gives ~10-15% gain compared to other configurations
- High spec had slightly higher memory consumption and CPU usage but showed diminishing returns in ingestion rates.
Please note that having more CPUs will have beneficial effects on the performance resulting in more cores able to concurrently execute more inputs
Challenges
- For 5000 inputs, the tests failed due to high memory and CPU usage reaching 100%, causing Splunk and VM crashes.
- The low spec couldn't handle higher input loads beyond 982 inputs.
More performance help
If you are still experiencing performance issues, or want to receive feedback tailored to your setup, you have the following options:
- Post a request to the community on Splunk Answers.
- Contact Splunk Support.