Architecture and performance considerations

When adding Splunk DB Connect to your deployment, take into account architecture and performance considerations. You can install and run Splunk DB Connect on Splunk Enterprise deployments ranging from a single host (indexer and Splunk Web both running on the same system) to a large distributed deployment (multiple search heads, search head clusters, indexers, load-balanced forwarders, and so on). Performance considerations and expectations vary based on your deployment and capacity requirements.

Database performance considerations

If Splunk DB Connect retrieves a large amount of data from your database, it might affect your database performance, especially for the initial run. Subsequent runs of the same query might have less impact, as the database might cache results and only retrieve new data since the previous run of the query.

Performance considerations in distributed environments

To use Splunk DB Connect in a distributed search environment, including search head clusters, you must determine the planned use cases. For ad hoc, interactive usage of database connections by live users, install the app on search heads. For scheduled indexing from databases and output of data to databases, install the app on heavy forwarders.

When planning a large DB Connect deployment, the ideal configuration for your needs can depend on a number of factors, including:

Total number of Forwarders in the deployment, and the hardware specifications of each.
Total expected data volume to transfer.
Number of database inputs per Forwarder.
Dataset size, per input, per interval.
Execution Frequency, the interval length between a database input's separate executions.
Fetch size (Not all JDBC drivers use this parameter for returning result sets).

Overloading the system can lead to data loss, so performance measurement and tuning can be critical. Use performance expections as the reference to plan your deployment, and monitor expected data returns for loss conditions.

Performance expectations

This section provides measured throughput data achieved under certain operating conditions. Use the information here as a basis for estimating and optimizing the DB Connect throughput performance in your own production environment. As performance might vary based on user characteristics, application usage, server configurations, and other factors, Splunk can't guarantee specific performance results.

Splunk produced the performance data in the following table with the following test bed and DB Connect configuration (Increasing cores or RAM might improve scaling characteristics):

Server: 8-core 2.60GHz CPU with hyper-threading enabled, providing 16 virtual CPUs (vCPUs), 16GB RAM, 1Gb Ethernet NIC, 64bit Linux
JVM config: MaxHeapSize = 4GB. (For more information about the JVM memory setting, see "Performance tuning advice".)
Data Source: Oracle 11g

Inputs

Number of inputs: 1600
Data payload (per input execution) : 250KB
Duration = 45 minutes
Interval: 1 minute

total data volume = data payload * duration / interval * number of inputs = 17.5 GB

Note: Data payload per input execution is the same for different input modes (rising column and batch)

Queries


Rows in data set	100	1,000	10,000	100,000	1,000,000
DB Connect 3	1.2 seconds	1.3 seconds	1.6 seconds	4.1 seconds	22.9 seconds
DB Connect 2	1.4 seconds	1.5 seconds	2.4 seconds	11.4 seconds	103.5 seconds

Lookups


Rows in data set	100	10,000	100,000
DB Connect 3	1.2 seconds	2.8 seconds	36.0 seconds
DB Connect 2	0.2 seconds	4.3 seconds	70.0 seconds

Outputs


Rows in data set	100	1,000	10,000	100,000	1,000,000
DB Connect 3	2.1 seconds	1.9 seconds	3.0 seconds	9.1 seconds	67.2 seconds
DB Connect 2	1.0 seconds	1.5 seconds	10.0 seconds	83.9 seconds	644.0 seconds

General performance tuning considerations

While it's impossible to provide prescriptive advice for maximizing performance in every situation, the following observations and tips can help you tune and improve performance in your unique distributed deployment:

Only select columns if necessary. A table can contain many types of columns. When ingesting data from a database into DB Connect, you likely don't need all of them. Therefore, instead of using a SELECT * FROM ... clause to retrieve all the columns, select only what you need by using a SELECT columnNeeded1, columnNeeded2, ... FROM ... clause. More columns means more memory claimed by the task server; omit those unnecessary columns to make smarter use of your available memory. See SQL tips and tricks for more details.
Avoid reaching the 5MB/10MB limit. Large column sizes can cause DB Connect to potentially run out of memory and behave erratically, so DB Connect has a column size limit of 10MB for data columns that hold two-byte data types and 5MB for one-byte data types. Splunk truncates data for columns with data that exceeds these limits. If possible, trim the amount of data stored per column so that you avoid the DB Connect hard caps.
Adjust the fetch size based on your scenario. The Fetch Size input parameter specifies the number of rows returned at a time from a database, which defaults to 300 rows. A higher fetch size means Splunk receives more records per database request, so you can use fewer database requests to retrieve the same total number of records. This increases resource utilization on the database and in DB Connect, but can lead to performance improvements. Lowering the fetch size parameter can help prevent the Task Server from hitting its memory limit. If you receive out of memory errors when you increase the fetch size, you might need to increase the memory heap size from its default of 1/4 of system RAM.
Reduce the total number of database inputs. It can increase the amount of data that each input is taking in. This helps ensure that CPU cores have to handle fewer processes within a given window of time. Small datasets can be slower than large because of environment initialization.
Reduce the concurrency of scheduled database tasks. Shifting start times for scheduled tasks reduces choke points during which inputs and outputs have to share resources. For more information, see "Set parameters" in Create and manage database inputs.
Adjust batch_upload_size field. The batch_upload_size field defines the number of events sent to splunkd through HEC per request, which defaults to 1,000 records. A higher batch upload size means Splunk sends more records per HTTP post, so you can use fewer server transactions to index the same total number of records. This increases resource utilization on the Forwarder, but can lead to performance improvements. You can increase the batch_upload_size field under $SPLUNK_HOME/etc/apps/splunk_app_db_connect/local/db_inputs.conf to have better performance.
Specify sufficient hardware. In general, Splunk best practice is to use the same hardware specifications for DB Connect as requiredfor Splunk Enterprise. Increased hardware might be necessary for increased indexing loads.
Configure Java for performance. Current Java engines automatically reserves 25% of the machine's RAM when opening. If your JVM Options setting is as -Xmx1024m (which is the default value from DB Connect version 2.0 to 2.2). You can remove it and use the default JVM setting. For more information about changing JVM options, see "JVM Options" in Configure DB Connect Settings.
Configure Splunk for performance. Increase Splunkd's index queue size and number of Parallel Ingestion Pipelines to avoid concurrency limits.
Configure DB Connect for performance Set SchedulerThreadPoolSize to match the number of processor cores.

Performance nonfactors

During testing, varying the following factors had a negligible effect on performance:

There was no discernable performance difference between running in batch mode (all events processed) and running in rising column mode (only the new events processed) with the same dataset.
The number of defined database connections does not limit performance. The number of connections is different from the number of database inputs.

How to scale Splunk DB Connect to support medium and high workloads

Since version 3.10.0 we use Splunk Modular Inputs, so the input execution is triggered by calling a REST API, then it is processed using Java concurrent features.

Depending on the instance resources (CPU and memory) where DB Connect is running, users may observe performance degradation when adding more inputs.This signals the need to consider scaling. For our simplest architectures: standalone Splunk or Heavy Forwarder, we offer the following ways of scaling:

Tuning DB Connect configuration - adjusting the configuration to handle higher workloads as described in section below.
Vertical scaling - add more resources to the instance running DB Connect or migrating to an instance with more resources.
Horizontal scaling - adding additional instances with DB Connect and splitting inputs between available instances.

Please Note: Increasing the number of CPUs will have positive impact on the number of threads which can concurrently execute inputs, but this also has some limits so observing resource usage is also crucial to find out if we are getting desired DB Connect performance vs instance resources usage

Results from our lab have shown that, for instance, with 16vCPUs, 64GiB RAM it was safe to process approximately ~1000 inputs. The best performance was achieved with a medium workload configuration, providing up to 15% improvement compared to other configurations. We achieved an ingestion rate of 120k EPS(Events per second), which translates to 8.58MB/sec. The input execution rate was approximately 33 inputs per second with each input ingesting an average of 370k events. Please Note: That above results were received on fully dedicated DB Connect environment:

separate instance for Splunk with DB Connect ( no other apps/connectors installed)
MySQL databases were also running on separate instances

Each environment should be treated individually and performance may be impacted by many factors. Once performance issues are noticed, this is the signal to check the environment configuration and how resource usage looks and it is also time to think about scaling.

Please find detailed results here [[Benchmarking & Scalability Metrics]]

Bottleneck

HTTP Connection Pool: currently it allows 1024 concurrent requests, and the queue has size equal to 1024. So only 1024 inputs can run in parallel and 1024 waiting, if more are coming they are refused.

Java Thread Pool Executor: currently it allows 32 concurrent executions, and the queue has size equals to 128. So only 32 inputs can be executed in parallel and 128 waiting, if more are coming they are rejected.

Configurations

HTTP Connection Pool

It is specified in $SPLUNK_HOME/etc/apps/splunk_app_db_connect/config/dbx_task_server.yml, as part of server attributes.

Example:

 server:
   minThreads: 128
   maxThreads: 1256
   maxQueuedRequests: 1256

Java Thread Pool Executor

It is specified in $SPLUNK_HOME/etc/apps/splunk_app_db_connect/config/dbx_task_server.yml, as root attributes (at the end of the file).

JDBC Connection Pool

It is specified in $SPLUNK_HOME/etc/apps/splunk_app_db_connect/local/db_connections.conf. It can be defined for all connections if they are present under [default] stanza or individually for each connection stanza.

Java Heap Memory Size

It is specified in Splunk DB Connect > Configuration > Settings > General > Task Server JVM Options.

Example: -Xms8g -Xmx8g

Note: the amount of memory that will be allocated to the Java Heap will depend on the amount of available memory you have. Allocate between 30 to 50 percent.

Adjusting the Configuration

HTTP Connection Pool Java Thread Pool Executor JDBC Connection Pool Heap Memory Size

	HTTP Connection Pool	Java Thread Pool Executor	JDBC Connection Pool	Heap Memory Size
Low Workload (Default up to version 3.18.0). From `0` to `500` inputs.	`minThreads: 8 maxThreads: 1024 maxQueuedRequests: 1024`	`scheduledJobExecutorMinIdlePoolSize: 8 scheduledJobExecutorMaxPoolSize: 32 scheduledJobExecutorJobQueueSize: 128`	`maxConnLifetimeMillis = 1800000 minIdle = 1 maxTotalConn = 8`	-
Medium Workload (Default since version 3.18.1). From `500` to `1500` inputs.	`minThreads: 128 maxThreads: 1512 maxQueuedRequests: 1512`	`scheduledJobExecutorMinIdlePoolSize: 128 scheduledJobExecutorMaxPoolSize: 128 scheduledJobExecutorJobQueueSize: 1512`	`maxConnLifetimeMillis = 1800000 minIdle = 8 maxTotalConn = 32`	`30` percent of the available memory
High Workload. From `1500` to `5000` inputs.	`minThreads: 256 maxThreads: 2048 maxQueuedRequests: 2048`	`scheduledJobExecutorMinIdlePoolSize: 256 scheduledJobExecutorMaxPoolSize: 256 scheduledJobExecutorJobQueueSize: 2048`	`maxConnLifetimeMillis = 3600000 minIdle = 16 maxTotalConn = 64`	`50` percent of the available memory

Low Workload (Default up to version 3.18.0). From 0 to 500 inputs.

 minThreads: 8
 maxThreads: 1024
 maxQueuedRequests: 1024

 scheduledJobExecutorMinIdlePoolSize: 8 
 scheduledJobExecutorMaxPoolSize: 32
 scheduledJobExecutorJobQueueSize: 128

 maxConnLifetimeMillis = 1800000
 minIdle = 1
 maxTotalConn = 8

Medium Workload (Default since version 3.18.1). From 500 to 1500 inputs.

 minThreads: 128
 maxThreads: 1512
 maxQueuedRequests: 1512

 scheduledJobExecutorMinIdlePoolSize: 128 
 scheduledJobExecutorMaxPoolSize: 128
 scheduledJobExecutorJobQueueSize: 1512

 maxConnLifetimeMillis = 1800000
 minIdle = 8
 maxTotalConn = 32

30 percent of the available memory

High Workload. From 1500 to 5000 inputs.

 minThreads: 256
 maxThreads: 2048
 maxQueuedRequests: 2048

 scheduledJobExecutorMinIdlePoolSize: 256 
 scheduledJobExecutorMaxPoolSize: 256
 scheduledJobExecutorJobQueueSize: 2048

 maxConnLifetimeMillis = 3600000
 minIdle = 16
 maxTotalConn = 64

50 percent of the available memory

Note: the above configuration will allow more inputs to be processed simultaneously, but will also increase resource consumption. You should consider scaling by adding more HF instances and avoiding large volumes of inputs to be executed in parallel.

Benchmarking & Scalability Metrics

The benchmarking focused on testing three DBX versions: 3.9.0, 3.10.0, and 3.17.2, while the scalability tests involved running DBX version 3.18.0 on three configurations (low, medium, and high) to assess their performance across different number of inputs (500, 1500, and 5000) running on a given environment. The goal of benchmarking and scalability tests was to measure DBX's performance in terms of CPU and memory utilisation, event ingestion rates, and input handling capacity across different environments. By identifying the best-performing configuration, we aim to determine the optimal specifications for running DBX at scale, ensuring high efficiency and reliability under heavy loads. DBX has been tested on m5.4xlarge instance type with following specifications:

CPU: 16 vCPUs)
Memory: 64 GiB
OS: Ubuntu 22.04
Storage: 200 GiB

Benchmarking Metrics (for 3.17.2)

Rising Input

DBX was configured to collect data from a database containing 10 million events. The test checked how fast DBX can ingest all events and what ingestion rate is.


Event size [B]	Total number of events ingested	Ingestion rate	Time to ingest all events [s]
75	10,000,000	48,544	206
512	10,000,000	39,370	254
2048	10,000,000	25,126	398

Batch Input

In this scenario, DBX ingested 2 million events at 1 second intervals. The test checked how many events could be collected over a 10 minute window.


Event size [B]	Ingestion rate	Time to ingest all events [s]	Total number of events ingested	Number of input executions
75	68071.31	617	42,000,000	21
512	50078.25	639	32,000,000	16
1024	40625.00	640	26,000,000	13

Multiple Inputs

The test involved creating multiple inputs (5, 50, 100) to collect events from a database. Each input ingested 10,000 rows every 5 seconds, measuring performance as more inputs were added. Inputs were working in raising more each input had to collect 1000000 of events.


Number of inputs	Ingestion rate	Time to ingest all events [s]	Total number of events ingested
5	20,243	247	5,000,000
50	92,937	269	25,000,000
100	112,108	446	50,000,000

=Conclusions=

Smaller event sizes (75B) had higher ingestion rates but required more resources.
Larger event sizes (2048B) showed slower ingestion rates but better CPU/memory performance.
Degradation in performance from version 3.9.0 to 3.17.2, especially for smaller event sizes.
CPU usage showed an increase across versions, with 3.17.2 showing the least CPU consumption but longest ingestion window.

Scalability Metrics (for 3.18.0)

Performance was tested with 500, 1500, and 5000 inputs on different configurations (low, medium, high). Inputs were working in batch mode collecting events from two mySQL databases.

500 Inputs


Configuration Spec	Time to create all inputs [s]	Time to remove inputs [s]	Total number of events collected	Ingestion rate during 10 minutes ingestion window
low	1190.51	1136.63	316,210,000	105,187
medium	1063.30	988.52	322,143,000	124,997
high	1201.19	1129.26	317,332,000	105,432

1500 Inputs


Configuration Spec	Time to create all inputs [s]	Time to remove inputs [s]	Total number of events collected	Ingestion rate during 10 minutes ingestion window
low	x	x	2,039,730,000	x
medium	20239.31	18627.15	3,720,000,000	120,058
high	27287.15	27215.46	5,650,000,000	106,881

Conclusions

CPU & Memory Usage: Resource usage (both CPU and memory) was similar across all configurations, with medium spec providing the most efficient performance and reaching 100% resource utilisation for a higher number of simultaneously running inputs.
Medium spec outperformed others, particularly in input creation/removal times and event ingestion rates, handling ~125k events per second which gives ~10-15% gain compared to other configurations
High spec had slightly higher memory consumption and CPU usage but showed diminishing returns in ingestion rates.

Please note that having more CPUs will have beneficial effects on the performance resulting in more cores able to concurrently execute more inputs

Challenges

For 5000 inputs, the tests failed due to high memory and CPU usage reaching 100%, causing Splunk and VM crashes.
The low spec couldn't handle higher input loads beyond 982 inputs.

More performance help

If you are still experiencing performance issues, or want to receive feedback tailored to your setup, you have the following options:

Post a request to the community on Splunk Answers.
Contact Splunk Support.

Documentation