Configure High Availability on top of Heavy Forwarders
Since version 4.0.0 Splunk DB Connect implements high availability features, this is possible using a etcd cluster to help replicate configuration changes and task coordination. This feature is experimental and still under development. However, it is already functional, and we encourage you to use it to provide feedback and help refine future versions.
Requirements for High Availability
- etcd up and running. Review the hardware recommendations guide.
- etcd is configured to work as a cluster. We recommend a cluster of at least 3 nodes.
- Splunk DB Connect up and running.
- Splunk DB Connect is configured to work as a cluster. We recommend at least 3 instances running as a cluster.
- Each Splunk DB Connect instance has installed the required JDBC Add-ons. JDBC Add-ons are not replicated.
Install and configure the etcd cluster
The etcd is a lightweight distributed key-value store. It allows replicating configuration changes in a reliable manner. The installation process is simple and does not require previous expertise working with etcd. Install etcd in the same instances as Splunk DB Connect, to avoid increasing infrastructure costs.
Download and install etcd
Review the official documentation related to etcd installation steps in Install etcd. The procedures on this page describe how to install in a Linux instances (AMD x64). Note that these steps are sensible to changes.
1. Download etcd:
$ ETCD_VERSION=v3.4.34
$ DOWNLOAD_URL=https://github.com/etcd-io/etcd/releases/download
$ curl -L ${DOWNLOAD_URL}/${ETCD_VERSION}/etcd-${ETCD_VERSION}-linux-amd64.tar.gz -o /opt/etcd-${ETCD_VERSION}-linux-amd64.tar.gz
2. Unpack etcd:
$ mkdir /opt/etcd-${ETCD_VERSION}
$ tar xzvf /opt/etcd-${ETCD_VERSION}-linux-amd64.tar.gz -C /opt/etcd-${ETCD_VERSION} --strip-components=1
3. Verify the etcd version:
$ /opt/etcd-${ETCD_VERSION}/etcd --version
$ /opt/etcd-${ETCD_VERSION}/etcdctl version
Configure etcd to work as a cluster
Review the official documentation related to etcd cluster steps in etcd clustering. The procedures on this page describe how to configure etcd as a cluster in Linux instances (AMD x64), but be aware that these steps are sensible to changes. The service to make sure etcd is able to restart after an unexpected event. During the configuration provide an IP address for each instance that joins the cluster. You must create the service for each instance.
1. Update IP address to hostname mapping:
$ sudo nano /etc/hosts
<node-1-ip> etcd-node-1
<node-2-ip> etcd-node-2
<node-3-ip> etcd-node-3
2. Create etcd service:
$ sudo nano /etc/systemd/system/etcd.service
[Unit]
Description=etcd cluster for Splunk DB Connect
After=network.target
[Service]
User=root
Type=notify
ExecStart=/opt/etcd-v3.4.34/etcd \
--name etcd-node-<1..3> \
--initial-advertise-peer-urls http://<node-ip>:<peer-commuication-port | 2380> \
--listen-peer-urls http://<node-ip>:<peer-commuication-port | 2380> \
--listen-client-urls http://<node-ip>:<client-request-port | 2379>,http://<loopback-ip | 127.0.0.1>:<client-request-port | 2379> \
--advertise-client-urls http://<node-ip>:<client-request-port | 2379> \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster etcd-node-1=http://<node-1-ip>:<peer-commuication-port | 2380>,etcd-node-2=http://<node-2-ip>:<peer-commuication-port | 2380>,etcd-node-3=http://<node-3-ip>:<peer-commuication-port | 2380> \
--initial-cluster-state new \
--data-dir /var/lib/etcd
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
3. Start the etcd service:
$ sudo systemctl daemon-reload
$ sudo systemctl enable etcd
$ sudo systemctl start etcd
$ systemctl status etcd.service
4. Verify cluster status:
$ ETCDCTL_API=3
$ /opt/etcd-v3.4.34/etcdctl --endpoints=http://<node-1-ip>:<client-request-port | 2379>,http://<node-2-ip>:<client-request-port | 2379>,http://<node-3-ip>:<client-request-port | 2379> endpoint health
Configure authentication in etcd
Review the official documentation related to etcd authentication in Authentication Guides. See the following ENDPOINTS environment variable to avoid verbosity and allow reusability.
$ ENDPOINTS=http://<node-1-ip>:<client-request-port | 2379>,http://<node-2-ip>:<client-request-port | 2379>,http://<node-3-ip>:<client-request-port | 2379>
1. Create user/role root for administrative purposes:
$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} role add root
Provide a password when adding a new user.
$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} user add root
$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} user grant-role root root
2. Enable authentication:
$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} auth enable
You can validate the authentication using the health API:
$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} --user root:<root-password> endpoint health
Create custom role with read/write access
The user/role root created in the previous procedure is used for administrative purposes. For security reasons create a new role and user with more specific access.
1. Create role dbx:
$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} --user=root:<root-password> role add dbx
2. Add privileges to the role. Give read/write access to all keys with prefix dbx. Splunk DB Connect uses dbx as prefix:
$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} role grant-permission dbx --prefix=true readwrite dbx
3. Create user dbx and grant role:
$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} user add dbx
$ /opt/etcd-v3.4.34/etcdctl --endpoints=${ENDPOINTS} user grant-role dbx dbx
Configure TLS in etcd
Review the official documentation related to TLS in Transport security model. The procedures on this page describe how to configure TLS in etcd, but be aware these steps are sensible to changes.
1. Obtain/generate certificates
You need to obtain or generate TLS certificates for each etcd member. You must provide the private key, certificates, and CA.
Add host entries (IP addresses and/or FQDNs) to the certificate's Subject Alternative Name (SAN) field.
The certificate must list every address the node uses to communicate, including its public and private IPs and 127.0.0.1 if used for local health checks.
2. Enable TLS for client (DB Connect) to etcd communication
Note that now the protocol for advertise-client-urls and listen-client-urls is HTTPS instead of HTTP. You must also add cert-file and key-file attributes, with a path to the certificate and the private key, respectively.
/opt/etcd-v3.4.34/etcd \
--name etcd-node-<1..3> \
--cert-file=/path-to-certs/etcd-node-<1..3>-cert.pem \
--key-file=/path-to-certs/etcd-node-<1..3>-key.pem \
--initial-advertise-peer-urls http://<node-ip>:<peer-commuication-port | 2380> \
--listen-peer-urls http://<node-ip>:<peer-commuication-port | 2380> \
--listen-client-urls https://<node-ip>:<client-request-port | 2379>,https://<loopback-ip | 127.0.0.1>:<client-request-port | 2379> \
--advertise-client-urls https://<node-ip>:<client-request-port | 2379> \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster etcd-node-1=http://<node-1-ip>:<peer-commuication-port | 2380>,etcd-node-2=http://<node-2-ip>:<peer-commuication-port | 2380>,etcd-node-3=http://<node-3-ip>:<peer-commuication-port | 2380> \
--initial-cluster-state new \
--data-dir /var/lib/etcd
3. Enable TLS for peer (etcd) to peer (etcd) communication
Note that the protocol for initial-advertise-peer-urls, listen-peer-urls, and initial-cluster is HTTPS instead of HTTP. The peer-cert-file and peer-key-file attributes, with a path to the certificate and the private key are added.
/opt/etcd-v3.4.34/etcd \
--name etcd-node-<1..3> \
--cert-file=/path-to-certs/etcd-node-<1..3>-cert.pem \
--key-file=/path-to-certs/etcd-node-<1..3>-key.pem \
--peer-cert-file=/path-to-certs/etcd-node-<1..3>-cert.pem \
--peer-key-file=/path-to-certs/etcd-node-<1..3>-key.pem \
--peer-client-cert-auth=true \
--peer-trusted-ca-file=/path-to-certs/<certificate-authority>.crt \
--initial-advertise-peer-urls https://<node-ip>:<peer-commuication-port | 2380> \
--listen-peer-urls https://<node-ip>:<peer-commuication-port | 2380> \
--listen-client-urls https://<node-ip>:<client-request-port | 2379>,https://<loopback-ip | 127.0.0.1>:<client-request-port | 2379> \
--advertise-client-urls https://<node-ip>:<client-request-port | 2379> \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster etcd-node-1=https://<node-1-ip>:<peer-commuication-port | 2380>,etcd-node-2=https://<node-2-ip>:<peer-commuication-port | 2380>,etcd-node-3=https://<node-3-ip>:<peer-commuication-port | 2380> \
--initial-cluster-state new \
--data-dir /var/lib/etcd
Maintenance
To keep your etcd cluster running at its optimal capacity, you might need to apply some specific configurations described in the maintenance guide.
Configure DB Connect to work as a cluster
To make Splunk DB Connect work as a cluster, add the etcd cluster member information in .
- Go to .
- If authentication is enabled for the etcd user, specify
userandpassword. - Click Add and enter
hostandportinformation for each etcd cluster member - Click Save.
When you save, your configuration is validated and the status information is shown for each etcd member. The information similar to This is an Active-Active cluster, at this moment this node is configured as Member appears.
The High Availability cluster supported by Splunk DB Connect works as Active-Active mode, it means the workload (data ingestion) is distributed across all the members.
*.conf files are not replicated automatically.
Enable TLS
To allow DB Connect to communicate with etcd using TLS you must switch on the TLS Enabled option. If you use a self-signed certificate, you need to add the CA or the certificate itself to the KeyStore in .
Reconciliation options
Splunk DB Connect implements an automatic reconciliation mechanism to ensure that all members always contain the same configuration data. However, a manual reconciliation option is also available if needed.Import configurations
This reconciliation option allows you to synchronize one specific instance with the other cluster members; any local configuration that are not replicated may be lost.
Export configurations
This reconciliation option allows you to replicate local configuration to other cluster members; any configuration made on other members may be lost.
Use Cases for Splunk DB Connect High Availability Cluster
Since version 4.1.0, Splunk DB Connect provides High Availability with workload distribution and load balancing.
High Availability benefits
High Availability benefits you in the following cases:
- You configured redundancy across multiple servers to cover Splunk DB Connect downtimes.
- The server where you run Splunk DB Connect has a high downtime rate.
- You want to minimize the risk of having delays or losing your data.
- If you split Splunk DB Connect into multiple servers to handle high volume of data ingestion and for better performance.
Upgrade scenarios for High Availability
If you have Splunk DB Connect installed in multiple instances and you want them to become a cluster to provide High Availability, either those instances are redundant (same configuration) or they have different configuration.
- Install and configure the etcd cluster.
- Configure Splunk DB Connect to work as a cluster.
- Review the requirements section and make sure you meet them.
- Choose the instance with the configurations that you want to replicate to others instances. Then go to Configurations > Settings > High Availability Cluster and click Export configurations.
- Go to the other instances. Then go to Configurations > Settings > High Availability Cluster and click Import configurations.
- Done
Scale the High Availability cluster
For more resiliency or to scale, set up a new server with a Heavy Forwarder and Splunk DB Connect.
- Add information to the cluster about the new member. Make sure it runs on one of the nodes that already belong to the cluster.
JSON
$ sudo /opt/etcd-${ETCD_VERSION}/etcdctl member add etcd-node-<1..3> --peer-urls=http://<node-ip - Follow Install and configure the etcd cluster to add a new etcd member to the cluster. Make sure you replace
--initial-cluster-state newfor--initial-cluster-state existingand you include the new member in--initial-cluster, it is only necessary for the new member. - Follow Configure DB Connect to work as a cluster to configure a new Splunk DB Connect instance.
- After startup, the new instance contains the same configuration data as other members. If any configuration is missing, go to and select Import configurations.
Replicating configurations
Replicated configurations
- Identities
- Connections
- Inputs
- Checkpoints (for Inputs)
- Certificates (stored in the KeyStore)
Not replicated configurations
- Outputs and Lookups (as these features are not supported on Heavy Forwarders).
- HTTP Event Collector configuration
- Logging configuration
- General settings
- JDBC drivers
Auto reconciliation
Splunk DB Connect implements an automatic reconciliation mechanism to ensure that all members always contain the same configuration data.
Versioning allows members to detect if there are changes that were not replicated.
-
$SPLUNK_HOME/etc/apps/splunk_app_db_connect/versions/identities.version
-
$SPLUNK_HOME/etc/apps/splunk_app_db_connect/versions/connections.version
-
$SPLUNK_HOME/etc/apps/splunk_app_db_connect/versions/inputs.version
-
$ETCD_HOME/etcdctl get "dbx/sync/version/identities"
-
$ETCD_HOME/etcdctl get "dbx/sync/version/identities"
-
$ETCD_HOME/etcdctl get "dbx/sync/version/inputs"
-
component=version_publisher
-
component=version_listener
The system compares local against the remote versions and requests synchronization if they do not match.
-
component=data_sync_request_publisher
-
component=connection_sync_request_publisher
-
component=identity_sync_request_publisher
-
component=input_sync_request_publisher
-
component=data_sync_request_listener
-
component=identity_sync_request_listener
-
component=connection_sync_request_listener
-
component=input_sync_request_listener
The system receives synchronization updates. They are processed by the member who sent the request.
-
component=data_sync_response_listener
-
component=identity_sync_response_listener
-
component=connection_sync_response_listener
-
component=input_sync_response_listener
Troubleshooting High Availability
Data is not being ingested after configure the Splunk Cloud HEC
HTTP Event Collector configurations are not replicated. By default, the local HEC is used, but if you need to configure an external HEC, you need to do it for each Splunk DB Connect instance.
Data configuration are not the same in each Splunk DB Connect instance
Make sure the etcd cluster is up and running, you can review the status of the members in . Data reconciliation is limited for now. If you see configurations that were not replicated, use the manual reconciliation option. Go to and select Import configurations.