Sizing guidelines for Edge Processors

Scale your Edge Processors to provide the necessary support for your incoming data.

The system requirements for your deployment of the Edge Processor solution can vary depending on the volume of data being processed and the complexity of the processing operations involved. In order to support higher volumes of incoming data, or more complex data transformations such as field extraction and normalization, you might need to install your Edge Processors on host machines with higher specifications or scale out your Edge Processors by adding more instances.

To determine how to scale your Edge Processor deployment, start by estimating the amount of incoming data that your Edge Processors will ingest and process per day. Then, refer to the table in the Sizing guidelines based on daily data ingestion volume section on this page.

Be aware that there is a soft limit on the maximum number of Edge Processor instances that can be supported. See Tested and recommended service limits (soft limits) in the Splunk Cloud Platform Service Details for more information.

Context for sizing guidelines

The guidelines shown in the Sizing guidelines based on daily data ingestion volume section on this page were determined based on internal performance testing using the following configurations:

  • The Edge Processor was installed on an Amazon EC2 C5 instance. For information about the specifications, see “Amazon EC2 C5 Instances” on the Amazon Web Services website.

  • The Splunk-to-Splunk (S2S) protocol was used for data transmission.

  • The data was sent from a forwarder to the Edge Processor, and then from the Edge Processor to a Splunk platform deployment.

  • The incoming data consisted of Palo Alto Network logs with an average event size of 400-600 bytes. These logs were consistent with the pan:firewall source type and looked like the following:
    CODE
    unused,2025/06/10 18:23:49,001234567890,TRAFFIC,URL,1.0,2025/06/10 18:23:49,112.52.238.98,68.25.111.114,,,block-social-media-apps-strict,carol.networking@example.org,,,dns-long-request-lookup,vsys1,DMZ-Frontline-Application-Network,Untrust-Network-Segment,ethernet1/1,ethernet1/2,,12345,0,54823,443,12345,54321,0xABCDEF,tcp,deny,www.example-longdomain.org,,,informational,1001,,US,Global,,,,,,,,,"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)",text/html,,,,,,,Company1,Company2,Company3,Company4,vsys1,firewall-1,,,,,,,proxy-tag-info,,high,
    
    unused,2025/06/10 18:23:49,001234567890,TRAFFIC,URL,1.0,2025/06/10 18:23:49,108.181.41.167,82.57.184.190,,,allow-internal-web-access-to-dmz,bob.superadmin@example.com,,,ssl-encrypted-browsing,vsys1,Internal-Zone-Long-Label,Untrust-Network-Segment,ethernet1/1,ethernet1/2,,12345,0,54823,443,12345,54321,0xABCDEF,tcp,drop,www.example-longdomain.org,,,informational,1001,,US,Global,,,,,,,,,"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)",text/html,,,,,,,Company1,Company2,Company3,Company4,vsys1,firewall-1,,,,,,,proxy-tag-info,,high,
    
    unused,2025/06/10 18:23:49,001234567890,TRAFFIC,URL,1.0,2025/06/10 18:23:49,55.212.51.43,38.66.82.16,,,restrict-vpn-access-on-weekends,bob.superadmin@example.com,,,http-cleartext,vsys1,Trust-Extended-Zone-Name,External-SaaS-Zone,ethernet1/1,ethernet1/2,,12345,0,54823,443,12345,54321,0xABCDEF,tcp,allow,www.example-longdomain.org,,,informational,1001,,US,Global,,,,,,,,,"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)",text/html,,,,,,,Company1,Company2,Company3,Company4,vsys1,firewall-1,,,,,,,proxy-tag-info,,high,
  • The data was processed by a pipeline that extracted fields, filtered data, and normalized data. This pipeline configuration represents mid-to-high complexity data transformations. For example:
    JAVASCRIPT
    function remove_readable_timestamp($source) {
    return | from $source
    | eval readable_time_regex = "\\w{3}\\s\\d{2}\\s\\d+:\\d+:\\d+"
    | eval _raw=replace(_raw, readable_time_regex, "")
    | fields -readable_time_regex
    }
    
    function route_test($source) {
    return | from $source
    | eval route_win=1
    }
    
    function external_ip($source) {
    return | from $source
    | eval cond_ip=if(cidrmatch("10.0.0.0/8",dest_ip) OR cidrmatch("172.16.0.0/12",dest_ip) OR cidrmatch("192.168.0.0/16",dest_ip), "internal", "external")
    | eval cond_vw=if(match(dest_zone, "String$"), 1, 0)
    | where NOT(cond_ip="internal" AND cond_vw="1")
    }
    
    $pipeline = | from $source 
    | rex field=_raw /^^.*?,.*?,.*?,.*?,.*?,.*?,.*?,.*?,(?P<dest_ip>.*?),.*?,.*?,(?P<rule>.*?),.*?,.*?,.*?,.*?,(?P<src_zone>.*?),(?P<dest_zone>.*?),/
    | where not(dest_ip="192.168.0.1" )
    | external_ip
    | eval index = "main"
    | eval ep="1"
    | eval src_user = if(src_user = "", "unknown", src_user)
    |eval _time=strptime(generated_time + " PST", "%Y/%m/%d %H:%M:%S")
    // Replace the Raw with a friendly string, to prevent the "Palo Alto Networks" TA from re-writing the time incorrectly
    | eval _raw = src_user + " on " + src_ip +" via policy " + rule + " accessing " + misc
    |fields -future_use3, future_use1, receive_time
    | rex field=_raw /^^(?P<future_use1>.*?),(?P<receive_time>.*?),(?P<serial_number>.*?),(?P<type>.*?),(?P<log_subtype>.*?),(?P<version>.*?),(?P<generated_time>.*?),(?P<src_ip>.*?),(?P<dest_ip>.*?),(?P<src_translated_ip>.*?),(?P<dest_translated_ip>.*?),(?P<rule>.*?),(?P<src_user>.*?),(?P<dest_user>.*?),(?P<app>.*?),(?P<vsys>.*?),(?P<src_zone>.*?),(?P<dest_zone>.*?),(?P<src_interface>.*?),(?P<dest_interface>.*?),(?P<log_forwarding_profile>.*?),(?P<future_use3>.*?),(?P<session_id>.*?),(?P<repeat_count>.*?),
    ,(?P<src_port>.*?),(?P<dest_port>.*?),(?P<src_translated_port>.*?),(?P<dest_translated_port>.*?),(?P<session_flags>.*?),(?P<transport>.*?),(?P<action>.*?),(?P<misc>.*?),(?P<threat>.*?),(?P<raw_category>.*?),(?P<severity>.*?),(?P<direction>.*?),(?P<sequence_number>.*?),(?P<action_flags>.*?),(?P<src_location>.*?),(?P<dest_location>.*?),(?P<future_use4>.*?),(?P<content_type>.*?),(?P<pcap_id>.*?),(?P<file_hash>.*?),(?P<cloud_address>.*?),(?P<url_index>.*?),(?P<user_agent>.*?),(?P<file_type>.*?),(?P<xff>.*?),(?P<referrer>.*?),(?P<sender>.*?),(?P<subject>.*?),(?P<recipient>.*?),(?P<report_id>.*?),(?P<devicegroup_level1>.*?),(?P<devicegroup_level2>.*?),(?P<devicegroup_level3>.*?),(?P<devicegroup_level4>.*?),(?P<vsys_name>.*?),(?P<dvc_name>.*?),(?P<future_use5>.*?),(?P<src_vm>.*?),(?P<dest_vm>.*?),(?P<http_method>.*?),(?P<tunnel_id>.*?),(?P<tunnel_monitor_tag>.*?),(?P<tunnel_session_id>.*?),(?P<tunnel_start_time>.*?),(?P<tunnel_type>.*?),(?P<threat_category>.*?),(?P<content_version>.*?),(?P<future_use6>.*?)/
    | into $destination;

When using the configurations described above to test the performance of a single-instance Edge Processor on host machines with different specifications, the following results were observed:

Amazon EC2 instance type Maximum CPU utilization Maximum memory usage Throughput of incoming data
t2.nano 74% 200 MB 0.70 TB per day
c5.large 97% 280 MB 1.17 TB per day
c5.xlarge 94% 350 MB 2.34 TB per day
c5.2xlarge 70% 740 MB 3.75 TB per day
c5.4xlarge 47% 1300 MB 5.16 TB per day

When testing the performance of a multi-instance Edge Processor, the following results were observed:

Amazon EC2 instance type Number of Edge Processor instances Throughput of incoming data
c5.2xlarge 5 16.37 TB per day

In all of these cases, the Edge Processor was able to ingest and process the data without holding it in the persistent queue or back pressuring it.

Sizing guidelines based on daily data ingestion volume

This table describes the following system requirements for processing a given amount of incoming data per day:

  • The number of Edge Processors that must be installed
  • The number of instances that each Edge Processor must be scaled out to
  • The amount of CPU and memory that the host machine of each Edge Processor must have
Note: These sizing guidelines are based on internal performance testing. Results can vary for different environments and use cases. Be sure to evaluate the resulting throughput of your Edge Processors and scale your deployment as necessary for your particular needs.
Data ingestion volume Edge Processors Instances CPU Memory
Less than 1 TB per day 1 1 2 vCPUs 4 GiB
1 to 2 TB per day 1 1 4 vCPUs 8 GiB
2 to 3.5 TB per day 1 1 8 vCPUs 16 GiB
More than 3.5 TB per day 1 1 instance for every 3.5 TB of data 8 vCPUs 16 GiB

Sizing guidelines for containerized Edge Processors

Size your Edge Processor deployment when running containers on Kubernetes using the Splunk-provided Helm chart.

The underlying Edge Processor binary is identical to the non-containerized version, making the throughput characteristics apply equally. The primary difference is how resources are expressed and managed. Instead of selecting a host machine of a given size, you configure CPU and memory limits on each pod. Kubernetes handles scheduling those pods onto appropriately sized nodes.

Kubernetes prerequisites

The following infrastructure requirements apply to the containerized Edge Processor deployment and have no equivalent in the non-containerized model:

  • Storage: A CSI-compatible storage provisioner is required for persistent volumes. On AWS, this means the Amazon EBS CSI driver must be installed and configured in your cluster. The Helm chart can optionally create a gp3-backed StorageClass for you by setting cloudProvider.aws.createStorageClass: true.

  • Cluster capacity: Ensure your node pool has enough total allocatable CPU and memory to satisfy not only your initial deployment.replicaCount × container.resources.limits, but also any additional replicas the HPA may add during scale-up.

  • Pod Disruption Budget: The chart deploys a PodDisruptionBudget with minAvailable: 2 by default, meaning at least two pods must remain running during voluntary disruptions such as node maintenance. Plan your replica count and node pool size accordingly.

Pod resource limits

In a containerized deployment model, the sizing unit is the pod. You can control how much CPU, memory, and storage each Edge Processor pod is allowed to consume by setting container.resources.requests and container.resources.limits in your chart's values.yaml file. The following table maps pod CPU and memory limits to the equivalent EC2 instance type from the performance analysis above, along with the expected throughput for a single Edge Processor instance. These equivalences hold because the C5 instance benchmarks are bottlenecked by vCPU count and available memory; therefore, a pod with matching limits will exhibit comparable throughput when scheduled on a node with sufficient capacity.

Pod CPU limit Pod memory limit Equivalent EC2 instance Throughput per pod
2 4 GiB c5.large ~1.17 TB per day
4 8 GiB c5.xlarge ~2.34 TB per day
8 16 GiB c5.2xlarge ~3.75 TB per day
16 32 GiB c5.4xlarge ~5.16 TB per day

The default limits provided in the Helm chart are 2 CPUs and 4 GiB of memory, which is equivalent to a c5.large EC2 instance. Adjust these values based on your expected daily ingestion volume. To scale beyond a single pod's throughput, increase your chart's desired replica count. Consistent with the multi-instance results shown previously, total cluster throughput scales approximately linearly with replica count.

Node sizing

Each Kubernetes node must have enough allocatable CPU, memory, and storage to satisfy the combined resource requests of all pods scheduled onto it. As a general rule, provision nodes whose allocatable capacity is at least as large as the pod limits you have configured, with sufficient headroom for the Kubernetes system components and any other workloads sharing the node.

Persistent volume sizing

Each Edge Processor pod makes use of two persistent volumes:

  • Event queue (ep-event-queue): All events are written through a per-destination queue file before being exported. The queue drains continuously under normal conditions; however, unsent events will begin to accumulate if the ingestion rate exceeds the export rate or if the downstream destination is unavailable. If the queue reaches full capacity, the Edge Processor instance applies backpressure by no longer accepting new events, and upstream sources are left to handle the blockage on their own (e.g., buffering locally, retrying continuously, etc). Defaults to 15 GiB.

  • Instance identity (ep-instance-identity): Stores the pod's registered identity and configuration. Defaults to 1 MiB and should not require any manual adjustment.

Sizing the event queue

The event queue is designed to act as a buffer. Its size should depend on your total peak ingestion rate, the number of replicas sharing that traffic, as well as how long you want the queue to absorb incoming data before the pod must apply backpressure. To calculate the size in GiB, use the following formula:

CODE
queue size (GiB) = 38.805 × peak ingestion rate (TB/day) × desired tolerance (hours) ÷ number of replicas

The constant 38.805 represents the conversion from TB/day to GiB/hour (i.e., 1 TB/day = 1,000,000,000,000 bytes/day ÷ 24 hours/day ÷ 1,073,741,824 bytes/GiB ≈ 38.805 GiB/hour). Additionally, the queue size is inversely proportional to the number of replicas because each pod handles a commensurate share of the traffic; thus, the per-pod queue size should decrease as the number of replicas increase. For example, a 3-replica deployment with a peak ingestion rate of 1.17 TB/day and a desired 1-hour tolerance requires approximately 15.1 GiB per pod, so the default 15 GiB is a suitable starting point.

Note: The event queue size is set at install time using persistence.eventQueue.size and cannot be resized without reprovisioning the persistent volume.
As ingestion volume grows over time, the preferred response is to increase the replica count rather than enlarging the per-pod queue. Scaling up replicas distributes the additional load across more pods and keeps individual queue sizes manageable, whereas simply growing the queue only delays the point at which backpressure is applied without increasing actual throughput capacity. Instead, reserve queue size increases for situations where you need a longer tolerance window. For example, to survive a longer downstream outage. Not as a substitute for horizontal scaling.

Horizontal scaling and scale-down behavior

The Helm chart includes a Horizontal Pod Autoscaler that scales the number of Edge Processor pods based on CPU and memory utilization. Scale-downs are disabled by default, however, since terminating a pod before its event queue has fully drained orphans any queued events on the persistent volume. Therefore, until queue-depth-aware scaling is available, it is not considered safe to automate the removal/deletion of pods.