Disk I/O Requirements

A critical factor in a machine's ability to support the performance requirements of a Controller in a production environment is the machine's disk I/O performance.

There are two requirements related to I/O latency:

  • This disk I/O must perform such that the maximum write latency for the Controller’s primary storage must not exceed 3 milliseconds while the Controller is under sustained load. Splunk AppDynamics cannot provide support for Controller problems resulting from excessive disk latency.
  • Self-monitoring must be set up for the Controller. Self-monitoring consists of a SIM agent that measures the latency of data partitions on the Controller host, and the configuration needs to include dashboard and health rule alerts that trigger when the maximum latency exceeds 3 ms. For details on Controller self-monitoring, contact customer support.

Disk I/O Operations

The Controller performs two types of I/O operations important to Controller performance:

  • The MySQL intent log is very sensitive to latency, and MySQL performs writes using varying block sizes.
  • MySQL’s InnoDB storage engine uses random, asynchronous, 16Kb reads and writes to move database pages between storage and cache. In a properly sized Controller, most reads are satisfied from one of the software caches.

It’s important for best performance that the stripe size of the RAID configuration matches the write size. The two write sizes are 16Kb (for the database) and 128Kb (for the logs). You should use the smallest stripe size supported, but no smaller than 16Kb. If using a hardware-based RAID controller, be sure that it supports these stripe sizes. The stripe size can be determined by the number of data disks multiplied by the strip/segment/chunk (the portion of data stored on a single disk).

SAN-based Storage Limitations

While onboard disks typically satisfy I/O requirements, SAN-based storage could be hampered by poor I/O latency performance. In addition, refrain from using an NFS-mounted filesystem. NFS adds latency and throughput constraints that can negatively affect Controller performance and even lead to data corruption. Similarly, you should avoid iSCSI or other SAN technologies that are subject to quality of service issues from the underlying network.

If you choose to deploy one of these latency-challenged storage technologies on a system that is expected to process 1M metrics/min or greater, a mirrored NVMe configured as a write-back cache for all storage accesses is recommended. Configuring such a device will hide some of the longer latencies that have been seen in these environments.

In all cases, be sure to thoroughly test the deployment with real-world traffic load before putting an Controller into a live environment.