Calculate sizes of dynamic fields

This search determines which fields in your events consume the most disk space, without any prior knowledge of field names and number of events.

Scenario

index=_internal earliest=-15m latest=now
| fieldsummary 
| rex field=values max_match=0 "value\":\"(?<values>[^\"]*)\","
| mvexpand values 
| eval bytes=len(values)
| rex field=field "^(?!date|punct|host|hostip|index|linecount|source|sourcetype|timeendpos|timestartpos|splunk_server)(?<FieldName>.*)"
| stats count sum(bytes) as SumOfBytesInField values(values) as Values max(bytes) as MaxFieldLengthInBytes by FieldName
| rename count as NumOfValuesPerField
| eventstats sum(NumOfValuesPerField) as TotalEvents sum(SumOfBytesInField) as TotalBytes
| eval PercentOfTotalEvents=round(NumOfValuesPerField/TotalEvents*100,2)
| eval PercentOfTotalBytes=round(SumOfBytesInField/TotalBytes*100,2)
| eval ConsumedMB=SumOfBytesInField/1024/1024
| eval TotalMB=TotalBytes/1024/1024
| table FieldName NumOfValuesPerField SumOfBytesInField ConsumedMB PercentOfTotalBytes PercentOfTotalEvents
| addcoltotals labelfield=FieldName label=Totals
| sort - PercentOfTotalEvents

The results appear on the Statistics tab and look something like this:

Note: The totals do not add up to exactly 100% because this search is adding up the values after the values have been rounded. To avoid this, you can move the rounding to the end of the search string.

Walkthrough

Let's walk through each part of the search.

  1. The example begins with a search to retrieve all events in index=_internal within the last 15 minutes.
    index=_internal earliest=-15m latest=now
    

    Note: You can replace this with any search string and time range.

  2. Next, add the the fieldsummary command to create a summary of all the fields in the previously retrieved events.
    | fieldsummary
    

    The results appear on the Statistics tab and look something like this:

  3. The values of each field are extracted with a regex into a multivalue field called values, and then expanded. The length of each value is calculated in bytes.
    | rex field=values max_match=0 "value\":\"(?<values>[^\"]*)\","
    | mvexpand values 
    | eval bytes=len(values)
    
  4. The values of the field are extracted with another regex, with some exceptions.
    | rex field=field "^(?!date|punct|host|hostip|index|linecount|source|sourcetype|timeendpos|timestartpos|splunk_server)(?<FieldName>.*)"
    
  5. The stats command is used to perform multiple calculations using stats functions, including the count and the sum of the bytes (SumOfBytesInField). The values function is used to returns the list of all distinct values of the values field as a multivalue entry (Values). The max function calculates the maximum field length in bytes (MaxFieldLengthInBytes). The results are organized by field name.
    | stats count sum(bytes) as SumOfBytesInField values(values) as Values max(bytes) as MaxFieldLengthInBytes by FieldName
    | rename count as NumOfValuesPerField
    
  6. The eventstats command is used to calculate several sums, the number of values in each field (TotalEvents) and the sum of the bytes in each field (Total Bytes).
    | eventstats sum(NumOfValuesPerField) as TotalEvents sum(SumOfBytesInField) as TotalBytes
    
  7. Several eval commands are run to calculate the percentage of total events, the percentage of total bytes, the megabytes consumed, and the total megabytes.
    | eval PercentOfTotalEvents=round(NumOfValuesPerField/TotalEvents*100,2)
    | eval PercentOfTotalBytes=round(SumOfBytesInField/TotalBytes*100,2)
    | eval ConsumedMB=SumOfBytesInField/1024/1024
    | eval TotalMB=TotalBytes/1024/1024
    
  8. The table command is used to display on a specific set of fields. The addfoltotals command is used to calculate the total for each column. The sort command is used sort the list in descending order by the PercentageOfTotalEvents field.
    | table FieldName NumberOfValuesPerField SumOfBytesInField ConsumedMB PercentageOfTotalBytes PercentageOfTotalEvents
    | addcoltotals labelfield=FieldName label=Totals
    | sort - PercentageOfTotalEvents
    

The results appear on the Statistics tab and look something like this: