Calculate sizes of dynamic fields
This search determines which fields in your events consume the most disk space, without any prior knowledge of field names and number of events.
Scenario
index=_internal earliest=-15m latest=now
| fieldsummary
| rex field=values max_match=0 "value\":\"(?<values>[^\"]*)\","
| mvexpand values
| eval bytes=len(values)
| rex field=field "^(?!date|punct|host|hostip|index|linecount|source|sourcetype|timeendpos|timestartpos|splunk_server)(?<FieldName>.*)"
| stats count sum(bytes) as SumOfBytesInField values(values) as Values max(bytes) as MaxFieldLengthInBytes by FieldName
| rename count as NumOfValuesPerField
| eventstats sum(NumOfValuesPerField) as TotalEvents sum(SumOfBytesInField) as TotalBytes
| eval PercentOfTotalEvents=round(NumOfValuesPerField/TotalEvents*100,2)
| eval PercentOfTotalBytes=round(SumOfBytesInField/TotalBytes*100,2)
| eval ConsumedMB=SumOfBytesInField/1024/1024
| eval TotalMB=TotalBytes/1024/1024
| table FieldName NumOfValuesPerField SumOfBytesInField ConsumedMB PercentOfTotalBytes PercentOfTotalEvents
| addcoltotals labelfield=FieldName label=Totals
| sort - PercentOfTotalEvents
The results appear on the Statistics tab and look something like this:
| FieldName | NumValuesPerField | SumOfBytesInField | ConsumedMB | PercentOfTotalBytes | PercentOfTotalEvents |
|---|---|---|---|---|---|
| Totals | 1802 | 45700 | 0.0436 | 99.87 | 100.29 |
| cumulative_hits | 100 | 587 | 0.0006 | 1.28 | 5.55 |
| eps | 100 | 1862 | 0.0018 | 4.07 | 5.55 |
| kb | 100 | 1159 | 0.0011 | 2.54 | 5.55 |
| kbps | 100 | 1881 | 0.0018 | 4.12 | 5.55 |
| req_time | 100 | 3000 | 0.0029 | 6.56 | 5.55 |
| uri | 100 | 10559 | 0.0101 | 23.11 | 5.55 |
| uri_query | 100 | 3532 | 0.0034 | 7.73 | 5.55 |
| message | 96 | 11633 | 0.0111 | 25.46 | 5.33 |
| avg_age | 76 | 280 | 0.0012 | 2.80 | 4.22 |
| ev | 62 | 140 | 0.0001 | 0.31 | 3.44 |
| average_kbps | 59 | 1071 | 0.0010 | 2.34 | 3.27 |
Walkthrough
Let's walk through each part of the search.
- The example begins with a search to retrieve all events in
index=_internalwithin the last 15 minutes.index=_internal earliest=-15m latest=nowNote: You can replace this with any search string and time range.
- Next, add the the fieldsummary command to create a summary of all the fields in the previously retrieved events.
| fieldsummaryThe results appear on the Statistics tab and look something like this:
field count distinct _count is _exact max mean min numeric _count stdev values abandoned _channels 29 1 1 0.0 0.00 0.0 29 0.00 [{"value":"0","count":29}] active 29 1 1 0.0 0.00 0.0 29 0.00 [{"value":"0","count":29}] active _hist _searches 31 2 1 1.0 0.13 0.0 31 0.34 [{"value":"0","count":27}, {"value":"1","count":4}]
average _kbps 87 59 1 0.3 0.21 0.0 87 0.15 [{"value":"0","count":29}, {"value":"0.31239045073685034","count":1}, {"value":"0.31240549380412547","count":1}, {"value":"0.3124557194522294","count":1}, {"value":"0.3124707607545469","count":1}]
- The values of each field are extracted with a regex into a multivalue field called values, and then expanded. The length of each value is calculated in bytes.
| rex field=values max_match=0 "value\":\"(?<values>[^\"]*)\"," | mvexpand values | eval bytes=len(values) - The values of the field are extracted with another regex, with some exceptions.
| rex field=field "^(?!date|punct|host|hostip|index|linecount|source|sourcetype|timeendpos|timestartpos|splunk_server)(?<FieldName>.*)" - The
statscommand is used to perform multiple calculations using stats functions, including the count and the sum of the bytes (SumOfBytesInField). The values function is used to returns the list of all distinct values of thevaluesfield as a multivalue entry (Values). The max function calculates the maximum field length in bytes (MaxFieldLengthInBytes). The results are organized by field name.| stats count sum(bytes) as SumOfBytesInField values(values) as Values max(bytes) as MaxFieldLengthInBytes by FieldName | rename count as NumOfValuesPerField - The
eventstatscommand is used to calculate several sums, the number of values in each field (TotalEvents) and the sum of the bytes in each field (Total Bytes).| eventstats sum(NumOfValuesPerField) as TotalEvents sum(SumOfBytesInField) as TotalBytes - Several
evalcommands are run to calculate the percentage of total events, the percentage of total bytes, the megabytes consumed, and the total megabytes.| eval PercentOfTotalEvents=round(NumOfValuesPerField/TotalEvents*100,2) | eval PercentOfTotalBytes=round(SumOfBytesInField/TotalBytes*100,2) | eval ConsumedMB=SumOfBytesInField/1024/1024 | eval TotalMB=TotalBytes/1024/1024 - The
tablecommand is used to display on a specific set of fields. Theaddfoltotalscommand is used to calculate the total for each column. Thesortcommand is used sort the list in descending order by the PercentageOfTotalEvents field.| table FieldName NumberOfValuesPerField SumOfBytesInField ConsumedMB PercentageOfTotalBytes PercentageOfTotalEvents | addcoltotals labelfield=FieldName label=Totals | sort - PercentageOfTotalEvents
The results appear on the Statistics tab and look something like this:
| FieldName | NumValuesPerField | SumOfBytesInField | ConsumedMB | PercentOfTotalBytes | PercentOfTotalEvents |
|---|---|---|---|---|---|
| Totals | 1802 | 45700 | 0.0436 | 99.87 | 100.29 |
| cumulative_hits | 100 | 587 | 0.0006 | 1.28 | 5.55 |
| eps | 100 | 1862 | 0.0018 | 4.07 | 5.55 |
| kb | 100 | 1159 | 0.0011 | 2.54 | 5.55 |
| kbps | 100 | 1881 | 0.0018 | 4.12 | 5.55 |
| req_time | 100 | 3000 | 0.0029 | 6.56 | 5.55 |
| uri | 100 | 10559 | 0.0101 | 23.11 | 5.55 |
| uri_query | 100 | 3532 | 0.0034 | 7.73 | 5.55 |
| message | 96 | 11633 | 0.0111 | 25.46 | 5.33 |
| avg_age | 76 | 280 | 0.0012 | 2.80 | 4.22 |
| ev | 62 | 140 | 0.0001 | 0.31 | 3.44 |
| average_kbps | 59 | 1071 | 0.0010 | 2.34 | 3.27 |