Control data to ingest using the Collector
Remove data prior to ingestion with the Splunk Distribution of OpenTelemetry Collector / Pre-ingest data removal.
Depending on its configuration, the Splunk Distribution of OpenTelemetry Collector can forward a wide range of telemetry, such as metrics, traces, or logs, to the Splunk Observability Cloud ingest endpoints. For certain scenarios, some of this data can be redundant, unnecessary, or sensitive, causing technical complications, increased cost, or legal issues.
To address these situations, the Collector comes with options to process the data you’re receiving to modify or delete unwanted elements before they’re ingested by Splunk Observability Cloud. For example, you can use the attributes processor to edit or remove any unwanted data.
Scenarios: Remove dimensions using the attributes processor
Moira, a performance engineer, notices high cardinality in ingested metrics, which heavily impacts data charges. Moira is considering dropping certain dimensions to cut costs.
Moira checks which dimensions are being used and realizes that for the metric cpu.utilization, the dimensions hostname and source_host are irrelevant. They decide that those dimension don’t need to be ingested at all.
To prevent both dimensions from being ingested, first Moira adds the attributes processor in the Collectors’s configuration, set up to skip the unnecessary dimensions:
extensions:
...
processors:
attributes/delete:
actions:
- key: hostname
action: delete
- key: source_host
action: delete
service:
...
...
Next, Moira adds the attributes/delete processor to the processors pipeline under pipelines in the Collector’s configuration:
...
service:
pipelines:
traces:
receivers: ...
processors: [..., attributes/delete, ...]
...
Scenario: Delete, redact, or hash tags from spans in the Splunk Distribution of OpenTelemetry Collector
Moira, a performance engineer, is looking at trace data and realizes that manual instrumentation is emitting sensitive data from the checkoutService by mistake. While Moira is updating the instrumentation to prevent this leak, they need to hide the values of all span tags with the potential to contain sensitive customer information. Using the attributes processor, they can delete, redact, or hash sensitive information.
The following is an example of a processor that Moira can add to their Splunk Distribution of OpenTelemetry Collector configuration file. In this example, they delete the keys and values of the user.password attribute from the spans associated with checkoutService because they know this value is not relevant for debugging application performance.
Additionally, Moira hashes the value of user.name to replace the user’s name with a unique hash value that doesn’t contain PIIs. This way, during debugging, they can use these unique hash values to see whether an issue is impacting one or more users without revealing their names.
Moira also redacts the values of credit.card.number, cvv, and credit.card.expiration.date tags from incoming spans because it’s useful to know in debugging that a value was entered for these fields, but not necessary to discern the contents of that value.
extensions:
...
processors:
attributes/update:
actions:
- key: user.password
action: delete
- key: user.name
action: hash
- key: credit.card.number
value: redacted
action: update
- key: cvv
value: redacted
action: update
- key: credit.card.expiration.date
value: redacted
action: update
service:
...
...
After configuring the processor, Moira adds the attributes/update processor to the processors pipeline under pipelines in their OpenTelemetry Collector configuration YAML file:
...
service:
pipelines:
traces:
receivers: ...
processors: [..., attributes/update, ...]
...
Alternatives to alter or remove data
You can also use the metrics pipeline management tool to control how you ingest and store your metrics. Learn more at Introduction to metrics pipeline management.