Why Filtering Matters
Instrumentation often generates noisy or redundant data. Some metrics or traces may include unnecessary information—like personal data or debug logs—leading to bloated dashboards, higher storage costs, and privacy risks. Filtering data within the OpenTelemetry Collector allows you to:- Remove unwanted data to reduce noise.
- Mask or drop sensitive information such as personally identifiable information (PII).
- Transform data formats (e.g., unify naming conventions).
- Control data flow to different backends by sending filtered data to specific pipelines.
How the OpenTelemetry Collector Filters Data
The Collector consists of receivers, processors, and exporters. Each plays a role in data filtering:- Receivers: Collect data from sources like applications and agents.
- Processors: Modify, filter, or batch data before it is exported.
- Exporters: Send the processed data to observability backends, such as Prometheus or Grafana.
Processor-based Filtering
Processors allow transformations and data filtering mid-pipeline. Processors offer inherently limited transformation configuration to do the most common tasks like:- Drop attributes or entire spans: Remove unnecessary fields or reduce trace volume.
- Normalize data formats: Convert snake_case to camelCase, or unify attribute labels.
- Mask sensitive data: Scrub PII before it is transmitted.
Transformations Using OpenTelemetry Transform Language (OTTL)
The OpenTelemetry Transform Language (OTTL) provides enhanced flexibility for complex transformations. OTTL allows for:- Pattern-based transformations: Modify data dynamically using regex or templates.
- Extract and relocate attributes: Move user IDs from URL parameters to span attributes.
- Conditional filtering: Drop traces based on specific conditions, such as redundant spans.
Stateful Operations and Their Limitations
While the collector excels at real-time data transformations, it is stateless by design to ensure high performance. Stateful operations—such as aggregating historical data—are not natively supported. This ensures the collector remains lightweight and efficient. Common limitations include:- No support for complex span re-linking: Spans cannot be re-associated once collected.
- Limited historical context: Filtering decisions are based only on the current data batch.
Filtering Use Cases
- PII Masking: Remove personal data before sending traces to public dashboards.
- Debug Data Suppression: Drop debug-level logs from production pipelines.
- Multi-pipeline Export: Send sanitized data to public dashboards while retaining full data for internal analysis.