https://betterstack.com/community/guides/observability/opentelemetry-collector/ Platform Better Stack --------------------------------------------------------------------- Website monitoring Monitor any web. Check every user flow. Incident management & on-call Get a call when your website goes down Status page Communicate downtime & build trust Incident silencing Reduce alert noise & prevent alert fatigue Slack-based incident management Resolve incidents directly in Slack Better Stack --------------------------------------------------------------------- Log management Collect insights across your stack Observability dashboards Analyze metrics across your services Documentation Pricing Community Community home Guides Questions Comparisons Blog Newsletter Company Work at Better Stack Engineering Security Enterprise Sign in Sign up Platform Documentation Pricing Community Company Enterprise Back Better Stack --------------------------------------------------------------------- Website monitoring Monitor any web. Check every user flow. Incident management & on-call Get a call when your website goes down Status page Communicate downtime & build trust Incident silencing Reduce alert noise & prevent alert fatigue Slack-based incident management Resolve incidents directly in Slack Better Stack --------------------------------------------------------------------- Log management Collect insights across your stack Observability dashboards Analyze metrics across your services Back Community home Guides Questions Comparisons Blog Newsletter Back Work at Better Stack Engineering Security Community * Guides * Questions * Comparisons * Blog Docs Documentation Back to Observability guides A Beginner's Guide to the OpenTelemetry Collector OpenTelemetry Observability Ayooluwa Isaiah Updated on September 10, 2024 Contents * Prerequisites * What is the OpenTelemetry Collector? * Benefits of using OpenTelemetry Collector * How the OpenTelemetry Collector works * Installing the OpenTelemetry Collector * Configuring the OpenTelemetry Collector * Exploring the OpenTelemetry Collector components * Understanding feature gates * Final thoughts The first step towards observability with OpenTelemetry is instrumenting your application to enable it to generate essential telemetry signals such as traces, logs, and metrics. Once telemetry data is being generated, it must be sent to a backend tool that may perform many functions, including analysis, visualization, and alerting. While you could send this data directly to the observability backend, using an intermediary tool between your services and the backend offers significant advantages. In this article, we'll examine the reasons behind the growing popularity of the OpenTelemetry Collector, and why it is often the recommended intermediary tool for building observability pipelines. The fastest log search on the planet Better Stack lets you see inside any stack, debug any issue, and resolve any incident. Explore more Prerequisites Before proceeding with this article, ensure that you're familiar with basic OpenTelemetry concepts. What is the OpenTelemetry Collector? OpenTelemetry Collector sits between instrumented services and the observability backend OpenTelemetry Collector sits between instrumented services and the observability backend The Collector is a core element of the OpenTelemetry observability framework, acting as a neutral intermediary for collecting, processing, and forwarding telemetry signals (traces, metrics, and logs) to an observability backend. It aims to simplify your observability setup by eliminating the need for multiple agents for different telemetry types. Instead, it consolidates everything into a single, unified collection point. This approach not only streamlines your setup but also acts as a buffer between your applications and your observability backends to provide a layer of abstraction and flexibility. It natively supports the OpenTelemetry Protocol (OTLP) but also accommodates other formats like Jaeger, Prometheus, Fluent Bit, and others. Its vendor-neutral design also lets you export your data to various open-source or commercial backends. Built on Go and licensed under Apache 2.0, the OpenTelemetry Collector encourages you to extend its functionality by incorporating custom components. This flexibility is invaluable when you need to extend its capabilities beyond standard use cases. Benefits of using OpenTelemetry Collector Preventing Vendor Lock-in diagram Preventing Vendor Lock-in diagram While sending telemetry data directly to an observability backend might seem convenient at first, using the OpenTelemetry Collector as a middleman between your services and the backend offers significant advantages for building a more flexible and resilient observability pipeline. Let's delve into a few of the most compelling reasons: 1. Preventing vendor lock-in Direct telemetry reporting or using a vendor-specific agent can create a tight coupling between your services and the specific backend you're using. This makes it challenging to switch backends in the future or even experiment with multiple backends simultaneously. With the OpenTelemetry Collector, you can effectively decouple your applications from any specific observability backend. By configuring the collector to send data to various backends, or even multiple backends at once, you have the freedom to choose the best tools for your needs without being locked into a single platform. If you ever decide to migrate to a different backend, you only need to update the collector's configuration, and not your entire application codebase. 2. Consolidation of observability tooling Using the OpenTelemetry Collector can simplify your observability stack by acting as a unified collection point for telemetry data from various sources. By supporting various open-source and commercial protocols and formats for logs, traces, and metrics, it eliminates the need for multiple agents and shippers which reduces complexity and cognitive load for your engineering teams. 3. Filtering sensitive data Illustration of OpenTelemetry Collector Process Illustration of OpenTelemetry Collector Process A common challenge in observability is the inadvertent logging of sensitive information, such as API keys or user data like credit card numbers, by monitored services. Without a collector, this data could be exposed within your observability system, posing a significant security risk. The Collector addresses this by allowing you to filter and sanitize your telemetry data before it's exported. This ensures compliance and strengthens your security posture by preventing sensitive information from reaching the backend. 4. Reliable and efficient data delivery The OpenTelemetry Collector optimizes telemetry data transmission through efficient batching and retries to minimize network overhead and ensure reliable data delivery even in the face of network disruptions. 5. Managing costs Through features like filtering, sampling, and aggregation, the Collector can help you move away from a "spray and pray" approach to signal collection by selectively reducing the amount of data transmitted. This allows you to focus on the most relevant information, minimizing unnecessary storage and analysis costs. 6. The OpenTelemetry Collector is observable A core strength of the OpenTelemetry Collector lies in its inherent observability. It doesn't just collect and process telemetry data from your applications; it also meticulously monitors its own performance and health by emitting logs, metrics, and traces, to allow you to track key performance indicators, resource utilization, and potential bottlenecks. This level of transparency fosters confidence in your observability pipeline, guaranteeing that the very tool responsible for gathering insights also remains under close observation. How the OpenTelemetry Collector works Overview of how the OpenTelemetry Collector works Overview of how the OpenTelemetry Collector works At a high level, the OpenTelemetry Collector operates in three primary stages: 1. Data reception: It collects telemetry data from a variety of sources, including instrumented applications, agents, and other collectors. This is done through receiver components. 2. Data processing: It uses processors to process the collected data, performing tasks like filtering, transforming, enriching, and batching to optimize it for storage and analysis. 3. Data transmission: It sends the processed data to various backend systems, such as observability platforms, databases, or cloud services, through exporters for storage, visualization, and further analysis. By combining receivers, processors, and exporters in the Collector configuration, you can create pipelines which serve as a separate processing lane for logs, traces, or metrics. Data enters from various sources, undergoes transformations via processors, and is ultimately delivered to one or more backends through exporters. Connector components can also link one pipeline's output to another's input allowing you to use the processed data from one pipeline as the starting point for another. This enables more complex and interconnected data flows within the Collector. Installing the OpenTelemetry Collector There are several ways to install the OpenTelemetry Collector, and each release comes with pre-built binaries for Linux, macOS, and Windows. For the complete list of options, refer to the official docs . The key decision is choosing the appropriate distribution to install. * Core: This contains only the most essential components along with frequently used extras like filter and attribute processors, and popular exporters such as Prometheus, Kafka, and others. It's distributed under the otelcol binary name. * Contrib: This is the comprehensive version, including almost everything from both the core and contrib repositories, except for components that are still under development. It's distributed under the otelcol-contrib binary name. * Kubernetes: This distribution is tailored for use within a Kubernetes cluster to monitor the Kubernetes infrastructure and the various services deployed within it. It's distributed under the otelcol-k8s binary name. There are also third-party distributions provided by various vendors, which are tailored for easier deployment and integration with their specific backends. The contrib distribution generally recommend for most users since it includes a wider range of components and out-of-the-box functionality to address various observability needs. The easiest way to get started with the Collector is through the official Docker images which you can download using: Copied! docker pull otel/opentelemetry-collector:latest # OpenTelemetry core Copied! docker pull otel/opentelemetry-collector-contrib:latest # OpenTelemetry contrib Copied! docker pull otel/opentelemetry-collector-k8s:latest # OpenTelemetry K8s For more advanced users, the OpenTelemetry Collector Builder offers the ability to create a custom distribution containing only the components you need from the core, contrib, or even third-party repositories. While beyond the scope of this article, we'll be sure to explore this in a future tutorial. Configuring the OpenTelemetry Collector The OpenTelemetry Collector configuration file The OpenTelemetry Collector configuration file The Collector's configuration is managed through a YAML file. On Linux, this file is typically found at /etc// config.yaml, where varies based on the specific Collector version or distribution you're using (e.g., otelcol, otelcol-contrib). You can also provide a custom configuration file when starting the Collector using the --config option: Copied! otelcol --config=/path/to/otelcol.yaml For Docker, mount your custom configuration file as a volume when launching the container with: Copied! docker run -v $(pwd)/otelcol.yaml:/etc/otelcol-contrib/config.yaml otel/opentelemetry-collector-contrib:latest The configuration can also be loaded from other sources, such as environmental variables, YAML strings, or even external URLs, offering great flexibility in how you choose to manage your settings. Copied! otelcol --config=env:OTEL_COLLECTOR_CONFIG Copied! otelcol --config=https://example.com/otelcol.yaml Copied! otelcol --config="yaml:exporters::debug::verbosity: normal" If multiple --config flags are provided, they will be merged into a final configuration. The configuration also automatically expands environment variables within the configuration so that you can keep sensitive data, like API secrets, secure outside of the version-controlled configuration files. Copied! processors: attributes/example: actions: - key: ${env:API_SECRET} action: ${env:OPERATION} Here's a quick overview of the basic structure of a Collector configuration file: otelcol.yaml Copied! receivers: otlp: protocols: http: endpoint: 0.0.0.0:4318 processors: batch: exporters: otlp: endpoint: jaeger:4317 extensions: health_check: service: extensions: [health_check] pipelines: traces: receivers: [otlp] processors: [batch] exporters: [otlp] This configuration sets up an OpenTelemetry Collector that receives trace data via the OTLP protocol over HTTP on port 4318, applies batch processing, and then exports the processed traces to a Jaeger endpoint located at jaeger:4317. It also includes a health_check extension for monitoring the collector's status. Each component within the configuration is assigned a unique identifier using the format type/. The part is optional if you only have a single instance of a particular component type. However, when you need to define multiple components of the same type, providing a distinct for each one becomes necessary: Copied! processors: batch: batch/2: send_batch_size: 10000 timeout: 10s batch/test: timeout: 1s The service section is also crucial, as it controls which configured components are enabled. Any component not mentioned there is silently ignored, even if it's configured in other sections. Once you're done configuring your Collector instance, ensure to validate the configuration with the validate command: Copied! otelcol validate --config=/path/to/config.yaml Screenshot of validation errors Screenshot of validation errors In the next section, we'll dive deeper into the individual components of the Collector configuration. Exploring the OpenTelemetry Collector components OpenTelemetry Collector Components OpenTelemetry Collector Components Let's now delve into the heart of the OpenTelemetry Collector: its components. In this section, we'll explore the building blocks that enable the Collector to receive, process, and export telemetry data. We'll cover receivers, processors, exporters, extensions, and connectors, understanding their roles and how they work together to create a powerful and flexible observability pipeline. Let's begin with the receivers first. Receivers Overview of the OpenTelemetry Collector receivers Overview of the OpenTelemetry Collector receivers Receivers are the components responsible for collecting telemetry data from various sources, serving as the entry points into the Collector. They gather traces, metrics, and logs from instrumented applications, agents, or other systems, and translate the incoming data into OpenTelemetry's internal format, preparing it for further processing and export. For the Collector to work properly, your configuration needs to include and enable at least one receiver. The core distribution includes the versatile OTLP receiver, which can be used in trace, metric, and log pipelines: Copied! receivers: otlp: The oltp receiver here starts an HTTP and gRPC server at localhost:4318 and localhost:4317 respectively, then waits for the instrumented services to connect and start transmitting data in the OTLP format. Similarly, many other receivers come with default settings, so specifying the receiver's name is enough to configure it. To change the default configuration, you may override the default values. For example, you may disable the gRPC protocol by simply not specifying it in the list of protocols: Copied! receivers: otlp: protocols: http: You can also change the default endpoint through http.endpoint: Copied! receivers: otlp: protocols: http: endpoint: 0.0.0.0:4318 The contrib repository boasts over 90 additional receivers, catering to a wide array of data formats and protocols, including popular sources like Jaeger, Prometheus, Apache Kafka, PostgreSQL, Redis, AWS X-Ray, GCP PubSub, and many more. Processors Overview of the OpenTelemetry Collector processors Overview of the OpenTelemetry Collector processors Processors are components that modify or enhance telemetry data as it flows through the pipeline. They perform various operations on the collected telemetry data, such as filtering, transforming, enriching, and batching so that it is ready to be exported. While no processors are enabled by default, you'll typically want to include the batch processor: Copied! processors: batch: This processor groups spans, metrics, or logs into time-based and size-based batches, enhancing efficiency. Additionally, it supports sharding data based on client metadata, allowing for effective multi-tenant data processing even with high volumes. Another processor in the otelcol core distribution is the memory_limiter which helps prevent out-of-memory errors by periodically checking service memory usage against defined limits: Copied! processors: memory_limiter: check_interval: 5s limit_mib: 4000 # 4 mebibytes hard limit spike_limit_mib: 800 # soft limit is `limit_mib` minus `spike_limit_mib` (3200) It operates with a soft and a hard limit. Exceeding the soft limit results in new data rejection until memory is freed up. Breaching the hard limit triggers garbage collection so that memory usage drops below the soft limit. This mechanism adds back pressure to the Collector, making it resilient to overload. However, it requires receivers to handle data rejections gracefully, usually through retries with exponential backoff. Beyond these, the contrib repository offers several other processors for tasks like filtering sensitive data, adding geolocation details, appending Kubernetes metadata, and more. Exporters Overview of the OpenTelemetry Collector exporters Overview of the OpenTelemetry Collector exporters Exporters serve as the final stage in the Collector's pipeline and are responsible for sending processed telemetry data to various backend systems such as observability platforms, databases, or cloud services, where the data is stored, visualized, and analyzed. To operate, the Collector requires at least one exporter configured through the exporters property. Here's an sample configuration exporting trace data to a local Jaeger instance: Copied! exporters: otlp/jaeger: endpoint: jaeger:4317 tls: insecure: true This configuration defines an exporter named otlp/jaeger that targets a local Jaeger instance listening on port 4317 via gRPC. The insecure: true setting disables encryption, which is not recommended for production environments. For a broader range of destinations, the contrib repository provides various other exporters, supporting diverse observability platforms, databases, and cloud services. Extensions OpenTelemetry Collector extensions overview OpenTelemetry Collector extensions overview Extensions add supplementary features to the OpenTelemetry Collector beyond the core data collection, processing, and export function. They offer features like health checks, performance profiling, authentication, and integration with external systems. Here's a sample configuration for extensions: Copied! extensions: pprof: health_check: zpages: The pprof extension here enables Go's net/http/pprof endpoint on http://localhost:1777 so that you can collect performance profiles and investigate issues with the service. The health_check extension offers an HTTP URL (http://localhost:13133 / by default) that can be used to monitor the collector's status. You can use this URL to implement liveness checks (to check if the collector is running) and readiness checks (to confirm if the collector is ready to accept data). Screenshot of Health Check extension Screenshot of Health Check extension A new and improved health check extension is currently being developed to enable individual components within the collector (like receivers, processors, and exporters) to provide their own health status updates. The zPages extension is equally useful. It provides various HTTP endpoints for monitoring and debugging the Collector without relying on any backend. This enables you to inspect traces, metrics, and the collector's internal state directly, assisting in troubleshooting and performance optimization. Screenshot of zPages extension Screenshot of zPages extension Authentication extensions also play a vital role in security by allowing you to authenticate both incoming connections at the receiver level and outgoing requests at the exporter level. Beyond these examples, the contrib repository offers a wide array of extensions to further expand the Collector's capabilities. Connectors OpenTelemetry Collector connectors overview OpenTelemetry Collector connectors overview Connectors are specialized components that bridge the different pipelines within the OpenTelemetry Collector. They function as both an exporter for one pipeline and a receiver for another, allowing telemetry data to flow seamlessly between pipelines, even if they handle different types of data. Some use cases for connectors are: * Conditional routing: Direct telemetry data to specific pipelines based on predefined rules, ensuring that the right data reaches the appropriate destination for processing or analysis. * Data replication: Create copies of data and send them to multiple pipelines, enabling diverse processing or analysis approaches. * Data summarization: Condense large volumes of telemetry data into concise overviews for easier comprehension. * Data transformation: Convert one type of telemetry data into another, such as transforming raw traces into metrics for simplified aggregation and alerting. The connectors section in your Collector configuration file is where you define these connections. Note that each connector is designed to work with specific data types and can only connect pipelines that handle those types. Copied! connectors: count: logs: app.event.count: description: "Log count by event" attributes: - key: event For instance, the count connector can count various telemetry data types. In the above example, it groups incoming logs based on the event attribute and counts the occurrences of each event type. The result is exported as the metric app.event.count, allowing you to track the frequency of different events in your logs. Services OpenTelemetry Collector Services overview OpenTelemetry Collector Services overview The service section specifies which components, such as receivers, processors, exporters, connectors, and extensions, are active and how they are interconnected through pipelines. If a component is configured but not defined within the service section, it will be silently ignored. It consists of three subsections which are: 1. Extensions The service.extensions subsection determines which of the configured extensions will be enabled: Copied! service: extensions: [health_check, pprof, zpages] 2. Pipelines The service.pipelines subsection configures the data processing pathways within the Collector. These pipelines are categorized into three types: traces, metrics, and logs. Copied! service: pipelines: traces: receivers: [otlp] processors: [batch] exporters: [otlp] metrics: receivers: [otlp] processors: [batch] exporters: [otlp] logs: receivers: [otlp] processors: [batch] exporters: [otlp] Each pipeline comprises a collection of receivers, processors, and exporters. Note that each component must be configured in their respective sections (receivers, processors, exporters) before incorporating them into a pipeline. Pipelines can have multiple receivers feeding data to the first processor. Each processor processes and passes data to the next, potentially dropping some if sampling or filtering is applied. The final processor distributes data to all exporters in the pipeline, ensuring each receives a copy of the processed data. 3. Telemetry The service.telemetry section within the Collector configuration focuses on controlling the telemetry data generated by the Collector itself. Metrics Copied! service: telemetry: metrics: address: 0.0.0.0:8888 level: detailed Metrics are exposed through a Prometheus interface, which defaults to port 8888 and there are four verbosity levels: * none: No metrics are collected. * basic: The most essential service telemetry. * normal: The default level which adds a few more standard indicators to basic-level metrics. * detailed: The most verbose level which emits additional low-level metrics like HTTP and RPC statistics. You can also configure the Collector to scrape its metrics with a Prometheus receiver and send them through configured pipelines, but this could put your telemetry data at risk if the Collector isn't performing optimally. Metrics cover resource consumption, data rates, drop rates, throttling states, connection counts, queue sizes, latencies, and more. For the full list, refer to the internal metrics page. Logs Copied! service: telemetry: logs: OpenTelemetry Collector logs are outputted to the standard error by default and you can use the operating environment's logging mechanisms (journalctl, docker logs, etc) to view and manage the logs. Logs provide insights into Collector events like startups, shutdowns, data drops, and crashes. Just like with metrics, you can configure a verbosity level (defaults to INFO) as well as log sampling policy, static metadata fields, and whether to encode the logs in JSON format . Under the hood, the Collector uses Uber's highly regarded Zap library to write the logs. Traces While the Collector doesn't currently expose traces by default, there's ongoing work to change that. This involves adding the ability to configure the OpenTelemetry SDK used for the Collector's internal telemetry. For now, this functionality is controlled by the following feature gate: Copied! otelcol --config=config.yaml --feature-gates=telemetry.useOtelWithSDKConfigurationForInternalTelemetry Once enabled, you can then register a service.telemetry.traces section like this: Copied! service: telemetry: traces: processors: batch: exporter: otlp: protocol: grpc/protobuf endpoint: jaeger:4317 Understanding feature gates The OpenTelemetry Collector's feature gates offer a valuable way to manage the adoption of new features by allowing them to be easily turned on or off. This provides a safe environment for testing and experimenting with new functionalities in production without fully committing to them. Each feature gate typically progresses through a lifecycle similar to Kubernetes: * Alpha: The feature is initially disabled by default and requires explicit activation. * Beta: The feature becomes enabled by default but can be deactivated if necessary. * Stable: The feature is considered fully integrated and generally available, and the feature gate is removed, leaving it permanently enabled. In some cases, features might be deprecated if they prove unworkable. Such features remain available for a limited time (typically two additional releases) before being removed completely. You can control feature gates using the --feature-gates flag: Copied! otelcol --config=config.yaml --feature-gates=transform.flatten.logs To disable a feature gate, prefix its identifier with a -: Copied! otelcol --config=config.yaml --feature-gates=-transform.flatten.logs If you use the zPages extension, you can see all the feature gates you have enabled by going to http://localhost:55679/debug/featurez: Screenshot of zPages Feature Gate Screenshot of zPages Feature Gate Final thoughts Throughout this article, we've explored the key concepts of OpenTelemetry Collector, so you should now have a good grasp of its capabilities and how it can help you build effective observability pipelines. For a deeper dive into configuring the OpenTelemetry Collector, I recommend exploring the opentelemetry-collector and opentelemetry-collector-contrib repositories on GitHub and their official docs. These contain extensive documentation and examples that will guide you through setting up and tailoring the Collector to your specific requirements. The best way to follow the development of the Collector is through its GitHub repo. In particular, you will find the changes that are being planned for upcoming releases on the roadmap page on GitHub. An official #otel-collector channel on the CNCF Slack also exists for community discussions. Thanks for reading, and until next time! Author's avatar Article by Ayooluwa Isaiah Ayo is the Head of Content at Better Stack. His passion is simplifying and communicating complex technical ideas effectively. His work was featured on several esteemed publications including LWN.net, Digital Ocean, and CSS-Tricks. When he's not writing or coding, he loves to travel, bike, and play tennis. Got an article suggestion? Let us know Next article Understanding the OpenTelemetry Transform Language Learn how to use the OpenTelemetry Transform Language (OTTL) to master telemetry transformations within the OpenTelemetry Collector. - Licensed under CC-BY-NC-SA This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. 1. Basics 1. Logging vs Metrics vs Tracing 2. Logging vs Error Tracking 3. What is Observability? 4. Observability vs Monitoring 5. What is Application Performance Monitoring (APM)? 6. What is Distributed Tracing? 7. Understanding High Cardinality 8. What is Structured Logging? 9. Distributed Tracing with Jaeger 2. OpenTelemetry 1. What is OpenTelemetry? 2. OpenTelemetry Collector 3. OpenTelemetry Transform Language (OTTL) 4. Redacting Sensitive Data 5. OpenTelemetry vs Prometheus 3. Go 1. Tracing in Go with OpenTelemetry 4. PHP 1. PHP Logging with OpenTelemetry Make your mark Join the writer's program Are you a developer and love writing and sharing your knowledge with the world? Join our guest writing program and get paid for writing amazing technical guides. We'll get them to the right readers that will appreciate them. Write for us Writer of the month [] Marin Bezhanov Marin is a software engineer and architect with a broad range of experience working... Build on top of Better Stack Write a script, app or project on top of Better Stack and share it with the world. Make a public repository and share it with us at our email. community@betterstack.com or submit a pull request and help us build better products for everyone. See the full list of amazing projects on github Platform Enterprise Uptime Logs Solutions Log management Uptime monitoring Website monitoring Incident management Status page Incident silencing Slack-based incident management Dashboards Integrations Resources Help & Support Uptime docs Logs docs Company Work at Better Stack Engineering Security Community Guides Questions Comparisons Blog Write for us Company Work at Better Stack Engineering Security Resources Help & Support Uptime docs Logs docs Compare Pingdom Pagerduty StatusPage.io Uptime Robot StatusCake Opsgenie VictorOps From the community What Is Incident Management? Beginner's Guide How to Create a Developer-Friendly On-Call Schedule in 7 steps 8 Best Free & Open Source Status Page Tools in 2024 10 Best API Monitoring Tools in 2024 5 Most Used Incident Management Tools (Reviewed & Ranked) Better Stack lets you see inside any stack, debug any issue, and resolve any incident. +1 (201) 500-2007 hello@betterstack.com Terms of Use Privacy Policy GDPR System status (c) 2024 Better Stack, Inc.