Skip to content

How OpenTelemetry Enhances Distributed Tracing of MQTT Messages

by Nasir Qureshi
15 min read

What is OpenTelemetry?

In simple terms, OpenTelemetry (OTel) is an open-sourced collection of tools, APIs, and SDKs that provide a standard format framework for how observability data is collected and sent.

OTel is used to instrument, collect, generate, and export telemetry data to help you analyze and understand a software’s (e.g. IoT application) performance and behavior. It provides unified sets of libraries and APIs primarily used for data collection and transmission.

What is Telemetry Data?

Telemetry data is a collection of logs, metrics, and traces generated from software or IoT applications.

The broad mass of OpenTelemetry data comes from backend applications running in data centers. Usually, IoT devices sit in remote, inaccessible areas; however, the major challenge is not their remote location, but collecting data from logically complex architectures and deployments.

For instance, many environments have Kubernetes cluster with 5,000 pods. Say only 80 of these clusters are involved in processing a request; and if there is a sporadic high latency, where should teams start looking for the problem?

Capturing telemetry data in IoT environments is critical to understanding how your IoT applications perform. This performance data is gathered and then processed by Application Performance Monitoring (APM) tools, such as Datadog, Honeycomb, etc.

What are Telemetry Data Logs?

Logs are readable files that show the results of any transaction in your IoT ecosystem. They provide a continuous, event-based record of these transactions and make it easy to correlate any issues or irregularities.

For instance, a plain text CONNECT log (shown below) can help you identify where an error might have occurred or which part of the process may be causing latency in the transaction.

Logs can be structured, unstructured, or plain text. Each type of log serves a specific purpose for it’s users.

Verbose CONNECT messageHiveMQ’s message log extension is helpful for application debugging and development. It enables engineers and developers to follow up on any clients communicating with the HiveMQ broker on the terminal.

What are Telemetry Data Metrics?

Telemetry data metrics are time-aggregated data points (counts, timestamps, values, or event names). Metrics can be extracted simply by querying the databases that store them.

For instance, a metric can be the numeric value of a moment in time (e.g., like CPU % used). Generally, every metric has a timestamp, a name, and one or more numeric values. Here’s what a metric might look like in a database:

Time Stamp Metric Name Count
22/08/2022 08:10:10 CPU Usage10%

The OpenTelemetry Collector — an application that allows you to process that telemetry and send it out to various destinations — can be used to collect HiveMQ cluster metrics via the Prometheus or InfluxDB extension. In the picture below, you can see a quick view of Cluster Metrics from HiveMQ’s Control Center dashboard:

Cluster Metrics from HiveMQ’s Control Center dashboardHere’s what each metric means:

Metric Description
Connections Current amount of active connections on all nodes
Inbound Publish Rate Current amount of incoming Publishes per second over all cluster nodes
Outbound Publish Rate Current amount of outgoing Publishes per second over all cluster nodes
Subscriptions Current amount of Subscriptions and replicas stored in the cluster
Retained Messages Current amount of Retained Messages and replicas stored in the cluster
Queued Messages Current amount of Queued Messages and replicas stored in the cluster (may show Queued Messages of already disconnected clean session clients)
Cluster Nodes Current amount of Cluster Nodes

Monitoring metrics is vital to proactively identify and fix issues before they grow into larger, more complex problems

What are Telemetry Data Traces?

Traces are all about tracking processes end-to-end (e.g., tracking API requests). Tracing can help developers understand how services connect and the entire IoT ecosystem. Tracing can also help developers knowif the system is working correctly, and if it isn’t, they can quickly start troubleshooting it because they know where to look.

Tracing includes unique identifiers, operation names, timestamps, logs, events, and indexes.

The illustration below shows an example of a transaction (unlocking a car door via a mobile app) going through an IoT environment.

transaction (unlocking a car door via a mobile app) going through an IoT environment

  1. A customer sends a request via the app to unlock their car’s door.

  2. The request is received in the web server, processed (in HTTP), and a Trace ID is generated and attached to the message.

  3. Next, the web server sends the message to the HiveMQ broker (in MQTT) for further processing.

  4. The Broker receives the message (along with the trace ID) and sends it to two entities:

    • First it forwards the message to the Kafka broker (via HiveMQ’s Kafka extension) with the same Trace ID.

    • Second, the broker delivers the message (via its Distributed Tracing Extension) to an Application Performance Monitoring (APM) solution (Datadog, Grafana Tempo) using the OpenTelemetry framework. This ensures the APM solutions get the message in a standardized format.

  5. Finally, the Kafka broker receives the record and sends it to the backend application for further processing.

  6. The backend application queries the database to process the request and transmits the result via Kafka.

  7. After the message is processed and authenticated, the broker sends it to both the car and the phone. The car receives an ‘unlock door’ command (via Kafka) - either a success or failure (error). The message is also sent to the phone application (via Kafka) that the car’s door is unlocked.

It is important to note that these transactions happen in milliseconds, so a slight delay (latency) in message delivery/processing can be very problematic.

Let’s see how this message (with its Trace ID) would appear in a database.

Time Stamp Trace ID Service ID Duration (seconds)
22/08/2022 08:10:10.10 123456Phone Application0.1
22/08/2022 08:10:10.30 123456Web Server0.20
22/08/2022 08:10:10.40 123456MQTT Broker0.10
22/08/2022 08:10:10.62 123456Kafka Broker (Produce)0.22
22/08/2022 08:10:10.65 123456Kafka Broker (Consume)0.25
22/08/2022 08:10:10.90 123456Backend Application0.28

From the example above, we can clearly see which stages of the process that are taking too long to process. For instance, if it takes 0.28 seconds for the message to transmit from the Kafka broker to the Backend Application, we know there is a time lag (latency) that must be addressed. Engineers now know (because of the trace ID) which message is causing the problem and at what stage. They can then start fixing the problem.

How Does OpenTelemetry Work?

OpenTelemetry features specialized protocols that collect telemetry data and export it to an identified system. The diagram below illustrates OpenTelemety data lifecycle.

OpenTelemety data lifecycle

With Native OpenTelemetry Integration, HiveMQ Enables Distributed Tracing

Organizations usually deploy IoT applications in a distributed environment. The messages exchanged within this setup must transit through multiple components, including MQTT brokers.

For DevOps and SRE teams, it is essential to have the ability to trace these messages throughout their distributed environment. Unfortunately, most MQTT brokers cannot continuously gather metadata on requests/messages, which creates gaps that impact the service level objectives of the responsible teams.

HiveMQ solves this problem with the help of Distributed Tracing. Distributed Tracing is a method to follow messages through multiple and complex systems. It allows a high-level overview of a message’s journey so teams analyzing issues can isolate potential problems and dive deeper into systems.

HiveMQ’s OpenTelemetry integration allows you to trace and debug MQTT data streams between devices and cloud service providers in real-time. The HiveMQ broker, with the Distributed Tracing Extension, offers OpenTelemetry capabilities that extend to traffic transiting the Enterprise Extension for Kafka.

To dive deeper into “how” distributed tracing boosts what you observe with your systems, read Distributed Tracing maximizes the Observability of your IoT applications. To learn how to start monitoring OpenTelemetry traces from HiveMQ in an APM tool, like Datadog, read this article Use HiveMQ and OpenTelemetry to monitor IoT applications in Datadog.

What Role Does OpenTelemetry Play in IoT Observability?

IoT Observability is a method that defines how users (engineers and developers) get granular visibility into their IoT applications’ key components and metrics.

IoT Observability enables users to:

  1. Debug their IoT applications quickly because they have more precise insights.

  2. Improve their IoT applications by quickly identifying critical issues and solving them before they become insidious problems.

  3. Develop a deep understanding of how their IoT applications work in the broader distributed structure.

An essential part of IoT observability is tracing ‘events.’ Events are simply instances where data is transferred from a publisher to a subscriber, via an intermediary ‘broker’ like HiveMQ. Tracking events is important because if there is a situation where the subscriber didn’t receive data, teams should know where to look for potential issues.

With the help of a broker, OpenTelemetry can generate a trace to confirm:

  1. If a publisher actually sent the event, and

  2. When a consumer initially receives an event.

This proof helps authenticate that the data transfer occurred; if not, teams know which side (publisher or subscriber) failed.

Learn more about OpenTelemetry and IoT Observability here.

Conclusion

To summarize, OpenTelemetry standardizes telemetry data. When an application monitoring tool like Data Dog, Honeycomb.io, etc. receives data, it makes the information observable and displays it in an easy-to-read form. Teams can then see how their IoT applications relate to each other and explain why things aren’t working as expected.

Contact our team to learn more how HiveMQ Enterprise MQTT broker uses OpenTelemetry standard and distributed tracing for end-to-end IoT observability.

Nasir Qureshi

Nasir Qureshi is a Senior Product Marketing Manager at HiveMQ. With a passion for working on disruptive technology products, Nasir has helped SaaS companies in their hyper-growth journey for over 3 years now. He holds an MBA from California State University with a major in Technology and Data Management. His interests include IoT devices, networking, data security, and privacy.

  • Nasir Qureshi on LinkedIn
  • Contact Nasir Qureshi via e-mail

A Robust Data Foundation for Industrial Metaverse Using MQTT and Unified Namespace

Success in the industrial metaverse relies on data. Explore how to build a strong foundation with MQTT & UNS to future-proof IIoT systems.

Blog

Optimizing Data Cost Efficiency in MQTT-Based IoT and Connected Systems

Learn key strategies to enhance data cost efficiency in MQTT deployments so you can not only cut expenses but also optimize performance.

Blog

Monitoring an MQTT Broker for Key Performance Indicators (KPIs)

Explore the importance of monitoring MQTT Brokers for Key Performance Indicators (KPIs) and different ways to achieving this.

Blog

Seamlessly Store MQTT Data in Microsoft SQL Server and Azure SQL

Learn how Enterprise Extension for Microsoft SQL Server forwards MQTT data to SQL databases, cutting middleware.

Blog

Debunking Common MQTT QoS Misconceptions

Learn how MQTT QoS truly operates and mitigate two common misconceptions around message retransmission policy and downgrade of QoS.

Blog

Real-time Insights with MQTT to Power Conversational Marketing

Explore how HiveMQ & MQTT power Qualifico’s real-time marketing platform with 24/7 availability & reliability, optimizing customer interactions seamlessly.

Blog

The History of MQTT – After MQTT 5.0: The Present and the Future

Discover the stability and compatibility of MQTT after 5.0, its coexistence with MQTT 3.1.1, and why no new version is planned for the near future.

Blog

HiveMQ Earns Top Marks from Customers on G2

HiveMQ earns multiple G2 badges, including Best Support & High Performer, highlighting leadership & customer satisfaction in MQ & IoT Management categories.

Blog

Connector Framework vs. Plug-in Architecture in MQTT-Based IoT Architectures

Discover the pros & cons of connector & plug-in framework & learn why the plug-ins framework is preferred for building HiveMQ Enterprise Extensions.

Blog

MQTT 5.0: The Next Generation of MQTT

Discover the history of how MQTT 5 evolved and the improvements that were made to the protocol, shaping MQTT into what it is known as today.

Blog

UNS Semantic Data Hierarchy with MQTT: Explained with an Example

Learn how to develop a UNS Semantic Data Hierarchy with MQTT for real-time data sharing, optimizing processes & enhancing decision-making in manufacturing.

Blog

The Unstructured Message: Exploring the Evolution and Future of MQTT with Andy Stanford-Clark

Watch Brian Gilmore on The Unstructured Message podcast as he talks with Andy Stanford-Clark about the origins, evolution, & future of MQTT in IoT & IIoT.

Blog

An IoT Tutorial Using HiveMQ MQTT Cluster, ESP32, Lua, and Xedge32

A beginner’s guide to creating efficient and secure IoT applications using HiveMQ MQTT Cluster, ESP32, Lua, and Xedge32.

Blog

MQTT Packets: A Comprehensive Guide

Explore MQTT control packets, the core of IoT communication. Learn their structure and types to design and test MQTT-based systems.

Blog

The History of MQTT: How MQTT 3.1.1 Was Standardized

Uncover the pivotal years and monumental effort that led to the standardization of MQTT 3.1.1 protocol.

Blog

Building a Robust MQTT Architecture and UNS for Scalability

Explore the challenges involved in scaling MQTT architecture to dozens of industrial sites, drawing on real-world applications using UNS and Kafka.

Blog

HiveMQ Cloud for Home and Community IoT Projects

Are you looking to understand the practical applications of MQTT and cloud technologies for your DIY or a proof-of-concept IoT project? Watch this interview.

Blog

The Origin of MQTT

Discover the history of MQTT in this first blog of a four-part series. Explore the origins, evolution, and impact of this IoT protocol from 1999 - 2012.

Blog

Enhanced APM Solution with HiveMQ Distributed Tracing and Azure Application Insights

A short guide to achieve enhanced observability & seamless application performance management (APM) integration with HiveMQ and Azure Application Insights.

Blog

Using MQTT to Build a Better Mousetrap

A fun and practical application of MQTT and the Internet of Things (IoT) to build a better and a smart mousetrap.

Blog

How HiveMQ Optimizes High-volume Data Ingest into AWS

A solution architect’s guide showing how HiveMQ MQTT platform can simplify the IoT solution architecture for telemetry data transfer to the AWS cloud.

Blog

The Unstructured Message: MQTT's Early Days and Impact on Scientific Research

Watch Brian Gilmore on The Unstructured Message podcast as he talks with Dr. Jeremy Frey about MQTT's early days & its impact on scientific research.

Blog

Stream IoT Data Between MQTT and Kafka with HiveMQ Cloud

Looking to stream IoT data between an MQTT broker to Apache Kafka for free? HiveMQ Cloud Kafka integration can help!

Blog

Cracking MQTT Performance with Automation: Benchmarking Implemented

Learn how HiveMQ engineers implemented automated system benchmarks to improve performance testing of the MQTT broker.

Blog

Empowering Smart Grids: The Role of MQTT in Advanced Power Flow Control

Explore how MQTT helps with the implementation of Advanced Power Flow Control (APFC) systems and how an MQTT platform can empower smart grids.

Blog

Cracking MQTT Performance with Automation: Challenges and Approaches

Explore how HiveMQ engineers addressed the challenges related to MQTT performance and how they leveraged automated system benchmarking.

Blog

Stopping the Scam: Anomaly Detection and Fraud Prevention with MQTT

Learn how MQTT & HiveMQ platform help provide deeper insights into IoT/IIoT data, detect anomalies as they occur, & safeguard against fraudulent activities.

Blog

Exploring Postman's MQTT Integration with HiveMQ

Learn how to use Postman for both HiveMQ MQTT communication & API management. Explore why HiveMQ and Postman are best friends!

Blog

Enhancing Grid Capacity with MQTT

Explore how an MQTT platform can be critical to enable Dynamic Line Rating (DLR) and enhance grid capacity in Energy industry.

Blog

Integrating MQTT and Siemens SCADA into Unity's 3D Worlds

Explore how integrating MQTT, GraphQL, Siemens SCADA System with Unity 3D can help create innovative gaming and industrial visualization solutions.

Blog
HiveMQ logo
Review HiveMQ on G2