Skip to content

Building a Robust MQTT Architecture and UNS for Scalability

by HiveMQ Team
29 min read

In today's fast-paced industrial environments, building a robust MQTT architecture is crucial for ensuring seamless communication across multiple sites. MQTT offers a lightweight and efficient messaging protocol that is vital for handling the extensive data flows typical of large-scale operations. 

HiveMQ Community team hosted an event titled CONNACK, where Jean-Romain Bardet, Co-Founder at Scorp-io, gave a talk on ‘Exploring Real-World MQTT-Based IIoT Architectures.’ In this talk, Jean touched upon challenges and solutions involved in scaling MQTT architecture to dozens of industrial sites, drawing on real-world applications using Unified Namespace (UNS) and Kafka, and practical experiences to illustrate key strategies for success.

UNS is like asking your kids to clean their room. Some kids will just throw all the toys into a big box, but some will sort each type of toy into its own small box so they can find them faster later. UNS is exactly the same. You can organize your data however you want, but you need to think about how you'll need to access it faster. So, clean your data based on your use case and the needs of the end-user. Otherwise, it's like cleaning someone else's room without knowing how they want to get their toys.

Jean-Romain Bardet, Co-Founder at Scorp-io
Jean-Romain Bardet Co-Founder at Scorp-io

Exploring Real-World MQTT-Based IIoT Architectures

Transcript of the Video

Jean Romain Bardet: Hi guys. I'm Jean Romain from Scorp-io. I'm a CEO and co-founder of Scorp-io and what we do in Scorp-io is pretty much we do monitoring and control in the cloud. So, today, I'm going to talk about building a robust MQTT architecture to scale across dozens of sites. We have taken a real use case from last year and we used our own software, which is based on MQTT.

So, I gonna try to make you understand what we did and what main issue we faced building this kind of architecture, because of course, our software is not across a dozen of sites, but it's across hundreds of sites. So we had a lot of issues at scale. So, I will try to give you tips as much as I can. Let's go. So, I am going to talk about a Belgian company, which is one of our main customers at the moment. They produce lime and for people who don't know about lime – producing lime is quite simple, to be honest.

Producing lime is the result of heating limestone in a big oven like this one. It's quite simple to produce, but you need a lot of energy to do it. So they face many issues. They have hundreds of sites across 25 countries. They have a small site with hundreds of data and a big site with thousands of data. They have multiple PSE as you may know and they have multiple on-premise software as well as different machines. They have a lot of stuff. This kind of machine produces data, which uses a lot of different protocols. MQTT, for example, but they also use Modbus, OPC UA, OPC DA, LoRaWAN, and other protocols they installed in the last decade.

They also use other software like ERP but it's complicated to get data from, and they mostly use REST API to get the data from. So the customer came with a simple demand. We have 5,000 people in our company and we want to bring the right data to the right person at the right moment. That was a challenge. That was the demand from them. The main issue they faced was they had this automation team, which spent like 50 percent of their time downloading data from the site and sharing them with their colleagues. So it was like 50 percent of wasted time getting the data and putting them in the hands of the right guy.

So that was the main demand. It was to bring the data to the right person at the right moment. And to be honest, it's quite challenging because you have to face multiple challenges. The first one is the result of the architecture. It's not the architecture itself, of course, but it's getting a smooth user experience to the user interface at the end of a project, because you can have the best architecture possible. If in the end the user experience is poor, the software won't be used. That's it. It's over. 

So, for me, it's not the architecture itself, but more about user experience. And of course, if you have the best architecture possible, you will also have the best user experience possible. So that was the first challenge.

The second challenge was also about the real-time data. Vincent talked about it, but you know, the customers have to deal with real-time data and not lose any data. That's why you need a robust architecture. You don't want to lose any data at scale. I mean, for one edge device to the cloud, it's quite simple, but when you have hundreds of sites, it's more complicated. You have to deal with store and forward, you have to deal with scalability at historian level, you have to deal with scalability at data streaming level, and you have to deal with transforming your data. When you do all these functionalities, you don't want to lose any data. So you have to take care of sustainability and scalability. 

The fourth is maintenance and security. At this kind of big company, they don't want problems with security. So the first step you put in the company is the best possible security because the more you're gonna put edge devices on the company, the more you are in a hole getting larger and larger if you don't care about it at first.

Maintenance, of course, I will talk about it a bit later, but, last but not the least is people and training. When you develop a software for big company like this, you have to onboard people because otherwise they're getting frustrated because they don't know how to use it. 

People are key in this kind of project. And last, but I guess it's our society. The industry is like our society now. They want it now. They want the best architecture possible with best user experience possible, but I want it now and I want all my site in one year, please. So, that's how it works and you have to deal with it. You have to take the best decision possible to make it happen. So you have to start to think about your edge device. I mean, edge devices are the key of your architecture. And to build this kind of architecture, you have to think at a global level. That means you have to first think about security, of course, but the second is versioning. Versioning and security go together. And for this particular, problem, you have different tech stacks. And one I liked very much was balena.io. If you don't know balena.io - it's quite impressive to use, because you centralize your operating system into one platform and you can push updates whenever you want.

And believe me, if you, finally do it one by one, you're going to waste a lot of time as you configure your firewall. 

Okay. We have talked about security. We have talked about version management, which is not about your operating system, it's also about your application, your edge application, the way you're gonna get data from the PLC, for example, to your cloud platform. 

You need to also take care about versioning and how you're going to deploy to hundreds of devices when you find something better or you correct a bug or you just correct security. And for this kind of use case, we use of course our own software, but if you want to do it yourself with an open source software, I think you should use FlowFuse. FlowFuse is like the way you centralize,  your application, your edge application, to one platform, and you can deploy it in the same time to any of your devices. 

The third challenge is UNS. So we talked a lot about it before. Here is an example of UNS we have implemented in this kind of use case. So you have, of course, the geographical division. Then there is also the machine, the time series sensors, and also the ERP information. So, for me, to be honest, UNS is like asking your kids to clean their room. Some kids will just throw all the toys into a big box, but some will sort each type of toy into its own small box so they can find them faster later. UNS is exactly the same You can organize your data however you want, but you need to think about how you'll need to access it faster. So, clean your data based on your use case and the needs of the end-user. Otherwise, it's like cleaning someone else's room without knowing how they want to get their toys. 

So we made it like this and it was quite based on the customer's needs. We studied a lot about what they needed. The customers have business knowledge. As a software editor, we don't have this kind of business knowledge. We just know how to build stuff but, we don't know how they are going to use it in a real use case.

So UNS is very important because it's gonna make your data work together on the cloud. So for me, it's better to do it on the edge, of course. And UNS is not only about cleaning your data. There are two notions which are very important. It's sampling and hysteresis, because, believe me, you don't want to use useless data on the cloud. To get rid of useless data, you have to fix sampling and hysteresis in the best way possible. Otherwise, you're gonna be flooded with useless data and useless data is a mess at the end. I will talk a bit more about store and forward because it's also key when you talk about distributed architecture. 

And the last one is monitoring. You need to monitor your edge device. Without your edge device, the top won't work. You have to monitor your edge device the same way you monitor the machine. Otherwise, it won't work at all. I guess that's it. 

Let me explain with an example of an architecture we made for an edge device. 

Architecture diagram from the talk – Exploring Real-World MQTT-Based IIoT ArchitecturesImage source: Event Presentation | Image credit:Jean Romain Bardet

On the left, you have level-one edge protocols, which are pretty much classical. In the middle, you have a UNS that cleans the data. As I said, it cleans the data because there is a template for each machine you're going to connect. You made it on the cloud and you pushed it into your edge device as well as the sampling and hysteresis, which are very, very, very important as I said before. 

Some people say, MQTT is not good for store and forward technology. I mean, if your broker is not on the edge device, I don't get how you can do store and forward with MQTT.

So, for me, MQTT goes with Kafka

Architecture diagram from the talk – Exploring Real-World MQTT-Based IIoT ArchitecturesImage source: Event Presentation | Image credit:Jean Romain Bardet

Kafka is kind of complicated, but it's very very powerful. If your backend services are cities and MQTT are the road, then Kafka is a highway. You know, you have to pay for it, but you can go faster on the third lane, but also slower in the first lane. So, I think it goes together, to be honest. So, you have Kafka in the middle of PubSub. That means if you break this link, everything goes to Kafka and waits, then synchronize when you get back the link. It does work well because we did it. In the hundreds of devices we have, we depend on the bandwidths and other stuff and it works well. It was one of the best decisions we made because we have tried like SQL server or any database on the edge device. It's quite complicated to make it work. And I'm also MQTT Sparkplug B maximalist. So, to publish the data to the broker, which is a scalable broker like HiveMQ, we use MQTT Sparkplug B. 

I have a ton of stories about MQTT Sparkplug B, but I don't know if I have got the time yet or not. Anyway,  why MQTT Sparkplug B? Because, of course, the birth and death mechanism is very powerful. Also because MQTT likes bandwidth, of course. And, the single source of truth is very powerful, of course. But on the top you have a scalable MQTT broker. 

Why did we make it scalable? Because when you build software, like we built, we think about thousands of devices connected. We don't want to be close to like 100 devices and that's it. We have to recreate a broker, a new broker with new address, etc., etc. So we went with a scalable broker.

And on the top, you have on the left, of course, the MQTT broker and Kafka in the middle. Kafka is like the backbone of our cloud solution. That means you get data from each microservices with Kafka. So you get standard data from it. We also have tons of connectors, which are not MQTT, obviously, it could be REST API, and connectors, S-Q-L-G-D-B-C connectors, and any connector which connects to Kafka.

Kafka is like a toolbox. So it's quite powerful. Every message is going through Kafka and goes to each microservices. We have like real-time data streaming, microservices, historians of course data services and, transform microservices because even if you deal with clean data,  you want to transform it some time, like you want to add alarms, you want to transform data to get use case on it. So you have to get one microservice to transform the data before sending it to the historian to the real-time data streaming. For the scalable historian DB, of course, there are multiple technologies. It depends on the scale of the team. If your team is more about SQL, you can choose a timescale DB, which is great. But if you prefer Influx DB, you can go with it. it's up to your team. We're good at it because TimescaleDB is a French database provider, which is kind of great. So don't bother to challenge to benchmark too much, because this technology is very good.

So in this kind of architecture, you have to think about – when I have a microservice, who's gonna get down, how I gonna deal with the message. If you lose any data or something goes wrong and you lose data, your customers are going to go to you and say, I have no data during this time. What happened? So, you are facing scalability issues and you have to work on it. So, to scale across thousands of sites, we use Kubernetes, which is a bit like Kafka. It's expensive at first, but very, very powerful. So basically it makes your microservice resilient. That means, for example, if I take this microservice here, it does scale with the charge on it. So if there are a lot of people working on it, it's going to scale until the charge is fine. Then it's gonna downscale until the charge is gone. Even if it fails, another port gonna pop up and it will work fine. This makes your architecture sustainable and scalable. Otherwise, you just have trouble with losing data. Kafka is very sustainable because if she doesn't deliver a message, then the message stays in the queue and it's fine.

The results – after eight months, we deployed 10 sites, which was great to be honest. And after 12 months, 25 sites were deployed and this year, we aim to deploy all sites around the 25 countries where they operate. At the moment, we have 100 active users, which is a lot for monitoring and control systems. And I'm quite proud of it when I check on the software, we see that there are like 10 or 20 people from the customer company working on it. It means it is working well and we have very, very good feedback.

And the two use cases here. The first one was monitoring the pollution into the air. It's quite simple as a use case. But after a few months, they made their own use case, which is the energy consumption of each cycle of industrial ovens. I value this use case because you deal with millions of dollars worth of capabilities and functionalities. Getting this use case is really, really great. That's why what I meant is they started small and now they are thinking big because they have the tool to do it. So when you provide the right tool with the right user experience, you can expect the right use case.

Thank you Kudzai to let me talk and thank you to let me talk to at this kind of event. Sorry for my English, it's not that good. I did my best, and if you want to talk to me, I would be proud to explain it better in French.

Conclusion

The journey to effectively scaling an MQTT architecture across various industrial locations involves navigating a complex landscape of technical challenges and strategic decisions. The insights provided here, based on a practical case study, offer valuable lessons on enhancing system robustness, improving data handling, and ensuring security at scale. As industrial needs continue to evolve, so too must the architectures that support them. We hope this exploration aids in your efforts to implement more resilient and efficient MQTT systems in your own operations.

HiveMQ Team

The HiveMQ team loves writing about MQTT, Sparkplug, Industrial IoT, protocols, how to deploy our platform, and more. We focus on industries ranging from energy, to transportation and logistics, to automotive manufacturing. Our experts are here to help, contact us with any questions.

HiveMQ logo
Review HiveMQ on G2