Building an Industrial Unified Namespace with HiveMQ and MongoDB
Watch Webinar
Chapters
- 00:00 - Introduction
- 02:30 - Overview
- 04:05 - HiveMQ usage in industries
- 06:45 - What is a Unified Namespace?
- 10:36 - Logical blocks of building a UNS
- 25:20 - HiveMQ MongoDB Extension
- 26:20 - MongoDB Introduction
- 36:56 - Building UNS with HiveMQ and MongoDB
- 49:47 - Demo
- 58:53 - Q&A
Webinar Overview
As factories become more connected and data-driven, it is essential to have a unified and standardized approach to manufacturing data management. Industrial Unified Namespace (IUN) follows an event-driven architecture topology where different manufacturing applications publish events and context in real-time to a central data repository. This results in a decoupled ecosystem, allowing applications and services to provide and consume data when and where needed.
This webinar delves into the powerful synergy of HiveMQ and MongoDB, showcasing how these technologies collaboratively help customers construct a scalable and flexible Industrial Unified Namespace.
Join us as we explore the fundamental principles and data models for aggregating and contextualizing industrial data. We will demonstrate the pivotal role of the HiveMQ Platform in streamlining MQTT transmission and MongoDB Atlas as a robust and scalable database solution.
We’ll share:
The significance of an IUN in Industry 4.0 architectures
IUN architecture design
Data modelling and establishing connectivity to the IUN
How HiveMQ and MongoDB play in the manufacturing sector
Real-world industrial use cases
Best practices for scaling and sustaining an IUN
Whether you work in OT, IT, as a developer, architect, or data scientist, this webinar promises valuable insights and actionable advice for elevating your factory data infrastructure.
Key Takeaways
The webinar covered building an Industrial Unified Namespace (UNS) using HiveMQ's MQTT platform and MongoDB's database platform.
HiveMQ:
Provides an MQTT broker for decoupled communication between industrial devices/systems
Offers cloud, self-hosted, and edge connectivity products
Used for OT/IT integration in manufacturing, automotive, transportation, etc.
The Unified Namespace (UNS) concept:
Breaks down traditional automation pyramid by making all components (devices, systems) publish data to a central MQTT broker
Provides real-time access to operational data across the enterprise
Enables edge-driven, lightweight, open architecture
Allows rapidly swapping components without tight coupling
Traditional challenges that UNS addresses:
Inability to handle advanced analytics use cases
Consolidating data from proliferating industrial IoT devices
Point-to-point integration leading to "spaghetti" architectures
UNS logical components:
Control domain (PLCs, SCADA) pushes data to MQTT broker
Enterprise domain consumes data from broker
MQTT broker (e.g. HiveMQ) provides communication fabric
Data platform (e.g. MongoDB) provides data modeling and transformation
Flexible document data model for schema variety
Built-in search, BI workload isolation, horizontal scaling
Multi-cloud, edge device sync capabilities
Used for IoT, digital twins, supply chain, connected products
The demonstration showed:
Shop floor simulation publishing pump/tank data to HiveMQ
HiveMQ extension persisting data to MongoDB Atlas
MongoDB normalizing data into appropriate collections
Triggers publishing transformed data back to HiveMQ
Charts/visualization and vector semantic search in MongoDB
In brief, the webinar showcased using the HiveMQ MQTT platform combined with the MongoDB database platform to implement a Unified Namespace architecture for industrial data integration, transformation, and analytics.
Transcript
Introduction
Jayashree Hegde: 00:00:06.225 Hello, everyone. Good morning, good afternoon, good evening. I'm Jayashree Hegde, welcoming you all to this exciting webinar on building an industrial Unified Namespace, where our experts will walk you through the concepts of UNS, share insights into data modeling and connectivity inside of an industrial UNS, and show how HiveMQ and MongoDB play a key role in many manufacturing use cases. Stay tuned with us throughout the presentation as we will show you a use case in a live demo. Now, allow me to introduce you all to our speakers for today. Kudzai Manditereza, Developer Advocate at HiveMQ, and Dr. Humza Akhtar, Principal for Industry Solutions Team at MongoDB. Kudzai, Humza, thank you for taking time today for this presentation. I will let you both introduce yourselves later. Joined by them is Nasir Qureshi and Erin from HiveMQ and Samantha from MongoDB, who all will be helping with Q&A and other info during the presentation. A warm welcome to you all.
Jayashree Hegde: 00:01:06.185 Before we kick off this session, I would like to share that we are recording this webinar, and we will share the recording and the slide presentation via follow-up email. We will open the floor for questions after the presentation. During the session, if you have any questions, feel free to add them in the Q&A box. We will launch a poll as well later. I request all of you to share your feedback. Now, without further ado, I will hand it over to Kudzai. Welcome, everyone.
Kudzai Manditereza: 00:01:35.711 Thank you. Thank you so much, Jayashree. And welcome, everyone, to our webinar today. So as Jayashree already mentioned, my name is Kudzai Manditereza. I'm a Developer Advocate here at HiveMQ. So basically, my role here is, I take care of all the evangelism around smart manufacturing, industrial IoT, with a particular forecast on the Unified Namespace. I will let Humza quickly introduce yourself before we proceed.
Humza Akhtar: 00:02:03.231 Yes. Very nice to meet everyone. Thank you for joining our webinar. My name is Humza. I'm a Principal for Industry Solutions at MongoDB. I'm basically the manufacturing and automotive SME. And I've been working on developing smart manufacturing solutions using MongoDB technology. And today we're going to show you a live demo of how we work together with HiveMQ to build a UNS for you.
Problem Statement
Kudzai Manditereza: 00:02:27.778 Awesome. Thank you, Humza. Yeah. So basically, as you can see, we've got a jam-packed session for you today. So basically, the problem that we're really going to be addressing here as we take you through the demo where I'm going to first of all kind of take you through the basic concepts of the Unified Namespace and show all the logical building blocks of the UNS. And then Humza is going to show you a real live demo and also take you through some of the architectures that we're going to be demonstrating today. And really, the core of the problem that the Unified Namespace tries to address is the fact that we see a lot of digital transformation projects that are failing for a number of reasons. First of all, they failed to address the rapid growth of the advanced analytics use cases that are more and more demanded in the manufacturing space. So with the current architecture, that is not possible. And also, because we've got a lot of devices that are being currently rapidly connected to an industrial IoT or an industrial environment with varying contexts of data, consolidating all of that information and making sense of it is proving to be extremely difficult using traditional architectures.
HiveMQ Usage in Industries
Kudzai Manditereza: 00:03:52.218 So basically, this is the problem that Unified Namespace tries to address, and we're going to really flesh it out for you all today and explain how it does that. So just to quickly take you through what HiveMQ is. So HiveMQ is a company based out of Germany, founded in 2012. Initially, it was really focused more on the connected car and mobility space. This is really where HiveMQ cut its teeth. But over the years, it kind of really explored all the other different industries, which include manufacturing and industrial automation, transportation and logistics, and also connected assets and products. So now we're really covering a whole lot of different verticals, offering an MQTT platform, which I'm going to talk to you about shortly. So basically, what HiveMQ offers is an MQTT platform. So what that means is that it's a suite of products with the flagship product being the MQTT broker itself. And then we have got, also, an extension ecosystem, one of which is what we're going to be focusing on here where we are connecting to MongoDB.
Kudzai Manditereza: 00:05:10.028 And then on top of that, we do have some functionality that we recently added to HiveMQ Platform like Data Hub, which allows you to perform all the MQTT data validation and also recently introduced the ability to perform some transformations onto that. So HiveMQ Platform comes in many different flavors. So we've got the self-managed, which you can deploy on your on-prem using a Kubernetes Operator and also OpenShift. And we've also got HiveMQ Cloud, which is a fully managed service which you can readily deploy on all the big cloud vendors just with one simple click. And then we also recently introduced HiveMQ Edge, which is basically a connectivity platform that allows you to connect to industrial machines using traditional protocols like OPC UA, Modbus, and then convert all of that information into MQTT and integrate it into your IT network. And we also have got a variety of HiveMQ clients, including C# and Java, that you can use to build your own MQTT clients that can integrate into the HiveMQ Platform. And then we've also got the Security Extension, and you are able to also build your own custom extensions if you need to directly connect or directly transfer all of this information that is coming through your MQTT broker into any application of your choice.
What is a Unified Namespace?
Kudzai Manditereza: 00:06:38.853 So this is basically the HiveMQ Platform in a snapshot. Now let's dive into the topic of the day. So basically, when we really need to talk about Unified Namespace and really explore how it works and how it sort of adds an added advantage over traditional architectures is we really need to look at what currently exists, what's the current layer of the land in the industrial environment. And I believe many of you on this call are already familiar with this automation pyramid, which is basically a way that the current assets or production systems are segmented within a manufacturing enterprise where you've got your sensors, actuators, you've got your PLC network, you've got your SCADA, MES, and ERP systems. So basically, the idea with this pyramid is that it really forces you to move your data. When you need to integrate it for advanced analytics, it forces you to move your data through all these layers as you propagate it up.
Kudzai Manditereza: 00:07:43.908 So you've got your PLCs connected to your sensors. You've got your SCADA collecting data. And then you've got MES connected to your SCADA. And then you've got your ERP or Cloud. Which means you've got this siloed vertical way of integrating data from one network segment to the other, which is really a big bottleneck. Because what it does — it means that what you need to do is you need to have some point-to-point data integration for you to, first of all, collect data from your shop floor equipment. And then you also need point-to-point integration, again, to collect your data from MES to your SCADA, and so on until you reach the cloud. So a lot of you that have been or are involved in industrial system integration would know that at each point of this integration, you require a certain specialist skill or certain kind of skill or certain expertise for you to be able to do MES integration or ERP integration or SCADA integration. So this is a costly exercise as it is.
Kudzai Manditereza: 00:08:49.585 Number two — this idea means that you're most likely going to end up with a tightly coupled system because in most cases, you're building some native connectors to connect to all the shop floor equipment natively, which becomes a technical debt, as it were, because now you can't easily exchange all of this information. You can't easily scale your architecture because all of these systems are tightly integrated with each other, which might work if you're only just connecting one PLC to maybe a cloud platform or just one part of your environment to a cloud platform. It might work. But as you try to build out your infrastructure, as you try to extract more value from your data, you soon realize that this is not a scalable way and you cannot integrate more and more equipment to your system without really facing some fragility that this brings to your system. So basically, if you do that point-to-point integration, what that means is that you end up with an architecture that we call a Spaghetti, which looks like this. Because essentially, each and every component within your system needs to have native drivers that are talking to a particular PLC or a particular gateway or a particular application. So as you can see, as you scale to hundreds or thousands of different devices and systems, this really becomes unmanageable at all. So this is the reason why a lot of companies are facing a challenge of scaling their digital transformation architecture and failing to realize any benefits out of it.
Logical Blocks of Building a UNS
Kudzai Manditereza: 00:10:35.172 So the Unified Namespace concept really is a way of solving all those challenges. And how it does that — it really knocks down this traditional approach of OT/IT data integration by looking at it from a perspective where each and every component in your system needs to become a node in your data ecosystem, a node that plugs into a common data infrastructure. So as opposed to you connecting your devices to certain applications, you're now connecting your devices and applications into an open infrastructure. So typically, how this is organized is that you've got your control domain, which consists of all your PLCs, your SCADA systems that are collecting all of this information and then pushing it out into a common data infrastructure. And then you've also got your systems that are collecting all of this information and doing some calculations and then pushing that information also into a common infrastructure.
Kudzai Manditereza: 00:11:41.907 So basically the idea is that instead of having a master or rather a client that is continuously polling and asking for information, collecting or trying to collect all of this data from your field, propagating it up from network to network segment. As you go up, instead you have got this one hub, this central hub, where all these applications simply just plug into that common infrastructure. So number one — that eliminates all these different layers of integration that you need to deal with. And number two — because you're all publishing this information through a common infrastructure, it means all your components have only got one common endpoint or one common interface with which they need to interact. And that interface is an open interface, which means that, first of all, you're not logged into a system and you do not have any technical debt because you can easily exchange your system. So it doesn't matter if you've got a PLC from one vendor and another PLC from another vendor. If they're all communicating or pushing their information into a common data infrastructure using open technologies, they can easily be swapped out. So this is a really loosely coupled architecture.
Kudzai Manditereza: 00:13:00.976 And one of the core principles around this idea of a Unified Namespace where you've got all your components being a node in an ecosystem pushing all of that information into one central hub is this idea of information being edge-driven. So instead of an application that is coming out from IT, reaching out into OT and asking for information to say, "Is the temperature changed?" Or, "Has so and so occurred?" All of this information is instead pushed from the edge into the common data infrastructure whenever there is a change that has been detected, which is also one of the principles which is a report by exception. Now, what that does is that because you've got all these events and states being pushed from the edge, from all these components of your ecosystem into a central hub or a common data infrastructure — what that does is that it gives you a snapshot of the current state of your operations. So if you need to find out, at any given time, what is the current state within any part of your business, at any business unit, at any domain of your business, you just simply need to connect to your common data infrastructure, which is an MQTT broker. And then you can get that snapshot, that real-time snapshot of your enterprise.
Kudzai Manditereza: 00:14:22.002 And one thing to also emphasize here is the idea of an open architecture that allows you to reuse components. But not only that, it really allows you to experiment because what digital transformation really is about is trying to make sense of the data, trying to extract value of your data to make you operate more efficiently. And that's not something that you can already know out of the box how to do. It means that you're going to have to perform a lot of experiments, a lot of data experiments. You're going to have to come up with hypotheses to say, "What if we do this? Are we going to operate more efficiently or not? What if we change this? Are we going to improve quality or not?" So for you to be able to rapidly experiment, you need to have an open architecture that allows you to interchangeably and easily connect systems just by plugging them onto the infrastructure instead of having to sit down and start to write some custom connectors to get your data. And at the end of the day, you spend up time focusing on the compatibility of your system, rather than on trying to experiment on how you can actually get value from the data.
Kudzai Manditereza: 00:15:31.771 And another point here to raise is the idea of it being lightweight because you have a situation where you've got like thousands and millions of tags in really large-scale systems. So you really want your data to be lightweight. You want to use a protocol that makes sure that your data is compliant with an SLA. Some companies would say we really need to have this data hit the servers or be available across our enterprise within, say, two seconds, right? So if you have got all of this data that you have got to account for, which means you've got limited bandwidth and you need to get all of this data and reach your SLA goals, it means that you need to have a lightweight protocol that allows you to make sure that you hit those SLAs. So those are the core principles that really drive the Unified Namespace and make it suitable architecture for OT/IT integration or your digital transformation. So having spoken about this idea of using a decoupled system with a central hub of information to which all the different components connect and publish their data and also consume the data from that same — the hub. The hub is not only for pushing data. It is only for consuming data. So you only need that one common endpoint to do that. Now, there's multiple ways that you could do that, but MQTT, over the years, has really emerged as the de facto protocol that allows you to implement such an architecture with all the principles that I've highlighted previously. Open architecture, lightweight, report by exception and edge-driven.
Kudzai Manditereza: 00:17:16.107 So for some of you on this call who are not familiar with MQTT, so MQTT is an IoT protocol that allows you to create a network of components that exchange information using a publish/subscribe pattern of communication. So as opposed to request-response, where you need to have a direct connection between, say, a client and a server, with MQTT, you've got what is called an MQTT broker, which is a server that kind of acts as a mediator and coordinates communication between your different systems. So all your components don't need to know anything about each other at all. They just need to know the broker, and then the broker handles all of this communication. Because of all this capability, MQTT is the de facto protocol for implementing the Unified Namespace.
Kudzai Manditereza: 00:18:14.193 Now let's kind of dive into the idea of a Unified Namespace a bit here. Because we've expressed that Unified Namespace can be implemented using an MQTT broker, and the idea of having all of this information being in one place, having this real-time snapshot of your enterprise. Now, the MQTT broker is on its own, because MQTT broker really is a medium of exchanging information. So on its own, it is not enough to create a Unified Namespace. You still need a certain platform that will allow you to contextualize all of this information because, as you would know, all this information that is coming out of your PLCs is really in a raw format. Some of it, the naming convention is not understandable. You've got electrical engineers who are writing these PLC codes. Sometimes they use some notations such as x001. So if that information lands into a database, you've got an IT personnel who needs to make sense of this information. And they've got no idea what that information means, even though it might mean this is a valve in a certain compressor, so to speak. So this idea of making sure that you've got a consistent way of naming information across your enterprise — you need a platform that allows you to do that.
Kudzai Manditereza: 00:19:34.167 You also need a platform that allows you to put your data in a way that it is modeled, in a way that it is understood. Because also, when your data is coming out of your PLCs, it is mostly as discrete tags, even though it represents, say, a complete object. So you want to have a way of actually creating models of all your objects within your enterprise, and then being able to map all those discrete texts to create that unified model or that unified model of an object, such that when that information lands in a database, such as MongoDB, we are able to actually visualize it in its full complexity to understand this is a pump that consists of such and such attributes. So you need a platform that allows you to do all of this normalization, transformation. You need to change your information also if it's some systems — perhaps you've got a plant that is in the United States that is using degree Fahrenheit, you've got another plant that is in Germany that is using degree Celsius. When that information hits a central database, it's not going to make much sense. So you need also to be able to normalize. So this is kind of really where HiveMQ and MongoDB are a perfect combination to really create that core of the Unified Namespace where you've got the MQTT broker handling all the communication and the decoupling. You've got the MongoDB data platform doing all the modeling, the contextualization, transformation, and normalization, and pushing that information into the MQTT broker. And you're going to see all of that in action when Humza shows you all of that later on in this talk.
Kudzai Manditereza: 00:21:12.607 Now I've got here for you a reference architecture or model of what a Unified Namespace would typically look like when put together. So as you can see here, what we have is a HiveMQ broker that is deployed at a site. So this allows you to create a localized namespace within a site. And then you could have edge connectivity software like HiveMQ Edge, which is open source, by the way. So if you want to try that, you can download it from GitHub. I think Jayashree should be able to share a link for you to download that, and you could try it out with no limitations at all. So the HiveMQ Edge allows you to connect to all your machines and then create also a namespace or organize your data before you push it out into your site-level Unified Namespace, where data also gets organized based on that site. And then that is pushed up to an enterprise Unified Namespace where this data is then kind of organized in a global — or in a globally understood namespace. And then from there, this is where we can then be able to connect to MongoDB and then persist all of that information. And then MongoDB can do some massaging of the data and then publishing it back into the Unified Namespace again.
Kudzai Manditereza: 00:22:35.750 So this is kind of the reference architecture for you to give you a picture of what that looks like. Now, this is also something that would also give you an idea of what that looks like in practice. So using an MQTT topic namespace, you are able to design or to lay out the topic namespace that allows you all the components to access all of this data. Because we spoke about the idea that we've got components that are pushing information to a centralized MQTT broker, but how that information is organized within that broker is really the crux of the Unified Namespace, right? That idea that information needs to be understood intuitively. So the way that your organization is already organized — MQTT topic structuring capability allows you to create that hierarchy that exists currently within your organization and then be able to create pockets of information at every level of your organization hierarchy — in a way that allows you any components that already understand how your organization is arranged to be able to explore that topic namespace and find information that makes sense to it.
Kudzai Manditereza: 00:23:53.893 So if I'm a developer and I need to find out what information is on a certain line at a certain site, all I need to do is to traverse that topic namespace, go to the enterprise, look for the site, look for the line, and then I automatically find out all of the information that are being published under that specific line. Or if I need to go down into a specific cell, I'm able to do that in a semantic — in a way that I already understand it from an organizational hierarchy point of view without having to have that information hardcoded. So this really allows you to have the real-time data access that allows all the different components within your organization to access information as and when they need it. And this is really key because it allows you to build your business model around your data because all the information that you can get in real time is a true reflection of the current state of your business. So if you've got a shipping component that needs to understand what is the current state of the certain product that is being produced, it will get that information straight from that real-time data access. So as you can see, this is where you've got your information that has been contextualized, whether it is KPIs or it is OEE, or whether it is a work order page information. And we're also going to see a lot of that in action when Humza shows you.
HiveMQ MongoDB Extension
Kudzai Manditereza: 00:25:20.707 So this is basically a diagram that shows you the integration between HiveMQ and MongoDB. So we have got an extension, which basically is a capability that runs within your broker, that scales with your broker as well, that allows you to persist data directly into your MongoDB data platform without you having to do anything. Just put all this configuration information, and then you're able to get all of your information into MongoDB. And then from there, there's a whole lot of tools that you could use to then dig into your information. And you could also transform that information and publish it back to HiveMQ again. So I hope this kind of gives you a basic understanding of what Unified Namespace is all about, and then also set the stage for when Humza shows you the demo. So that's it for me. I'll hand it over to you, Humza.
MongoDB Introduction
Humza Akhtar: 00:26:22.951 Thank you, Kudzai. I'll share my screen so it's easier because I have to move towards the demo as well. Hope you can see my screen now. All right. So thank you, Kudzai, again for going through all the logical concepts of how Unified Namespace works and what benefits it brings. What I'm going to go through now is basically introduce the MongoDB developer data platform to you. And then we're going to start building a Unified Namespace. And I'll show you each and every single component in action. And so hope you'll find it very interesting. So as a start, as an introduction to our company, MongoDB, we provide a general purpose document-based distributed data platform to build modern applications quickly, build them with flexibility and applicable to a wide variety of use cases. Our core value proposition is to make developers and development process as productive as possible so that, as developers, you can work on innovation rather than having the trouble of managing your database, managing infrastructure, and working on schema changes all the time. So we provide you a very flexible database and then a data platform on top of it with additional features that I will share with you.
Humza Akhtar: 00:27:50.044 MongoDB has grown a lot as a company in the past few years. We have 40,000 clients that are using our data platform. And we are in many countries as you can see in some statistics listed on the right-hand side of this slide here. So when it comes to manufacturing, we truly support our clients in end-to-end value chain, from supply chain use cases like track and trace, logistics management, logistics optimization, inventory management, to factory-specific use cases like companies are using us or clients are using us to build their IoT platforms. We have MongoDB as the central data store for all the raw IoT data. We also help in use cases related to creating digital twins of assets and even creating full-blown virtual factories and connecting the workforce together so that they have the right data from the processes that are running on the factory. Then we have good presence in the retail side and commerce side. And then finally, when it comes to connected vehicles, connected products, products that are out in the field, sending time-series IoT data back into central MongoDB cluster, we are pretty prevalent in that as well. And fleet management is one of the nice use cases that we have that many of our clients are using us to connect their trucks and fleets of trucks and fleets of cars and so on. So some of the names you see here, these are all published stories. We have tens of thousands of customers out there that are using us in a wide variety of industries. Not just in manufacturing, but also in the retail sector, financial services, telecom, insurance, healthcare, and so on.
Humza Akhtar: 00:29:41.926 So what exactly is MongoDB Atlas or the MongoDB developer data platform that I mentioned before? So on the right-hand side, you see this hamburger diagram. On the top, you see “Document Model” written. This is exactly what our database is — our database MongoDB. It is a document model database, meaning that all of your data, instead of being stored in rows and columns in a tabular format, it's going to be stored as JSON documents. Internally, inside MongoDB, we convert that JSON into binary JSON or JSON. But from a user perspective, we are pushing data as JSON to MongoDB and then you're retrieving it back as a JSON document. That means that you get this flexibility where you can use the same database to store not just key value pairs, but relational data, any sort of graph information, any geospatial information, time series data, or objects encode directly into one database. So that reduces the complexity from the get-go. And then however you are storing data inside, you access it via one unified interface or API. So that also reduces your complexity of the final application. For example, if you are setting up a remaining useful life prediction application for a machine, you would not just need the sensor data that is capturing important parameters such as vibration or temperature values. But you will also need to store the data related to the maintenance logs or the maintenance that happened on that machine. But when was the last maintenance done?
Humza Akhtar: 00:31:20.831 So instead of having a time series database and a relational database, you can combine all this data inside MongoDB and serve your application or serve your ML model from MongoDB. So that's the benefit of the MongoDB flexible document-based database. And then on top of that, everything you see in the blue here is the additional capability that we provide with MongoDB Atlas, our developer data platform. You get the search engine right out of the box. So our clients don't need to set up a separate search engine to do full-text search on the data that's stored in the database. They can do it right inside MobileDB Atlas. So there's no need to do any ETL jobs of keeping two different separate engines and sync one database engine, one search engine. And you can extend it to perform semantic search and enable GenAI use case. And I'll show you one simple example later on in the live demo. You can actually separate your database nodes for serving your operational applications that need real-time, fast access, and analytical nodes as well, which are connected to your BI tools such as Power BI or Tableau, where you're running long-running analytical queries or BI-based queries. So it supports both operational use cases as well as analytical use cases, and provides you with that workload isolation so that both of your analytical and operational applications can perform in the right manner.
Humza Akhtar: 00:32:55.905 Finally, when your data increases, MongoDB scales horizontally. This is what we call sharding, meaning that you don't need to keep investing in bigger servers and keep scaling vertically. You can scale horizontally, meaning it will create clusters with the same configuration, and MongoDB will take care of splitting the data in these various clusters and it will keep care of load balancing for you. You just need to specify what kind of sharding or horizontal scaling you want, whether it's range based or hash based or location based. And location based is important because, in most cases, because of regulatory reasons, you would need the data that's generated in one particular site or one country to stay in that particular country, right? And on top of that, when it comes to mobile apps, when it comes to IoT gateways, where you need some sort of local storage on the edge, we provide something called Atlas Device SDKs, which can be installed inside. It's like a small object-oriented database. You can install it in a mobile app. It becomes part of your code. It's object-oriented. You don't need to design tabular schemas for that. Or you can put it inside an IoT gateway. Our clients are looking to use it inside actual cars so that they can enable connected car applications. So that little database will store data locally, and then it will automatically synchronize this data to the cloud. In this case, it's MongoDB Atlas. So we will provide you with bidirectional synchronization right out of the box.
Humza Akhtar: 00:34:31.847 And then finally, MongoDB is fully multi-cloud and highly secure. When I talk about multi-cloud, I mean you can deploy MongoDB Atlas on AWS, Azure, or GCP, or you can even split the database nodes across different cloud providers. So let's say you want to use one particular service from AWS and another service from GCP, you can have the same database split across two cloud provider regions. And this multi-cloud, multi-region functionality frees you from hyperscaler login and also gives you this global reach and scalability across the globe when you're trying to publish an application that has global audience, right? So that's some of the key features of MongoDB Atlas. And with that, I'll go into the main topic for today, where we're going to build a Unified Namespace. Some of the key requirements that we're going to look at — so most of these, because I already mentioned, I'm just going to quickly recap them. We are looking to provide connectivity to data producers and consumers. You can have application that is both publishing and subscribing to data at the same time. So these are our data producers and consumers. We will use a manufacturing execution system. We will use assets on the shop floor that are pushing out data to HiveMQ. Then we need a data modeling flexibility to represent the data in that sort of tree structure so that we clearly can see that we have a site which has a number of lines. And then in each line, we have a number of tanks and pumps. And then for each pump, for example, we have the status information, whether it's on or off.
Building UNS with HiveMQ and MongoDB
Humza Akhtar: 00:36:25.981 And then for each tank we have the level information, or what's the level in that particular water tank, for example. So we need to be able to transform the raw data that is coming from these assets and then store them inside the database. And the database needs to be able to handle the time series data coming in at high speed with high rate as well. So with this simple set of requirements, let's look into what we're going to build today. So we have an example site here that has two lines, and each line has a pump and a tank. So what is happening is that Pump 1 will take the water from a central reservoir and pushes into Tank 1. And then Pump 2 is like a transfer pump. So it will take the water from Tank 1 and pushes into Tank 2, and then it's going to be consumed by other equipment on the site. So for the sake of simplicity, we are just looking at these four assets under two lines and under one site. So they are sending out raw data, publishing it to the MQTT broker, which in our case, we are using HiveMQ. And then what happens at HiveMQ is it's going to push that information using the HiveMQ Extension for MongoDB inside MongoDB Atlas. So that data is pushed to MongoDB by the extension that I mentioned. It's a very nice extension. We are very proud of it. And it opens up that ease of connectivity between HiveMQ and MongoDB Atlas.
Humza Akhtar: 00:37:59.069 And then once the data is in Atlas, then we will structure the data as it comes in. We will set up a trigger, and I'll show you how that is set up. So we structure it and push it to the right collections inside MongoDB. So one thing to know here is inside, when you talk about MongoDB as a database, there are no rows, tables, or columns. What you get is a database, and inside the database, you have collections. And these collections are collections of JSON documents. So when I talk about collections, I actually mean in a relational or a scale in a table. And then each row is a JSON document, right? So we'll structure the data and push it to the right collections. If you're getting level information from the tanks, you push it to the levels collection. If you're getting status information from the pumps, you push it to the pumps collection. And the raw data can stay in the raw data collection. And then this document model that we have, it enables that flexible storage because in each JSON document that you have in any collection, you can add fields at runtime. You can remove fields and values in runtime. You don't need to worry about bringing the database down to do any key schema changes and then perform these joins between different tables, and making sure that your schema is performing. And MongoDB gives you that flexibility, and on top of that, the speed to serve your applications in a real-time manner.
Humza Akhtar: 00:39:30.338 So once the data is in MongoDB, we have a feature in Atlas called Atlas Charts, which you can use to quickly set up your dashboards to visualize the data that is coming in. This is a very neat technology that we have, which allows people to quickly build these visualizations to see what is happening with the data in the database, and also gives you the freedom to actually embed these charts into your end-user application. So I can show you that as well. And finally, we can actually send back data from MongoDB to HiveMQ. So let's say if we receive some information from HiveMQ in Atlas platform, we transformed it, we worked on it, and we can use a feature called Atlas Triggers, which then can run based on data that is — it will listen to a collection. And as soon as new data comes in, it will run a function. And that function can be used to publish data back into HiveMQ, where there are subscribers listening for that data. And in our case, we are actually using one external application, a manufacturing execution system from a company called Arcstone. They're based in Singapore and they have offices in many other countries, including US as well. Pretty neat MES provider. And they gave us their MES handbook environment for us to test and work on this demo. So we're grateful for them.
Humza Akhtar: 00:41:02.000 So we connected their manufacturing execution system to HiveMQ, and they are listening on specific topics which the MES is interested in. So we are pushing back some of the transformed data from MongoDB Atlas to HiveMQ, and then it goes into the manufacturing execution system. You can connect more and more applications as well as you see fit and based on your use case. But in this particular example, we are just using one third-party application. So the UNS system that is found here is a combination of the HiveMQ broker as well as MongoDB Atlas platform. It is doing all the data transformation. It is doing data publishing. It's doing data visualization. It's doing the data storage, and it is acting as that single source of group for all the other applications that are outside of UNS and want to get the right information at the right time from the site. So some of the features that we're going to look at — and I'll quickly go through them in a couple of slides before we go into the demo. So we're going to look into what document model means and how you can use aggregation framework that MongoDB provides for data transformation. We're going to look into how you can traverse through graph data. And we're going to look at Atlas Charts, Atlas Triggers, who respond to any events like a new data comes in or a transformed data gets saved into the collection. How do you trigger an event?
Humza Akhtar: 00:42:29.423 And then since everyone is talking about Gen AI, we thought we'll talk about it too. Actually, MongoDB Atlas is a pretty neat platform to help you quickly create any semantic search or Gen AI along these use cases. And we have done it in our demo as well. So we'll show you how easily you can do that too. So starting with the document model. So as I mentioned, MongoDB is a document database. On the right-hand side, you see an example of the JSON documents that you'll find in a typical MongoDB setup. So these are examples of time-series data. You will see there is a timestamp and there is some metadata that has the sensorID
, what's the sensor type; it’s
temperature`. And what's the parent_equipment
that this temperature sensor is attached to. And then we have the actual values and the units. And you can add more metadata as you want. But all these, this is essentially what a time series data point can look like. And you can have different types of time series data. You can have alerts and you can have non-time series data as well, which you can use to represent your production collection. For example, all the production assets, like what's my machine ID, manufacturer name, where it's installed, what's the current status, and so on. So the nice thing here is that in one collection, you can have different types of time series data. You can have different metadata. And for a non-time-series collection, each document can look different from another, and that is totally fine.
Humza Akhtar: 00:44:09.714 The second thing is aggregations. So as we have the data coming in, MongoDB provides you with a nice aggregations framework to actually do data processing for data transformation and analytics. So I'm showing you an example here. Let's say I have an order collection, and as I mentioned before, each collection in MongoDB will have different JSON documents. So I have simple customer orders data here. And three of the document status is accepted and one of it is declined, for example. So I can actually use a match operator to filter out all the documents that have a status of accepted. So it will collect as a filter, and then I can group these filtered values inside another set of documents which will tell me, okay, for this particular ID, customer ID A123, what was the total amount, total 750. So I can use these Unix pipeline set of operators and put them one after another to do all these data transformations inside MongoDB, which is pretty nice. You can create a very complicated aggregation framework through that. Then once you have transformed your data, you can use Atlas Charts to do your real-time data visualization. Atlas Charts is built for JSON data. It's integrated inside Atlas, can create very neat visualizations. There's tons of charts and widgets provided, and you can actually embed via iFrame or there's some SDKs provided as well that you can use for embedding these charts that once you've created in your applications, whether it's mobile app or web app, you can embed the charts in.
Humza Akhtar: 00:45:58.199 And so we talked about document model, how the data looks like when it comes in, how you transform it, how you visualize it. And then if you want to publish the data back into another system, let's say HiveMQ, you can use triggers. So Atlas Triggers can be set up through the Atlas user interface. And how it works is there are two types of triggers. One is startup-based trigger. So it will respond to any changes or inserts or any deletions in a specific collection. So you will set up the trigger to listen to a specific collection in the database. And as soon as a new document comes in, for example, it will run a function. And you can add your own logic to that function, and that data transformation that that function does can be stored in Atlas and pushed to a third-party service. You can also have scheduled triggers. So you can set up some timing. Let's say at the end of my shift in manufacturing, I want to run a trigger that will look into my performance and quality data collections and calculate my OE. So just one time where we shift. So you can set up those scheduled triggers as well and then calculate the OE and store it in a separate collection as a separate document.
Humza Akhtar: 00:47:09.671 Finally, vector search. So Atlas, as I mentioned, provides you with the full-text search functionality. We have expanded that to include vector search or semantic search as well. So what that means is if you have been reading and working on GenAI applications, you need to take your unstructured information, your PDF files, your images, your videos, and then create vector embeddings out of it, and these vector embeddings are a series of arrays of numbers and decimal numbers that have many dimensions. So what we provide is the ability for you to store your vector embeddings right next to your actual data. And then these vectors that are stored inside MongoDB — we provide the capability to do semantic search on top of it. So let's say I have a PDF file that is telling me the repair procedures of how to fix a pump. And I create vector embeddings of it using Hugging Face or any other of the embeddings models out there in the market, and there are many. And then I can store those vector embeddings inside MongoDB. And then we provide that list vector search capabilities. So, actually, in your code, you can do a semantic search on top of those vector embeddings, and you can pass on your query. That query will also be converted into vector embeddings. We will automatically find the closest neighbors to it and then return back the results of the search.
Humza Akhtar: 00:48:39.166 Now, these results can feed into an LLM model, and then that way, the LLM application or model will not hallucinate, and the Gen AI application will give you the right answer. As an example, if you're going to ask a question to your chatbot, "Why did the pump failure happen on my pump during this particular job?" Now, in Atlas, you have all the data. You have the job information. You have the sensor pump information. You have the technical manuals as vector embeddings. You can maybe have pump sound information also as vector embeddings stored inside. Now this can provide the LLM with the right context so that it can give you a specific or well-informed answer that the pump failure during this job on your machine occurred likely due to high vibration and worn bearings. So instead of it hallucinating and giving you a wrong answer based on the data that's stored in Atlas, based on the vector embeddings that are stored in Atlas, it gets the right context. And that is how the RAG model works actually. So I'll show you that in action as well.
Demo
Humza Akhtar: 00:49:47.091 And with that, let's go to the demo and take the next five minutes. I wanted to show you a couple of things that we have done. And this is important here for us to go back to. So we have to set up the site and a couple of lines with pumps and tanks. So I use [inaudible] to set it up. If you see here, I have Pump 1 that is pushing data into this Tank 1. You can see the water level changing. And then I have a transfer pump which is taking data from Tank 1 and pushing into Tank 2. And the way both pumps work is based on the start and end set points. And we have the tank alarms here as well. So as the tank level goes down from, let's say 135, it will show the low alarm. If it goes up more than 215, it will show the high alarm. Same for Tank 2. So the status, the run mode of each pump changes based on the set point that we set for each pump as well. So right now, this one is stopped. But as it comes down from 150, it will start again. So every second, we are running the pumps, and this data is generated. And the data that we are sending to HiveMQ is actually the level information, the status of the pumps. We are sending the alarms. We are sending the set points as well. So you can see that now, Pump 2 has started running and the water level is now going up. So as soon as it goes up 200, it will start to come down. It will stop and the water level will come down again. So it's pretty neat. So all of this data that I mentioned, the levels, the status of the pumps, and also alarms, we are pushing it to HiveMQ.
Humza Akhtar: 00:51:40.246 This is a dashboard for HiveMQ Broker that is set up on AWS EC2 instance, very easy to set it up. And then here, if I go into the extensions, you can see that all of these entities that I showed you before, the pump center — they are sending it to a particular topic of just T/raw data. And every data that we receive on this topic is pushed to MongoDB Atlas through the HiveMQ Enterprise Extension for MongoDB. So that's pretty neat. And I set up a route from MQTT to MongoDB, and it's using the extension. Now, this means that all the data from my site is being sent to HiveMQ, and then from there to MongoDB. So now how does it look inside MongoDB? So this is a MongoDB Atlas interface for those seeing it for the first time. Inside here, I set up a database called UNS. You can create more databases in our clusters. And then inside each database, I have a series of collections. And let's look at the raw data collection because that's where the data from HiveMQ is coming in. So you can see the data looks like this. You can set up the quality of service inside the extension settings at HiveMQ, but the payload that I'm receiving is actually a base 64 string. So I need to transform this data. This data in this current form is of no use to the end-user applications. So what I'll do is I will set up a trigger. We talked about Atlas triggers before. Inside MongoDB Atlas, we can set up triggers. And what triggers does is, as I mentioned — they will look for changes or document insertions, updates, or deletions in the collection that you ask the trigger to look at.
Humza Akhtar: 00:53:34.140 So let's look at the standard message trigger here that I set up. All of the triggers are enabled. So in the standard message trigger, I just need to enable it. And I ask it to look at the cluster Unified Namespace, UNS database, and the collection was raw data shown, which had those base 64 strings. So as soon as there is an insertion, so we're looking at insert document, it will run a function. Here you see a function. And I wrote this little JavaScript function here, which is just looking at that data, and it is going to move it to the right collection. If the data type is looking like it's alarm information, it will move it into the levels collection. And similarly, if it's levels, then move it to level. If it's alarm, move it to alarms collection. If the data is status of the pump, it will move it into the status collection. And looking here, if you look at, let's say, the levels collection, which the triggers are populated, you can see now the data is very easy to see. Now it can be published back into HiveMQ. So the level, type level, value of the level — and this is Tank 1. Similarly, Tank 2 has the value and the type level. So I'm transforming the data. As soon as it comes in, I store it in a separate collection. And then I can take it — what I can do is I can run another trigger. As soon as I have data in the levels collection, I can run another trigger to actually push it back into HiveMQ.
Humza Akhtar: 00:55:01.309 So I set up another trigger called Publish MQTT level and enable it. I'm looking at the level selection now. And at each insertion, I run another function. And what this function is doing is actually just pushing it back with a specific topic, Site 1, Line 2, Tank 2 level, or Site 1, Line 1, Tank 1 level, and pushing it back into HiveMQ. And that's how HiveMQ receives this information back, and then it can publish it to the end-user applications. So just quickly show you a couple other things. One is the charts. We talked about data visualization. So in Atlas, in the same interface, you can create very nice charts. You can just click on this button of Add Charts, and you just need to specify the collection that you're looking at. So I was looking at the alarms collection, and it will create these charts for you, and then you can modify them as needed. So it gives you a very quick ability to look at your data in the database.
Humza Akhtar: 00:56:01.469 Now, one thing I want to show you is how you traverse the data. So for that, I'm going to use MongoDB Compass, which is a free tool that you can download and use. It gives you the ability to go through all your MongoDB data. So we have a collection here called Equipment, which has data in there showing that I have a site, and this site has two lines, and then each line has tanks and pumps. So I can set up an aggregation pipeline query here, which is a graph lookup. And it is very easy. There's just four lines of code here or a simple query that will tell me exactly — that will traverse through this whole data set as if it was a graph relationship. So the result is shown here. I have site, and there's no parent for that site. But for line, I have parents, which is Site 1. And for Tank 2, I have two parents. One is Site 1, and the other one would be Line 2. And Line 2 parent is Site 1. So I can do graph traversals as well.
Humza Akhtar: 00:57:09.371 Finally — or second last thing is the MES. So we are using the Arcstone MES here. This is the dashboard I set up there. You can see the tank values changing. And they are getting updated from the data that we are pushing back from HiveMQ. So if you see these values, 140, or they are going up 160, they're following the data that has been set up from here. So if you say 186 tank level, you can see it's coming down at the same time. Similarly, we're capturing the status information as well. So this is getting updated from the UNS as we go. Last thing I promise you I'll show you is vector search. So basically, I have a repair manual stored inside Atlas. And I can actually ask a query that my pump is making some strange sounds. And what this will do is it'll go through those manuals and will tell you what the procedure to apply. So I wrote a very stupid question, like a strange sound on the pump, but it gave me the exact procedure of what to do. I can ask, like, "Heating up," and it will tell me that there's a thermal overall reset and I have to fix it. So that's all possible. That's the power of vector search, and we can talk about this more. I just set up — inside MongoDB Atlas, I set up a vector search index which goes through all the data that I have according to the PDF files and everything, and it does the semantic search for you. Very easy to set up. It takes a couple of seconds, just these few lines of code that you have to write, and you can set up your search applications. So I think in the interest of time, I'll stop here. I showed you a lot of things. Let's look at some questions now.
Q&A
Jayashree Hegde: 00:58:56.581 Thank you so much, Humza. Thanks, Kudzai. This was a wonderful presentation and a demo. To all the attendees, if you guys liked the demo and our presentation, give us some reactions in there. We would be really — it would be really encouraging for us. And we are really close to the top of the hour. Maybe we will be able to take up a couple of questions, but we can get back to you all with the answers later on. Thank you so much for those reactions. I would also like to launch a poll question until we cover a couple of questions. Nasir, over to you.
Nasir Qureshi: 00:59:37.122 Yeah, sure. Sure. Thank you. Thank you, everyone, for putting in these questions. One question I do want to address is about using the MongoDB Extension for free as a trial. So you can do that. I have shared the download link on the chat, and it is part of our platform. But the question is: Does the extension work with the free version of MongoDB? So Humza, can you answer that?
Humza Akhtar: 01:00:06.839 Yes, it works. I tested my demo on a free version of MongoDB. It gives you a free — you can set up a free forever cluster. It gives you all the features. The only limitation is the size of data that you can store. I think there's 512 MB, but you'll get everything. So no worries.
Nasir Qureshi: 01:00:26.849 Okay. Perfect. Thank you for that, Humza. All right, another question that we have is, well, sorry. Where have you all implemented LLM models with MongoDB in the manufacturing industry?
Humza Akhtar: 01:00:49.752 Yeah. So our clients are using the LLM applications for knowledge management, mostly in manufacturing and also knowledge preservation. So let's say you have experts that are retiring and you get a lot of — you can create videos of how they set up their robot, for example, or you have a bunch of technical manuals that are required to do changeable processes. So it becomes very easy for new operators if they have access to an LLM-based application instead of going through all that data, all those videos to understand how to do their job. So that's where we've seen a lot of traction.
Nasir Qureshi: 01:01:31.805 Okay. Thank you, Humza. I think we can have another question. So what latency happens when data transformation happens in MongoDB before pushing it to the consumer?
Humza Akhtar: 01:01:45.590 So latency is dependent on how complex your function is in the Atlas trigger. In this particular example, everything was being done in a few milliseconds. But if you're doing very huge calculations, then of course you will incur some latency. We have a new product coming up called Atlas Streams Processing, and there's a private preview now. We'll be going to a Public Preview soon. That will allow you to do transformation before the data is even stored in MongoDB. So that will also address any complex duties that you want to run before the data is stored inside MongoDB. So that will help as well.
Nasir Qureshi: 01:02:35.302 Okay. Perfect. Jayashree, do we have time for one more?
Jayashree Hegde: 01:02:40.410 [laughter] I think, yeah, a quick one, yes.
Nasir Qureshi: 01:02:42.649 Okay. All right. Let's do this. So let's do this one. So how can you handle your alarms — in [inaudible] as all status for each alarm? I believe that question is about alarms. How can you handle alarms?
Humza Akhtar: 01:03:08.475 Yeah, yeah. So we are not an alarm management platform. What we are doing here is we — so in our application and the demo we showed, we are getting the alarms from the site. When the alarm is in and it has been acknowledged by an end user, then what our clients do is — they're using Atlas Device SDK basically in their application. So as soon as the value of the alarm change, that change is synchronized wireless automatically. And then as soon as we receive that change, the field in that particular document is updated automatically. So an end-user application has to change the alarm status. As soon as that is changed, we have the synchronization mechanism and update mechanism to make the changes on the backend side.
Nasir Qureshi: 01:04:11.683 Okay. Perfect. Thank you so much, Humza. Thank you, Kudzai, and everyone else for joining today. Over to you, Jayashree.
Jayashree Hegde: 01:04:19.153 Yeah, thank you so much. Thanks, Humza and Kudzai. Again, thanks to all the attendees for joining us today. There have been lots of questions coming in. We know we are over time. We are not able to address all of them. But if we have the bandwidth, then we will get back to you personally with the answers or contact information of Kudzai as well as Humza will be provided in the slide presentation. We will share both the recording as well as the presentation in a follow-up email. You will have access to contact us and you will have access to ask any other queries you guys have. So thanks for joining everyone. Thanks again. See you all next time. Take care.
Humza Akhtar: 01:05:02.466 Thank you.
Jayashree Hegde: 01:05:03.067 Bye-bye.
Kudzai Manditereza
Kudzai is a tech influencer and electronic engineer based in Germany. As a Sr. Industry Solutions Advocate at HiveMQ, he helps developers and architects adopt MQTT and HiveMQ for their IIoT projects. Kudzai runs a popular YouTube channel focused on IIoT and Smart Manufacturing technologies and he has been recognized as one of the Top 100 global influencers talking about Industry 4.0 online.
Humza Akhtar
Dr. Humza Akhtar is a Principal in the Industry Solutions Team at MongoDB, designing Industry 4.0 solutions for the manufacturing and energy sector. Prior to joining MongoDB, he was working at Ernst & Young Canada as a Senior Manager in digital operations consultancy practice. Humza attained his Ph.D. at Nanyang Technological University, Singapore, and worked with the Singapore manufacturing industry for a number of years on Industry 4.0 research and implementation. He has spent most of his career enabling smart and connected factories for many manufacturing clients.