Scalable Industrial Data Management for IIoT and Smart Manufacturing
Watch Webinar
Chapters
- 00:00 - Introduction
- 01:34 - The Power of Effective Data Management for Smart Manufacturing
- 03:21 - Aligning Data Management With Business Objective
- 04:34 - Identifying, Acquiring and Integrating Plant-Floor Data for Smart Manufacturing
- 06:33 - Potential Sources for Data Acquisition
- 21:38 - DataOps for Industrial IoT Data Management
- 25:44 - Data Modelling for IIoT
- 26:49 - Data Normalization for IIoT
- 28:30 - Data Transformation for IIoT
- 29:53 - Data Contextualisation for IIoT
- 31:26 - Data Modelling Standards for Smart Manufacturing (MQTT Sparkplug, OPC UA, AAS, etc. )
- 39:21 - Semantic Data Structuring with MQTT Sparkplug and Unified Namespace
- 44:31 - Key Steps to Designing Your UNS Data Architecture
- 50:34 - IIoT Data Storage and Actionable Analytics Generation (Time-series, Structured, Data lakes, Graph DB, etc.)
- 54:30 - Q&A
Webinar Overview
To consistently generate actionable insights, make real-time decisions, and significantly improve your manufacturing business operations, you need a well-thought-out industrial data management strategy that empowers you to gather, store, process, and interpret data meaningfully. More importantly, effective data management facilitates the abstraction of the automation layer, making it easier for you to scale your smart manufacturing solution across a broad range of plant environments, business units, and production lines.
Watch Kudzai Manditereza, Developer Advocate at HiveMQ, in this webinar recording discussing key strategies and industrial data management approaches that scale, allowing you to harness the potential of your data in driving the success of your smart manufacturing use cases.
Key Takeaways
Uncover best practices for collecting, processing, and utilizing data to optimize production, reduce downtime, and enhance quality control.
Learn how to implement a distributed industrial data management strategy, as opposed to the unscalable approach of storing all your data in one place.
Learn how to avoid the trap of a complex and tightly coupled data landscape for a more modern, powerful and flexible architecture with MQTT at its core.
Expert tips and recommendations for developing a comprehensive data management strategy tailored to your organization’s unique needs.
Additional technical takeaways:
Having an effective data management strategy is crucial for successful digital transformation and smart manufacturing initiatives.
Align your data management strategy with your specific business objectives, whether it's improving product quality, gaining competitive advantage, etc.
Common sources for acquiring plant-floor data include PLCs, SCADA systems, historians, and connectivity software like KEPServer that exposes data from different devices via a standardized interface like OPC UA.
Utilize a publish/subscribe architecture with a central MQTT broker to decouple OT and IT systems and enable real-time data processing and scalability.
Implement a DataOps layer on top of your OT data sources to perform data modeling, normalization, transformation, and contextualization. This abstraction layer makes your system more scalable and governance-driven.
Consider adopting data modeling standards like MQTT Sparkplug and others to structure data consistently across the enterprise.
Create a Unified Namespace using the ISA-95 hierarchy to semantically represent your real-time production data and enable event-driven process automation across systems.
Store historized IIoT data for analytics using a combination of time-series databases, structured databases, data lakes, and graph databases based on your use case needs.
Integrate your IIoT data streams seamlessly with existing analytics platforms by aligning edge contextualization rules to how the data will be consumed by those platforms.
Measure success of your industrial data management over time by how quickly you can derive actionable insights and how easily the strategy can be replicated across facilities.
This webinar is designed for:
Industrial IoT Solution Architects, Digital Transformation Specialists and looking to leverage IIoT and data analytics for manufacturing operational excellence.
Manufacturing professionals and executives seeking to maximize the potential of their production processes.
Technology enthusiasts and solution providers looking to explore the latest trends and advancements in smart manufacturing.
Transcript
Introduction
Jayashree Hegde: 00:00:08.741 [music] Hello, everyone. Good morning. Good afternoon. Good evening. I am Jayashree Hegde welcoming you all to this exciting webinar on Scalable Industrial Data Management, discussing key strategies on how to manage industrial data so you can scale easily. Join me in welcoming our speaker for today, Kudzai Manditereza, Developer Advocate at HiveMQ. Hi, Kudzai. Before we begin, I did like to share a few housekeeping items as usual. The session is being recorded, and we will share it in a follow-up email later this week. If you have any questions throughout the presentation, please use the Q&A box at the bottom of your screen. We will try to answer as many questions as possible during the Q&A. Lastly, we will run a short poll during the Q&A session. I request you all to cast your votes. Now, without further ado, I will hand it over to Kudzai. Welcome, everyone.
Kudzai Manditereza: 00:00:59.198 Thanks, Jayashree, and thanks for the introduction. And welcome to everyone who's joined us today for this webinar. So today we're going to be discussing data management, scalable industrial data management for smart manufacturing. So as Jayashree has already introduced me, I'm a developer advocate here at HiveMQ. And I also have got a podcast where I talk about smart manufacturing. So as already mentioned, today we're going to be talking about data management for industrial IoT or smart manufacturing. And a key part, really, of talking about this topic is studying about discussing why data management — why does it matter to kind of really have a data management strategy?
The Power of Effective Data Management for Smart Manufacturing
Kudzai Manditereza: 00:01:57.465 So a lot of times what we see is that some companies, whenever they sort of embark on smart manufacturing or digital transformation, they start really just by connecting applications together without a data foundation, just connecting applications together. And in some cases, they might see some successes initially because they're just getting data, and then they're visualizing the data, and then they're doing it within that confined environment. But as soon as you start to kind of include your entire enterprise to kind of get a clear picture of exactly what is happening across your manufacturing enterprise, it becomes very difficult to scale that out. Mainly because you already have a complex web of technologies where everything is directly connected to each other. So you've got a lot of dependencies that you need to maintain, which really makes it difficult to scale out to your solution so that you can get insights across your enterprise. So that's why it's really critical to lay down first a foundation of data management to say, "This is the data that I have in my enterprise. This is how I'm going to make it available for consumption by all the other applications," and then be able to scale that out to the rest of your entire enterprise so that you can generate some actionable analytics, which is really what we want to have at the end of the day by putting this digital transformation initiative together.
Aligning Data Management With Business Objectives
Kudzai Manditereza: 00:03:32.170 Now, another key to success when it comes to data management is really being able to align your data management strategies with whatever business objectives you have. So a lot of companies, when they embark on digital transformation, they happen to have some objectives, business objectives that they want to achieve, whether it could be improving the quality of their product, or it could be gaining a competitive advantage or maybe just making sure that they don't get disrupted by a new competition in the space. There's always an underlying objective. And the key to success, really, is to take that objective and then be able to align your data management strategy to that to say, "For me to achieve A, B, C, and D, how do I make my data available, or how do I structure the data so that all the apps that are consuming that data will be able to provide the analytics that are required for an effective digital transformation strategy?" And a big part of really putting together your data management strategy is to start by identifying sources of data. Because at the end of the day, really, it's all about integrating data from the operations technologies into your IT technology domain. So which means within your operations technology, you've got a lot of systems where you could potentially get data from, right? So that's the first step — to really identify which is the best place to collect this data so that it can be easily integrated with the IT domain. So that is the critical first step, to identify those sources of data where you could perform that integration of the data.
Identifying, Acquiring and Integrating Plant Floor Data for Smart Manufacturing
Kudzai Manditereza: 00:05:21.663 Now, for you to kind of really get a picture of exactly where within your enterprise you'd be able to get data that you could acquire or collect for a digital transformation project, we could start looking about this diagram, which I believe a lot of you on this call are already familiar with, which is the Purdue model or ISA-95 security model, which kind of really creates this segmentation of the different systems within a manufacturing enterprise, where you've got your sensors and actuators at Level 1. You've got your PLCs and remote terminal units, Level 2, SCADA, historian, and so on until you have got your ERP at Level 5. So looking at this picture, this is where you could then start to think about, "Where within this hierarchy, where within this pyramid are there some systems that I could potentially get data from to start building that digital transformation or to start laying that data management foundation," because this is really where it starts?
Potential Sources for Data Acquisition
Kudzai Manditereza: 00:06:31.500 So when you look at, say, Level 1, already you can see that we've got sensors that are directly connected to this sensor, to the process. And they are not a good source of data for you to read data because in any case, all of those sensors at the lowest level are already connected to PLCs. So the PLCs then become that ideal point of data acquisition if you need to collect data about your process or any underlying system, manufacturing production system. And also, another potential source of data collection is your SCADA system, which is at Level 3. So as you know, SCADA system really stands for data acquisition. So you already have some data acquisition being performed by SCADA. We're going to look at all the different advantages and disadvantages of all these different systems to kind of help you think about where to plug in when you need to collect data from there. And then another potential source of data is your historian, which is also at Level 3. So once you have had a look at this, you can see all these potential sources where you could collect data from your plant floor.
Kudzai Manditereza: 00:07:51.105 Now, let's start by looking at how to collect data from programmable logic controllers. So in some bigger manufacturing facilities, you'd find that there's a situation whereby you've got a PLC that is acting as a data concentrator to which other low-level PLCs are connected. So if you find yourself in a situation like that, a data concentrator or that PLC to which other PLCs are connected becomes your ideal point of data acquisition. You could then collect data from there. And PLCs, in most cases, expose that information using some protocols such as OPC UA or Modbus or whatever protocol they're using, which means you're aggregating all of the information from the different PLCs and you're only using just that one PLC as your source of data. And you could have, actually, all these data concentrators and multiple of them scattered across your plant flow. So these are the sources of data that you could use to collect your data because it's already aggregated and concentrated in one place. And another normal scenario is where you've got PLCs that are connected to your sensors. And each and every PLC has the potential of providing data to a high level that you could then use as a point for data acquisition to get your data into your IT network.
Kudzai Manditereza: 00:09:24.503 So let's kind of quickly look at the pros and cons of using a PLC as a source of data for your digital transformation strategy. So I mean, the biggest advantage with PLCs is this reliability, right? You all know that PLCs are always on. It's rare cases where you can find a PLC that is down. So PLCs are these reliable sources. They are stable. They're deterministic. All the data that is available there you always get at any given point. And the other advantage of getting your data from PLCs is that PLCs have got really a high resolution of scan times data, where it could be collected like in every 10 milliseconds, 20 milliseconds. So you've got this fine-grained data that you can collect through the PLCs because of these rapid scanning capabilities. And again, I've already spoken about the idea of a minimal downtime that is guaranteed whenever you're collecting data from your PLCs. The disadvantages, however, are the lower levels of data abstraction within a PLC. So those of you who work in the controls or as automation engineers would know that with PLCs, a lot of data is very raw. Even the naming convention itself, you find that a pump is actually named in a code where it's just 002 or X001 for a valve. So this lower-level abstraction makes it difficult to work with that data once you collect it. However, we're going to discuss later on in this session how you can deal with that situation if you find yourself where you are collecting data directly from a PLC, where the naming convention is not so favorable.
Kudzai Manditereza: 00:11:12.174 So moving on, let's look at the integrating data from SCADA systems for your digital transformation or for your data management strategy. So in the same scenario, you could have a SCADA system that is connected to your PLCs that are acting as data concentrators and also connected to PLCs that are standalone. So your SCADA system, as already mentioned, is already built for data acquisition, which is the biggest advantage really of collecting data from SCADA system because it already has got this robust mechanism of collecting data from all this equipment in your factory floor. It has already got some built-in drivers for all the different PLCs. So that's the advantage of using your SCADA system as a source of your data acquisition. So remember the idea here really is to kind of identify those points. It doesn't need to be a single point. It could be multiple points of data acquisition so that you can collect your data and make it available for integration into your IT network.
Kudzai Manditereza: 00:12:22.640 So another advantage of SCADA system is that within a SCADA system, there is some semblance of a data model. So when you're working with the SCADA system, in most cases, when you create a data point within a SCADA system, you can specify the upper threshold, lower threshold. You can specify the units. So some SCADA systems already force you to conform to some form of data model. So that is an advantage where you're accessing data that is already contextualized to some extent. So that is an advantage of getting your data from your SCADA system. And the SCADA system is already really within your corporate or plant network. So it makes it easy for you to then integrate it within an IT network because all you need to do is to tap into that ethernet network and then collect data and then move it into the other domain.
Kudzai Manditereza: 00:13:18.659 However, SCADA systems, as you might be aware, are less reliable sources of data acquisition. If you need to do some updates, whether it's on the Windows PC in most cases, where you need to do some restarts, multiple, you can't always rely on the SCADA system being up and running, which is a big disadvantage. And also, the resolution of the data is not as fine as within the PLC. So if you are doing an application that requires you to have data in its original resolution, you might find a situation whereby it's really hard for you to work with the data that is coming from a SCADA system. And in cases where the SCADA system is not really exposing this data via a standardized interface, you'd find yourself having to connect directly to the SCADA system using an SDK or API, which kind of really creates a tightly-coupled scenario and a lot of dependencies, which are going to make it difficult for you to scale your solution as you grow your smart manufacturing initiative.
Kudzai Manditereza: 00:14:29.397 So the next data source that we're going to look at is historians. So historians are really interesting because they already follow some form of hierarchical asset model where all this information is being arranged according to the asset model that exists within your plan, which is really a big advantage because again, you've got some more context on the data that you're collecting, which you could then use to your advantage whenever you are integrating that into an IT domain. Because the bottom line is that whenever you're integrating two systems, there's going to be some contextualization that needs to be done because these are two separate domains. Or even just moving data between two different applications, there's always contextualization that needs to happen. So if some of that heavy lifting has already been done, it really makes it easy for you to work with that data. So that's a big advantage with the historian. And the ability really to kind of store this data and store and forward capabilities also make it suitable in case you have got an interruption of your network connection between your IT and OT environment. And again, historians are already integrated within a corporate network, which really makes it easy for you to integrate into IT.
Kudzai Manditereza: 00:15:51.454 So the disadvantage, however, is that in most cases, historians are not collecting all the data within a plant. Maybe they're really focused on a specific data set or specific areas of your plant. So you might not get all the data that you need. So that's a big advantage because within a smart manufacturing initiative, you don't make assumptions about how that data is going to be consumed. So you want to be able to collect each and every piece of data and be able to then push it wherever it needs to go and then make sense out of that data. And another issue is that, especially when you're dealing with machine learning systems, in most cases, these systems require your data to be in a time snapshot that allows you to train your model based on the data that is already represented with a snapshot that includes all specific data points about a machine. So if it's a compressor that has got all the data points included there with a timestamp, that's the kind of data that makes it easy for a machine learning system to work with. And another disadvantage is that if a historian also doesn't expose its information via a standardized interface, you're also going to find yourself having to connect directly using an SDK or API, which again creates that tightly-coupled connection, which you don't want whenever you want to scale your data management solution.
Kudzai Manditereza: 00:17:21.133 Now, bringing all these data sources together, we have a situation whereby you could have PLCs connected to a SCADA system and some PLCs connected directly to a historian. So you've got your historian collecting data from some PLCs. You've got your SCADA system collecting data from some PLCs. And in the event that you don't have that setup, what you need to do is to have a component that then is responsible for collecting data from all your PLCs or devices and then expose that via a standardized interface. Because the idea is that this data, for it to be easily integrated and for you to really create a data management foundation that is easy to work with and highly scalable, you are going to want to really expose your data via a standardized interface that is able to actually make the data available for consumption by IT systems. So in most cases, this is where you find a situation whereby maybe you need to use a connectivity software like KEPServer, which allows you to connect to all these different CNCs, PLCs, and what have you, and then collect that data and then expose it via a single standardized interface, which in most cases is OPC UA. And then you have got all sources of your data on your plant network, which you can then tap your data acquisition software or whatever tool you're using into that planned network and then push it out into your IT network.
Kudzai Manditereza: 00:18:56.767 So now, when you're pushing your data out into your IT network — this is also a very interesting point because this is also a potential for creating a system that is not really scalable or a system that won't allow you to get the best out of your data. So typically, you'd have a point-to-point connections between your enterprise software and your OT tools or systems. So what you want to do here is really kind of use a publish-subscribe pattern of communication so that cloud connectivity or IT connectivity is not really dependent on the direct coupling or the direct connection between these two applications. Because what that is going to do for you is that it allows you to interchange systems, and it allows you really to create some transparency, whereby systems that are in the IT domain — they don't really need to know anything about systems that are in the OT domain. They're only just all connecting via a centralized hub which everything is plugged into, so there's no tight coupling in this scenario. So this really allows you to scale your system because as you add more components, it doesn't really affect how the network or it doesn't really affect how that data is being consumed. Because if you scale to 10,000 devices, if you scale to 10,000 data points, it doesn't really matter because everything is being published to that central hub, which is an MQTT broker. So all you need to do is just to make sure that your broker is able to handle whatever amount of connections or whatever amount of traffic that is being directed to it. So you're able to scale your data that way.
Kudzai Manditereza: 00:20:44.945 So again, scalability is a big issue here when you come to integrating from OT to IT and then the ability for real-time data processing because whenever you're using publish-subscribe communication, which means data is being pushed from the edge or from your OT environment into that central hub, so all the other systems are able to respond in real time to whatever data is being pushed to it because it's all reported by exception. Whenever there is an event, whenever there is an alarm, whenever there's an order that has been finished, whatever is the case, that data is being pushed out into that central hub as an event. And then all the other systems are able to respond to that in real time. So you're able to create a real-time system that allows you to respond as and when changes occur.
DataOps for Industrial IoT Data Management
Kudzai Manditereza: 00:21:38.475 But a crucial aspect here to bring to the conversation is that whenever you're collecting all of this data, again, as I mentioned, it's most likely that this data is going to need to be contextualized because remember, these are two separate domains. This is OT and this is IT. Even within IT itself or within OT itself, you've got System A and System B. There's a high chance that these systems don't use the same data format, or they don't contextualize data the same way. So what that means is that for you to make sure that your system is scalable, you're going to need to add another layer on top of whatever data you're collecting from all those different data sources. And this data layer is really typically what is called a DataOps layer.
Kudzai Manditereza: 00:22:30.304 So this is a layer that allows you to then create models of your data so that if you've got, say, a compressor across — maybe you've got 300 compressors across different geographical areas of your enterprise, and these compressors are of the same type. And you want to be able to look at the data in the same way. You want to be able to correlate to see this is the performance. If it's a predictive maintenance, or if it's an application that allows you to compare the performance of one machine in one plant or in one line or maybe in the same line but different cells, you want to kind of create a standardized way of representing these machines or the data that is being produced by these machines. So data modeling is that first step that you need to be able to implement before you push your data out into the IoT network so that whenever it gets integrated within the IoT network, it already has that modeling aspect of it created. We're going to talk about how exactly modeling looks like in a few seconds here.
Kudzai Manditereza: 00:23:41.748 And the other issue that you want to do is also data normalization and data transformation and contextualization. So again, to emphasize, the DataOps layer is the layer that you then bring on top of your OT data or whatever foundation you've set up from all your different data sources so that your system really is scalable because once you create that abstraction, you're no longer relying on the code that is within a PLC or the settings that are within a machine. So you've got this abstraction that you can then repeat across your entire enterprise where you've got the same setting because it can be the case that one programmer is naming things differently in a plant, say, in Munich, and another programmer is naming things differently in another plant, say, in Massachusetts or wherever that is.
Kudzai Manditereza: 00:24:37.958 So you want to create that data abstraction that allows you to kind of create a data governance — right? — that you can roll out across to say, "This is how we name machines. This is how we do A, B, C, and D," so that when you integrate your data into IT enterprise, it's all within the same context. And it makes it easy for data scientists to then work on the data because without that layer, it means you're getting all of this data raw through that publish-subscribe communication pattern. You're getting it raw into the IT department, which then requires that all of this contextualization be performed. But now the disadvantage is that we are now just simply pushing that work up the stack to where people who work there really are not familiar with the OT equipment, right? They don't really know what's happening. So you want that to be performed within the factory flow by automation engineers who know exactly what is what within that environment. So that's the importance of being able to put that data layer in place.
Data Modelling for IIoT
Kudzai Manditereza: 00:25:43.863 So this is an example of exactly what we mean when you're talking about data modeling. So basically, what you're doing is you're defining the structure and relationships and the characteristics of your data. So again, this is a simple example of a data model for a machine whereby you could specify to say for each machine, we need to have a machine ID. We need to have a timestamp for when this reading was taken. And then we need to have all the data points that are within that machine. So if that machine consists of a temperature, vibration, and output speed measurements, you want to be able to create a structure that incorporates all of those into that unified model that you can then use to create instances of those machines if you happen to have a lot of those machines across your enterprise. This really kind of allows you to create data in a uniform way and makes it easy for analytics applications to consume that information because it's all modeled in a consistent manner.
Data Normalization for IIoT
Kudzai Manditereza: 00:26:50.403 Okay. So data normalization is what we're going to look at next. So when we talk about data normalization, basically what we mean is that in any system — so in any system, there's different conventions about really things like let's say time. So some device might be representing time in a note in UTC format. It might be representing in some local time, right? You could have another system that is representing temperature in degrees Celsius, and a plant in America is representing temperature in Fahrenheit. And maybe vibration, the units are different in different plants, and also the units of speed are different in different plants. So you can imagine if you just collect this data and put it into a data lake or whatever system that you're using to then generate your analytics, it would be hard to make sense of the data, or you could get incorrect data and then make the wrong decisions because just of that inconsistency in the units that are being used or the time format that is being used. So data modeling is kind of looking at all those potential discrepancies that could happen within your data to make sure that you've got that data integrity, right? So that's what you do. You make sure that you standardize across your entire enterprise on how you represent time or what is the speed of — what is the unit of speed, what is the unit of temperature? So this is what we mean when you talk about data normalization. So this is also another step that you need to do within that layer.
Data Transformation for IIoT
Kudzai Manditereza: 00:28:30.530 And then you've got data transformation. So whenever you are dealing with an analytics system or maybe generating some analytics, because the bottom line really with this whole data collection is that you want to be able to provide data or put it in front of decision-makers in a way that it makes sense to them. So if it makes sense for a certain decision-maker to see time as an average output speed. So maybe just the raw speed doesn't make sense or is of no use to them, right? They're just looking at speed, but what does it mean? So as an organization, you already know what makes sense to you. So this is where data transformation comes into play whereby you're looking at your data and saying, "Okay. It makes sense for us to actually represent the speed as an average output speed over a 48-hour period." For whatever reason that could be, but all the different organizations have got different ways of looking at data based on the context of their operations. So this is a data transformation stage. And also, you've got different shifts. Maybe you want to look at data after every shift. This is how you could then represent information that is coming out from a machine into this format so that it makes sense whenever you put it into the analytic systems according to how you want to be able to make decisions.
Data Contextualization for IIoT
Kudzai Manditereza: 00:29:52.833 And then we move on to data contextualization. So again, if you're collecting data from these systems, PLCs, SCADA systems, there isn't full context to that data. So sometimes the SCADA system doesn't even say where that SCADA system is located. So maybe you just want to add the area to say this is coming from plant A. Or another way of talking about contextualization is to say — you've got data that you're collecting from a machine. What was the recent activity that was performed in the last 48 hours? Maybe it makes sense for you whenever you're collecting data to point out if ever there was a maintenance activity that was being carried out in the last 20 minutes or two hours or whatever the case may be. You're adding context to that data. Are there any environmental anomalies? Was there a power outage report at any given time? You're adding context. And again, only you as an organization, once you've looked at your objectives, once you looked at how you want to consume the data, you will be able to then back in those data governance or those strategies into your data by contextualizing it to say, "This is what we care about whenever we're reading data. We want to know whether there was a maintenance activity being carried out or whether there was an event, an environmental anomaly that happened." So at least you have got a full view of that data that you are getting.
Data Modelling Standards for Smart Manufacturing (MQTT Sparkplug, OPC UA, AAS, etc.)
Kudzai Manditereza: 00:31:26.193 Right. So what we've been discussing so far is a way of modeling your data. And this really is a way of custom modeling whereby you decide how you're going to structure the data. So I showed you an example of modeling a machine. So you decide according to how it makes sense to you what data points you want to put in there, how to represent that data. Now, that's perfectly fine, and it works in a lot of cases. Another way to look at it also, another thing to include, so what we see in most cases is organizations that kind of mix both strategies here. The other strategy is to kind of get inspiration from data modeling standards. Or you could adopt those outright, right? Or you could mix and say, "This is the standard. We're going to take this section of it, and then we're going to use it, and we're just going to discard this." Or if you want your foundation to be standards-best 100% across your enterprise, it's a good idea then to look at data modeling standards to say, "How do I represent my data?" "How do I structure my data?" So we're going to kind of look at different data modeling standards that allow you to represent data in a standardized or consistent way that is universally understood. And the advantage, again, of this is kind of being able to also incorporate — because in most cases, you might find a situation whereby you need to incorporate systems from your customers or systems from your suppliers who might be implementing the same standards. So that would work to your advantage. But again, if it doesn't work for your organization, then you're better off mixing both or sticking to custom-build models.
Kudzai Manditereza: 00:33:20.291 So the first one that we're going to look at is the Digital Twin Definition Language, right? So the Digital Twin Definition Language really is mostly for digital twin use cases. So where if you want to create models that represent different machines and components in a way that interconnects all these different systems into that one big picture, what you'd call a digital twin, this is the standard that you'd want to look at, the Digital Twin Definition Language. So the Digital Twin Definition Language allows you to capture those relationships between different devices and their properties, commands and telemetry, right? And it uses JSON Linked Data. So it's a form of JSON that allows you to then create those connections between different data points or between different properties within your structure. So again, Digital Twin Definition Language is mostly suited whenever you are kind of taking the approach of digital twin modeling as it were.
Kudzai Manditereza: 00:34:27.856 And you've also got MQTT Sparkplug. So MQTT Sparkplug, for those of you who are not familiar with it, is really a specification that is built on top of MQTT. So as you may know, flat MQTT is really like an open book, so to speak, whereby you are able to — whenever you're publishing into a topic, you get to decide how you represent that topic. You get to decide how you represent the session state. And more importantly, you get to define how you represent that payload, right? So you might want a situation whereby you want to use MQTT, but you want a standardized representation of your payload structure. So this is a situation whereby MQTT Sparkplug would be useful because MQTT Sparkplug provides definition for data types, data sets, and also for complex data types using what is called templates. So basically, templates are a way of modeling your objects. Say, if you've got a machine that you want to represent through templates, you can create properties of those machines into that one object such that when you publish it into an MQTT network, it is represented as that object, a complete object, which can then be consumed by other analytics systems that then make use of that.
Kudzai Manditereza: 00:35:59.703 So for example, Cirrus Link have got a connector that allows you to tap into an MQTT Sparkplug network and integrate that data into Azure Digital Twins. So if you've got a Sparkplug data represented as a Sparkplug template, that is directly integratable into Azure Digital Twins. And also, it's directly integratable into also AWS IoT SiteWise. And also, there's another version for Snowflake. So there is a lot more that you could do with the Sparkplug template and represented data, integrating it into those analytic systems and then generating some insights from that. So this is an example also of a temperature sensor that is being represented using a Sparkplug template. As you can see, I've got your properties represented there.
Kudzai Manditereza: 00:36:53.640 And another option is using Companion Specifications from OPC UA. So this is kind of a unified approach for integrating data that is coming from OPC UA server. So if you've gotten a robotic arm that supports OPC UA, there are some standards that define how that robotic arm exposes its information via OPC UA. So that would be like a consistent structure. So it might be of use to you to say if you want to push that information out, we are using an OPC UA Companion Specification. So I know of some projects from some organizations like the Advanced Manufacturing Research Centre at University of Sheffield, where they kind of took OPC UA Companion Specifications because they wanted to stick to the standard. And then they mapped it into a Sparkplug payload. So it's a way of conforming to Companion Specification but being able to publish that into an MQTT Sparkplug network. So that kind of gives you a best of both worlds whereby you are using that information model, and then you are pushing it out into an MQTT Sparkplug network. So Companion Specifications use an XML structure. So that's an example of how that looks like.
Kudzai Manditereza: 00:38:12.749 And then there's also Asset Administration Shell, which is also a way of representing data or structuring your data in a way that is consistent with international standards. So in most cases, with Asset Administration Shell, you'd use it in a situation whereby you really want to kind of capture information about an asset, things like technical features. Or if you want to capture a data sheet — so you've got a machine that is a data sheet, that is a PDF, whatever. You want to capture it in a digital way, in a way that is understood by other systems. The Asset Administration Shell allows you to do that. And it allows you also to kind of create that trail across your value chain. So from when the equipment is being manufactured, there are certain specs that are being captured there. When it is being used, there is certain data that is being captured. So you can kind of create a trail of your system or your components within your value chain. So this is where you'd want to use Asset Administration Shell for structuring your data.
Semantic Data Structuring with MQTT Sparkplug and Unified Namespace
Kudzai Manditereza: 00:39:21.160 So we've kind of looked at how to get data, what's the strategy for getting data from your equipment and how to expose it, how to integrate it using a Pub/Sub network, how to structure the data, and how to then kind of look at options of modeling your data. Now, the next part that we're going to talk about really is kind of creating that repository of information where every other component has got access to that data. So I mean, we've got a choice of saying, "Now I've got the data. Now let's put it into a format that I want. Where do I send that data? Do I send that data into a data lake? Do I send that data into a data warehouse? How do I make that data available for consumption?" So these are all the different options.
Kudzai Manditereza: 00:40:16.630 Another option for real-time data consumption is to use a Unified Namespace where you are kind of creating a semantic structure of your data. That makes it easy for all the participants or analytic systems of your network to consume that data, to go through that data. Because we've spoke about events to say whenever an event occurs within your operating technology and you want an analytics system to kind of kick off a workflow whenever that event occurs, how do you make that data available in real time so that you are able to automate your business processes based on the way that your data is structured? This is where the Unified Namespace kind of really makes it easy for you to expose real-time data across your entire enterprise so that that is a single source of truth. Any system that needs to find out what is the current state of a certain production line, what is the current state of a certain order, will simply look up to the Unified Namespace and be able to find that information and be able to act accordingly. That is really a valuable way of structuring your data because it really allows you to automate your business process, to kind of recreate your business model around that because you know that whatever data you're looking at is the most current data. It's in the most current representation, and it's arranged according to how you already understand your organization.
Kudzai Manditereza: 00:41:45.490 So let's kind of look at — how do you go about creating your information in that hierarchy? How do you go about structuring all this data that you have been collecting into a way that it makes sense across your entire enterprise for that real-time consumption? And by the way, once you've had that Unified Namespace as a real-time source of information, you'd still need some external data store. So you'd have a situation whereby you've got your external data store still subscribing to that real-time data information, that real-time snapshot of your enterprise and persisting that information for later use or for later retrieval. So this is the kind of information that will then be consumed maybe if you need to train a machine learning model, right? So all of that data still is being historized. But first, you need to kind of create that real-time snapshot, that data that has already been contextualized, which makes it easy for any kind of analytic system, whether real-time or historical system, to be able to consume that information.
Kudzai Manditereza: 00:42:50.131 So a way of really kind of creating information hierarchy. It's not a hard and fast rule. It's recommended for [inaudible] ISA-95 Part 2, which kind of is a shared framework that shows you how to manage your production processes and resources. So it's sort of an asset model to see — how do you structure or lay out your assets within your enterprise? So a lot of systems, ERPs, already follow this convention. So it would make sense also for you to pick it as a way of structuring your data — to say, "This is how I'm going to represent my data sources." So again, as you can see, it's Enterprise, Site, Area. And then depending on which vertical you're in, you can then say a Production Line, Work Cell, and so on. And there's also the S88 Extension if you need to kind of have a fine-grain level of hierarchies that you are using to represent your organization. So following the ISA-95 is a good way because it's a globally recognized standard for merging your enterprise and control systems. So you could then pick it and say, "This is how I'm going to represent sources." So we've been talking about sources, SCADA, CNC machines, and stuff like that. But how do we structure that? How do we present it within a hierarchy without kind of using that Purdue module, which kind of just really creates some different levels that are directly connected to each other? So the ISA-95 is a good place to start doing that.
Key Steps to Designing Your UNS Data Architecture
Kudzai Manditereza: 00:44:30.950 So once you've identified that you're going to be using the ISA-95 Part 2 hierarchy for collecting your data or representing the semantic structure of your data, the next thing is to kind of look at — how do you then go about designing your Unified Namespace data architecture? So the first step is to kind of identify existing namespaces because — remember — all we've spoken about so far is that you've got data coming from PLCs. You've got data coming from the SCADA system, and this and that. But we haven't really spoken about what kind of data is it? What sort of namespaces are those? And in most cases, these are kinds of things that are unique to a company. But there are some namespaces that are kind of common throughout the industry. For example, Overall Equipment Effectiveness could be one of the things that you're kind of really looking at as something of interest within your enterprise. For each production line, you want to know, what is the equipment effectiveness within that production line? So that's an example of a namespace that already exists because you already use that. Whether you're using it manually, pen-and-paper, whatever the case may be, it's a namespace that already exists within your enterprise.
Kudzai Manditereza: 00:45:51.957 So the first step is to identify those namespaces, mean time to failure, and so on and so forth, identify those namespaces that already exist. And then the second step is to collect and contextualize the data. So this is where that DataOps layer helps you to then contextualize the data, create those namespaces. And then once you've done that, you then plug those namespaces into the suitable level in the hierarchy where it makes sense for that data to live or if the OEE of production Line 1 in Site A needs to live in that particular hierarchy. So using the ISA-95 hierarchical representation, you will then be able to publish data into that specific packet of information where it will make sense for other systems that need to find that data to locate it and then be able to act upon that. So if you need to be able to act on the OEE of a particular production line, you know exactly where to go to find that information. So again, with MQTT, MQTT topic structure allows you to create that hierarchy where you're representing; you are mapping your ISA-95, representing organization into that MQTT topic namespace. And you can then use wildcards to navigate your enterprise structure to find all of this information according to how you have named that information.
Kudzai Manditereza: 00:47:18.625 So this is an example of how that would look once you have put your ISA-95 structure together. So you've got an enterprise. You've got a site area. So this is where you would then be able to plug your namespaces. So you could have a functional namespace within your enterprise level. So maybe that is a definition of OEE across your entire enterprise. So maybe the enterprise is only interested in something that affects or something that is relevant across the entire enterprise. Or someone who's looking at information in a certain production line, they're only concerned about finding out what is the OEE of this particular production line. So they will know where they need to navigate, to go and find that particular information based on that structure that you have already laid out. So this is an example that shows you how that information could be represented.
Kudzai Manditereza: 00:48:11.321 So this is an example of a Unified Namespace structure where you've got your ISA-95 common data modeling representing your enterprise, where, for example, we've got a bottling company here. And as you can see here, we've got OEE being pushed and consumed by an MES system under Filling Area 1. And you could have an ERP pushing a work order into Filling Line 1 and also consuming that information. So every other participant of that metric can consume that information. And again, from what we've been discussing, if you've got some non-IoT-ready PLCs that are not able to push information, this is where that data needs first to cascade into being exposed as a standardized interface and being exposed to a DataOps layer and then finally being pushed into that MQTT broker, that Unified Namespace where it will then live inside that.
Kudzai Manditereza: 00:49:13.536 So when it comes to the Sparkplug — so you could represent your Unified Namespace using just the flat MQTT topic whereby your topic namespace shows your hierarchy. Now, when you come to Sparkplug, because you've got limitation as to the number of nodes or elements that you could put there, as you can see, you've got group ID, edge node ID, and device ID. So that is the MQTT topic representation that Sparkplug specifies. It's not enough to kind of put together or stitch together the different levels of your hierarchy. So another option is to really kind of use delimiters to pack all of that information into that topic representation and then be able to unpack it whenever you consume this information. So if you need to use Sparkplug, this is how you currently do it. So there are currently talks to have this in the standard. It's not going to be delimiters or not necessarily. I don't know what they're going to come up with, but this is going to be taken care of, meaning that in the next version of Sparkplug, you're not going to have to use delimiters. The standard itself is going to have provision for you to kind of represent your hierarchy according to that.
IIoT Data Storage and Actionable Analytics Generation (Time-series, Structured, Data Lakes, GraphDB, etc.)
Kudzai Manditereza: 00:50:34.128 Now, you have collected your data. You have represented it in a real-time snapshot. Now, you need to store that data because a majority of use cases really, your data need to be historized. It needs to be stored. And then this is where you can then create some machine learning models, or if you need just to kind of create some simple trends and be able to go back to the data and see what has been happening and create some actionable insights from there. So there are different mechanisms of storing IIoT data that you can look at. So there's time series databases. So basically, these are types of database systems that hang your data in a timestamped format where your data is collected over time and is represented in that form. And these are crucial whenever you are storing and retrieving data in a way where you need to access that data efficiently. Because of the way that time series data is being stored, it makes it easy, faster access. So if you need to create a dashboard for a process engineer to kind of really dig into the data to find out different permutations or different correlations, a time series database would be a suitable way for you to kind of really store that data and make it available for analytics consumption.
Kudzai Manditereza: 00:51:52.201 So there, I've got an example of a data model of a common time series database called InfluxDB, where you can see there how a machine is represented, where you've got the machine ID, the location, and whoever is the operator of that machine. And then you've got some fields where it could be temperature. And then you've got the timestamp. So a time series database is one way of storing your data and also structured databases. So structured databases have been around for a long time. So this is a way really of creating predefined structures, tables. So really this is a way of managing your metadata. So it's hard to manage metadata within a time series database. So for example, if you want to store data about where a particular machine is located, so GPS coordinates or the firmware version of that machine, you don't want to continuously send that information or store it in a time series database. You want to store it in a way where it could live. And because it's data that really change; it doesn't change much. So this is the kind of information that gets stored in a structured database, including things like transaction logs. They could be stored there in your database. But again, your system doesn't really need to use only one type of database. In most cases, you're going to use a mixture of these databases.
Kudzai Manditereza: 00:53:18.151 And then we've got data lakes. Basically, data lakes really allow you to put together different raw information in one place without having to predefine a structure. So if you want to send video, if you want to send text, if you want to send this, a data lake allows you to collect all of that information and only define the structure whenever you read it. So that's one way of representing data. And then you've got graph databases, which are really a way of showing different relationships between different machines. So again, this is a way of storing your data. So in most cases, you find your way using all of these different databases and then creating a unified view, like what you could call a single pane of glass whereby your time series database is referencing your structured database. And if you need to show the connections, and then you've got your graph databases. So with a unified view, you're then able to create cross-contextualization, which means you're contextualizing across different data stores, and then you're able to create some correlations between your data points. So yeah. That brings us to the end of these sessions. If there are any questions, we'd be happy to take those.
Q&A
Jayashree Hegde: 00:54:40.273 Awesome. Thank you so much, Kudzai. It's been a really insightful webinar. To all the attendees, if you have liked the presentation, give us a thumbs-up. Cool. Maybe before moving on to questions, I would like to launch a poll. I request all the attendees to cast your votes. Cool, I have launched the polls now. We can probably cover a couple of questions because we are really running out of time. There is one question from Eric in the chat. So where would this — so the question is, "Where would this contextualization be happening? Are there recommended workflows or APIs or software plugins from Hive? Or is this completely dependent on DataOps import strategy to the cloud? If we are on Amazon IoT versus Azure house, then should we use these connectors? Or is there a need for additional Python or Node-RED intermediaries?"
Kudzai Manditereza: 00:55:55.726 Yeah. So good question. Now, it really depends, right? So in most cases, you could have a situation whereby you are using an IIoT platform. For example, tools like Ignition, which is a SCADA system. You've got tools like HighByte, which is a DataOps system. You've got [inaudible]. So there's different platforms that you could use to perform that contextualization at the edge or within your enterprise before you push that into your MQTT broker or that central hub of information. So it depends. And also Node-RED is also very much usable. If you maybe want to use Node-RED, you can just kind of create that contextualization there, create your models, your data structure within Node-RED, and then push that information out there. So there aren't really some rules around that but kind of really paying attention to a system that allows you to perform this in a repeatable way, right? So if you're going to standardize, of course, Node-RED — is this something that is going to make it easy for you to scale across your entire enterprise? Or maybe you're just concerned with contextualizing data for just one location. So there's different ways of looking at it.
Jayashree Hegde: 00:57:15.529 Cool. Thank you so much, Kudzai. There are lots of other interesting questions coming in. So the next question is, "How can I ensure seamless integration between existing data analytics platforms and data generated through MQTT-based data streams?"
Kudzai Manditereza: 00:57:36.311 Yeah. So it's really very much dependent upon the data analytics application that is going to consume that information. In most cases, I like to kind of look at it, say, look at how your information is going to be consumed first before you contextualize it at the edge. So if you're going to be consuming that information using a particular analytics application or a particular system, look at that. What is expected of that data? And then these are the rules that you then apply at the edge for you to then push that data in a way that — so it doesn't come — you don't just contextualize it however you think at the edge, which even though it's possible, it's better to kind of look at how you expect your data to be consumed. So the analytic systems are the ones that determine how that data is prepared at the edge before it is pushed.
Jayashree Hegde: 00:58:30.346 Cool. Thank you so much, Kudzai. Maybe we can take one last question. To all the attendees, if you have any follow-up questions, please do contact us. We will help you after the session. So the next question is — what are some key indicators or metrics to measure the success and effectiveness of industrial data management over time?
Kudzai Manditereza: 00:58:59.332 Yeah. So basically, how you could measure industrial data management is how easy it is or how fast it is for you to get to that actionable insight. So if your goal is to be able to determine when an equipment is about to fail, within 30 minutes, within 20 days, however you define it, that's a measure of success. If your data is not prepared — if your data management foundation is not prepared in a way that allows you to get to those insights quicker, then you know that you've got a problem. Or if your data management strategy doesn't allow you to scale — allow you to redo whenever you need to implement it elsewhere, then you know that there is a problem with the data management strategy. Those are the things that you could look at — time to insights and also how easy it is to repeat the same process across your enterprise.
Jayashree Hegde: 00:59:48.246 Awesome. Thank you so much, Kudzai. So we are really at the top of the hour, so maybe we should call it a day. To all the attendees, thanks for tuning in. Thanks, Kudzai, for a wonderful presentation. So as I shared earlier, we will be sharing the recording of this webinar as well as the slide presentation in a follow-up email. So stay tuned with us. Also, we have released a new guide where Kudzai has written a wonderful guide on industrial data management. So I have already shared the links in the chat section. Do check it out. Download your copy. It's for free. Do also check out our other resources, the six-part article series on our website. Also, happy to share that we have released a new HiveMQ Edge today morning. So do check out on our website. So thank you all for tuning in and see you all next time. Take care. Bye-bye.
Kudzai Manditereza: 01:00:49.856 Thank you.
Kudzai Manditereza
Kudzai is a tech influencer and electronic engineer based in Germany. As a Sr. Industry Solutions Advocate at HiveMQ, he helps developers and architects adopt MQTT and HiveMQ for their IIoT projects. Kudzai runs a popular YouTube channel focused on IIoT and Smart Manufacturing technologies and he has been recognized as one of the Top 100 global influencers talking about Industry 4.0 online.