Why a Data Warehouse Can’t Be the Unified Namespace
Welcome to Part 2 of the blog series, An Advanced Guide to Building UNS for IIoT: Beyond the Basics. There was a time when many industrial companies had limited applications for data. They primarily focused on generating reports and dashboards to manage operational risks, comply with regulations, and make informed decisions at a slower pace. Consequently, it was sufficient for most use cases to integrate all data into a specific context within a Data Warehouse or Data Lake and rely on a highly specialized central data team to convert the data into useful information.
Today, however, is an entirely different story. A vast number of varying data-generating sources are being connected rapidly across OT and IT domains, and manufacturing businesses are growing in organizational complexity. Data use cases have proliferated across every aspect of the organization, from asset reliability to digital twins, industrial AI agents, personalized customer experience, machine learning for product design, real-time logistics and many more.
Moreover, expectations for data access and timeliness have shifted from relying on a centralized data team to democratized access where anyone in the organization can utilize data in real-time. Meeting these expectations requires a new approach to industrial data management — one that seamlessly accommodates the diverse uses of data. This approach must support various data access modes, from simple structured views for reporting to continuously reshaping data for advanced use cases like machine learning.
How the Unified Namespace (UNS) Works
The Unified Namespace (UNS) has gained massive popularity recently due to its flexibility, scalability, openness, quick time to value, and support for citizen developer-driven digital infrastructure. It allows manufacturers to address today's diverse data needs and expectations. If you're unfamiliar with the UNS, you can read more about it here. In essence, it provides contextualized, normalized, standardized, and unified real-time data and information about an organization's current state and events through an MQTT data layer. MQTT is an open, lightweight data exchange technology that supports flexible information structuring for different access modes, fostering the data management agility essential for an expanding variety of data sources and use cases.
However, there is a misconception that a Unified Namespace can be built on a Data Warehouse or Data Lake platform. While Data Warehouses and Data Lakes still play significant roles in data historicization and powerful analytics, they are no longer the first point of integration or the single source of truth for operational data — that role belongs to the UNS. Instead, they serve as nodes within a larger UNS ecosystem. Below, I've outlined why a Data Lake or Data Warehouse cannot function as the UNS.
Key Reasons Why a Data Warehouse Can't be the UNS
Lack of Agility
As stated previously, manufacturers need to address the rapid growth of data use cases to succeed with digital transformation, and this demands continuous transformation, aggregation, and reshaping of data to support the test-and-learn innovation cycle. This agility is necessary for swiftly moving from one data use case to another. Consequently, an effective data architecture and organizational model must be resilient to constant changes in data sources and use cases.
The Unified Namespace (UNS), leveraging MQTT, addresses these needs through an edge-driven, decentralized, and decoupled approach. This allows for real-time data management and access at scale, regardless of organizational complexity.
Conversely, monolithic data architectures like Data Warehouses (DW) and Data Lakes (DL) struggle to meet the agility required by a UNS. These traditional models cannot keep pace with the speed and scale demands due to their reliance on a central data team, resulting in long data-to-insights response times ranging from 24 hours to weeks. This delay prevents quick data distribution and usage.
Additionally, centralized monolithic architectures create a disconnect between the people and systems that need data and understand its use case and the sources, teams, and systems that generate and know the data best. This gap prolongs the time needed to access the correct data and impedes hypothesis-driven development. Consequently, the return on data investment stagnates.
Complex Data Architecture
In manufacturing, using a Data Warehouse as your Unified Namespace requires consolidating data from Control, Operations, Information, and Business domains into a central data store for generating decision-making insights. However, data attributes from different domains often have the same names but different meanings and definitions. Therefore, as more data is collected and stored centrally, conflicts and inconsistencies increase, making it difficult to understand the data. This results in complex and unwieldy pipelines of batch or streaming jobs, which are only understood and maintained by a central team of highly specialized data engineers. Data lineage and dependencies become obscured and hard to track, leading to a meaningless unified context for most users.
On the other hand, a Unified Namespace (UNS) eliminates such complexity by allowing subject matter experts from each domain to gather and contextualize data to tailor it to the demands of each use case. For instance, to collect data for training a predictive model for an Industrial Compressor, instead of creating an ETL pipeline to obtain and clean data from a Data Warehouse — which may be challenging since the data team will typically lack deep knowledge about the compressors — a UNS allows subscription through an MQTT data layer to receive and persist contextualized and normalized prediction modeling parameters directly from the control domain, prepared by domain experts. This effectively abstracts all the complexity away.
Lack of Data Quality
Beyond complexity and scalability limitations, a Unified Namespace (UNS) built on a Data Warehouse faces significant challenges related to data quality and resilience to change. The core issue is that the teams and domains most familiar with the data are not responsible for its quality. Instead, a central data team, distant from the data source and isolated from domain expertise, is tasked with restoring data quality through cleansing and enrichment pipelines. Often, data is adjusted by engineers in the staging layer before being loaded into the Data Warehouse, resulting in data that loses its original form and meaning by the time it reaches the central system.
In contrast, by publishing contextualized and normalized data straight from the originating systems in each functional domain into a common MQTT data infrastructure, the UNS architecture creates a perpetual, real-time representation of high-quality data products. These data products are created by domain experts and shared with consumers in other domains, effectively creating a singular, authoritative reference point for your business model.
Tightly Coupled Data Architecture
When a Data Warehouse (DW) is used as the UNS, the single source of truth, it creates a tightly coupled system where all applications must access data through dedicated extraction pipelines. This leads to numerous cross-dependencies that become difficult to manage at scale. The situation worsens when data consumers, trying to avoid long wait times, combine data quickly to create their own views, resulting in significant technical debt that causes future problems. Small changes can trigger a ripple effect, necessitating further adjustments. Additionally, Data Warehouses are tightly coupled with their underlying technology, forcing consumers with different read patterns to export data to other environments.
An MQTT-based Unified Namespace (UNS) addresses these issues by creating an enterprise-wide information model abstraction interface that both data consumers and producers can intuitively access through a standard abstracted interface. This way, applications only need to plug into the infrastructure to publish and consume data and can easily unplug afterward. Data consumers and producers are completely decoupled from each other, sharing information solely through a common data infrastructure.
Lack of Real-Time Operationalization of Insights
One of the most significant advantages of a Unified Namespace (UNS) is its ability to enhance manufacturing operational processes through real-time predictions by operationalizing data analytics insights. Insights generated from data can be immediately published back into the UNS, making them relevant in the operational context. This is difficult with centralized Data Warehouse architectures. Instead, an infrastructure is needed that enables different business units to share operational and analytical information with each other in real-time through a standard data layer, such as MQTT.
Does Not Support Citizen Developers
Expanding the use of data to everyone who can create value is essential for the success of digital transformation in manufacturing and realizing its full economic benefits. The Unified Namespace (UNS) is inherently designed for extended reusability, enabling frictionless sharing of information and access by all systems and, more importantly, people who can create value for the organization.
In essence, the UNS allows manufacturers to extract value from data more cost-effectively at scale by empowering a broader population of generalist technologists to become data developers. With most applications supporting MQTT connectivity, tools to produce and consume information from the UNS are readily accessible to everyone in the organization, from automation engineers tweaking a user interface to data scientists applying machine learning.
In contrast, monolithic data architectures like Data Warehouses and Data Lakes hinder this capability. Instead of turning data into value themselves, subject matter experts must rely on a highly specialized central team of data engineers.
Conclusion
In conclusion, the rapid evolution of data use cases and the increasing complexity of manufacturing organizations necessitate a shift from traditional monolithic data architectures, such as Data Warehouses and Data Lakes, to more agile, decentralized, and real-time data management solutions. The Unified Namespace (UNS), leveraging MQTT technology, offers a robust alternative by providing a flexible, scalable, and open platform that supports diverse data access modes and democratizes data usage across the organization.
Unlike the tightly coupled and slow-to-respond nature of Data Warehouses, the UNS enables real-time data integration and operationalization, ensuring high data quality and supporting a culture of innovation and rapid adaptation to new data insights. By empowering subject matter experts and fostering an environment where data is easily accessible and utilizable by all, the UNS significantly enhances the ability of manufacturers to derive value from their data, thus driving digital transformation and achieving long-term operational excellence.
Whether you're an engineer, IIoT Solution Architect, Digital Transformation Specialist, or decision-maker, understanding UNS is crucial for leveraging the full potential of IIoT and driving digital transformation in your organization. Download our eBook on Architecting a Unified Namespace for IIoT with MQTT or contact us to learn more.
Other Blogs from the Series
Part 1: The Business Value of Unified Namespace for Industry 4.0
Part 3: UNS Semantic Data Hierarchy with MQTT: Explained with an Example
Part 4: Data Modeling for The Unified Namespace: Best Practices
Part 5: Automating Manufacturing Business Processes with the Unified Namespace
Kudzai Manditereza
Kudzai is a tech influencer and electronic engineer based in Germany. As a Sr. Industry Solutions Advocate at HiveMQ, he helps developers and architects adopt MQTT and HiveMQ for their IIoT projects. Kudzai runs a popular YouTube channel focused on IIoT and Smart Manufacturing technologies and he has been recognized as one of the Top 100 global influencers talking about Industry 4.0 online.