MQTT Experts Answer Your Questions - July 2022
Watch the Webinar
Chapters
Webinar Overview
Drawing upon the success of our previous ‘Ask Me Anything About MQTT' webinar and on popular demand, we continue a webinar series under the same name. In this August 2022 edition, our MQTT experts answered questions around MQTT Topics, MQTT Security, MQTT Sparkplug, and OPC UA.
Florian Raschbichler, Head of Support at HiveMQ, and Matthias Hofschen, Senior Consultant Professional Services at HiveMQ, personally answered questions live during the webinar.
Feel free to ask questions on the HiveMQ Community Forum.
Key Takeaways
HiveMQ is a distributed system (a cluster of servers). Since HiveMQ takes high availability and lack of message loss extremely seriously, it persists everything to disk.
HiveMQ is clusterable (Professional and Enterprise Editions) for redundancy and scalability. Replication factor for data persistence allows recovery from node failures.
MQTT Sparkplug is evolving to address specific industry needs. HiveMQ is active in the specification committee, and there is potential for further customization through additional payloads and protocols.
MQTT's strength lies in its data agnostic nature. This allows flexibility for various use cases and verticals to define their own data structures on top of MQTT.
MQTT doesn't specify the payload since leaving data format open enables diverse applications and avoids restricting future possibilities.
The HiveMQ Community Extension SDK enables building custom functionality, including monitoring subscriptions and usernames. Client snapshots in the HiveMQ Control Center offer another option.
Transcript
Welcome and poll
Jayashree Hegde: 00:00:08.584 [music] Hello, everyone. Good morning. Good afternoon. Good evening. A very warm welcome to our July edition of Ask Me Anything on MQTT session. Thank you for taking time today to attend this webinar. It's a pleasure to have you all. I'm Jayashree Hegde, moderating this webinar today. Allow me to introduce you all to our experts who will be helping us with answering your questions: Florian Raschbichler, who is Head of Support, Matthias Hofschen, who is the Senior Consultant with our Professional Services, and joined by Florian and Matthias, Gaurav Suman, who is Director of Product Marketing at HiveMQ, and he will be moderating the Q&A. Thank you, Florian, Matthias, Gaurav. Before we kick off the session, I would like to share that we are recording it, and the recording will be shared in a follow-up email. Feel free to submit your questions in the Q&A pod and kindly refrain from using the chat pod to ask questions. However, feel free to interact with each other. We have received many questions already, which we will address one by one. During the session, I will unmute if we need more context or clarity on the questions you have asked. Lastly, there will be two polls running right at the beginning and at the end. I request you all to participate. Without further ado, I will hand it over to Gaurav, and I will start the first polls. Welcome, everyone.
Gaurav Suman: 00:01:45.359 Thank you, Jayashree. Welcome, everybody. Appreciate you making time. And thank you, Matthias and Florian, for making time for the community. Jayashree has put up the first poll for you here, so please take a moment to answer these questions for us, if you don't mind.
Jayashree Hegde: 00:02:00.420 I'll end the polls now. Gaurav? All you.
Introducing the panel and Q&A format
Gaurav Suman: 00:02:03.004 Thank you. Excellent. Thank you, Jayashree. So a quick background refresher for everybody who's here. So the format we run here is we requested some of — well, we received some questions before this panel conversation, and we'll go through those questions. And what we'll also do is — and I can already see a lot of people who've submitted the questions are also here attending, which is fantastic because the opportunity these individuals will have is — once we answer the question, we'll give you an opportunity to ask a follow-up. And I could not have asked for a better panel than Florian and Matthias. So Florian works day-to-day with our customers worldwide. His team is ensuring customer success, customer support across the globe. Matthias is one of our senior specialists in our professional services team, so he's thinking about the current problems and also future architectural guidance that our customers need. So fantastic place to be asking and clarifying your MQTT questions.
What's the built-in security in MQTT?
Gaurav Suman: 00:03:03.244 And honestly, the only bad question here is a question that you will choose not to ask. So I would encourage you to ask, if your questions have not already been included or what we are covering is not answering what you have in mind. We are not here to judge anybody. Feel free to ask questions in the chat pod, as Jayashree mentioned, and we'll keep picking those and going through them one by one. All right. That said, my name is Gaurav. Thank you again for joining us today. I'm joining you from Ottawa, Canada. I'm leading a function called Product Marketing here at HiveMQ. And the question I want to begin with is to do with security, and the question is asked by Aloysius Lee. And the question from Aloysius is around security, and they're asking, "What's the built-in security in MQTT to make sure that there is only authorized access and also some restrictions and authorizations around subscribe and publish that's built into the MQTT protocol?" So let's go to our experts and see how they want to answer this.
Matthias Hofschen: 00:04:06.981 I can take the first crack at answering it. Hello, everybody, first. So authentication and authorization are really important for security. Authorization in MQTT, especially for HiveMQ, can be done via the Enterprise Security Extension. So there, you can define explicitly based on a user that has authenticated and that is a member of a role. For the role, you then can define permissions that are defined in terms of publishing to certain topics, subscribing to topics, using QoS levels, using retained messages, and so on and so forth. So that would be my first crack at the question. And, Florian, do you want to add something?
Florian Raschbichler: 00:05:08.483 Yeah, happily. And thank you, Matthias, Gaurav, Jayashree, and everybody on this webinar. Matthias correctly covered one solution to the question at hand. So HiveMQ comes with the Enterprise Security Extension. That has this already built in. From an, actually, protocol specification point of view, the protocol is a bit open here about what the implementation should or should not do. They just are very precise, and there needs to be ways of authorization. And at HiveMQ, we are all about making MQTT able for you. So in case the Enterprise Security Extension, with its vast feature set, might not be suitable for your use cases for whatever reason, we have an Extension SDK. So we allow users also to implement custom extensions, and then you can really be creative about it, and you can be really, really granular, like Matthias said. It's not only about topic, publish, subscribe which call your service level, but it's also about details like, "Can you use the retained flag? Are you able to use the shared subscription, etc.?" So the spec is not clearly defined, but HiveMQ is very clear about it and allows absolutely granular authorization for all MQTT clients.
Does HiveMQ support certificate-based authentication between device and the broker?
Gaurav Suman: 00:06:34.483 Excellent. Thank you. So I don't think this particular individual who's asked this question are on the call right now. So what I would suggest, though, is we have a question which is around security again — actually, around authentication. And this question is by Debello Maceri. And Debello is asking if HiveMQ supports certificate-based authentication between device and the broker. So how about we get an answer to that and we'll see if Debello wants to ask a follow-up question.
Florian Raschbichler: 00:07:05.679 Yes, absolutely. I'm happy to take that. And the short answer is, yes, the aforementioned Enterprise Security Extension has this as one of the options. So what you can do, technically speaking, is you can access certain claims, like the subject domain name, from the certificate presented by the client and then do your authorization based on the contents of these claims. On top, of course, we would always recommend using certificates and some kind of encryption on top of authentication and authorization. But if you want to use the contents of the client certificate for authorization, this can absolutely be done.
Matthias Hofschen: 00:07:57.303 And then just to add here, it's actually really flexible to work with the Enterprise Security Extension. It defines a pipeline in which you can preprocess the certificate, take out values from it, and then pass it on for the authorization. And so it's actually a very good system.
What about the use of private security certificates rather than using a public certificate authority?
Gaurav Suman: 00:08:23.394 Sounds good. Thank you. And another thing which Florian, in a previous webinar around security, covered is using private security certificates, so not using a public certificate authority, which obviously has its own pros and cons. Is that something which you feel, from a customer use case perspective, specific areas where we can extend that flexibility to customers to use a private server for their digital certificates?
Florian Raschbichler: 00:08:56.207 So I'm going to take this, even though maybe Matthias might even be better suited to answer it, and I'm sure he will add to it. But from my point of view, I would say in a typical MQTT IoT use case using so-called self-signed certificates that can be signed by your company's own certificate authority actually are advantageous over the use of certificates that are created by a trusted certificate authority because then you can configure the broker in a way that you will only trust the certificates that are signed by your own certificate authority, thus increasing the security even more. And on a positive side note, it's also cheaper.
Matthias Hofschen: 00:09:46.133 Nothing to add.
Gaurav Suman: 00:09:46.482 Anything to add, Matthias?
Matthias Hofschen: 00:09:48.856 Nothing to add. All good.
Gaurav Suman: 00:09:50.032 Okay. And I mean, is it fair to also say that in a way, you are sort of reducing your threat quotient a little bit? Because when you are relying on, although, a public and perhaps a well-established certificate authority, it's still out there. It's not part of your organization. Is that an element which customers worry about?
Florian Raschbichler: 00:10:12.565 So I can speak from my experience. Like you mentioned in the intro, I've had ample customer interactions across the globe. That particular conversation typically does not come up from that way. Cost is more often a better argument. But I would agree in theory that, of course, you reduce the vectors for attack by removing a PKI that's outside of your control. And then depending on the size of the company, of course, these decisions sometimes might be on some security or compliance desk and are not necessarily for the developers of the IT applications to make as well. So it's also something to consider. And this is why with HiveMQ, you have all the options. You can use clustered CAs and just use the default trust stores on your machine or create your own.
Gaurav Suman: 00:11:12.470 Sounds good.
Matthias Hofschen: 00:11:13.622 And maybe one thing to add here, depends on the use case. So if you have devices and the broker from one company, then that's good with private certificates, with self-signed certificates. But if you're running a service and the client wants to have knowledge that the broker is trustworthy, then you might want to go with a public authority.
Gaurav Suman: 00:11:42.661 Makes sense. Thank you. So, Debello, we've given you the opportunity to unmute yourself if you like. I know you also have a follow-up question around Edge devices, which come with this enabled. So first, if you can ask a follow-up, and could you also clarify what you mean by enable — exactly what? So if you'd like, you can unmute yourself and ask a follow-up to Florian and Matthias.
Are certificates for authentication available as a default feature?
Debello Maceri: 00:12:10.825 Thanks, guys. The question is more around the device itself. So we're looking at getting devices. We have a device called an Arc. I forgot the specific name. I want to say it's an archive, what it's called. It's by a third-party vendor, I believe. And we were looking around devices that have used certificates for authentication, and we wanted to know if that was available as a default feature because I could not, for the life of me, find the configuration to use certification. So it was username and password.
Florian Raschbichler: 00:12:59.222 Okay. So yes, it's definitely possible. Again, you have to define, like Matthias mentioned, the Enterprise Security Extension pipeline accordingly. But also, if you're talking about authentication and not authorization, the authentication part of a certificate usually comes through ways of trusting the certificate you are signed. So this will be done on the broker side, independent of any extension, just to make sure that you only trust those certificates that are used by your devices. If you want to dig deeper into this and have some hands-on help, please feel free. Just send an email to support@hivemq.com, and my team will gladly pick it up and help you get it set up.
Debello Maceri: 00:13:59.085 Okay. Understood. Thank you.
Is there a way to reduce the topic string to reduce the data?
Gaurav Suman: 00:14:02.220 Excellent. Thank you for the question, Debello. There was another related question from Puja Jeven Kataria, and the question was around securing MQTT connection with the HiveMQ infrastructure. I think we did answer that. And they also asked, "How do we create certificates and keys?" So I think we've handled that quite appropriately here, so we'll keep moving. I would love to pick this question from Tobias Parson. So Tobias is asking, "For IoT devices, you often want to reduce your data amount. Is there a way to reduce the topic string to reduce the data, as in sending a two-byte key instead of repeating the same topic string for each message?" So they are using an example of a thermometer, where the transmission of data is just two bytes while the topic string might be excessively long. So they are concerned about the amount of data we transmit. So how about we help out Tobias on this?
Matthias Hofschen: 00:15:00.273 Yeah. So there is a feature in MQTT and then the MQTT definition Topic Alias that could be used in this case. Yep. And Florian posted the link in the chat.
Florian Raschbichler: 00:15:15.200 Yeah, there's a blog post describing it. There's also a video at the end, YouTube, for the visual learners amongst us, but in short, it does exactly what Tobias is asking for. So with MQTT 5 clients, the client and the broker can negotiate a set amount of so-called Topic Aliases, and then the string value will be changed to an integer. So how many bytes is an integer, Matthias? So I think it's eight, right, not two bytes. But you can definitely save a lot of data, especially if you always have the same topics and/or high-frequency messaging.
Matthias Hofschen: 00:16:03.742 I mean, it makes a lot of sense when you transfer a temperature value and the topic string is multiple times as large as the actual value that you're transferring.
Is there a tool which can create dynamic topics and apply security and business rules?
Gaurav Suman: 00:16:16.755 Right. Actually, while on the topic of topics, there's a bunch of questions on managing topics. And Paul is actually on the call, Paul Perez, and Paul is asking — and I'm going to start with this particular question, which has to do with features around topic management and what we can associate with that. So the question is — they're looking for a tool which can create dynamic topics and apply security and business rules, example being QoS, throttling, etc. on a per-topic basis. How can we help Paul here?
Matthias Hofschen: 00:16:55.363 So maybe the first thing to say here is that there is no need for a tool that dynamically creates topics. Topics are extremely lightweight in MQTT. That means as a publisher, if I publish to a topic, that topic starts to exist as soon as I publish a message to it, so there is no need to pre-create topics. So if somebody thinks about Kafka here, in Kafka, you pre-create a topic before you can use it. That's not the case in MQTT, certainly not with HiveMQ. And, Florian, do you want to pick the rest of the question, these business rules? I mean, we kind of answered this already.
Florian Raschbichler: 00:17:46.225 Yeah. But yeah, definitely add on to that. So again, when we talk about access of the clients to certain topics, certain QoS levels, and etc., again, here comes the so-often-mentioned Enterprise Security Extension into play. One of the options you can use it with would be, for example, an SQL database as the source of the authorization rules in there. This is what most of our customers choose to do. And then you could dynamically change those permissions in the database and then have dynamic permissions for your clients as well. When it comes to throttling on a per-topic basis, that is something where the extension SDK, so the ability to extend HiveMQ's capabilities to your own needs, comes into play, and you could certainly do a per-topic throttling as well.
Gaurav Suman: 00:18:58.637 Right. So I think what Matthias highlighted was so critical, right? There are multiple messages. And I hesitate to call them a messaging middleware as such, Apache Kafka, but I'm just thinking about data moving around, and there are ways and means to organize and streamline that data, and it requires — and for the case of Apache Kafka materials, as you're pointing out, you have to predetermine the topic structures and what those strings look like. And in the case of our technology and MQTT in general, that's not expected from a publisher — I should say, rather, that's not expected on the broker side. It's part of the architecture that you could send it to us, decide what QoS you want to transmit it at, and all the policy is managed on the broker level that, "Should we accept this data? What should we do with it?" Right? All of that decision-making is happening on the broker level. So the clients, overall, in a larger scheme of things, do not have to worry about getting themselves organized around particular topics. Of course, you need an architecture, you need to think through how it'll be all organized, but unlike Apache Kafka, there's a different degree of flexibility here.
Florian Raschbichler: 00:20:10.947 If you allow me, Gaurav, to —
Gaurav Suman: 00:20:12.772 Yeah. Go ahead, Florian.
Florian Raschbichler: 00:20:13.487 — quickly add on top of that. So what that also allows you, that situation, is that you can be much more creative and much wider in creating your topic tree and your topic structure, and you can, for example, take advantage of the publish/subscribe architecture and can multicast certain messages on topics but also can, even if you have a million devices connected, have a device-specific topic to address individual devices even through publish/subscribe. And as a quick anecdote from one of our biggest customers, I know that they — at any given time — have 160 million topics in their MQTT broker. So when we're saying it's more dynamic and more lightweight than other messaging technologies, it's really many, many magnitudes more flexible and dynamic.
Matthias Hofschen: 00:21:05.691 And that actually reminds me of one particular use case with ActiveMQ, where the biggest problem was that the ability to create topics was limited at some point. That wouldn't happen with something like HiveMQ.
Gaurav Suman: 00:21:19.953 Sounds good. Thank you, Matthias. So Paul is on the call, Jayashree. There's another couple of questions Paul had, and we'll cover those also, but let's see, particular to this dynamic topic management and whatever intelligence we can build into topics, if Paul has any follow-up question around that. So, Paul, you have the option to unmute yourself and ask a follow-up if you like.
Paul: 00:21:40.531 Yes. Thank you very much for the answer. It's very impressive to see that we can create many, many topics. And what we would like to do is to associate the service publication and channel. So we were creating a kind of channel management that will associate the service with many channels. It's the reason why we want to create them dynamically when the service provider published the service in our system. And it seems it's much better than the feedback we have from Kafka, in fact. It's lighter and maybe it's faster.
Florian Raschbichler: 00:22:25.765 Yes, exactly. In that case that you describe, neither the publisher nor the subscriber or in your case, you or your channel partners, need to concern themselves about managing the topics at all. Of course, the one thing you need to do is you have to agree on the same topic so that the data can follow through, of course.
Paul: 00:22:46.692 Yeah, it's exactly what we do. We have the central management that when you invoke a service, we reply with a channel number or channel identification. And then we are able to connect the provider and the consumer.
Comparing MQTT 3 and MQTT 5, what's the market share and adoption?
Gaurav Suman: 00:23:05.985 So Paul also had a couple of other questions. So, Paul, feel free to go unmute for a brief bit here because let's see what our experts have on the couple of other questions you've asked, and then you can unmute yourself to clarify, perhaps ask any follow-up you might have. So I'm going to address the two questions that Paul has because they are more around sort of the market share and the strategic choices some of the hyperscale companies make around MQTT. So one question Paul has is that between MQTT 3 and MQTT 5, what's the market share and adoption? And why is it that large hyperscale companies, and Paul particularly calls out AWS, does not talk about MQTT 3 specification? Absolutely not about 5, but they don't even talk about 3, or perhaps they talk about their limited compliance to MQTT 3. What's our take on this? And then we'll move to the second question where Paul is looking for an opinion on HTTP also in the similar context. How about, yeah, I open it up for whoever wants to answer? Go ahead, Matthias.
Matthias Hofschen: 00:24:13.042 I'd like to start maybe by saying that, of course, we don't know why AWS or Azure don't fully implement the MQTT protocol. I know from most of the — I mean, some of the websites, specifically Azure yesterday read that explicitly, they do not support the full MQTT specification, which is okay. That's their choice. Of course, from our perspective, only with the full implementation of the MQTT protocol do we have full interoperability. So it's too bad that they don't do it, but that's what they do. I want to point out here one thing. For example, if there is no retained message support, there cannot be a proper Sparkplug implementation possible. So yeah, I'm just going to let that sit there. In terms of market shares, I don't have an answer. Florian, do you know something about that?
Florian Raschbichler: 00:25:19.210 Yeah. Yes, I can. So we see a clear trend. So HiveMQ, we were literally the first broker on the market that supports MQTT 5. We were on the committee that specified it. A lot of stuff in there comes actually from previous HiveMQ features. So HiveMQ as a company, we are big advocates for MQTT 5 because it just makes sense. It's the next evolution step. And from the point of view — what we see in customers, so whenever we see what you would call a greenfield project, so where nothing is there yet and something gets built up from the ground, we see that on the backend application side of things where you have basically free choice of your library, MQTT 5 is extremely relevant, and I would probably say, in these cases, more than 90% MQTT 5. On the other hand, of course, there's a lot of devices that are already out there. There's PLC vendors that have MQTT support. There's all kinds of hardware with MQTT built in. And they usually, at this point, do not have MQTT 5 enabled yet, but we also see a trend here that more and more of that will come. So if our users have the choice, we see them going with MQTT 5 every time because there's really no downside to it — only benefits — because with a broker like HiveMQ, you can make the two protocols interoperable. So you can have some old devices in the field with an older version, like 3.1, but your backend operates fully with all the MQTT 5 features. And then with HiveMQ, you can take advantage of both.
Matthias Hofschen: 00:27:04.188 I want to pick up on that to make sure I show how important it is — if you're looking at migration scenarios — to have mixed support for MQTT 3 and MQTT 5 because it's simply impossible to upgrade every machine in a factory to a new protocol at the same point in time. It is really critical that a broker supports both specifications seamlessly. And I just want to point that out.
Gaurav Suman: 00:27:34.917 Thank you. Now I must point out the amazing resources we have on our website. And one of the things we, at HiveMQ, believe a lot in is education enablement, so you'll see MQTT 5 Essentials series. Florian is our coach on that journey, where you'll see him explaining some really critical features. So I invite you to check those out. Lots to learn there and very practical examples to begin to implement. I think from the perspective of what you mentioned before, Matthias, around, say, Sparkplug — and for people who do not know, that's a specification which is built on top of — or perhaps powered — by what MQTT 3 specifies and on top of which an additional specification has been written for the manufacturing industry. And then there are other standards. There are some questions around this that we'll pick up. But just going back to what Matthias was sharing before, if a company or a provider decides not to comply with that foundational specification, then what they can or cannot build on top of it will also carry that debt, right, and that debt keeps getting worse.
Is it fair to say that what HTTP was in the '90s is what MQTT is now in the 2020s?
Gaurav Suman: 00:28:45.675 So when somebody goes to AWS to say, "Hey, could you do Sparkplug because we love it? Here's the 10 things it does. We read about it," and they will come back a blank, or they'll have their own way to try and solve those problems, but it will not be the way the industry is approaching that problem because they chose not to follow those specifications. Something to think about from an architectural perspective. So, Paul, I'm going to also ask the other question you had and then see, as a follow-up, if you want to ask anything sort of holistically here. The other question Paul had was around HTTP, and they're asking, "Is it fair to say that MQTT is the HTTP, or what HTTP was in the '90s, is what MQTT is now in the 2020s?" And they're saying that there's a lot of development going on. Could everything, REST and HTTP, be potentially replaced by MQTT from an enterprise architecture perspective? So eager to know what your take is, Florian, Matthias.
Florian Raschbichler: 00:29:42.992 If it's okay, Matthias, I'd like to go first, and I'll let you go even deeper on the expertise. But just from a high-level perspective, what I like to say is basically, MQTT is to machines what HTTP is to humans. So we, at HiveMQ, everybody here on this panel, I believe, strongly believes that MQTT has already won the competition for the connectivity protocol for the Internet of Things. And if I interpret what Paul is saying here correctly, we also see a trend that some applications might also use MQTT, even though it's not technically a machine. So it's not only for machines, but it's certainly the best for machines. Whether or not it can replace HTTP or REST completely, maybe, Matthias, if you have some thoughts on that, but my take would be probably not, because in some cases, just a direct request-response still makes sense.
Matthias Hofschen: 00:30:52.161 Yeah, I surely hope it doesn't replace HTTP. I'm not sure what my browsing experience is going to be as a person. I mean, there is application for both protocols. That's for sure. MQTT is on the rise. We can all see that. And I think, yes, it's going to take over most of the IoT communication in the future. But HTTP will always stick around, I mean, just for how difficult it is to actually evolve a specification and change the whole ecosystem that is around it, just as a side note.
Gaurav Suman: 00:31:41.169 Sounds good. Paul, if you have a follow-up or anything you want to clarify from Florian and Matthias, feel free to unmute yourself. Great question, by the way. All great questions. Please go ahead.
Paul: 00:31:51.916 Thank you very much. I'm mainly working on integration, and today integration is mainly made by HTTP, REST, and so on. And we have issues on the quality of delivery and scalability. And that's my point of view, that as HTTP push away all the old protocols, such as COBOL or things like that — I'm not talking about internet, but let's say at the enterprise level, backend level, we can, in many places, replace HTTP by MQTT. Of course, it requires more development and more understanding from the developer, but it could be HTTP, or it could be something like Redis or something like that, but the asynchronous communication must be the future of the integration.
Matthias Hofschen: 00:32:55.574 Yeah, I could subscribe to that. I mean, let's just take an example of microservices that use HTTP for coordination that creates a very complicated web of interactions between different services. That could definitely be improved by a central message component that brokers messages for them back, as an example.
Gaurav Suman: 00:33:27.678 Sounds good. Thank you. The next one I want to pick — and actually, let me just see if Aloysius Lee — Aloysius, I don't know if you were there when we answered your question before, and I'm not sure what to offer you. But in case you heard the question and if you would like to ask a follow-up, you can type in the chat, and my colleague, Jayashree, will make sure that you have the option to unmute yourself. But I don't want to assume if you heard the answer to your question before or not, but if you want to unmute —
Aloysius Lee: 00:33:58.392 So —
Gaurav Suman: 00:33:59.141 Yes, please.
Aloysius Lee: 00:34:00.475 Sorry, I missed the answer because I dialed in a couple of minutes late, if you don't mind.
Gaurav Suman: 00:34:07.004 So how about I do this, Aloysius? I'll follow up with you. Of course, you can listen to the recording, but we talked about the authentication —
Aloysius Lee: 00:34:13.220 Okay. Thanks.
Gaurav Suman: 00:34:13.877 — authorization mechanism, and I can send you some literature about that and happy to follow up. But thank you for joining us. I appreciate it.
Florian Raschbichler: 00:34:22.776 Yes, I also just answered a question in the Q&A panel about MQTT security with links to blog posts, Aloysius. Those probably are also interesting for you.
Aloysius Lee: 00:34:32.714 Thank you.
What’s the practical number of messages that HiveMQ MQTT Broker can handle on a per-second basis on a single node?
Gaurav Suman: 00:34:35.234 Let me move to a question which was submitted in the Q&A pod here by Akshay Dumbre. Akshay is asking about practical number of — practical, I think that's the key word here, practical number of messages that the MQTT broker from HiveMQ can handle on a per-second basis on a single node and what would be the RAM and disk configuration, etc. associated with that.
Matthias Hofschen: 00:35:00.010 Florian, that's a question for you.
Florian Raschbichler: 00:35:03.150 Yes. And I see Matthias smiling, and he looks at me. This is, of course, a question we get quite regularly since our customers want to know the sizing of their deployment, etc. So the quick answer is it's not that simple. So there's so many factors about it that it's really hard to answer it just like that. We don't really calculate in single servers. HiveMQ is a distributed system, a cluster of servers. We take high availability really seriously, and we also take lack of message loss extremely seriously. So this is why we persist everything to disk. So for example, one key difference would be, "Do you have a single server, or do you have a cluster? Do you have in-memory persistence? Do you have disk persistence?" and then of course, "What's the number of topics? What's the number of clients? How big are the messages?" So it's really not easy to answer that right away. But I can tell you this: with a minimum requirement, HiveMQ deployment, which is 4 CPUs and 4 gigabytes of RAM with 100 gigabits of disk, you can certainly reach thousands of messages per second. And if you want to reach more than that, then we need to take deeper looks. And this is then where Matthias comes in or someone from his team and looks into topic structures or how we can optimize it. But a straight answer, we cannot give because it's dependent on so many factors, but certainly, thousands of messages a second are possible.
Matthias Hofschen: 00:36:48.394 Just to add to this, to make sure that it's clear. The minimum configuration for HiveMQ is four CPUs with four gigabytes of memory for the process and eight gigabytes for the overall box. That's the minimum configuration. And something that's often under — or not looked at enough is the speed of the disk, so the amount of IOPS that can be provided by the attached SSD. And it should be an SSD in this case because spinning disks run about 180 IOPS or something like that. That's not good if you want to have a high message throughput. So the higher the message throughput you want to reach, the more the IOPS performance of the disk should also be able to provide.
Gaurav Suman: 00:37:44.740 Thank you. So let's see if Akshay has a follow-up here. Jayashree, if you could give Akshay the right to unmute themselves and if they have a follow-up to ask. Akshay?
Akshay Dumbre: 00:37:56.276 Yeah, no, thank you very much. Thank you, guys.
What's the redundancy of HiveMQ?
Gaurav Suman: 00:37:59.050 No problem. Thank you for the question, Akshay. Actually, related to this, while we are on sort of node configuration, and we talked about being a distributed system, there's a question from Peter from Econext, and short question, they're asking, "What's the redundancy of HiveMQ?" So around the idea of overall reliability and redundancy, what's your take, experts?
Matthias Hofschen: 00:38:23.517 Yeah. Okay. So let me start this time. So the HiveMQ Professional and Enterprise Edition are basically clusterable. So it's basically a peer-to-peer setup. You can determine how many nodes you want to run, depending on the size of your use case. So let's say we'll decide on three nodes. The three nodes, let's say, have eight CPUs and eight gigabytes of memory. Then you can determine the replication factor for data that is kept. So to put this into perspective, the data that HiveMQ keeps in its persistence is message delivery-related. So what we persist is mostly messages in flight, so basically QoS1. So you publish a message and then a PUBACK
comes back. So if the communication gets interrupted for whatever reason and the PUBACK
has not been sent back, it's persisted. That state has persisted, and it can be picked up again. So that's what the replication factor is about.
Matthias Hofschen: 00:39:45.164 It's elastic, so that means that if you're observing a higher load, you can add nodes to the system, depending on your currently observed node. If you lose a node, the remaining nodes — or if I lose one of the three nodes, the remaining nodes can keep operating. All data is available. Of course, that assumes that the nodes are not loaded. Let's say if you have three nodes that are 80% CPU loaded, if you take one of them away, then the load is too much for the remaining two nodes. Please keep that in mind. But otherwise, that's basically how it works.
Gaurav Suman: 00:40:30.209 Anything to add, Florian?
Florian Raschbichler: 00:40:34.658 Yes, nothing to add.
Gaurav Suman: 00:40:35.867 Okay. Let's see if they are here on the call. I think, yeah, Peter is on the call. So, Peter, if you like, we'll give you the opportunity to unmute yourself and ask a follow-up if you like. Jayashree, would you please —?
How will HiveMQ determine when you should upgrade from using your free services?
Peter: 00:40:53.878 Hi there. No, I'm all good here. So I've got another question that we actually did not ask or that you didn't answer. How will HiveMQ determine the moment that you should upgrade from using your free services to having your own deployment of HiveMQ where it is a dedicated server for yourself?
Matthias Hofschen: 00:41:18.384 So this would be a HiveMQ cloud-related question, right?
Peter: 00:41:22.757 It's going to actually be a bit of both because we have IoT devices out in the field. We actually have another company that's basically doing our broker for us. But we've recently redone development, and for some reason, they don't want to allow our devices to connect. So ever since, we started doing the redevelopment, and we have been testing everything out on HiveMQ, and HiveMQ works so well.
Matthias Hofschen: 00:41:49.553 So I'm not quite sure I understand. So are you talking about the HiveMQ Community Edition versus the Professional Edition or the HiveMQ Cloud free version versus the paid version?
Peter: 00:42:05.694 My question is actually — how are you going to determine whether to stop allowing me to use the free version and forcing me to upgrade onto something that is paid for?
Matthias Hofschen: 00:42:19.926 Well, I mean, we wouldn't force you. Your use case would force you. So in essence, if you are running a use case with a certain number of clients and sending a certain number of messages, for the free version, you're going to come to a limit in terms of scalability. So basically, you're exhausting the ability of the system that you're using, which is not clusterable, and you have to upgrade because your use case requires more throughput and more possibility of connecting clients. Did I understand this correctly?
Peter: 00:43:06.893 Yes, yes, that's fine. And can you maybe disclose more about the numbers, basically, to explain how many devices would have to connect. If we have 1,000 devices, how many devices would those devices have to send? How many messages will our cap be at?
Matthias Hofschen: 00:43:29.288 Florian.
Florian Raschbichler: 00:43:29.288 So all of this information should be in detail on our website when it comes to the various cloud versions. I don't have the — I don't have the information at hand. So the free version goes up to 100 devices as far as I know, and then there's data volume restrictions as well. But all of the cloud versions, they also come with a control panel that shows you the current usage and see how much you used up this month. And I guess if you realize on the fifth of the month that you're already 50% done with your data volume, then it's probably a good idea to upgrade to the pay-as-you-go version — is how I would go about it.
Matthias Hofschen: 00:44:15.761 So how many devices do you have right now?
Peter: 00:44:20.600 Before we did the redevelopment, we're sitting at over 1,000 devices, and they are publishing at least 24 to 48 messages a day.
Matthias Hofschen: 00:44:31.955 Yeah. So you must already be on the pay-as-you-go because —
Peter: 00:44:35.923 No, this is on a different broker than HiveMQ.
Matthias Hofschen: 00:44:40.454 Oh, okay, okay, okay. So the pay-as-you-go goes up to 1,000 devices. And for the cloud, that would be then a dedicated one, once you surpass the 1,000. If you have your own installation, then we would probably talk about a Professional Edition.
Peter: 00:45:10.270 Unfortunately, I stay in South Africa, so everything would have to be hosted. And we have spoken to the people at HiveMQ before. It was just very long ago. And it was quite expensive for us to actually go with you guys.
Matthias Hofschen: 00:45:24.152 Oh, okay. Great.
Peter: 00:45:27.180 Okay. But that's all for me. Thank you.
Matthias Hofschen: 00:45:28.992 You're welcome.
Will the MQTT Sparkplug standard evolve?
Gaurav Suman: 00:45:29.452 Yeah, no problem, Peter, and yeah, feel free to reach out also via the website, and we can get us another conversation going. If it's been a while since you spoke to us about the overall solution, happy to follow up and help you out with your production requirements. Thank you for the question, Peter. So I'm going to move to a question around the — okay. We tackled this one from Peter already. There's a question around Sparkplug, and like I said, we are going to get to Sparkplug. "Will the MQTT Sparkplug standard evolve?" is the question. And this is somebody from a large manufacturer who's asking this question, so it's an interesting exercise and discovery perhaps they're doing. And what they're asking is that, for example, different topic structures for better supporting location information, like building a digital twin perhaps or decentralizing control of applications. So they're talking about particular industry requirements which can be met with then other evolution that it might require in the future. So what's our opinion on it?
Matthias Hofschen: 00:46:47.261 Florian?
Florian Raschbichler: 00:46:50.626 Sorry, I was distracted for a second there, to be honest.
Matthias Hofschen: 00:46:52.834 So the Sparkplug standard, so maybe upfront, we are sitting on the specification committee with our experts, so we're trying to also drive the specification forward. But it's clear that Sparkplug is centered around IIoT and tries to solve that as best as possible. We're actually also looking at other specifications, other payloads, and further specifications on top of MQTT that different verticals have. For example, for trains, there is a SFERA protocol available. So it's kind of interesting to also see the adoption of MQTT as the underlying protocol and then further definitions on top. Yeah, that's my take on this.
Gaurav Suman: 00:47:57.815 Anything to add, Florian?
Florian Raschbichler: 00:48:00.809 No, Matthias covered it as well. I mean, one thing to add, I personally think this is just the beginning, and there will be more and more industries adopting specific specifications that are built on top of MQTT. And I mean, I couldn't say 100% sure, but it's sure that it could be about anything. There will never be any data model restrictions in the MQTT protocol itself to whichever version we get. So I think it being data agnostic is one of its key principles.
Why does MQTT not specify the payload?
Gaurav Suman: 00:48:35.843 Sounds good. And unfortunately, the person who asked the question is not on the call here, so we can't do a follow-up. But there's another question submitted by Svetozar Yolov, and Svetozar's question is related to what you just said, right around being payload agnostic perhaps. And it's a very detailed question they've asked, but in essence, the question is: Why does MQTT not specify the payload? And if I may just mention a couple of things, in particular, from their question — they're saying that the broker does not care about the data structure, which is publishing the client, but in the end, all of that data must be consumed by the clients. It could be a historian or anything as an endpoint. So why is it that vendors are not able to agree on particular standards? And why does MQTT then not pick that responsibility? So it's a fairly broad question. And, Matthias and Florian, you've seen that question. It got submitted to us. What's your take on this overall? Please go ahead.
Matthias Hofschen: 00:49:44.513 So I mean, again, we've just said that one of the key strengths of MQTT is not to specify the payload format. That is better left to vertical implementations that define it on top, like SFERA or Sparkplug. Well, I think this is absolutely critical. It's as critical as HTTP, not defining how the HTML has to look like — or the other data formats have to look like that are transported via HTTP. That's how these protocols can be leveraged in many different situations with many different use cases and verticals.
Gaurav Suman: 00:50:32.872 Right. Anything to add, Florian?
Florian Raschbichler: 00:50:39.934 No, I think Matthias has covered it quite well.
How can we identify which subscriber has subscribed to which topics?
Gaurav Suman: 00:50:42.128 Sounds good. Sounds good. So let's go to a question in the Q&A pod, and this question was — actually, maybe we answered it offline. There was a question by Gaurav Upadhyay around — I don't recall, but they were looking to find out about subscription permissions. Did we answer that already offline or maybe not? Okay. In any case, let me read the question from the chat pod. And I think they are on the call, so we can also have them ask a follow-up if they like. So Gaurav's question is — I seem to have lost that question somehow. Yes, here we go. So how can we identify which subscriber has subscribed to which topics on host, on Android platform? And they're saying, "I mean, username." So do you think we should clarify the question, Florian, before we answer, or Matthias? We could do that if you like.
Matthias Hofschen: 00:51:50.037 Yeah, I mean, we can — yeah, go ahead. I mean, so my two cents on this would be that in order to know which topic a subscriber has subscribed to probably means that some other entity, maybe a backend application needs to know that information, then perhaps the extension SDK in terms of HiveMQ would be the right answer here.
Gaurav Suman: 00:52:23.267 Sure. Anything to add, Florian?
Florian Raschbichler: 00:52:27.500 Yeah, that's probably the best generic answer. Of course, also, the extension SDK, right, can provide that — or of course, you can also do a client snapshot on the Control Center in HiveMQ, and there, you can see a list of clients. And you can also see both the subscription that they have as well as what username they used to connect.
Gaurav Suman: 00:52:54.607 So I've seen some examples where —
Matthias Hofschen: 00:52:55.661 Perhaps just to —
Gaurav Suman: 00:52:57.352 Go ahead, Matthias.
Matthias Hofschen: 00:52:58.290 Just one short note here, an explanation. When we talk about extension SDK, you need to think about the HiveMQ Broker as kind of a Lego house, and you can plug in modules into the broker via the extension SDK, essentially listening to, for example, incoming publishes and getting a callback on an incoming publish and then taking an action — observing incoming subscriptions and then taking an action. So it's quite flexible, what additional functionality you can build with this kind of SDK.
Gaurav Suman: 00:53:43.875 Sounds good. Jayashree, I know we've got only five or so minutes left. I have one quick comment or perhaps a follow-up for Matthias and Florian on the point they just made, and then I'll hand it back to you for the poll we want to run. I was going to say, Matthias and Florian, I've seen examples where, as part of the connection process itself, there is an option where the client can query and get information on what rights do they have, what QoS can they publish on, and similar other information. Is that a feature of the extension SDK? Does the broker have to have the extension SDK enabled for that purpose?
Florian Raschbichler: 00:54:22.839 I mean, so the extension SDK, just for clarification, is not something that needs to be enabled. It's always there. It's just the user's choice to take advantage of it or not and build their own custom extension or not. But what you describe is potentially possible, but that sounds like a security nightmare to me. You don't want a — so the way MQTT works is if you try publishing to a topic you're not allowed, you get kicked out of the broker, right? So if you now tell the connecting client what it's allowed to or not, it still has some purpose, but it certainly removes some of the purpose. But yes, a custom extension would be needed to implement the functionality like that. Personally, maybe, Matthias, you have an opinion as well, but I would not recommend introducing a function like that, at least outside of developmental stages.
Matthias Hofschen: 00:55:32.738 Agreed.
Gaurav Suman: 00:55:33.874 Okay. Perfect. So, Jayashree, I think we've tackled everything. If there's anything which you saw which I may have perhaps missed, let me know. But in case not, then back to you.
Closing words
Jayashree Hegde: 00:55:45.696 Yeah, I think we have covered most of the questions. In case if we have left any, then we will get back to you over email. So I'll launch the second poll. I request you all to participate. In the meanwhile, we will share some of the resources we have been sharing during the session. We will be sending out all those links so that you can refer to it at a later point. Also, do check out our YouTube channel as well as our blog section. We will be continuously updating the content. And this recording will be shared, so you can get the recording in a follow-up email. Any closing thoughts, Matthias, Florian, Gaurav, before we end the session?
Florian Raschbichler: 00:56:35.558 Thank you, everybody. These were really, really great questions all around, noticing people thinking about security and reliability and stuff. So it's really, really great to see.
Matthias Hofschen: 00:56:48.158 Thanks, everyone.
Jayashree Hegde: 00:56:54.805 I think there are two more minutes left. I will leave the polls open. If anyone has any last questions, please submit them to the Q & A pod, or our community forum is always open for questions. You can always submit your questions there. And see you all in the next session. Thank you.
Gaurav Suman: 00:57:19.581 Thank you, Jayashree. Thank you, Matthias. Thank you, Florian.
Jayashree Hegde: 00:57:22.516 Bye-bye.
Matthias Hofschen: 00:57:23.583 Bye-bye.