For an organizational unit that operates an MQTT broker, the behavior of the MQTT clients is not always entirely under their control. Certain client behaviors can put the availability of the broker at risk, especially in case of high load. The HiveMQ MQTT broker uses an Overload Protection mechanism to mitigate this risk effectively.

Overload protection prevents outages when MQTT clients temporarily send more packets than the resource-constrained broker can process. The broker achieves this by determining whether it should stop accepting more MQTT packets based on a set of criteria. By not reading more data from the TCP socket of the publisher, the broker avoids running out of memory, which is crucial for maintaining availability.

With backpressure applied to the clients, the broker can gracefully handle the temporary spikes in traffic (aka flash crowd) by spreading the workload over a longer period of time.

The main challenge for such an Overload Protection mechanism is that it must be responsive enough to prevent outages, but not overly aggressive so it doesn’t unnecessarily throttle incoming traffic.

For an MQTT-broker cluster, there is the additional challenge that high load can lead to cascading failure. Once a broker instance runs out of memory because there is too much unfinished work in progress, the workload for the remaining broker instances will usually increase. This is because the clients previously connected to the broker that ran out of memory will reconnect to the remaining broker instances, which means the same workload has to be processed by fewer brokers. This can lead to more service degradation. A more in-depth explanation can be found in the blog HiveMQ: High Availability Through Replication and Failover.

Overload Protection

The Overload Protection mechanism limits the number of MQTT PUBLISH, SUBSCRIBE, and UNSUBSCRIBE packets that each broker instance accepts from each connected MQTT client independently. In addition, the Overload Protection subsystem limits the total number of MQTT CONNECT packets that each broker instance accepts from all the clients attempting to connect.

The original Overload Protection mechanism of HiveMQ Broker limited the rate of MQTT packets the broker accepted per second. The computed rate was sometimes overly restrictive, but the new approach ensures better adaptation to rapid changes in usage, providing more balanced overload protection while leveraging available broker resources effectively.

Overload Protection was redesigned to limit the number of packets the broker processes in parallel. It is no longer necessary to estimate the rate of incoming packets that the broker can process. It is also more in line with similar popular mechanisms like TCP congestion control. Companies that cater to large user groups, such as Netflix, also employ similar concurrent request limiting approaches.

For CONNECT packets, the Overload Protection would previously decline packets that exceed the limit by sending a CONNACK with reason code 137 “Server busy” and closing the connection. The new take on Overload Protection will attempt to buffer excess CONNECT packets and only decline the connection attempt if a limit for the buffer is reached. The buffer is capped to ensure the broker doesn’t run out of memory.

For every MQTT client, the broker only processes a limited number of MQTT packets in parallel. When the limit is reached, the broker will not read from the TCP socket until it has finished the immediate work required to process the MQTT packet, such as persisting to the disk and replication over the network. Typically, the broker considers the process to be done when it sends a PUBACK, PUBCOMP, SUBACK, or UNSUBACK packet.

There is one limit per client for all PUBLISH, SUBSCRIBE, and UNSUBSCRIBE packets. Every packet counts as one packet, regardless of its type or size.

The number of packets that the broker accepts to process concurrently is dynamically computed based on the total current load on the whole HiveMQ Broker cluster. When the load is low, each broker instance will accept up to 100 packets per client in parallel. This window gradually shrinks down to 1 packet as the load increases. This means the broker will process packets sent by the same client sequentially when the load is high.

Figure 1 shows a sequence diagram that depicts how a broker would update the number of packets in progress in an exemplary cluster use case.

Figure 1: Overload Protection v2 PUBLISH, SUBSCRIBE Example

In the sequence diagram above, the broker cluster contains two nodes: Broker A and Broker B. MQTT client Client 1 sends a PUBLISH packet to Broker A. Broker A increases the number of packets that are in process for Client 1 to 1. Broker A enqueues the PUBLISH and replicates the queue entry to Broker B. Broker B stores the replicated queue entry. Broker A then receives a SUBSCRIBE MQTT packet from Client 1 and increases the number of packets in progress to 2.

Broker A then adds the new subscription to the session of Client 1 and replicates the subscription to Broker B. When Broker A receives the response for the queue entry replication, it decreases the number of packets in progress to 1. Once Broker A receives the confirmation that the replication of the subscription is complete, it decreases the number of packets in progress for Client 1 to 0.

CONNECT Limit

Each broker only processes a limited number of CONNECT MQTT packets in parallel. When the limit is reached, the excess CONNECT packets will be buffered. If the buffer limit is reached, the broker responds with a CONNACK with reason code 137 “server busy.”

Figure 2 shows how a broker would update the number of CONNECT packets that are in progress in an exemplary use case. As before, the cluster consists of two broker instances.

Figure 2: Overload Protection v2 CONNECT Example

In the sequence diagram above, an MQTT client Client 1 sends a CONNECT packet to a broker instance Broker A. Broker A increases the number of connects in progress to 1. Broker A updates the session information of Client 1 and replicates the session to another broker instance Broker B. Another client Client 2 issues a CONNECT packet to Broker A. Broker A increases the number of connects in progress to 2 and repeats session storage and replication. Once Broker A has received a confirmation that the replication of the session for Client 1 is complete, it decreases the number of connections in progress to 1. When Broker A receives a confirmation that the replication of the session for Client 2 is also complete, it decreases the number of connections in progress back to 0.

Results

We’ve done two tests to compare the new overload protection (HiveMQ 4.28.10) to the old (HiveMQ 4.28.9 and before). The new Overload Protection mechanism prevented all reproducible cases of service degradation that Overload Protection was expected to prevent. In addition, new overload protection nearly halved the connection time for 200,000 clients (see Figure 4).

Figure 3 shows CPU usage and incoming PUBLISH rate over time when a single MQTT client sends PUBLISH packets to 10,000 subscribers as fast as possible. The clients are connected to the cluster with 3 broker instances. The graphs show that the brokers now process more PUBLISH packets while making better use of the available compute resources.

Figure 3: Publish rate metrics for a single publisher

Figure 4 shows the number of created client sessions and the total connect attempts in a test with 200,000 clients connecting to 3 brokers simultaneously. This shows that it is now much less likely that the broker will decline any connect attempts, even when many clients send an MQTT CONNECT packet at the same time.

Figure 4: Connect attempt for 200,000 connections (sessions count includes replicas)

Robust and Responsive Overload Protection for HiveMQ Broker

Overload Protection

CONNECT Limit

Results

Lukas Brandl

Site Broker or Not? How to Decide for Modern IIoT and Edge Deployments

A Step-by-Step Guide to Connecting Ignition to MQTT and HiveMQ

Deploying HiveMQ on AWS ECS vs. AWS EKS: Pros and Cons

Overload Protection

PUBLISH, SUBSCRIBE and UNSUBSCRIBE Limit

CONNECT Limit

Results

Lukas Brandl

Site Broker or Not? How to Decide for Modern IIoT and Edge Deployments

A Step-by-Step Guide to Connecting Ignition to MQTT and HiveMQ

Deploying HiveMQ on AWS ECS vs. AWS EKS: Pros and Cons