Resilience Engineering of an MQTT Client Application
Resilience engineering deals with designing and developing systems that can withstand and recover from unexpected events and failures. In today’s world of connected devices and IoT, resilience engineering has become an essential aspect of software development, especially for applications that require real-time communication, like autonomous vehicles (connected cars), manufacturing and industrial process control systems, and QA in the food and beverage industry.
MQTT was designed with unreliable and unstable networks in mind, making it an ideal choice for communication between IoT devices, which often have limited connectivity options. This means that it is important to test the resilience of an MQTT client application in the face of connection losses and other unexpected events.
In this blog post, we demonstrate how to test the robustness of your MQTT client application.
What you need
To start resilience engineering of your MQTT client, you will need the following:
Assuming your client project uses Gradle, add this to your build.gradle.kts
, to obtain third-party dependencies:
Example
For demonstration, we have created a simple MQTT client application that publishes numbers from 0 to 100 at a 100ms interval. We use serial numbers to test that the ordering of the messages remains correct without duplicates or missing messages when the connection between the client and the server is impaired.
Note the following implementation details
The client library is configured to automatically reconnect. This means that the client tries to reconnect to the server once it detects a broken connection.
The client connects with a keepAlive of one second. If no MQTT control packets are successfully exchanged within this timeframe, the client regards a connection to the server as broken.
The client connects with a persistent session by setting
cleanStart=false
andnoSessionExpiry
. With this setting, all message flows between the server and the client will resume at the correct stage upon reconnecting.The client publishes the messages with Quality of Service EXACTLY_ONCE which means that no duplicates or message loss happens.
To test the application, we create an integration test with Junit5.
We use the
@Testcontainers
annotation so that the container lifecycle methods of all fields annotated with@Container
are invoked.We create a docker network to connect Toxiproxy and HiveMQ.
We instantiate a HiveMQ Container and add it to the docker network. We use
withNetworkAliases("hivemq")
to make HiveMQ reachable under that DNS address within the docker network.To simulate connection failures, we create a Toxiproxy container and add it to the docker network.
We use a Toxiproxy client to encapsulate the proxy connection between the
MqttApplication
and HiveMQ, allowing us to simulate network failures. Here, we provide the network alias of HiveMQ and the standard port for MQTT.To test that all messages are correctly published, we create a second MQTT client to receive them and connect them directly to the HiveMQ container. The client subscribes to
#
to receive all published messages.Now we instantiate and start the MQTT application to test. This starts the application to publish numbers from 0 to 100 at a 100ms interval. Rather than directly connecting to the HiveMQ container, we connect to the proxy that forwards the connection to HiveMQ.
After one second, we cut the connection between HiveMQ and the MQTT application using timeout. This stops all data from getting from the client to the server. The connection remains open, and data will be delayed until the connection is re-enabled.
When one second has passed, we re-enable the connection between HiveMQ and the client.
After another second, we cut the connection between HiveMQ and the MQTT application using bandwidth. This limits the connection to a maximum number of 0 kilobytes per second and leads to a failed MQTT connection.
We wait one second, and remove the limit to the connection bandwidth.
After this, we iterate over all messages that the test client from step 6 receives and assert that the messages are in order and no messages are duplicated or lost.
Conclusion
In conclusion, resiliency engineering testing is an essential step for ensuring the reliability and robustness of your MQTT client application. By simulating various failure scenarios and testing the application’s ability to handle them, you can identify weaknesses and vulnerabilities and make the necessary improvements.
We have demonstrated how to apply resilience engineering techniques to an MQTT client application with a simple example of continuously publishing numbers from 0 to 100 at a 100ms interval. Using Toxiproxy and HiveMQ together, we simulated network failures in a controlled environment.
Overall, by following the techniques outlined in this guide, you can increase the fault tolerance of your MQTT client application, and minimize the risk of downtime, data loss, and other negative consequences. So, don’t overlook the importance of resiliency engineering, and make it an integral part of your software development process.
HiveMQ Team
The HiveMQ team loves writing about MQTT, Sparkplug, Industrial IoT, protocols, how to deploy our platform, and more. We focus on industries ranging from energy, to transportation and logistics, to automotive manufacturing. Our experts are here to help, contact us with any questions.