Using HAProxy to Load Balance HiveMQ with the New Health API
In a HiveMQ deployment handling thousands of connected devices, it is often vital to make sure incoming MQTT traffic is evenly distributed across the nodes in the cluster. This is often accomplished by placing the HiveMQ network behind a load balancer. Load balancers can often be configured to perform a health check to detect when a node is down and stop routing traffic to it until it comes online again. This is a significant advantage of this configuration.
With the release of HiveMQ 4.14 we are happy to announce the introduction of the Health API. When configured and enabled, this API exposes HTTP endpoints that can be used by load balancers for these health checks.
This blog post describes how the Health API can be used with the HAProxy load balancer.
The Health API
To use the Health API we must first enable it by adding the following section to our HiveMQ config.xml
configuration file:
<health-api>
<enabled>true</enabled>
<listeners>
<http>
<bind-address>0.0.0.0</bind-address>
<port>8889</port>
</http>
</listeners>
</health-api>
This will expose two new probe endpoints on port 8889:
/api/v1/health/liveness
: The liveness probe will always return a status code of 200 if the broker is running,/api/v1/health/readiness
. The readiness probe will return a status code of 200 only when the broker is in an okay state and accepting incoming MQTT connections. We will only make use of the readiness probe in this example. For more information, please refer to our documentation on The Health API.
HAProxy
HAProxy is a simple and easy-to-install load balancer.
HAProxy distributes incoming MQTT connections to different nodes in the cluster
To use it with our HiveMQ cluster, we can create the following HAProxy configuration file called haproxy.cfg
:
global
stats socket /var/run/api.sock user haproxy group haproxy mode 660 level admin expose-fd listeners
log stdout format raw local0 info
defaults
log global
mode tcp
option tcplog
maxconn 1024000
timeout connect 30000
timeout client 600s
timeout server 600s
default-server init-addr last,libc,none
frontend stats
mode http
bind *:8404
stats enable
stats uri /stats
stats refresh 10s
stats admin if LOCALHOST
frontend health_frontend
mode tcp
option tcplog
bind *:8889
default_backend health_backend
backend health_backend
mode http
server HMQ1 HMQ-node1:8889
server HMQ2 HMQ-node2:8889
server HMQ3 HMQ-node3:8889
frontend mqtt_frontend
mode tcp
option tcplog
bind *:1883
default_backend mqtt_backend
backend mqtt_backend
mode tcp
stick-table type string len 32 size 100k expire 30m
stick on req.payload(0,0),mqtt_field_value(connect,client_identifier)
option httpchk
http-check send meth GET uri /api/v1/health/readiness
server HMQ1 HMQ-node1:1883 check port 8889
server HMQ2 HMQ-node2:1883 check port 8889
server HMQ3 HMQ-node3:1883 check port 8889
This will configure HAProxy to route all incoming MQTT traffic on port 1883 to one of three HiveMQ nodes in our cluster. We enable health checks for the nodes by adding check port 8889
after each server line in the backend mqtt_backend
section. The lines option httpchk
and http-check send meth GET uri /api/v1/health/readiness
specify that the health check is performed by accessing our readiness probe URI. HAProxy will periodically send a request to this URI on each node. A response with a 2xx or 3xx HTTP status code indicates to HAProxy that the node is healthy, which the Health API conforms to.
All other status codes are considered unhealthy, which might mean the particular node is not yet operating. HAProxy will stop routing traffic to these unhealthy nodes until it becomes healthy again.
The stats
frontend will allow us to see the current status of our network by visiting localhost:8404
in a browser.
Cluster Setup
We will use Docker Compose to run our three-node HiveMQ cluster on a local network behind HAProxy. First we will add the following section to our HiveMQ config.xml
file:
<cluster>
<enabled>true</enabled>
<transport>
<tcp>
<bind-address>0.0.0.0</bind-address>
<bind-port>7800</bind-port>
</tcp>
</transport>
<discovery>
<static>
<node>
<host>172.31.0.101</host>
<port>7800</port>
</node>
<node>
<host>172.31.0.102</host>
<port>7800</port>
</node>
<node>
<host>172.31.0.103</host>
<port>7800</port>
</node>
</static>
</discovery>
</cluster>
This will allow the HiveMQ nodes to discover each other and successfully form a cluster.
Next we will create a docker-compose.yml
file for the Docker Compose setup. With it, we will need to ensure the following:
We will need three instances of the latest HiveMQ Docker image and one instance of the HAProxy Docker image running on the same internal network with all necessary ports exposed.
Because we have specified static IP addresses in the HiveMQ configuration, each node must be assigned its correct IP when running in Docker.
We need to mount volumes to provide the
config.xml
file to the HiveMQ Docker containers, and thehaproxy.cfg
file to the HAProxy container.
version: "3.3"
services:
hmq-node1:
image: hivemq/hivemq4
networks:
hiveMQ.net:
ipv4_address: "172.31.0.101"
volumes:
- "./conf/:/opt/hivemq/conf"
hmq-node2:
image: hivemq/hivemq4
networks:
hiveMQ.net:
ipv4_address: "172.31.0.102"
volumes:
- "./conf/:/opt/hivemq/conf"
hmq-node3:
image: hivemq/hivemq4
networks:
hiveMQ.net:
ipv4_address: "172.31.0.103"
volumes:
- "./conf/:/opt/hivemq/conf"
haproxy-lb:
image: haproxytech/haproxy-alpine:2.6.6
networks:
- hiveMQ.net
volumes:
- "./haproxy/:/usr/local/etc/haproxy:ro"
ports:
- "1883:1883"
- "8889:8889"
- "8404:8404"
depends_on:
- hmq-node1
- hmq-node2
- hmq-node3
networks:
hiveMQ.net:
driver: bridge
ipam:
config:
- subnet: 172.31.0.0/24
gateway: 172.31.0.1
We will place our config.xml
file in a directory called conf
, and our haproxy.cfg
file in a directory called haproxy
. This allows us to mount these files as volumes and make them available to our Docker containers. Our directory structure should now look like this:
|- docker-compose.yml
|- conf/
|- config.xml
|- haproxy/
|- haproxy.cfg
We can now start our cluster by running the command docker-compose up
in this directory.
Demonstration
To test the functionality of the Health API in our setup, we can kill one of our nodes and monitor its status with HAProxy. To find the container ID for a node in our cluster we can use docker container ls
. The command docker container kill <CONTAINER_ID>
will then kill this node. If we visit localhost:8404
in our browser we can see a list of all nodes visible to HAProxy. The dead one should now appear as down. Restart the node again with docker container start <CONTAINER_ID>
, which will change its status back to running in the HAProxy stats overview.
See Also
Other load balancers than HAProxy can also be configured to use the Health API for health checks:
Some other useful links:
The hivemq-HAProxy repo, a ready-to-go example of using HiveMQ with HAProxy.
HiveMQ and Docker, an overview of running HiveMQ in Docker.