RabbitMQ is the backbone of many event-driven systems — and one of the most dangerous services to leave unmonitored. When a consumer crashes and stops acknowledging messages, the queue grows unboundedly. When the broker itself goes down, publishers start buffering or dropping messages without any visible error to end users. Dead letter queues fill silently while the underlying problem festers.
Vigilmon gives you external uptime monitoring for the RabbitMQ management API and heartbeat monitoring for RabbitMQ consumers and scheduled message processors. This tutorial shows you how to configure both.
Why RabbitMQ Monitoring Matters
RabbitMQ ships with a management plugin that exposes a rich API, but it only helps when something is actively polling it. Internal monitoring (systemd, Docker health checks) only tells you the process is running — it cannot tell you:
- Whether queues are backing up because consumers have crashed or slowed down
- Whether dead letter queues are accumulating unprocessed failures
- Whether the management API itself is responding correctly
- Whether consumers have silently stopped processing without the broker noticing
- Whether a network partition has split your broker cluster
External monitoring through Vigilmon catches these conditions before your queue depth reaches critical levels or your message SLA is breached.
Step 1: Enable the RabbitMQ Management Plugin
The RabbitMQ management plugin exposes an HTTP API for queue inspection and a built-in health check endpoint. Enable it if you haven't already:
rabbitmq-plugins enable rabbitmq_management
The management HTTP API is available at port 15672 by default. The health check endpoint is:
GET http://localhost:15672/api/health/checks/alarms
This returns {"status":"ok"} when the broker has no active alarms. It returns a 503 with details when something is wrong. RabbitMQ 3.8+ also provides:
GET /api/health/checks/node-is-mirror-sync-critical
GET /api/health/checks/virtual-hosts
Step 2: Build a Proxy Health Endpoint
For brokers on private networks, add a /health/rabbitmq endpoint to your application that proxies the RabbitMQ management API check:
# healthcheck.py — FastAPI RabbitMQ health proxy
import os, httpx
from fastapi import FastAPI
from fastapi.responses import JSONResponse
app = FastAPI()
RABBITMQ_API = os.environ.get("RABBITMQ_API_URL", "http://rabbitmq:15672")
RABBITMQ_USER = os.environ.get("RABBITMQ_USER", "guest")
RABBITMQ_PASS = os.environ.get("RABBITMQ_PASS", "guest")
@app.get("/health/rabbitmq")
async def rabbitmq_health():
try:
auth = (RABBITMQ_USER, RABBITMQ_PASS)
async with httpx.AsyncClient() as client:
# Check broker alarms
resp = await client.get(
f"{RABBITMQ_API}/api/health/checks/alarms",
auth=auth, timeout=5
)
if resp.status_code != 200:
return JSONResponse(status_code=503, content={
"status": "alarm",
"detail": resp.json(),
})
# Check queue depth on a critical queue
q_resp = await client.get(
f"{RABBITMQ_API}/api/queues/%2F/orders",
auth=auth, timeout=5
)
q_data = q_resp.json()
depth = q_data.get("messages", 0)
if depth > 10000:
return JSONResponse(status_code=503, content={
"status": "backlog",
"queue": "orders",
"depth": depth,
})
return JSONResponse(status_code=200, content={"status": "ok"})
except Exception as e:
return JSONResponse(status_code=503, content={"status": "down", "error": str(e)})
// healthcheck.js — Express RabbitMQ health proxy
const express = require('express');
const axios = require('axios');
const app = express();
const RABBITMQ_API = process.env.RABBITMQ_API_URL || 'http://rabbitmq:15672';
const auth = {
username: process.env.RABBITMQ_USER || 'guest',
password: process.env.RABBITMQ_PASS || 'guest',
};
app.get('/health/rabbitmq', async (req, res) => {
try {
const { data } = await axios.get(
`${RABBITMQ_API}/api/health/checks/alarms`,
{ auth, timeout: 5000 }
);
if (data.status !== 'ok') {
return res.status(503).json({ status: 'alarm', detail: data });
}
return res.status(200).json({ status: 'ok' });
} catch (err) {
return res.status(503).json({ status: 'down', error: err.message });
}
});
app.listen(3001);
Verify it manually:
curl -i https://your-app.example.com/health/rabbitmq
# HTTP/1.1 200 OK
# {"status":"ok"}
Step 3: Configure a Vigilmon HTTP Monitor for RabbitMQ
- Log in to vigilmon.online and go to Monitors → New Monitor
- Choose HTTP / HTTPS
- Set the URL to your RabbitMQ health endpoint
- Set the check interval to 1 minute
- Under Expected response, configure:
- Status code:
200 - Response body contains:
"status":"ok" - Response time threshold:
2000ms
- Status code:
- Under Alert channels, assign your Slack or email channel
- Save the monitor
What This Catches
| Failure | Internal tools | Vigilmon | |---|---|---| | RabbitMQ process crash | ✓ | ✓ | | Management API unresponsive | ✗ | ✓ | | Active broker alarm (disk/memory) | ✗ | ✓ | | Queue backlog exceeding threshold | ✗ | ✓ | | Network partition from app to broker | ✗ | ✓ |
Step 4: Monitor Queue Depth for Dead Letter Queues
Dead letter queues (DLQs) are where messages go when processing fails — either because a consumer rejects them, or because they expire. A growing DLQ is a strong signal that something in your message processing pipeline is broken.
Add a dedicated monitor per critical DLQ:
@app.get("/health/rabbitmq/dlq/{queue_name}")
async def dlq_health(queue_name: str, threshold: int = 100):
try:
auth = (RABBITMQ_USER, RABBITMQ_PASS)
async with httpx.AsyncClient() as client:
resp = await client.get(
f"{RABBITMQ_API}/api/queues/%2F/{queue_name}.dlq",
auth=auth, timeout=5
)
q = resp.json()
depth = q.get("messages", 0)
if depth > threshold:
return JSONResponse(status_code=503, content={
"status": "dlq_backlog",
"queue": f"{queue_name}.dlq",
"depth": depth,
"threshold": threshold,
})
return JSONResponse(status_code=200, content={"status": "ok", "depth": depth})
except Exception as e:
return JSONResponse(status_code=503, content={"status": "down", "error": str(e)})
Create Vigilmon monitors per DLQ:
https://your-app.example.com/health/rabbitmq/dlq/ordershttps://your-app.example.com/health/rabbitmq/dlq/paymentshttps://your-app.example.com/health/rabbitmq/dlq/notifications
Step 5: Heartbeat Monitoring for RabbitMQ Consumers
Consumers can stop processing messages without the broker detecting it. A consumer that has disconnected will have its messages requeued, but a consumer that is connected and not acknowledging will hold messages indefinitely (if using ack mode) or silently drop them (if using auto-ack).
Vigilmon's heartbeat monitors catch silent consumer death. Your consumer pings Vigilmon after each successful message processing cycle.
Set Up the Heartbeat Monitor
- In Vigilmon, go to Monitors → New Monitor → Heartbeat
- Set the name:
rabbitmq-order-consumer - Set the expected interval based on your message volume (e.g., 2 minutes for a busy consumer)
- Set the grace period: 5 minutes
- Save and copy the heartbeat URL
Wire It Into Your Consumer
Python (aio-pika / asyncio):
import aio_pika, aiohttp, os, asyncio
VIGILMON_HB = os.environ["VIGILMON_HEARTBEAT_URL"]
async def process_message(message: aio_pika.IncomingMessage):
async with message.process():
# Your business logic
await handle_order(message.body)
# Ping heartbeat after successful processing
async with aiohttp.ClientSession() as session:
await session.get(VIGILMON_HB)
Node.js (amqplib):
const amqp = require('amqplib');
const axios = require('axios');
async function startConsumer() {
const conn = await amqp.connect(process.env.RABBITMQ_URL);
const ch = await conn.createChannel();
await ch.assertQueue('orders', { durable: true });
ch.prefetch(10);
ch.consume('orders', async (msg) => {
try {
await processOrder(JSON.parse(msg.content.toString()));
ch.ack(msg);
// Ping heartbeat only after successful ack
await axios.get(process.env.VIGILMON_HEARTBEAT_URL).catch(() => {});
} catch (err) {
ch.nack(msg, false, true); // requeue on failure
}
});
}
Ruby (bunny):
require 'bunny'
require 'net/http'
conn = Bunny.new(ENV['RABBITMQ_URL'])
conn.start
ch = conn.create_channel
queue = ch.queue('orders', durable: true)
queue.subscribe(manual_ack: true, block: true) do |delivery_info, _props, body|
process_order(JSON.parse(body))
ch.ack(delivery_info.delivery_tag)
Net::HTTP.get(URI(ENV['VIGILMON_HEARTBEAT_URL']))
rescue => e
ch.nack(delivery_info.delivery_tag, false, true)
Rails.logger.error("Consumer error: #{e}")
end
Step 6: Alert Routing for Broker and Consumer Failures
In Vigilmon, configure alert priority by monitor type:
- Broker health monitor → immediate Slack + PagerDuty (P1 — broker down means all consumers fail)
- DLQ depth monitors → Slack + email (P2 — processing failures accumulating)
- Consumer heartbeat monitors → Slack + email (P2 — consumers silently stopped)
Create a RabbitMQ Status Page in Vigilmon grouping broker health, DLQ monitors, and consumer heartbeats for a single pane of glass during incidents.
Summary
RabbitMQ failures are insidious — the broker can appear healthy while queues back up and consumers fail silently. Vigilmon gives you:
| Monitor Type | What It Covers |
|---|---|
| HTTP monitor on /health/rabbitmq | Broker uptime, alarms, queue depth |
| HTTP monitors on DLQ endpoints | Dead letter accumulation per queue |
| Heartbeat monitors | Consumer and processor liveness |
Get started free at vigilmon.online — your first RabbitMQ monitor is live in under two minutes.