How to Monitor RabbitMQ Uptime and Consumer Health with Vigilmon

RabbitMQ is the backbone of many event-driven systems — and one of the most dangerous services to leave unmonitored. When a consumer crashes and stops acknowledging messages, the queue grows unboundedly. When the broker itself goes down, publishers start buffering or dropping messages without any visible error to end users. Dead letter queues fill silently while the underlying problem festers.

Vigilmon gives you external uptime monitoring for the RabbitMQ management API and heartbeat monitoring for RabbitMQ consumers and scheduled message processors. This tutorial shows you how to configure both.

Why RabbitMQ Monitoring Matters

RabbitMQ ships with a management plugin that exposes a rich API, but it only helps when something is actively polling it. Internal monitoring (systemd, Docker health checks) only tells you the process is running — it cannot tell you:

Whether queues are backing up because consumers have crashed or slowed down
Whether dead letter queues are accumulating unprocessed failures
Whether the management API itself is responding correctly
Whether consumers have silently stopped processing without the broker noticing
Whether a network partition has split your broker cluster

External monitoring through Vigilmon catches these conditions before your queue depth reaches critical levels or your message SLA is breached.

Step 1: Enable the RabbitMQ Management Plugin

The RabbitMQ management plugin exposes an HTTP API for queue inspection and a built-in health check endpoint. Enable it if you haven't already:

rabbitmq-plugins enable rabbitmq_management

The management HTTP API is available at port 15672 by default. The health check endpoint is:

GET http://localhost:15672/api/health/checks/alarms

This returns {"status":"ok"} when the broker has no active alarms. It returns a 503 with details when something is wrong. RabbitMQ 3.8+ also provides:

GET /api/health/checks/node-is-mirror-sync-critical
GET /api/health/checks/virtual-hosts

Step 2: Build a Proxy Health Endpoint

For brokers on private networks, add a /health/rabbitmq endpoint to your application that proxies the RabbitMQ management API check:

# healthcheck.py — FastAPI RabbitMQ health proxy
import os, httpx
from fastapi import FastAPI
from fastapi.responses import JSONResponse

app = FastAPI()
RABBITMQ_API = os.environ.get("RABBITMQ_API_URL", "http://rabbitmq:15672")
RABBITMQ_USER = os.environ.get("RABBITMQ_USER", "guest")
RABBITMQ_PASS = os.environ.get("RABBITMQ_PASS", "guest")

@app.get("/health/rabbitmq")
async def rabbitmq_health():
    try:
        auth = (RABBITMQ_USER, RABBITMQ_PASS)
        async with httpx.AsyncClient() as client:
            # Check broker alarms
            resp = await client.get(
                f"{RABBITMQ_API}/api/health/checks/alarms",
                auth=auth, timeout=5
            )
            if resp.status_code != 200:
                return JSONResponse(status_code=503, content={
                    "status": "alarm",
                    "detail": resp.json(),
                })

            # Check queue depth on a critical queue
            q_resp = await client.get(
                f"{RABBITMQ_API}/api/queues/%2F/orders",
                auth=auth, timeout=5
            )
            q_data = q_resp.json()
            depth = q_data.get("messages", 0)

            if depth > 10000:
                return JSONResponse(status_code=503, content={
                    "status": "backlog",
                    "queue": "orders",
                    "depth": depth,
                })

        return JSONResponse(status_code=200, content={"status": "ok"})
    except Exception as e:
        return JSONResponse(status_code=503, content={"status": "down", "error": str(e)})

// healthcheck.js — Express RabbitMQ health proxy
const express = require('express');
const axios = require('axios');

const app = express();
const RABBITMQ_API = process.env.RABBITMQ_API_URL || 'http://rabbitmq:15672';
const auth = {
  username: process.env.RABBITMQ_USER || 'guest',
  password: process.env.RABBITMQ_PASS || 'guest',
};

app.get('/health/rabbitmq', async (req, res) => {
  try {
    const { data } = await axios.get(
      `${RABBITMQ_API}/api/health/checks/alarms`,
      { auth, timeout: 5000 }
    );

    if (data.status !== 'ok') {
      return res.status(503).json({ status: 'alarm', detail: data });
    }
    return res.status(200).json({ status: 'ok' });
  } catch (err) {
    return res.status(503).json({ status: 'down', error: err.message });
  }
});

app.listen(3001);

Verify it manually:

curl -i https://your-app.example.com/health/rabbitmq
# HTTP/1.1 200 OK
# {"status":"ok"}

Step 3: Configure a Vigilmon HTTP Monitor for RabbitMQ

Log in to vigilmon.online and go to Monitors → New Monitor
Choose HTTP / HTTPS
Set the URL to your RabbitMQ health endpoint
Set the check interval to 1 minute
Under Expected response, configure:
- Status code: 200
- Response body contains: "status":"ok"
- Response time threshold: 2000ms
Under Alert channels, assign your Slack or email channel
Save the monitor

What This Catches

| Failure | Internal tools | Vigilmon | |---|---|---| | RabbitMQ process crash | ✓ | ✓ | | Management API unresponsive | ✗ | ✓ | | Active broker alarm (disk/memory) | ✗ | ✓ | | Queue backlog exceeding threshold | ✗ | ✓ | | Network partition from app to broker | ✗ | ✓ |

Step 4: Monitor Queue Depth for Dead Letter Queues

Dead letter queues (DLQs) are where messages go when processing fails — either because a consumer rejects them, or because they expire. A growing DLQ is a strong signal that something in your message processing pipeline is broken.

Add a dedicated monitor per critical DLQ:

@app.get("/health/rabbitmq/dlq/{queue_name}")
async def dlq_health(queue_name: str, threshold: int = 100):
    try:
        auth = (RABBITMQ_USER, RABBITMQ_PASS)
        async with httpx.AsyncClient() as client:
            resp = await client.get(
                f"{RABBITMQ_API}/api/queues/%2F/{queue_name}.dlq",
                auth=auth, timeout=5
            )
            q = resp.json()
            depth = q.get("messages", 0)

        if depth > threshold:
            return JSONResponse(status_code=503, content={
                "status": "dlq_backlog",
                "queue": f"{queue_name}.dlq",
                "depth": depth,
                "threshold": threshold,
            })
        return JSONResponse(status_code=200, content={"status": "ok", "depth": depth})
    except Exception as e:
        return JSONResponse(status_code=503, content={"status": "down", "error": str(e)})

Create Vigilmon monitors per DLQ:

https://your-app.example.com/health/rabbitmq/dlq/orders
https://your-app.example.com/health/rabbitmq/dlq/payments
https://your-app.example.com/health/rabbitmq/dlq/notifications

Step 5: Heartbeat Monitoring for RabbitMQ Consumers

Consumers can stop processing messages without the broker detecting it. A consumer that has disconnected will have its messages requeued, but a consumer that is connected and not acknowledging will hold messages indefinitely (if using ack mode) or silently drop them (if using auto-ack).

Vigilmon's heartbeat monitors catch silent consumer death. Your consumer pings Vigilmon after each successful message processing cycle.

Set Up the Heartbeat Monitor

In Vigilmon, go to Monitors → New Monitor → Heartbeat
Set the name: rabbitmq-order-consumer
Set the expected interval based on your message volume (e.g., 2 minutes for a busy consumer)
Set the grace period: 5 minutes
Save and copy the heartbeat URL

Wire It Into Your Consumer

Python (aio-pika / asyncio):

import aio_pika, aiohttp, os, asyncio

VIGILMON_HB = os.environ["VIGILMON_HEARTBEAT_URL"]

async def process_message(message: aio_pika.IncomingMessage):
    async with message.process():
        # Your business logic
        await handle_order(message.body)
        # Ping heartbeat after successful processing
        async with aiohttp.ClientSession() as session:
            await session.get(VIGILMON_HB)

Node.js (amqplib):

const amqp = require('amqplib');
const axios = require('axios');

async function startConsumer() {
  const conn = await amqp.connect(process.env.RABBITMQ_URL);
  const ch = await conn.createChannel();
  await ch.assertQueue('orders', { durable: true });
  ch.prefetch(10);

  ch.consume('orders', async (msg) => {
    try {
      await processOrder(JSON.parse(msg.content.toString()));
      ch.ack(msg);
      // Ping heartbeat only after successful ack
      await axios.get(process.env.VIGILMON_HEARTBEAT_URL).catch(() => {});
    } catch (err) {
      ch.nack(msg, false, true); // requeue on failure
    }
  });
}

Ruby (bunny):

require 'bunny'
require 'net/http'

conn = Bunny.new(ENV['RABBITMQ_URL'])
conn.start
ch = conn.create_channel
queue = ch.queue('orders', durable: true)

queue.subscribe(manual_ack: true, block: true) do |delivery_info, _props, body|
  process_order(JSON.parse(body))
  ch.ack(delivery_info.delivery_tag)
  Net::HTTP.get(URI(ENV['VIGILMON_HEARTBEAT_URL']))
rescue => e
  ch.nack(delivery_info.delivery_tag, false, true)
  Rails.logger.error("Consumer error: #{e}")
end

Step 6: Alert Routing for Broker and Consumer Failures

In Vigilmon, configure alert priority by monitor type:

Broker health monitor → immediate Slack + PagerDuty (P1 — broker down means all consumers fail)
DLQ depth monitors → Slack + email (P2 — processing failures accumulating)
Consumer heartbeat monitors → Slack + email (P2 — consumers silently stopped)

Create a RabbitMQ Status Page in Vigilmon grouping broker health, DLQ monitors, and consumer heartbeats for a single pane of glass during incidents.

Summary

RabbitMQ failures are insidious — the broker can appear healthy while queues back up and consumers fail silently. Vigilmon gives you:

| Monitor Type | What It Covers | |---|---| | HTTP monitor on /health/rabbitmq | Broker uptime, alarms, queue depth | | HTTP monitors on DLQ endpoints | Dead letter accumulation per queue | | Heartbeat monitors | Consumer and processor liveness |

Get started free at vigilmon.online — your first RabbitMQ monitor is live in under two minutes.