How to Monitor MongoDB Uptime and Health with Vigilmon

MongoDB is the backbone of millions of modern applications — but replica set elections, oplog window exhaustion, and connection pool saturation can degrade your database silently. A secondary that falls too far behind the primary won't throw errors; it will just serve stale data until you notice. A connection pool at capacity won't crash your app; it will queue requests until response times spiral.

Vigilmon gives you external visibility into MongoDB health through HTTP probe monitoring and heartbeat monitoring for change streams and background jobs. This tutorial walks through both.

Why MongoDB Monitoring Needs More Than Process Checks

systemd, Docker health checks, and cloud-provider dashboards tell you the mongod process is running. They cannot tell you:

Whether MongoDB is reachable from your application servers across the network
Whether a replica set secondary has fallen behind the primary's oplog
Whether your connection pool is exhausted and queuing write operations
Whether a MongoDB change stream listener or background job has silently stopped
Whether an index hint is missing and a query is performing a full collection scan at scale

These are the failure modes that produce slow, degraded experiences without clean error signals. External monitoring through Vigilmon catches them by probing the actual connectivity and logic paths your application relies on.

Step 1: Build a MongoDB Health Endpoint

MongoDB does not expose an HTTP health endpoint natively. Add a thin health route to your existing application server, or deploy a lightweight sidecar process.

Node.js / Express Example

// health/mongodb.js
const express = require('express');
const { MongoClient } = require('mongodb');

const app = express();
const client = new MongoClient(process.env.MONGO_URL);

async function connect() {
  await client.connect();
}
connect().catch(console.error);

app.get('/health/mongodb', async (req, res) => {
  try {
    const db = client.db('admin');

    // Ping verifies basic connectivity
    await db.command({ ping: 1 });

    // Check replica set status
    let replicationLag = null;
    try {
      const status = await db.command({ replSetGetStatus: 1 });
      const primary = status.members.find(m => m.stateStr === 'PRIMARY');
      const self = status.members.find(m => m.self === true);
      if (primary && self && self.stateStr === 'SECONDARY') {
        replicationLag = Math.round(
          (primary.optimeDate - self.optimeDate) / 1000
        );
        if (replicationLag > 30) {
          return res.status(503).json({
            status: 'degraded',
            reason: 'replication_lag',
            lag_seconds: replicationLag,
          });
        }
      }
    } catch {
      // Not a replica set — standalone, skip replication check
    }

    return res.status(200).json({ status: 'ok', replication_lag_seconds: replicationLag });
  } catch (err) {
    return res.status(503).json({ status: 'down', error: err.message });
  }
});

app.listen(3002);

Python (FastAPI) Example

# health_mongodb.py
from fastapi import FastAPI
from fastapi.responses import JSONResponse
from motor.motor_asyncio import AsyncIOMotorClient
import os

app = FastAPI()
client = AsyncIOMotorClient(os.environ["MONGO_URL"])

@app.get("/health/mongodb")
async def mongodb_health():
    try:
        db = client.admin
        await db.command("ping")

        replication_lag = None
        try:
            status = await db.command("replSetGetStatus")
            primary = next((m for m in status["members"] if m["stateStr"] == "PRIMARY"), None)
            self_node = next((m for m in status["members"] if m.get("self")), None)
            if primary and self_node and self_node["stateStr"] == "SECONDARY":
                replication_lag = int(
                    (primary["optimeDate"] - self_node["optimeDate"]).total_seconds()
                )
                if replication_lag > 30:
                    return JSONResponse(status_code=503, content={
                        "status": "degraded",
                        "reason": "replication_lag",
                        "lag_seconds": replication_lag,
                    })
        except Exception:
            pass  # standalone

        return {"status": "ok", "replication_lag_seconds": replication_lag}
    except Exception as e:
        return JSONResponse(status_code=503, content={"status": "down", "error": str(e)})

Verify the endpoint manually before wiring up Vigilmon:

curl -i https://your-app.example.com/health/mongodb
# HTTP/1.1 200 OK
# {"status":"ok","replication_lag_seconds":null}

Step 2: Configure Vigilmon HTTP Monitor for MongoDB

Log in to vigilmon.online and go to Monitors → New Monitor
Choose HTTP / HTTPS
Set the URL to your health endpoint: https://your-app.example.com/health/mongodb
Set the check interval to 1 minute
Under Expected response, configure:
- Status code: 200
- Response body contains: "status":"ok"
- Response time threshold: 2000ms
Under Alert channels, assign your Slack or PagerDuty channel
Save the monitor

Vigilmon probes from multiple geographic regions simultaneously, requiring multi-region consensus before opening an incident. You get confident, actionable alerts rather than noise from transient single-probe hiccups.

Monitoring Replica Sets and Shards Separately

For replica sets, create one monitor per node type:

[mongo-primary] /health/mongodb — immediate P1 page on failure
[mongo-secondary-1] /health/mongodb — P2 Slack alert (degraded reads)
[mongo-secondary-2] /health/mongodb — P2 Slack alert

For sharded clusters, monitor the mongos router as well:

# mongos health endpoint — expose via your app the same way
curl -i https://your-app.example.com/health/mongos

Use Vigilmon's status page grouping to show all MongoDB monitors in a single pane for your team.

Step 3: Heartbeat Monitoring for Change Streams and Background Jobs

MongoDB change streams, migration scripts, and scheduled aggregation jobs run silently in the background. When they stall — due to a connection drop, oplog window exhaustion, or an unhandled exception — nothing throws an error. The job just stops.

Vigilmon heartbeat monitors detect silent stalls: your job pings Vigilmon after each successful cycle. If pings stop arriving within the expected window, Vigilmon fires an alert.

Set Up the Heartbeat Monitor

In Vigilmon, go to Monitors → New Monitor → Heartbeat
Set the name: mongodb-change-stream-listener
Set the expected interval: 5 minutes (adjust to your stream's event frequency)
Set the grace period: 10 minutes
Save — copy the unique heartbeat URL, e.g. https://vigilmon.online/heartbeat/abc123xyz

Wire It Into a Change Stream Listener

// changestream-worker.js
const { MongoClient } = require('mongodb');
const axios = require('axios');

const client = new MongoClient(process.env.MONGO_URL);

async function run() {
  await client.connect();
  const db = client.db('mydb');
  const collection = db.collection('orders');

  const changeStream = collection.watch([], { fullDocument: 'updateLookup' });

  changeStream.on('change', async (change) => {
    await processChange(change);
    // Ping Vigilmon after each successful change event
    await axios.get(process.env.VIGILMON_HEARTBEAT_URL).catch(() => {});
  });

  changeStream.on('error', (err) => {
    console.error('Change stream error:', err);
    // Heartbeat stops pinging — Vigilmon alerts within grace period
    process.exit(1);
  });
}

run();

For batch jobs (e.g., nightly aggregations), ping at the end of each successful run:

# nightly_aggregation.py
import requests, os

def run_aggregation():
    # ... your aggregation logic ...
    requests.get(os.environ["VIGILMON_HEARTBEAT_URL"], timeout=5)

if __name__ == "__main__":
    run_aggregation()

Step 4: Connection Pool and Slow Query Alert Routing

Connection pool exhaustion is one of the most common MongoDB production failures. Under high concurrency, every new request blocks waiting for a connection slot — and response times climb until the pool drains or a timeout fires.

Expose pool metrics in your health endpoint:

const poolStatus = client.topology?.s?.pool?.options;
// Or use the mongostat wrapper command
const serverStatus = await db.command({ serverStatus: 1 });
const connInfo = serverStatus.connections;

// Current vs available connections
const currentConnections = connInfo.current;
const availableConnections = connInfo.available;

if (availableConnections < 10) {
  return res.status(503).json({
    status: 'degraded',
    reason: 'connection_pool_near_exhaustion',
    current: currentConnections,
    available: availableConnections,
  });
}

For slow query detection, expose the currentOp slow queries count in your health response and set a Vigilmon response-body threshold alert to page when the count exceeds a threshold.

Configure alert routing in Vigilmon:

| Monitor | Alert Channel | Priority | |---|---|---| | MongoDB primary /health/mongodb | Slack + PagerDuty | P1 | | MongoDB secondaries /health/mongodb | Slack | P2 | | Heartbeat: change stream | Slack + email | P2 | | Heartbeat: nightly aggregation | Email | P3 |

Set response time thresholds as early warnings:

Alert at 500ms for the health endpoint (MongoDB ping should be near-instant)
Alert at 3000ms for application endpoints backed by MongoDB (signals index or pool issues)

Summary

MongoDB failures surface in subtle ways — replication lag, connection exhaustion, silent worker stalls — long before your users see errors. Vigilmon gives you external visibility across the full failure surface:

| Monitor Type | What It Covers | |---|---| | HTTP monitor on /health/mongodb | Connectivity, replica lag, connection pool | | HTTP monitor per replica/shard | Replica sync, shard availability | | Heartbeat monitor | Change stream liveness, job completion |

Get started free at vigilmon.online — your first MongoDB monitor is running in under two minutes.