Uptime monitoring for Express.js applications

You shipped your Express.js API. It's running behind PM2 on a VPS. But how quickly do you find out when a process crash takes it down at 2am? Is your /api/users route still responding, or did a memory leak silently kill the worker four hours ago?

This tutorial covers production-grade uptime monitoring for Express.js applications using Vigilmon. We will walk through:

A /health route with dependency checks
Vigilmon HTTP monitoring and webhook alerts
A PM2-aware heartbeat pattern so worker restarts and silent crashes trigger alerts

Prerequisites

Node.js 18+
An existing Express.js application
PM2 installed globally (optional but recommended for production)
A free account at vigilmon.online

Part 1: Add a health route

A health check endpoint gives your monitoring service something meaningful to ping. An empty 200 OK from your root route works, but a proper health route checks your real dependencies.

Basic health route

// routes/health.js
const express = require('express');
const router = express.Router();

router.get('/', async (req, res) => {
  const checks = {};
  let status = 'ok';

  // Example: check database connectivity
  try {
    await req.app.locals.db.query('SELECT 1');
    checks.database = 'ok';
  } catch (err) {
    checks.database = `error: ${err.message}`;
    status = 'degraded';
  }

  // Example: check Redis
  if (req.app.locals.redis) {
    try {
      await req.app.locals.redis.ping();
      checks.redis = 'ok';
    } catch (err) {
      checks.redis = `error: ${err.message}`;
      status = 'degraded';
    }
  }

  const httpStatus = status === 'ok' ? 200 : 503;

  return res.status(httpStatus).json({
    status,
    timestamp: new Date().toISOString(),
    uptime: process.uptime(),
    checks,
  });
});

module.exports = router;

Mount it in your main app file:

// app.js
const express = require('express');
const healthRouter = require('./routes/health');

const app = express();

app.use(express.json());

// Mount health check before auth middleware so Vigilmon can reach it
app.use('/health', healthRouter);

// ... rest of your routes and middleware

module.exports = app;

Test it:

curl http://localhost:3000/health

{
  "status": "ok",
  "timestamp": "2026-06-29T07:00:00.000Z",
  "uptime": 3612.4,
  "checks": {
    "database": "ok",
    "redis": "ok"
  }
}

When the database is unreachable the response becomes HTTP 503, which Vigilmon flags as a failure and fires an alert.

TypeScript version

// routes/health.ts
import { Router, Request, Response } from 'express';

const router = Router();

router.get('/', async (req: Request, res: Response): Promise<void> => {
  const checks: Record<string, string> = {};
  let status = 'ok';

  try {
    await (req.app.locals.pool as Pool).query('SELECT 1');
    checks.database = 'ok';
  } catch (err: unknown) {
    checks.database = `error: ${(err as Error).message}`;
    status = 'degraded';
  }

  res.status(status === 'ok' ? 200 : 503).json({
    status,
    timestamp: new Date().toISOString(),
    checks,
  });
});

export default router;

Part 2: Set up HTTP monitoring in Vigilmon

Log in to vigilmon.online and click Add Monitor.
Choose HTTP(S) monitor.
Enter: https://yourapi.example.com/health
Set interval to 1 minute.
Add your alert channel (email, Slack webhook, or webhook URL).
Click Save.

Vigilmon will now ping your health route every 60 seconds. The first non-2xx response or timeout triggers an immediate alert to your configured channel.

Skipping auth for the health route

If you use JWT middleware globally, exclude /health:

// app.js
const { authenticate } = require('./middleware/auth');

app.use('/health', healthRouter); // No auth — before the global authenticate middleware
app.use(authenticate);            // Auth applied to everything after
app.use('/api', apiRouter);

Or use a more explicit exclusion pattern:

app.use((req, res, next) => {
  if (req.path === '/health') return next();
  return authenticate(req, res, next);
});

Part 3: Receive Vigilmon webhooks

Add a webhook alert endpoint to fan out Vigilmon DOWN/UP events to Slack, PagerDuty, or any internal system:

// routes/webhooks.js
const express = require('express');
const router = express.Router();

router.post('/vigilmon', express.json(), (req, res) => {
  const { monitor_name, status, url, response_code, checked_at } = req.body;

  if (status === 'down') {
    console.error('[VIGILMON] Monitor DOWN', {
      monitor: monitor_name,
      url,
      code: response_code,
      at: checked_at,
    });

    // Forward to Slack, PagerDuty, internal systems, etc.
    notifyOnCall({ monitor: monitor_name, url, code: response_code });
  } else if (status === 'up') {
    console.info('[VIGILMON] Monitor recovered', { monitor: monitor_name });
  }

  res.sendStatus(204);
});

function notifyOnCall({ monitor, url, code }) {
  // Example: Slack incoming webhook
  const slackUrl = process.env.SLACK_WEBHOOK_URL;
  if (!slackUrl) return;

  fetch(slackUrl, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      text: `*ALERT*: ${monitor} is DOWN\nURL: ${url}\nHTTP: ${code}`,
    }),
  }).catch(err => console.error('Slack notify failed:', err));
}

module.exports = router;

Mount it:

app.use('/webhook', require('./routes/webhooks'));

Vigilmon sends a POST with this payload on every DOWN and UP transition:

{
  "monitor_id": "mon_abc123",
  "monitor_name": "Production API /health",
  "status": "down",
  "url": "https://yourapi.example.com/health",
  "checked_at": "2026-06-29T07:01:00Z",
  "response_code": 503,
  "response_time_ms": 1204
}

Part 4: Heartbeat monitoring with PM2

HTTP monitoring confirms the web tier responds. It will not catch a worker that is running but no longer processing jobs — or a PM2 cluster where all worker processes have been respawned into a crash loop and are restarting faster than Vigilmon's 1-minute poll.

The heartbeat pattern closes this gap: your worker pings a Vigilmon heartbeat URL on each successful processing cycle. If the pings stop, Vigilmon alerts you.

Create the heartbeat monitor in Vigilmon

In Vigilmon, click Add Monitor.
Choose Heartbeat monitor.
Set expected interval to 5 minutes (match your worker's cycle time).
Copy the unique URL: https://vigilmon.online/heartbeat/abc123xyz

Worker heartbeat implementation

// workers/jobProcessor.js
const { parentPort } = require('worker_threads');

const HEARTBEAT_URL = process.env.VIGILMON_HEARTBEAT_URL;
const POLL_INTERVAL_MS = 5 * 60 * 1000; // 5 minutes

async function processJobs() {
  // ... your actual job processing logic
  const processed = await fetchAndProcessPendingJobs();
  console.log(`Processed ${processed} jobs`);
  return processed;
}

async function pingHeartbeat() {
  if (!HEARTBEAT_URL) return;
  try {
    const res = await fetch(HEARTBEAT_URL, { signal: AbortSignal.timeout(10_000) });
    if (!res.ok) {
      console.warn(`Heartbeat ping returned ${res.status}`);
    }
  } catch (err) {
    console.warn('Vigilmon heartbeat ping failed:', err.message);
  }
}

async function run() {
  while (true) {
    try {
      await processJobs();
      await pingHeartbeat(); // Only ping on success — silent failure = no ping = Vigilmon alert
    } catch (err) {
      console.error('Job processing cycle error:', err);
      // Don't ping — let Vigilmon detect the stalled heartbeat
    }

    await new Promise(resolve => setTimeout(resolve, POLL_INTERVAL_MS));
  }
}

run().catch(err => {
  console.error('Worker fatal error:', err);
  process.exit(1);
});

Start it under PM2:

pm2 start workers/jobProcessor.js --name job-processor
pm2 save

PM2 ecosystem file

For a production setup, define everything in ecosystem.config.js:

module.exports = {
  apps: [
    {
      name: 'api',
      script: 'server.js',
      instances: 'max',
      exec_mode: 'cluster',
      env: {
        NODE_ENV: 'production',
        PORT: 3000,
      },
    },
    {
      name: 'job-processor',
      script: 'workers/jobProcessor.js',
      instances: 1,
      exec_mode: 'fork',
      env: {
        NODE_ENV: 'production',
        VIGILMON_HEARTBEAT_URL: 'https://vigilmon.online/heartbeat/abc123xyz',
      },
    },
  ],
};

pm2 start ecosystem.config.js
pm2 save

PM2 monitoring via startup heartbeat

PM2 can run a custom script on process events. Use this to also ping Vigilmon when PM2 detects your app has restarted unexpectedly — a separate "restart storm" signal distinct from the job heartbeat:

// pm2-monitor.js — run with: pm2 start pm2-monitor.js
const pm2 = require('pm2');

pm2.connect(() => {
  pm2.launchBus((err, bus) => {
    bus.on('process:event', async ({ event, process: proc }) => {
      if (event === 'restart' && proc.name === 'job-processor') {
        console.log(`PM2: job-processor restarted (count: ${proc.pm2_env.restart_time})`);
        // Optionally POST to your own webhook or Vigilmon webhook
      }
    });
  });
});

Part 5: Memory and event loop health checks

For Node.js specifically, two failure modes are invisible to HTTP checks: memory leaks and event loop lag. Add them to your health route:

// routes/health.js (extended)
const v8 = require('v8');

router.get('/', async (req, res) => {
  const heapStats = v8.getHeapStatistics();
  const heapUsedMb = Math.round(heapStats.used_heap_size / 1024 / 1024);
  const heapTotalMb = Math.round(heapStats.heap_size_limit / 1024 / 1024);
  const heapPercent = Math.round((heapUsedMb / heapTotalMb) * 100);

  const checks = {
    heap_used_mb: heapUsedMb,
    heap_total_mb: heapTotalMb,
    heap_percent: heapPercent,
  };

  let status = 'ok';

  // Warn if heap is over 85%
  if (heapPercent > 85) {
    status = 'degraded';
    checks.heap_warning = 'Heap usage above 85%';
  }

  // ... existing database/redis checks

  return res.status(status === 'ok' ? 200 : 503).json({
    status,
    timestamp: new Date().toISOString(),
    uptime: process.uptime(),
    checks,
  });
});

With this in place, Vigilmon will return HTTP 503 when your heap is dangerously high, alerting you before Node.js OOM-kills the process.

Summary

Your Express.js application now has three layers of production monitoring:

/health route — real dependency checks (database, Redis, heap), returns HTTP 503 when degraded so Vigilmon fires an alert automatically.
Webhook endpoint — receives Vigilmon DOWN/UP events and routes them to Slack, PagerDuty, or any internal system.
PM2 heartbeat worker — pings Vigilmon on each successful processing cycle; silence means the worker is dead or stuck.

This is the minimum viable monitoring setup for an Express.js application in production. Vigilmon handles the check scheduling, alert routing, and uptime history — you write the health logic that fits your app.

Monitor your Express app free at vigilmon.online

#nodejs #javascript #express #monitoring