You shipped your Express.js API. It's running behind PM2 on a VPS. But how quickly do you find out when a process crash takes it down at 2am? Is your /api/users route still responding, or did a memory leak silently kill the worker four hours ago?
This tutorial covers production-grade uptime monitoring for Express.js applications using Vigilmon. We will walk through:
- A
/healthroute with dependency checks - Vigilmon HTTP monitoring and webhook alerts
- A PM2-aware heartbeat pattern so worker restarts and silent crashes trigger alerts
Prerequisites
- Node.js 18+
- An existing Express.js application
- PM2 installed globally (optional but recommended for production)
- A free account at vigilmon.online
Part 1: Add a health route
A health check endpoint gives your monitoring service something meaningful to ping. An empty 200 OK from your root route works, but a proper health route checks your real dependencies.
Basic health route
// routes/health.js
const express = require('express');
const router = express.Router();
router.get('/', async (req, res) => {
const checks = {};
let status = 'ok';
// Example: check database connectivity
try {
await req.app.locals.db.query('SELECT 1');
checks.database = 'ok';
} catch (err) {
checks.database = `error: ${err.message}`;
status = 'degraded';
}
// Example: check Redis
if (req.app.locals.redis) {
try {
await req.app.locals.redis.ping();
checks.redis = 'ok';
} catch (err) {
checks.redis = `error: ${err.message}`;
status = 'degraded';
}
}
const httpStatus = status === 'ok' ? 200 : 503;
return res.status(httpStatus).json({
status,
timestamp: new Date().toISOString(),
uptime: process.uptime(),
checks,
});
});
module.exports = router;
Mount it in your main app file:
// app.js
const express = require('express');
const healthRouter = require('./routes/health');
const app = express();
app.use(express.json());
// Mount health check before auth middleware so Vigilmon can reach it
app.use('/health', healthRouter);
// ... rest of your routes and middleware
module.exports = app;
Test it:
curl http://localhost:3000/health
{
"status": "ok",
"timestamp": "2026-06-29T07:00:00.000Z",
"uptime": 3612.4,
"checks": {
"database": "ok",
"redis": "ok"
}
}
When the database is unreachable the response becomes HTTP 503, which Vigilmon flags as a failure and fires an alert.
TypeScript version
// routes/health.ts
import { Router, Request, Response } from 'express';
const router = Router();
router.get('/', async (req: Request, res: Response): Promise<void> => {
const checks: Record<string, string> = {};
let status = 'ok';
try {
await (req.app.locals.pool as Pool).query('SELECT 1');
checks.database = 'ok';
} catch (err: unknown) {
checks.database = `error: ${(err as Error).message}`;
status = 'degraded';
}
res.status(status === 'ok' ? 200 : 503).json({
status,
timestamp: new Date().toISOString(),
checks,
});
});
export default router;
Part 2: Set up HTTP monitoring in Vigilmon
- Log in to vigilmon.online and click Add Monitor.
- Choose HTTP(S) monitor.
- Enter:
https://yourapi.example.com/health - Set interval to 1 minute.
- Add your alert channel (email, Slack webhook, or webhook URL).
- Click Save.
Vigilmon will now ping your health route every 60 seconds. The first non-2xx response or timeout triggers an immediate alert to your configured channel.
Skipping auth for the health route
If you use JWT middleware globally, exclude /health:
// app.js
const { authenticate } = require('./middleware/auth');
app.use('/health', healthRouter); // No auth — before the global authenticate middleware
app.use(authenticate); // Auth applied to everything after
app.use('/api', apiRouter);
Or use a more explicit exclusion pattern:
app.use((req, res, next) => {
if (req.path === '/health') return next();
return authenticate(req, res, next);
});
Part 3: Receive Vigilmon webhooks
Add a webhook alert endpoint to fan out Vigilmon DOWN/UP events to Slack, PagerDuty, or any internal system:
// routes/webhooks.js
const express = require('express');
const router = express.Router();
router.post('/vigilmon', express.json(), (req, res) => {
const { monitor_name, status, url, response_code, checked_at } = req.body;
if (status === 'down') {
console.error('[VIGILMON] Monitor DOWN', {
monitor: monitor_name,
url,
code: response_code,
at: checked_at,
});
// Forward to Slack, PagerDuty, internal systems, etc.
notifyOnCall({ monitor: monitor_name, url, code: response_code });
} else if (status === 'up') {
console.info('[VIGILMON] Monitor recovered', { monitor: monitor_name });
}
res.sendStatus(204);
});
function notifyOnCall({ monitor, url, code }) {
// Example: Slack incoming webhook
const slackUrl = process.env.SLACK_WEBHOOK_URL;
if (!slackUrl) return;
fetch(slackUrl, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
text: `*ALERT*: ${monitor} is DOWN\nURL: ${url}\nHTTP: ${code}`,
}),
}).catch(err => console.error('Slack notify failed:', err));
}
module.exports = router;
Mount it:
app.use('/webhook', require('./routes/webhooks'));
Vigilmon sends a POST with this payload on every DOWN and UP transition:
{
"monitor_id": "mon_abc123",
"monitor_name": "Production API /health",
"status": "down",
"url": "https://yourapi.example.com/health",
"checked_at": "2026-06-29T07:01:00Z",
"response_code": 503,
"response_time_ms": 1204
}
Part 4: Heartbeat monitoring with PM2
HTTP monitoring confirms the web tier responds. It will not catch a worker that is running but no longer processing jobs — or a PM2 cluster where all worker processes have been respawned into a crash loop and are restarting faster than Vigilmon's 1-minute poll.
The heartbeat pattern closes this gap: your worker pings a Vigilmon heartbeat URL on each successful processing cycle. If the pings stop, Vigilmon alerts you.
Create the heartbeat monitor in Vigilmon
- In Vigilmon, click Add Monitor.
- Choose Heartbeat monitor.
- Set expected interval to 5 minutes (match your worker's cycle time).
- Copy the unique URL:
https://vigilmon.online/heartbeat/abc123xyz
Worker heartbeat implementation
// workers/jobProcessor.js
const { parentPort } = require('worker_threads');
const HEARTBEAT_URL = process.env.VIGILMON_HEARTBEAT_URL;
const POLL_INTERVAL_MS = 5 * 60 * 1000; // 5 minutes
async function processJobs() {
// ... your actual job processing logic
const processed = await fetchAndProcessPendingJobs();
console.log(`Processed ${processed} jobs`);
return processed;
}
async function pingHeartbeat() {
if (!HEARTBEAT_URL) return;
try {
const res = await fetch(HEARTBEAT_URL, { signal: AbortSignal.timeout(10_000) });
if (!res.ok) {
console.warn(`Heartbeat ping returned ${res.status}`);
}
} catch (err) {
console.warn('Vigilmon heartbeat ping failed:', err.message);
}
}
async function run() {
while (true) {
try {
await processJobs();
await pingHeartbeat(); // Only ping on success — silent failure = no ping = Vigilmon alert
} catch (err) {
console.error('Job processing cycle error:', err);
// Don't ping — let Vigilmon detect the stalled heartbeat
}
await new Promise(resolve => setTimeout(resolve, POLL_INTERVAL_MS));
}
}
run().catch(err => {
console.error('Worker fatal error:', err);
process.exit(1);
});
Start it under PM2:
pm2 start workers/jobProcessor.js --name job-processor
pm2 save
PM2 ecosystem file
For a production setup, define everything in ecosystem.config.js:
module.exports = {
apps: [
{
name: 'api',
script: 'server.js',
instances: 'max',
exec_mode: 'cluster',
env: {
NODE_ENV: 'production',
PORT: 3000,
},
},
{
name: 'job-processor',
script: 'workers/jobProcessor.js',
instances: 1,
exec_mode: 'fork',
env: {
NODE_ENV: 'production',
VIGILMON_HEARTBEAT_URL: 'https://vigilmon.online/heartbeat/abc123xyz',
},
},
],
};
pm2 start ecosystem.config.js
pm2 save
PM2 monitoring via startup heartbeat
PM2 can run a custom script on process events. Use this to also ping Vigilmon when PM2 detects your app has restarted unexpectedly — a separate "restart storm" signal distinct from the job heartbeat:
// pm2-monitor.js — run with: pm2 start pm2-monitor.js
const pm2 = require('pm2');
pm2.connect(() => {
pm2.launchBus((err, bus) => {
bus.on('process:event', async ({ event, process: proc }) => {
if (event === 'restart' && proc.name === 'job-processor') {
console.log(`PM2: job-processor restarted (count: ${proc.pm2_env.restart_time})`);
// Optionally POST to your own webhook or Vigilmon webhook
}
});
});
});
Part 5: Memory and event loop health checks
For Node.js specifically, two failure modes are invisible to HTTP checks: memory leaks and event loop lag. Add them to your health route:
// routes/health.js (extended)
const v8 = require('v8');
router.get('/', async (req, res) => {
const heapStats = v8.getHeapStatistics();
const heapUsedMb = Math.round(heapStats.used_heap_size / 1024 / 1024);
const heapTotalMb = Math.round(heapStats.heap_size_limit / 1024 / 1024);
const heapPercent = Math.round((heapUsedMb / heapTotalMb) * 100);
const checks = {
heap_used_mb: heapUsedMb,
heap_total_mb: heapTotalMb,
heap_percent: heapPercent,
};
let status = 'ok';
// Warn if heap is over 85%
if (heapPercent > 85) {
status = 'degraded';
checks.heap_warning = 'Heap usage above 85%';
}
// ... existing database/redis checks
return res.status(status === 'ok' ? 200 : 503).json({
status,
timestamp: new Date().toISOString(),
uptime: process.uptime(),
checks,
});
});
With this in place, Vigilmon will return HTTP 503 when your heap is dangerously high, alerting you before Node.js OOM-kills the process.
Summary
Your Express.js application now has three layers of production monitoring:
/healthroute — real dependency checks (database, Redis, heap), returns HTTP 503 when degraded so Vigilmon fires an alert automatically.- Webhook endpoint — receives Vigilmon DOWN/UP events and routes them to Slack, PagerDuty, or any internal system.
- PM2 heartbeat worker — pings Vigilmon on each successful processing cycle; silence means the worker is dead or stuck.
This is the minimum viable monitoring setup for an Express.js application in production. Vigilmon handles the check scheduling, alert routing, and uptime history — you write the health logic that fits your app.
Monitor your Express app free at vigilmon.online
#nodejs #javascript #express #monitoring