AWS Lambda is one of the most widely adopted serverless platforms in the world. The operational model is compelling: no servers to provision, automatic scaling, and pay-per-invocation pricing. But "serverless" doesn't mean "worry-free." Lambda functions fail silently, cold starts inflate response times, and scheduled jobs stop running without any visible error. Vigilmon gives you the external visibility layer that Lambda's built-in metrics can't provide: an independent check that confirms your function actually responds correctly, from outside AWS.
This guide walks through adding production-grade uptime monitoring to your Lambda functions.
What You'll Build
- A health endpoint Lambda function that checks real dependencies
- A Vigilmon HTTP monitor targeting your function via API Gateway or Lambda URL
- A heartbeat monitor for scheduled (EventBridge) Lambda jobs
- Alert channels via Slack and webhook
Prerequisites
- An AWS account with Lambda and API Gateway (or Lambda Function URLs) configured
- Node.js or Python Lambda functions (examples cover both)
- A free account at vigilmon.online
Step 1: Add a Health Endpoint to Your Lambda Function
The simplest and most reliable monitoring pattern for Lambda is a dedicated /health route that probes your real dependencies — database, downstream APIs, cache — and returns a structured status. This goes beyond a static 200 and tells you whether the function is actually usable.
Node.js (Express + API Gateway)
// health.js
const { DynamoDBClient, DescribeTableCommand } = require("@aws-sdk/client-dynamodb");
const dynamo = new DynamoDBClient({ region: process.env.AWS_REGION });
exports.handler = async (event) => {
const checks = {};
let ok = true;
// DynamoDB probe — describe the table to verify IAM permissions + connectivity
try {
await dynamo.send(new DescribeTableCommand({ TableName: process.env.TABLE_NAME }));
checks.dynamodb = "ok";
} catch (err) {
checks.dynamodb = `error: ${err.message}`;
ok = false;
}
// Downstream API probe
try {
const resp = await fetch("https://api.example.com/ping", {
signal: AbortSignal.timeout(3000),
});
checks.upstream = resp.ok ? "ok" : `http_${resp.status}`;
if (!resp.ok) ok = false;
} catch (err) {
checks.upstream = `timeout_or_unreachable`;
ok = false;
}
return {
statusCode: ok ? 200 : 503,
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ status: ok ? "ok" : "degraded", checks }),
};
};
Python (Flask + Mangum or direct Lambda handler)
import json
import os
import urllib.request
import boto3
def health_handler(event, context):
checks = {}
ok = True
# S3 probe — check bucket accessibility
try:
s3 = boto3.client("s3")
s3.head_bucket(Bucket=os.environ["BUCKET_NAME"])
checks["s3"] = "ok"
except Exception as e:
checks["s3"] = f"error: {str(e)}"
ok = False
# RDS connectivity probe via a simple query
try:
import psycopg2
conn = psycopg2.connect(
host=os.environ["DB_HOST"],
database=os.environ["DB_NAME"],
user=os.environ["DB_USER"],
password=os.environ["DB_PASS"],
connect_timeout=3,
)
conn.close()
checks["rds"] = "ok"
except Exception as e:
checks["rds"] = f"error: {str(e)}"
ok = False
return {
"statusCode": 200 if ok else 503,
"headers": {"Content-Type": "application/json"},
"body": json.dumps({"status": "ok" if ok else "degraded", "checks": checks}),
}
Deploy this as a Lambda function and expose it through API Gateway or a Lambda Function URL. Your health endpoint URL will look like:
https://<api-id>.execute-api.<region>.amazonaws.com/prod/health
# or with Lambda Function URL:
https://<url-id>.lambda-url.<region>.on.aws/health
Step 2: Understanding Cold Start Impact on Monitoring
Lambda functions that haven't been invoked recently go "cold" — the runtime container is shut down and must be re-initialized on the next request. Cold starts add 200ms–2s of latency depending on runtime, memory, and package size.
This matters for monitoring because:
- A cold-start response may time out if your monitor has a tight timeout setting.
- Cold start latency looks like a performance incident in response time graphs.
- The first probe after idle may fail if your timeout is shorter than the cold start duration.
Vigilmon settings for cold-start-aware monitoring
When creating your monitor in Vigilmon:
- Set Timeout to at least 10 seconds for Lambda endpoints — this gives room for a cold start without false alarms.
- Set Check interval to 60 seconds — frequent checks also serve as a keep-warm mechanism, reducing cold starts for production traffic.
- Enable Multi-region consensus — Vigilmon probes from multiple regions simultaneously. A cold start in one region won't fire an alert unless other regions also see a failure.
This last point is critical. Lambda containers are per-region. A cold start in eu-west-1 doesn't mean your US traffic is affected. Multi-region consensus prevents cross-region cold start noise from waking your on-call.
Step 3: Configure Vigilmon HTTP Monitor
- Log in to Vigilmon and click Add Monitor → HTTP.
- Set URL to your Lambda health endpoint URL.
- Set Check interval to
60 seconds. - Set Timeout to
10 seconds. - Set Expected status code to
200. - Under Advanced → JSON body assertion, add:
- Path:
status - Expected value:
ok
- Path:
- Save and verify the first check appears green.
Adding alert channels
Navigate to Alerts → Channels and set up:
- Slack: paste your Slack incoming webhook URL — Vigilmon will post a message to your chosen channel when your Lambda goes down.
- Webhook: configure a webhook URL for PagerDuty, Opsgenie, or any HTTP endpoint to receive structured JSON incident payloads.
- Email: add your on-call email for direct paging.
Step 4: Heartbeat Monitor for Scheduled Lambda Functions
If your Lambda runs on an EventBridge schedule (cron or rate expression), a standard HTTP monitor won't catch it — the function has no public endpoint to probe between runs. Instead, configure the function to send a heartbeat ping to Vigilmon at the end of each successful run.
EventBridge-triggered Node.js function
// scheduled-job.js
exports.handler = async (event) => {
try {
await runYourScheduledWork();
// Notify Vigilmon the job completed successfully
await fetch(`https://vigilmon.online/api/heartbeat/${process.env.VIGILMON_HEARTBEAT_ID}`, {
method: "POST",
});
return { status: "ok" };
} catch (err) {
// Do NOT ping the heartbeat on failure — Vigilmon will fire an alert after the grace period
console.error("Scheduled job failed:", err);
throw err;
}
};
Set up the heartbeat in Vigilmon
- In Vigilmon, click Add Monitor → Heartbeat.
- Name it (e.g.,
nightly-report-job). - Set Grace period to a few minutes longer than your schedule interval — e.g., for a 15-minute EventBridge rate, set grace to 20 minutes.
- Copy the heartbeat URL and store it as a Lambda environment variable (
VIGILMON_HEARTBEAT_ID).
If Vigilmon doesn't receive a ping within the grace period, it fires an alert. This catches:
- Function crashes or unhandled exceptions
- EventBridge rule disabled or misconfigured
- IAM permission revocations that prevent execution
- Deployment failures that break the handler
Step 5: Webhook Integration for Incident Automation
Vigilmon can POST structured JSON to any webhook endpoint when a monitor changes state (up → down or down → up). This enables automation beyond simple alerting.
Example payload Vigilmon sends:
{
"monitor": "lambda-health",
"status": "down",
"url": "https://<your-lambda>.lambda-url.eu-west-1.on.aws/health",
"region": "eu-west-1",
"timestamp": "2026-06-30T14:22:00Z",
"response_time_ms": null,
"error": "Connection timed out"
}
You can wire this to a Lambda function that:
- Opens a GitHub issue or Jira ticket automatically
- Triggers an AWS Systems Manager Automation runbook
- Posts a structured incident update to a Slack channel
Configure the webhook URL in Alerts → Channels → Add Webhook.
Monitoring Coverage Summary
| Failure scenario | Detection method |
|---|---|
| Lambda function throws unhandled exception | HTTP monitor sees 5xx response |
| Downstream database is unreachable | Health check returns degraded (503) |
| Cold start causes timeout | HTTP monitor with 10s timeout catches runaway starts |
| Scheduled job silently stops running | Heartbeat monitor fires after grace period |
| Lambda function URL or API Gateway misconfigured | HTTP monitor returns connection error |
| SSL certificate on custom domain expires | Vigilmon certificate monitor alerts 14 days before expiry |
Lambda's abstraction layer makes deployments faster and operations simpler — but it doesn't eliminate the need for external monitoring. Vigilmon gives you the independent vantage point that AWS CloudWatch metrics can't: a probe from outside your AWS account that confirms the function responds correctly to real traffic, from multiple geographies, with noise-free alerting via multi-region consensus.
Set up monitoring for your Lambda functions in under 5 minutes — start free at vigilmon.online.
Tags: #aws #lambda #serverless #monitoring