AWS Lambda functions are invisible by default. They run, they sleep, they run again — but unless you're actively watching CloudWatch metrics, you won't know that your /api/checkout Lambda started throwing 502s three hours ago. A CloudWatch alarm helps, but it's inside your AWS account, configured by the same team that broke things. Vigilmon gives you an external, independent health check that fires alerts even when your AWS account itself has a problem.
This tutorial shows you how to wire Lambda-based APIs into Vigilmon for uptime monitoring, scheduled-job heartbeats, and smart alert thresholds.
What You'll Build
- A Lambda health handler exposed through API Gateway
- A Vigilmon HTTP monitor with cold-start-aware timeout settings
- An EventBridge rule that drives heartbeat pings for scheduled Lambdas
- A multi-region monitoring strategy
Prerequisites
- AWS account with at least one Lambda function behind API Gateway (HTTP API or REST API)
- AWS CLI or SAM/CDK configured locally
- A free account at vigilmon.online
Step 1: Add a Health Handler to Your Lambda
Whether you use Node.js, Python, or Go, the pattern is the same: add a route for GET /health that checks your real dependencies.
Node.js (ESM)
// handlers/health.mjs
import { DynamoDBClient, DescribeTableCommand } from "@aws-sdk/client-dynamodb";
const dynamo = new DynamoDBClient({});
export async function handler(event) {
if (event.requestContext?.http?.method !== "GET") {
return { statusCode: 405, body: "Method Not Allowed" };
}
const checks = {};
let ok = true;
// DynamoDB connectivity probe
try {
await dynamo.send(new DescribeTableCommand({ TableName: process.env.TABLE_NAME }));
checks.dynamodb = "ok";
} catch (err) {
checks.dynamodb = `error: ${err.message}`;
ok = false;
}
// Downstream HTTP dependency
try {
const resp = await fetch(process.env.DOWNSTREAM_URL + "/ping", {
signal: AbortSignal.timeout(2000),
});
checks.downstream = resp.ok ? "ok" : `http_${resp.status}`;
if (!resp.ok) ok = false;
} catch (err) {
checks.downstream = `error: ${err.message}`;
ok = false;
}
return {
statusCode: ok ? 200 : 503,
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
status: ok ? "ok" : "degraded",
region: process.env.AWS_REGION,
checks,
}),
};
}
Python
# handlers/health.py
import json
import os
import boto3
import urllib.request
from botocore.exceptions import ClientError
dynamo = boto3.client("dynamodb")
def handler(event, context):
checks = {}
ok = True
try:
dynamo.describe_table(TableName=os.environ["TABLE_NAME"])
checks["dynamodb"] = "ok"
except ClientError as e:
checks["dynamodb"] = f"error: {e.response['Error']['Code']}"
ok = False
status = "ok" if ok else "degraded"
return {
"statusCode": 200 if ok else 503,
"headers": {"Content-Type": "application/json"},
"body": json.dumps({"status": status, "region": os.environ.get("AWS_REGION"), "checks": checks}),
}
Wire this as a separate route in your API Gateway configuration:
# SAM template excerpt
HealthFunction:
Type: AWS::Serverless::Function
Properties:
Handler: handlers/health.handler
Runtime: nodejs22.x
Events:
HealthApi:
Type: HttpApi
Properties:
Path: /health
Method: GET
Step 2: Configure Vigilmon with Cold-Start-Aware Timeouts
Lambda cold starts can add 200–800 ms (sometimes more for large JVM runtimes). Set your Vigilmon HTTP monitor accordingly.
- Log in to Vigilmon → Add Monitor → HTTP.
- URL:
https://<api-id>.execute-api.<region>.amazonaws.com/health(or your custom domain). - Check interval: 60 seconds.
- Response timeout: Set to 10 seconds to give cold-start headroom without masking real hangs.
- Expected status:
200. - JSON assertion: path
status, expected valueok.
Tip: Use Provisioned Concurrency or a keep-warm ping from Vigilmon's frequent checks to reduce cold starts in production. At 60-second check intervals, Lambda will typically stay warm between checks if it has recent traffic.
Step 3: EventBridge Heartbeat for Scheduled Lambdas
If you have a Lambda that runs on a schedule (data pipeline, cleanup job, report generator), a standard HTTP monitor won't tell you it stopped running. Use EventBridge to trigger the Lambda and have the Lambda itself ping Vigilmon on success.
EventBridge rule (CDK)
import * as events from "aws-cdk-lib/aws-events";
import * as targets from "aws-cdk-lib/aws-events-targets";
new events.Rule(this, "PipelineSchedule", {
schedule: events.Schedule.rate(cdk.Duration.minutes(15)),
targets: [new targets.LambdaFunction(pipelineFn)],
});
Lambda handler with heartbeat ping
// handlers/pipeline.mjs
export async function handler(event) {
try {
await runPipeline();
// Ping Vigilmon heartbeat only on success
await fetch(process.env.VIGILMON_HEARTBEAT_URL, { method: "POST" });
console.log("Pipeline complete — heartbeat sent");
} catch (err) {
console.error("Pipeline failed — heartbeat withheld:", err);
// Re-throw so Lambda marks the invocation as failed in EventBridge
throw err;
}
}
In Vigilmon, create a Heartbeat monitor and set the grace period to 20 minutes (for a 15-minute schedule). Store the heartbeat URL in an SSM Parameter:
aws ssm put-parameter \
--name "/myapp/prod/vigilmon-heartbeat-url" \
--value "https://vigilmon.online/api/heartbeat/<your-id>" \
--type SecureString
Reference it in your SAM/CDK config as an environment variable.
Step 4: Multi-Region Lambda Monitoring
If you deploy Lambda to multiple AWS regions for redundancy, you need a monitor per region.
- Deploy the same
/healthLambda to each region. - In Vigilmon, create one HTTP monitor per regional API Gateway URL:
https://<api-id>.execute-api.us-east-1.amazonaws.com/healthhttps://<api-id>.execute-api.eu-west-1.amazonaws.com/health
- Group them under a single Status Page — visitors see one aggregate status.
- Wire the regional monitors to the same alert channels so you get one Slack message with the failing region in the body.
The /health response already includes "region": process.env.AWS_REGION, so when Vigilmon logs the response body for a failing check you'll see which region returned degraded.
Step 5: Alerting Strategy
| Alert type | Recommended channel | |---|---| | Primary on-call | Email or PagerDuty webhook | | Team visibility | Slack webhook | | Status page | Public Vigilmon status page embed |
Set alert escalation: notify immediately on first failure, re-notify after 5 minutes if still down. This avoids alert fatigue from transient Lambda throttling while still catching sustained outages fast.
What Vigilmon Catches That CloudWatch Misses
| Scenario | CloudWatch | Vigilmon | |---|---|---| | API Gateway returns 502 | Needs explicit alarm on IntegrationLatency | HTTP monitor catches immediately | | Cold start causes timeout | Needs P99 latency alarm | Response timeout fires alert | | Scheduled Lambda stops running | Needs EventBridge + SNS plumbing | Heartbeat grace period fires alert | | Your AWS account has an IAM issue | Your alarms may also break | Vigilmon is external — unaffected | | Region-level degradation | Per-region alarm config required | One monitor per region, same alert channel |
Serverless doesn't mean worry-free. Lambdas fail silently, cold starts surprise you, and scheduled jobs vanish without a trace. External monitoring from Vigilmon gives you the independent signal you need.
Start monitoring your Lambda functions in under 5 minutes — register free at vigilmon.online.