Monitoring AWS Lambda with Vigilmon: API Gateway Health Checks, EventBridge Heartbeats & Multi-Region Strategy

AWS Lambda functions are invisible by default. They run, they sleep, they run again — but unless you're actively watching CloudWatch metrics, you won't know that your /api/checkout Lambda started throwing 502s three hours ago. A CloudWatch alarm helps, but it's inside your AWS account, configured by the same team that broke things. Vigilmon gives you an external, independent health check that fires alerts even when your AWS account itself has a problem.

This tutorial shows you how to wire Lambda-based APIs into Vigilmon for uptime monitoring, scheduled-job heartbeats, and smart alert thresholds.

What You'll Build

A Lambda health handler exposed through API Gateway
A Vigilmon HTTP monitor with cold-start-aware timeout settings
An EventBridge rule that drives heartbeat pings for scheduled Lambdas
A multi-region monitoring strategy

Prerequisites

AWS account with at least one Lambda function behind API Gateway (HTTP API or REST API)
AWS CLI or SAM/CDK configured locally
A free account at vigilmon.online

Step 1: Add a Health Handler to Your Lambda

Whether you use Node.js, Python, or Go, the pattern is the same: add a route for GET /health that checks your real dependencies.

Node.js (ESM)

// handlers/health.mjs
import { DynamoDBClient, DescribeTableCommand } from "@aws-sdk/client-dynamodb";

const dynamo = new DynamoDBClient({});

export async function handler(event) {
  if (event.requestContext?.http?.method !== "GET") {
    return { statusCode: 405, body: "Method Not Allowed" };
  }

  const checks = {};
  let ok = true;

  // DynamoDB connectivity probe
  try {
    await dynamo.send(new DescribeTableCommand({ TableName: process.env.TABLE_NAME }));
    checks.dynamodb = "ok";
  } catch (err) {
    checks.dynamodb = `error: ${err.message}`;
    ok = false;
  }

  // Downstream HTTP dependency
  try {
    const resp = await fetch(process.env.DOWNSTREAM_URL + "/ping", {
      signal: AbortSignal.timeout(2000),
    });
    checks.downstream = resp.ok ? "ok" : `http_${resp.status}`;
    if (!resp.ok) ok = false;
  } catch (err) {
    checks.downstream = `error: ${err.message}`;
    ok = false;
  }

  return {
    statusCode: ok ? 200 : 503,
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      status: ok ? "ok" : "degraded",
      region: process.env.AWS_REGION,
      checks,
    }),
  };
}

Python

# handlers/health.py
import json
import os
import boto3
import urllib.request
from botocore.exceptions import ClientError

dynamo = boto3.client("dynamodb")

def handler(event, context):
    checks = {}
    ok = True

    try:
        dynamo.describe_table(TableName=os.environ["TABLE_NAME"])
        checks["dynamodb"] = "ok"
    except ClientError as e:
        checks["dynamodb"] = f"error: {e.response['Error']['Code']}"
        ok = False

    status = "ok" if ok else "degraded"
    return {
        "statusCode": 200 if ok else 503,
        "headers": {"Content-Type": "application/json"},
        "body": json.dumps({"status": status, "region": os.environ.get("AWS_REGION"), "checks": checks}),
    }

Wire this as a separate route in your API Gateway configuration:

# SAM template excerpt
HealthFunction:
  Type: AWS::Serverless::Function
  Properties:
    Handler: handlers/health.handler
    Runtime: nodejs22.x
    Events:
      HealthApi:
        Type: HttpApi
        Properties:
          Path: /health
          Method: GET

Step 2: Configure Vigilmon with Cold-Start-Aware Timeouts

Lambda cold starts can add 200–800 ms (sometimes more for large JVM runtimes). Set your Vigilmon HTTP monitor accordingly.

Log in to Vigilmon → Add Monitor → HTTP.
URL: https://<api-id>.execute-api.<region>.amazonaws.com/health (or your custom domain).
Check interval: 60 seconds.
Response timeout: Set to 10 seconds to give cold-start headroom without masking real hangs.
Expected status: 200.
JSON assertion: path status, expected value ok.

Tip: Use Provisioned Concurrency or a keep-warm ping from Vigilmon's frequent checks to reduce cold starts in production. At 60-second check intervals, Lambda will typically stay warm between checks if it has recent traffic.

Step 3: EventBridge Heartbeat for Scheduled Lambdas

If you have a Lambda that runs on a schedule (data pipeline, cleanup job, report generator), a standard HTTP monitor won't tell you it stopped running. Use EventBridge to trigger the Lambda and have the Lambda itself ping Vigilmon on success.

EventBridge rule (CDK)

import * as events from "aws-cdk-lib/aws-events";
import * as targets from "aws-cdk-lib/aws-events-targets";

new events.Rule(this, "PipelineSchedule", {
  schedule: events.Schedule.rate(cdk.Duration.minutes(15)),
  targets: [new targets.LambdaFunction(pipelineFn)],
});

Lambda handler with heartbeat ping

// handlers/pipeline.mjs
export async function handler(event) {
  try {
    await runPipeline();

    // Ping Vigilmon heartbeat only on success
    await fetch(process.env.VIGILMON_HEARTBEAT_URL, { method: "POST" });
    console.log("Pipeline complete — heartbeat sent");
  } catch (err) {
    console.error("Pipeline failed — heartbeat withheld:", err);
    // Re-throw so Lambda marks the invocation as failed in EventBridge
    throw err;
  }
}

In Vigilmon, create a Heartbeat monitor and set the grace period to 20 minutes (for a 15-minute schedule). Store the heartbeat URL in an SSM Parameter:

aws ssm put-parameter \
  --name "/myapp/prod/vigilmon-heartbeat-url" \
  --value "https://vigilmon.online/api/heartbeat/<your-id>" \
  --type SecureString

Reference it in your SAM/CDK config as an environment variable.

Step 4: Multi-Region Lambda Monitoring

If you deploy Lambda to multiple AWS regions for redundancy, you need a monitor per region.

Deploy the same /health Lambda to each region.
In Vigilmon, create one HTTP monitor per regional API Gateway URL:
- https://<api-id>.execute-api.us-east-1.amazonaws.com/health
- https://<api-id>.execute-api.eu-west-1.amazonaws.com/health
Group them under a single Status Page — visitors see one aggregate status.
Wire the regional monitors to the same alert channels so you get one Slack message with the failing region in the body.

The /health response already includes "region": process.env.AWS_REGION, so when Vigilmon logs the response body for a failing check you'll see which region returned degraded.

Step 5: Alerting Strategy

| Alert type | Recommended channel | |---|---| | Primary on-call | Email or PagerDuty webhook | | Team visibility | Slack webhook | | Status page | Public Vigilmon status page embed |

Set alert escalation: notify immediately on first failure, re-notify after 5 minutes if still down. This avoids alert fatigue from transient Lambda throttling while still catching sustained outages fast.

What Vigilmon Catches That CloudWatch Misses

| Scenario | CloudWatch | Vigilmon | |---|---|---| | API Gateway returns 502 | Needs explicit alarm on IntegrationLatency | HTTP monitor catches immediately | | Cold start causes timeout | Needs P99 latency alarm | Response timeout fires alert | | Scheduled Lambda stops running | Needs EventBridge + SNS plumbing | Heartbeat grace period fires alert | | Your AWS account has an IAM issue | Your alarms may also break | Vigilmon is external — unaffected | | Region-level degradation | Per-region alarm config required | One monitor per region, same alert channel |

Serverless doesn't mean worry-free. Lambdas fail silently, cold starts surprise you, and scheduled jobs vanish without a trace. External monitoring from Vigilmon gives you the independent signal you need.

Start monitoring your Lambda functions in under 5 minutes — register free at vigilmon.online.