Monitoring AWS App Runner with Vigilmon
AWS App Runner is the simplest way to run containers on AWS. You point it at a container image or a source code repository, and App Runner handles the deployment, load balancing, auto-scaling, and TLS — no VPCs, no ECS task definitions, no ALB configuration required.
But "managed" doesn't mean "observable." App Runner can restart your service when it crashes, but it won't tell you your /api/orders endpoint was returning 500s for six minutes before the restart happened. Vigilmon fills that gap with external, independent HTTP monitoring that catches failures the moment they start — not after the fact in CloudWatch logs.
This tutorial covers:
- A health endpoint in your App Runner service
- Vigilmon external HTTP monitor setup
- Alerting on deployment failures
- Heartbeat monitoring for background tasks
Step 1: Add a health check route to your service
App Runner uses a configurable health check path to determine if your service is healthy. Make that path check your real dependencies.
Node.js / Express:
// src/health.js
import { DynamoDBDocumentClient, GetCommand } from '@aws-sdk/lib-dynamodb';
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
const dynamo = DynamoDBDocumentClient.from(new DynamoDBClient({}));
app.get('/health', async (req, res) => {
const checks = {};
let healthy = true;
// DynamoDB probe
try {
await dynamo.send(new GetCommand({
TableName: process.env.TABLE_NAME,
Key: { pk: '_health', sk: 'probe' },
}));
checks.dynamodb = 'ok';
} catch (err) {
checks.dynamodb = `error: ${err.message}`;
healthy = false;
}
// Downstream HTTP dependency
try {
const resp = await fetch(process.env.DOWNSTREAM_URL + '/ping', {
signal: AbortSignal.timeout(3000),
});
checks.downstream = resp.ok ? 'ok' : `http_${resp.status}`;
if (!resp.ok) healthy = false;
} catch (err) {
checks.downstream = `error: ${err.message}`;
healthy = false;
}
res.status(healthy ? 200 : 503).json({
status: healthy ? 'ok' : 'degraded',
checks,
timestamp: new Date().toISOString(),
});
});
Python / FastAPI:
# app/health.py
import os
import boto3
from fastapi import APIRouter, Response
from botocore.exceptions import ClientError
from datetime import datetime, timezone
router = APIRouter()
dynamo = boto3.client('dynamodb')
@router.get('/health')
async def health(response: Response):
checks = {}
healthy = True
try:
dynamo.describe_table(TableName=os.environ['TABLE_NAME'])
checks['dynamodb'] = 'ok'
except ClientError as e:
checks['dynamodb'] = f"error: {e.response['Error']['Code']}"
healthy = False
response.status_code = 503
return {
'status': 'ok' if healthy else 'degraded',
'checks': checks,
'timestamp': datetime.now(tz=timezone.utc).isoformat(),
}
Keep the health check fast — App Runner polls it every few seconds and slow responses can trigger unnecessary replacements.
Step 2: Configure App Runner health checks
App Runner health check settings live in the service configuration. Set them to use your /health route:
AWS Console:
- Go to App Runner → your service → Configuration → Health check
- Set Protocol to
HTTP - Set Path to
/health - Set Interval to
10seconds - Set Timeout to
5seconds - Set Healthy threshold to
1 - Set Unhealthy threshold to
5
CloudFormation / CDK:
// lib/app-runner-stack.ts
import * as apprunner from 'aws-cdk-lib/aws-apprunner';
const service = new apprunner.CfnService(this, 'AppRunnerService', {
serviceName: 'my-app',
sourceConfiguration: {
imageRepository: {
imageIdentifier: 'public.ecr.aws/my-org/my-app:latest',
imageRepositoryType: 'ECR_PUBLIC',
imageConfiguration: {
port: '8080',
},
},
autoDeploymentsEnabled: false,
},
healthCheckConfiguration: {
protocol: 'HTTP',
path: '/health',
interval: 10,
timeout: 5,
healthyThreshold: 1,
unhealthyThreshold: 5,
},
});
Terraform:
resource "aws_apprunner_service" "app" {
service_name = "my-app"
source_configuration {
image_repository {
image_identifier = "public.ecr.aws/my-org/my-app:latest"
image_repository_type = "ECR_PUBLIC"
image_configuration {
port = "8080"
}
}
auto_deployments_enabled = false
}
health_check_configuration {
protocol = "HTTP"
path = "/health"
interval = 10
timeout = 5
healthy_threshold = 1
unhealthy_threshold = 5
}
}
App Runner's internal health check controls instance replacement. You still need an external probe to monitor user-facing availability.
Step 3: Set up Vigilmon external HTTP monitoring
App Runner health checks are internal to AWS. They catch container crashes, but they won't detect:
- DNS misconfiguration on your custom domain
- TLS certificate expiry
- App Runner service quota exhaustion blocking new deployments
- Upstream dependency failures your health check doesn't cover
Connect your App Runner service to Vigilmon:
- Sign up at vigilmon.online — free, no card required
- Click New Monitor → HTTP
- URL:
https://your-service-id.us-east-1.awsapprunner.com/health(or your custom domain) - Check interval: 1 minute (paid) or 5 minutes (free)
- Expected status:
200 - JSON assertion (optional): path
status, expected valueok - Save
App Runner auto-scaling note: App Runner scales to zero by default in development environments. On scale-up from zero, the first request may take a few seconds. Set Response timeout to
15seconds and Confirm Down After to2failures to avoid false alerts from isolated cold starts.
Step 4: Alerting on deployment failures
App Runner supports automatic deployments from ECR. When a new image is pushed, App Runner deploys it and runs your health check against the new revision. If the new revision fails health checks, App Runner rolls back — but it takes a few minutes, and traffic may hit the broken revision during that window.
Vigilmon catches that window.
In Vigilmon, go to Notifications → New Channel:
Slack:
1. Create an incoming webhook at api.slack.com/apps
2. Paste the URL into Vigilmon
3. Enable on your App Runner monitors
Email: Add your on-call address as a notification channel.
PagerDuty: Create a Vigilmon integration in PagerDuty and paste the integration key into Vigilmon's notification settings.
When a bad deployment causes failures, Vigilmon sends an immediate alert. You can then roll back manually:
# List recent deployments
aws apprunner list-operations \
--service-arn arn:aws:apprunner:us-east-1:123456789012:service/my-app/abc123
# The service auto-rolls back on sustained health check failures,
# but you can trigger a manual redeploy to last known good image:
aws apprunner start-deployment \
--service-arn arn:aws:apprunner:us-east-1:123456789012:service/my-app/abc123
Step 5: Heartbeat monitoring for background workers
If your App Runner service runs background tasks (queue consumers, scheduled work via an in-process scheduler), a standard HTTP monitor won't detect those tasks failing silently.
Add a heartbeat ping at the end of each successful background run:
// workers/processor.js
import fetch from 'node-fetch';
async function processQueue() {
while (true) {
try {
const messages = await sqs.receiveMessages();
for (const msg of messages) {
await processMessage(msg);
await sqs.deleteMessage(msg);
}
// Ping heartbeat after each successful processing cycle
const heartbeatUrl = process.env.VIGILMON_WORKER_HEARTBEAT;
if (heartbeatUrl && messages.length > 0) {
await fetch(heartbeatUrl, { method: 'GET' });
}
} catch (err) {
console.error('Worker error:', err);
// Do NOT ping heartbeat on error — Vigilmon will alert on missing ping
}
await sleep(30_000);
}
}
Store the heartbeat URL in AWS Secrets Manager and reference it in App Runner:
CloudFormation:
AppRunnerService:
Type: AWS::AppRunner::Service
Properties:
InstanceConfiguration:
InstanceRoleArn: !GetAtt AppRunnerRole.Arn
SourceConfiguration:
ImageRepository:
ImageIdentifier: !Sub "${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/my-app:latest"
ImageRepositoryType: ECR
ImageConfiguration:
Port: "8080"
RuntimeEnvironmentSecrets:
- Name: VIGILMON_WORKER_HEARTBEAT
Value: !Sub "arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:vigilmon-heartbeat-url"
In Vigilmon, create a Heartbeat Monitor and set the grace period to slightly longer than your worker's ping interval.
Step 6: Public status page
Go to Status Pages → New Status Page in Vigilmon, add your App Runner monitors, and publish the page at a custom subdomain. Add the status badge to your README:

What you've built
| What | How |
|------|-----|
| Health endpoint | /health route checking real dependencies |
| Internal health check | App Runner healthCheckConfiguration on /health |
| External uptime monitoring | Vigilmon HTTP monitor on public URL |
| Deployment failure alerts | Vigilmon catches bad revision window |
| Background task monitoring | Heartbeat ping in workers |
| Slack/email/PagerDuty alerts | Vigilmon notification channels |
| Status page | Vigilmon public status page |
App Runner is simple by design. Your monitoring setup should be too. Vigilmon adds the external visibility layer App Runner doesn't include out of the box.
Next steps
- Monitor each App Runner environment separately (prod, staging)
- Watch response time trends to detect gradual resource pressure
- Add heartbeat monitors for every background worker that processes critical data
Get started free at vigilmon.online.