Monitoring AWS App Runner with Vigilmon

AWS App Runner is the simplest way to run containers on AWS. You point it at a container image or a source code repository, and App Runner handles the deployment, load balancing, auto-scaling, and TLS — no VPCs, no ECS task definitions, no ALB configuration required.

But "managed" doesn't mean "observable." App Runner can restart your service when it crashes, but it won't tell you your /api/orders endpoint was returning 500s for six minutes before the restart happened. Vigilmon fills that gap with external, independent HTTP monitoring that catches failures the moment they start — not after the fact in CloudWatch logs.

This tutorial covers:

A health endpoint in your App Runner service
Vigilmon external HTTP monitor setup
Alerting on deployment failures
Heartbeat monitoring for background tasks

Step 1: Add a health check route to your service

App Runner uses a configurable health check path to determine if your service is healthy. Make that path check your real dependencies.

Node.js / Express:

// src/health.js
import { DynamoDBDocumentClient, GetCommand } from '@aws-sdk/lib-dynamodb';
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';

const dynamo = DynamoDBDocumentClient.from(new DynamoDBClient({}));

app.get('/health', async (req, res) => {
  const checks = {};
  let healthy = true;

  // DynamoDB probe
  try {
    await dynamo.send(new GetCommand({
      TableName: process.env.TABLE_NAME,
      Key: { pk: '_health', sk: 'probe' },
    }));
    checks.dynamodb = 'ok';
  } catch (err) {
    checks.dynamodb = `error: ${err.message}`;
    healthy = false;
  }

  // Downstream HTTP dependency
  try {
    const resp = await fetch(process.env.DOWNSTREAM_URL + '/ping', {
      signal: AbortSignal.timeout(3000),
    });
    checks.downstream = resp.ok ? 'ok' : `http_${resp.status}`;
    if (!resp.ok) healthy = false;
  } catch (err) {
    checks.downstream = `error: ${err.message}`;
    healthy = false;
  }

  res.status(healthy ? 200 : 503).json({
    status: healthy ? 'ok' : 'degraded',
    checks,
    timestamp: new Date().toISOString(),
  });
});

Python / FastAPI:

# app/health.py
import os
import boto3
from fastapi import APIRouter, Response
from botocore.exceptions import ClientError
from datetime import datetime, timezone

router = APIRouter()
dynamo = boto3.client('dynamodb')

@router.get('/health')
async def health(response: Response):
    checks = {}
    healthy = True

    try:
        dynamo.describe_table(TableName=os.environ['TABLE_NAME'])
        checks['dynamodb'] = 'ok'
    except ClientError as e:
        checks['dynamodb'] = f"error: {e.response['Error']['Code']}"
        healthy = False
        response.status_code = 503

    return {
        'status': 'ok' if healthy else 'degraded',
        'checks': checks,
        'timestamp': datetime.now(tz=timezone.utc).isoformat(),
    }

Keep the health check fast — App Runner polls it every few seconds and slow responses can trigger unnecessary replacements.

Step 2: Configure App Runner health checks

App Runner health check settings live in the service configuration. Set them to use your /health route:

AWS Console:

Go to App Runner → your service → Configuration → Health check
Set Protocol to HTTP
Set Path to /health
Set Interval to 10 seconds
Set Timeout to 5 seconds
Set Healthy threshold to 1
Set Unhealthy threshold to 5

CloudFormation / CDK:

// lib/app-runner-stack.ts
import * as apprunner from 'aws-cdk-lib/aws-apprunner';

const service = new apprunner.CfnService(this, 'AppRunnerService', {
  serviceName: 'my-app',
  sourceConfiguration: {
    imageRepository: {
      imageIdentifier: 'public.ecr.aws/my-org/my-app:latest',
      imageRepositoryType: 'ECR_PUBLIC',
      imageConfiguration: {
        port: '8080',
      },
    },
    autoDeploymentsEnabled: false,
  },
  healthCheckConfiguration: {
    protocol: 'HTTP',
    path: '/health',
    interval: 10,
    timeout: 5,
    healthyThreshold: 1,
    unhealthyThreshold: 5,
  },
});

Terraform:

resource "aws_apprunner_service" "app" {
  service_name = "my-app"

  source_configuration {
    image_repository {
      image_identifier      = "public.ecr.aws/my-org/my-app:latest"
      image_repository_type = "ECR_PUBLIC"
      image_configuration {
        port = "8080"
      }
    }
    auto_deployments_enabled = false
  }

  health_check_configuration {
    protocol            = "HTTP"
    path                = "/health"
    interval            = 10
    timeout             = 5
    healthy_threshold   = 1
    unhealthy_threshold = 5
  }
}

App Runner's internal health check controls instance replacement. You still need an external probe to monitor user-facing availability.

Step 3: Set up Vigilmon external HTTP monitoring

App Runner health checks are internal to AWS. They catch container crashes, but they won't detect:

DNS misconfiguration on your custom domain
TLS certificate expiry
App Runner service quota exhaustion blocking new deployments
Upstream dependency failures your health check doesn't cover

Connect your App Runner service to Vigilmon:

Sign up at vigilmon.online — free, no card required
Click New Monitor → HTTP
URL: https://your-service-id.us-east-1.awsapprunner.com/health (or your custom domain)
Check interval: 1 minute (paid) or 5 minutes (free)
Expected status: 200
JSON assertion (optional): path status, expected value ok
Save

App Runner auto-scaling note: App Runner scales to zero by default in development environments. On scale-up from zero, the first request may take a few seconds. Set Response timeout to 15 seconds and Confirm Down After to 2 failures to avoid false alerts from isolated cold starts.

Step 4: Alerting on deployment failures

App Runner supports automatic deployments from ECR. When a new image is pushed, App Runner deploys it and runs your health check against the new revision. If the new revision fails health checks, App Runner rolls back — but it takes a few minutes, and traffic may hit the broken revision during that window.

Vigilmon catches that window.

In Vigilmon, go to Notifications → New Channel:

Slack:

1. Create an incoming webhook at api.slack.com/apps
2. Paste the URL into Vigilmon
3. Enable on your App Runner monitors

Email: Add your on-call address as a notification channel.

PagerDuty: Create a Vigilmon integration in PagerDuty and paste the integration key into Vigilmon's notification settings.

When a bad deployment causes failures, Vigilmon sends an immediate alert. You can then roll back manually:

# List recent deployments
aws apprunner list-operations \
  --service-arn arn:aws:apprunner:us-east-1:123456789012:service/my-app/abc123

# The service auto-rolls back on sustained health check failures,
# but you can trigger a manual redeploy to last known good image:
aws apprunner start-deployment \
  --service-arn arn:aws:apprunner:us-east-1:123456789012:service/my-app/abc123

Step 5: Heartbeat monitoring for background workers

If your App Runner service runs background tasks (queue consumers, scheduled work via an in-process scheduler), a standard HTTP monitor won't detect those tasks failing silently.

Add a heartbeat ping at the end of each successful background run:

// workers/processor.js
import fetch from 'node-fetch';

async function processQueue() {
  while (true) {
    try {
      const messages = await sqs.receiveMessages();
      for (const msg of messages) {
        await processMessage(msg);
        await sqs.deleteMessage(msg);
      }

      // Ping heartbeat after each successful processing cycle
      const heartbeatUrl = process.env.VIGILMON_WORKER_HEARTBEAT;
      if (heartbeatUrl && messages.length > 0) {
        await fetch(heartbeatUrl, { method: 'GET' });
      }
    } catch (err) {
      console.error('Worker error:', err);
      // Do NOT ping heartbeat on error — Vigilmon will alert on missing ping
    }

    await sleep(30_000);
  }
}

Store the heartbeat URL in AWS Secrets Manager and reference it in App Runner:

CloudFormation:

AppRunnerService:
  Type: AWS::AppRunner::Service
  Properties:
    InstanceConfiguration:
      InstanceRoleArn: !GetAtt AppRunnerRole.Arn
    SourceConfiguration:
      ImageRepository:
        ImageIdentifier: !Sub "${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/my-app:latest"
        ImageRepositoryType: ECR
        ImageConfiguration:
          Port: "8080"
          RuntimeEnvironmentSecrets:
            - Name: VIGILMON_WORKER_HEARTBEAT
              Value: !Sub "arn:aws:secretsmanager:${AWS::Region}:${AWS::AccountId}:secret:vigilmon-heartbeat-url"

In Vigilmon, create a Heartbeat Monitor and set the grace period to slightly longer than your worker's ping interval.

Step 6: Public status page

Go to Status Pages → New Status Page in Vigilmon, add your App Runner monitors, and publish the page at a custom subdomain. Add the status badge to your README:

![Service Status](https://vigilmon.online/badge/your-monitor-slug)

What you've built

| What | How | |------|-----| | Health endpoint | /health route checking real dependencies | | Internal health check | App Runner healthCheckConfiguration on /health | | External uptime monitoring | Vigilmon HTTP monitor on public URL | | Deployment failure alerts | Vigilmon catches bad revision window | | Background task monitoring | Heartbeat ping in workers | | Slack/email/PagerDuty alerts | Vigilmon notification channels | | Status page | Vigilmon public status page |

App Runner is simple by design. Your monitoring setup should be too. Vigilmon adds the external visibility layer App Runner doesn't include out of the box.

Next steps

Monitor each App Runner environment separately (prod, staging)
Watch response time trends to detect gradual resource pressure
Add heartbeat monitors for every background worker that processes critical data

Get started free at vigilmon.online.