Uptime Monitoring for Flask Applications: A Complete Guide

Flask's simplicity is a strength, but it makes monitoring easy to overlook. A broken database connection returns a 500. A crashed background worker silently stops processing jobs. A misconfigured Gunicorn deployment answers with 502 from nginx while your Flask process exits with a traceback nobody saw. Vigilmon catches all of these from outside your infrastructure, before your users do.

In this guide you'll add production-grade uptime monitoring to a Flask application — a health blueprint, an HTTP monitor on Vigilmon, alert channels, and a Gunicorn deployment setup that survives rolling restarts.

What You'll Build

A Flask /health blueprint that checks the database and Redis
A Vigilmon HTTP monitor pointed at your endpoint
Email and Slack alert channels
A Gunicorn production deployment with health-check-aware pre-loading
An optional APScheduler heartbeat for background tasks

Prerequisites

Python 3.10+
A Flask project (SQLAlchemy and Redis optional but shown)
A free Vigilmon account

Step 1: Create the Health Blueprint

Isolate the health check in a Blueprint so that a bug in your application's main routes doesn't kill the monitoring endpoint. Place it in app/health/views.py:

# app/health/views.py
import time
from flask import Blueprint, current_app, jsonify

health_bp = Blueprint("health", __name__, url_prefix="")


def _check_database():
    try:
        db = current_app.extensions["sqlalchemy"]
        with db.engine.connect() as conn:
            conn.execute(db.text("SELECT 1"))
        return "ok", None
    except Exception as exc:
        return "error", str(exc)


def _check_redis():
    redis_url = current_app.config.get("REDIS_URL")
    if not redis_url:
        return "not_configured", None
    try:
        import redis as redis_lib
        client = redis_lib.from_url(redis_url, socket_timeout=2)
        client.ping()
        return "ok", None
    except Exception as exc:
        return "error", str(exc)


@health_bp.route("/health")
def health_check():
    start = time.monotonic()
    checks = {}
    overall = "ok"

    db_status, db_err = _check_database()
    checks["database"] = db_status if not db_err else f"error: {db_err}"
    if db_status == "error":
        overall = "degraded"

    redis_status, redis_err = _check_redis()
    checks["redis"] = redis_status if not redis_err else f"error: {redis_err}"
    if redis_status == "error":
        overall = "degraded"

    http_status = 200 if overall == "ok" else 503
    return jsonify({
        "status": overall,
        "latency_ms": round((time.monotonic() - start) * 1000, 1),
        "checks": checks,
    }), http_status

# app/__init__.py
from flask import Flask
from app.health.views import health_bp


def create_app(config=None):
    app = Flask(__name__)
    # ... your existing config and extensions ...
    app.register_blueprint(health_bp)
    return app

Test it locally:

curl -s http://localhost:5000/health | python -m json.tool

Expected output:

{
    "status": "ok",
    "latency_ms": 1.8,
    "checks": {
        "database": "ok",
        "redis": "ok"
    }
}

Step 2: Set Up a Vigilmon HTTP Monitor

Log into Vigilmon and click New Monitor → HTTP.
Set the URL to https://your-domain.com/health.
Set the check interval to 60 seconds (free tier) or 30 seconds (paid).
Under Assertions, add:
- Status code equals 200
- Response body contains "status": "ok"
Click Save.

Vigilmon will now probe your health endpoint from multiple regions and alert you within seconds if it stops responding or returns a non-200 status.

Why the `/health` endpoint, not just `/`?

Your index route might serve a cached response from an in-memory store even when the database is down. A dedicated health endpoint performs active dependency checks and returns 503 when the app can't serve real traffic, giving Vigilmon an accurate signal.

Step 3: Configure Alert Channels

Navigate to Alerts → Channels in the Vigilmon dashboard.

Email

Add your on-call email address. Vigilmon sends an alert within 30 seconds of the first failed check, and a recovery notification when the endpoint comes back up.

Slack

Click Add Channel → Slack.
Paste your Slack incoming webhook URL.
Save.

Now set which monitors trigger which channels under Alerts → Routing. A good starting rule: route all CRITICAL alerts to both email and Slack.

Step 4: Deploy with Gunicorn

Gunicorn manages multiple worker processes for you. Two settings are critical for health-check reliability:

# gunicorn.conf.py
bind = "0.0.0.0:5000"
workers = 4
worker_class = "gthread"
threads = 2
timeout = 30
keepalive = 5

# Preload the app once in the master process, then fork.
# Workers inherit the DB connection pool — faster startup after a crash.
preload_app = True

# Give workers 30 s to finish in-flight requests before SIGKILL.
graceful_timeout = 30

Start with:

gunicorn -c gunicorn.conf.py "app:create_app()"

Tip: Set timeout shorter than Vigilmon's alert threshold. If a worker hangs for longer than timeout, Gunicorn kills and restarts it. Vigilmon will catch any 503 during the restart and alert you, but the restart is automatic.

Step 5: Add a Heartbeat for Background Tasks (APScheduler)

If your Flask app runs background jobs with APScheduler, a stuck job won't show up in HTTP uptime checks. Create a heartbeat monitor in Vigilmon:

Go to New Monitor → Heartbeat.
Set the expected interval to 5 minutes (or whatever your job period is).
Copy the unique ping URL (looks like https://vigilmon.online/ping/abc123).

Then ping it from your scheduler:

# app/scheduler.py
import httpx
from apscheduler.schedulers.background import BackgroundScheduler

VIGILMON_HEARTBEAT_URL = "https://vigilmon.online/ping/YOUR_HEARTBEAT_ID"

scheduler = BackgroundScheduler()


@scheduler.scheduled_job("interval", minutes=5)
def my_background_job():
    # ... your job logic ...
    process_pending_emails()

    # Signal success to Vigilmon
    try:
        httpx.get(VIGILMON_HEARTBEAT_URL, timeout=5)
    except Exception:
        pass  # Don't let a monitoring failure crash the job


def init_scheduler(app):
    with app.app_context():
        scheduler.start()

Call init_scheduler(app) in your app factory. If the job stops running — whether due to a crash, a deadlock, or a bad deploy — Vigilmon will alert you after one missed interval.

Step 6: Verify End-to-End

Confirm the monitor shows UP in the Vigilmon dashboard.
Temporarily return a 503 from your health endpoint and verify an alert arrives within 2 minutes.
Restore the 200 and confirm the recovery notification.

You can also simulate a database failure in development:

# Temporarily break the DB URL
app.config["SQLALCHEMY_DATABASE_URI"] = "postgresql://bad:creds@localhost/none"

Your health endpoint should return 503 with "database": "error: ..." in the checks payload, and Vigilmon should catch it.

Production Checklist

[ ] /health blueprint registered and returning 200
[ ] Vigilmon HTTP monitor created with status-code and body assertions
[ ] Email + Slack alert channels configured
[ ] Gunicorn timeout and graceful_timeout set
[ ] APScheduler heartbeat monitor configured (if using background jobs)
[ ] Health endpoint excluded from authentication middleware (it must be publicly reachable)

Summary

You now have a Flask application that:

Exposes a /health endpoint that actively checks the database and Redis
Feeds an external Vigilmon monitor that probes from multiple regions
Fires Slack and email alerts within seconds of a failure
Survives Gunicorn worker crashes with graceful restarts
Monitors background APScheduler jobs with a heartbeat

The entire setup takes under 30 minutes and the monitoring is free on Vigilmon's starter tier. Your users will hear about downtime from Slack — not from support tickets.