Flask's simplicity is a strength, but it makes monitoring easy to overlook. A broken database connection returns a 500. A crashed background worker silently stops processing jobs. A misconfigured Gunicorn deployment answers with 502 from nginx while your Flask process exits with a traceback nobody saw. Vigilmon catches all of these from outside your infrastructure, before your users do.
In this guide you'll add production-grade uptime monitoring to a Flask application — a health blueprint, an HTTP monitor on Vigilmon, alert channels, and a Gunicorn deployment setup that survives rolling restarts.
What You'll Build
- A Flask
/healthblueprint that checks the database and Redis - A Vigilmon HTTP monitor pointed at your endpoint
- Email and Slack alert channels
- A Gunicorn production deployment with health-check-aware pre-loading
- An optional APScheduler heartbeat for background tasks
Prerequisites
- Python 3.10+
- A Flask project (SQLAlchemy and Redis optional but shown)
- A free Vigilmon account
Step 1: Create the Health Blueprint
Isolate the health check in a Blueprint so that a bug in your application's main routes doesn't kill the monitoring endpoint. Place it in app/health/views.py:
# app/health/views.py
import time
from flask import Blueprint, current_app, jsonify
health_bp = Blueprint("health", __name__, url_prefix="")
def _check_database():
try:
db = current_app.extensions["sqlalchemy"]
with db.engine.connect() as conn:
conn.execute(db.text("SELECT 1"))
return "ok", None
except Exception as exc:
return "error", str(exc)
def _check_redis():
redis_url = current_app.config.get("REDIS_URL")
if not redis_url:
return "not_configured", None
try:
import redis as redis_lib
client = redis_lib.from_url(redis_url, socket_timeout=2)
client.ping()
return "ok", None
except Exception as exc:
return "error", str(exc)
@health_bp.route("/health")
def health_check():
start = time.monotonic()
checks = {}
overall = "ok"
db_status, db_err = _check_database()
checks["database"] = db_status if not db_err else f"error: {db_err}"
if db_status == "error":
overall = "degraded"
redis_status, redis_err = _check_redis()
checks["redis"] = redis_status if not redis_err else f"error: {redis_err}"
if redis_status == "error":
overall = "degraded"
http_status = 200 if overall == "ok" else 503
return jsonify({
"status": overall,
"latency_ms": round((time.monotonic() - start) * 1000, 1),
"checks": checks,
}), http_status
Register the blueprint in your app factory:
# app/__init__.py
from flask import Flask
from app.health.views import health_bp
def create_app(config=None):
app = Flask(__name__)
# ... your existing config and extensions ...
app.register_blueprint(health_bp)
return app
Test it locally:
curl -s http://localhost:5000/health | python -m json.tool
Expected output:
{
"status": "ok",
"latency_ms": 1.8,
"checks": {
"database": "ok",
"redis": "ok"
}
}
Step 2: Set Up a Vigilmon HTTP Monitor
- Log into Vigilmon and click New Monitor → HTTP.
- Set the URL to
https://your-domain.com/health. - Set the check interval to 60 seconds (free tier) or 30 seconds (paid).
- Under Assertions, add:
- Status code equals
200 - Response body contains
"status": "ok"
- Status code equals
- Click Save.
Vigilmon will now probe your health endpoint from multiple regions and alert you within seconds if it stops responding or returns a non-200 status.
Why the /health endpoint, not just /?
Your index route might serve a cached response from an in-memory store even when the database is down. A dedicated health endpoint performs active dependency checks and returns 503 when the app can't serve real traffic, giving Vigilmon an accurate signal.
Step 3: Configure Alert Channels
Navigate to Alerts → Channels in the Vigilmon dashboard.
Add your on-call email address. Vigilmon sends an alert within 30 seconds of the first failed check, and a recovery notification when the endpoint comes back up.
Slack
- Click Add Channel → Slack.
- Paste your Slack incoming webhook URL.
- Save.
Now set which monitors trigger which channels under Alerts → Routing. A good starting rule: route all CRITICAL alerts to both email and Slack.
Step 4: Deploy with Gunicorn
Gunicorn manages multiple worker processes for you. Two settings are critical for health-check reliability:
# gunicorn.conf.py
bind = "0.0.0.0:5000"
workers = 4
worker_class = "gthread"
threads = 2
timeout = 30
keepalive = 5
# Preload the app once in the master process, then fork.
# Workers inherit the DB connection pool — faster startup after a crash.
preload_app = True
# Give workers 30 s to finish in-flight requests before SIGKILL.
graceful_timeout = 30
Start with:
gunicorn -c gunicorn.conf.py "app:create_app()"
Tip: Set
timeoutshorter than Vigilmon's alert threshold. If a worker hangs for longer thantimeout, Gunicorn kills and restarts it. Vigilmon will catch any503during the restart and alert you, but the restart is automatic.
Step 5: Add a Heartbeat for Background Tasks (APScheduler)
If your Flask app runs background jobs with APScheduler, a stuck job won't show up in HTTP uptime checks. Create a heartbeat monitor in Vigilmon:
- Go to New Monitor → Heartbeat.
- Set the expected interval to 5 minutes (or whatever your job period is).
- Copy the unique ping URL (looks like
https://vigilmon.online/ping/abc123).
Then ping it from your scheduler:
# app/scheduler.py
import httpx
from apscheduler.schedulers.background import BackgroundScheduler
VIGILMON_HEARTBEAT_URL = "https://vigilmon.online/ping/YOUR_HEARTBEAT_ID"
scheduler = BackgroundScheduler()
@scheduler.scheduled_job("interval", minutes=5)
def my_background_job():
# ... your job logic ...
process_pending_emails()
# Signal success to Vigilmon
try:
httpx.get(VIGILMON_HEARTBEAT_URL, timeout=5)
except Exception:
pass # Don't let a monitoring failure crash the job
def init_scheduler(app):
with app.app_context():
scheduler.start()
Call init_scheduler(app) in your app factory. If the job stops running — whether due to a crash, a deadlock, or a bad deploy — Vigilmon will alert you after one missed interval.
Step 6: Verify End-to-End
- Confirm the monitor shows UP in the Vigilmon dashboard.
- Temporarily return a
503from your health endpoint and verify an alert arrives within 2 minutes. - Restore the
200and confirm the recovery notification.
You can also simulate a database failure in development:
# Temporarily break the DB URL
app.config["SQLALCHEMY_DATABASE_URI"] = "postgresql://bad:creds@localhost/none"
Your health endpoint should return 503 with "database": "error: ..." in the checks payload, and Vigilmon should catch it.
Production Checklist
- [ ]
/healthblueprint registered and returning200 - [ ] Vigilmon HTTP monitor created with status-code and body assertions
- [ ] Email + Slack alert channels configured
- [ ] Gunicorn
timeoutandgraceful_timeoutset - [ ] APScheduler heartbeat monitor configured (if using background jobs)
- [ ] Health endpoint excluded from authentication middleware (it must be publicly reachable)
Summary
You now have a Flask application that:
- Exposes a
/healthendpoint that actively checks the database and Redis - Feeds an external Vigilmon monitor that probes from multiple regions
- Fires Slack and email alerts within seconds of a failure
- Survives Gunicorn worker crashes with graceful restarts
- Monitors background APScheduler jobs with a heartbeat
The entire setup takes under 30 minutes and the monitoring is free on Vigilmon's starter tier. Your users will hear about downtime from Slack — not from support tickets.