Flask's simplicity is a strength, but it makes monitoring easy to skip. A broken database connection returns a 500, a crashed Celery worker silently stops processing jobs, and a failed Redis connection can take down your session store without any visible error in your logs. Vigilmon catches all of these. In this tutorial you'll add comprehensive monitoring to a Flask application — from a health endpoint to background worker heartbeats to alert routing.
What You'll Build
- A Flask
/healthblueprint that checks DB and Redis - A Vigilmon HTTP monitor pointed at your app
- A Celery beat heartbeat task that pings Vigilmon on success
- An APScheduler alternative for apps without Celery
- Gunicorn/uWSGI deployment health tips
- Email and Slack alert channels
Prerequisites
- Python 3.10+
- A Flask project
- A free Vigilmon account
- Optionally: SQLAlchemy, Redis, and Celery
Step 1: Create the Health Blueprint
Blueprints keep your health check isolated from application logic. This is important — if your app's main blueprint has a bug, the health endpoint should still respond.
# app/health/views.py
import time
from flask import Blueprint, current_app, jsonify
from sqlalchemy import text
import redis
health_bp = Blueprint("health", __name__, url_prefix="")
def _check_database():
"""Ping the SQLAlchemy database connection."""
try:
db = current_app.extensions["sqlalchemy"]
with db.engine.connect() as conn:
conn.execute(text("SELECT 1"))
return "ok", None
except Exception as exc:
return "error", str(exc)
def _check_redis():
"""Ping the Redis connection (if configured)."""
redis_url = current_app.config.get("REDIS_URL")
if not redis_url:
return "not_configured", None
try:
client = redis.from_url(redis_url, socket_timeout=2)
client.ping()
return "ok", None
except Exception as exc:
return "error", str(exc)
@health_bp.route("/health")
def health_check():
checks = {}
overall_status = "ok"
db_status, db_err = _check_database()
checks["database"] = db_status if not db_err else f"error: {db_err}"
if db_status == "error":
overall_status = "degraded"
redis_status, redis_err = _check_redis()
checks["redis"] = redis_status if not redis_err else f"error: {redis_err}"
if redis_status == "error":
overall_status = "degraded"
http_status = 200 if overall_status == "ok" else 503
return jsonify(
status=overall_status,
timestamp=time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()),
checks=checks,
), http_status
Register the blueprint in your application factory:
# app/__init__.py
from flask import Flask
from app.health.views import health_bp
def create_app(config=None):
app = Flask(__name__)
# ... other setup ...
app.register_blueprint(health_bp)
return app
Test it:
curl -s http://localhost:5000/health | python3 -m json.tool
# {
# "status": "ok",
# "timestamp": "2025-06-29T10:00:00Z",
# "checks": {
# "database": "ok",
# "redis": "ok"
# }
# }
When your database is unreachable, the endpoint returns 503 Service Unavailable. Vigilmon treats any non-2xx response as a failure.
Step 2: Create a Vigilmon HTTP Monitor
Log in to Vigilmon and create a new HTTP Monitor:
| Field | Value |
|---|---|
| URL | https://yourapp.com/health |
| Method | GET |
| Check interval | 60 seconds |
| Expected status | 200 |
| Timeout | 10 seconds |
| Regions | 2–3 for triangulation |
Under Alert Channels, add your email. Slack comes in Step 5.
Tip: if you're behind a load balancer or reverse proxy, point the monitor at the public URL — this validates the full network path, not just the Flask process. If you also want to monitor individual instances, use internal monitors from each host.
Step 3: Celery Beat Heartbeat Task
Celery workers can silently stop consuming tasks after an unhandled exception or OOM event. The heartbeat pattern keeps Vigilmon informed: your beat task pings a heartbeat URL on success. If pings stop arriving, Vigilmon alerts you.
First, grab your Heartbeat URL from Vigilmon (Dashboard → Heartbeat Monitors → New):
https://vigilmon.online/api/heartbeats/YOUR-UUID/ping
Create a Celery task that does real work and then pings:
# app/tasks/heartbeat.py
import logging
import os
import requests
from celery import shared_task
logger = logging.getLogger(__name__)
VIGILMON_HEARTBEAT_URL = os.environ.get("VIGILMON_HEARTBEAT_URL")
@shared_task(name="tasks.heartbeat", bind=True, max_retries=0)
def heartbeat_ping(self):
"""
Run periodically via Celery Beat. Pings Vigilmon on success
so a silent worker failure triggers an alert automatically.
"""
try:
# Your actual scheduled work goes here:
# e.g. send_pending_notifications()
# refresh_exchange_rates()
# prune_expired_sessions()
logger.info("[heartbeat] scheduled work complete")
# Only ping Vigilmon when work succeeds
if VIGILMON_HEARTBEAT_URL:
resp = requests.get(VIGILMON_HEARTBEAT_URL, timeout=5)
resp.raise_for_status()
logger.info("[heartbeat] pinged Vigilmon, status %d", resp.status_code)
except Exception as exc:
logger.error("[heartbeat] failed: %s", exc)
# Do NOT ping — silence is the signal to Vigilmon
raise
Register the beat schedule in your Celery config:
# celery_config.py (or wherever you configure Celery)
from celery.schedules import crontab
beat_schedule = {
"heartbeat-every-minute": {
"task": "tasks.heartbeat",
"schedule": 60.0, # seconds — matches your Vigilmon heartbeat window
},
}
Add the environment variable:
VIGILMON_HEARTBEAT_URL=https://vigilmon.online/api/heartbeats/YOUR-UUID/ping
Start your beat worker:
celery -A app.celery beat --loglevel=info
Vigilmon will alert if the ping stops arriving — whether because the beat process died, a Redis connection error stopped task dispatch, or a worker got OOM-killed.
Step 4: APScheduler Alternative (No Celery)
If your app doesn't use Celery, APScheduler is a lightweight alternative that runs inside your Flask process:
pip install apscheduler
# app/scheduler.py
import logging
import os
import requests
from apscheduler.schedulers.background import BackgroundScheduler
from apscheduler.triggers.interval import IntervalTrigger
logger = logging.getLogger(__name__)
VIGILMON_HEARTBEAT_URL = os.environ.get("VIGILMON_HEARTBEAT_URL")
def heartbeat_job():
"""Runs every 60 seconds inside the Flask process."""
try:
# Your scheduled work here
logger.info("[scheduler] job ran successfully")
if VIGILMON_HEARTBEAT_URL:
resp = requests.get(VIGILMON_HEARTBEAT_URL, timeout=5)
resp.raise_for_status()
except Exception as exc:
logger.error("[scheduler] job failed: %s", exc)
def init_scheduler(app):
"""Call this from your application factory."""
scheduler = BackgroundScheduler()
scheduler.add_job(
heartbeat_job,
trigger=IntervalTrigger(seconds=60),
id="heartbeat",
replace_existing=True,
)
scheduler.start()
# Shut down cleanly on app teardown
import atexit
atexit.register(scheduler.shutdown)
return scheduler
Wire it in your factory:
# app/__init__.py
from app.scheduler import init_scheduler
def create_app(config=None):
app = Flask(__name__)
# ... other setup ...
with app.app_context():
init_scheduler(app)
return app
Gunicorn caveat: when Gunicorn spawns multiple workers (e.g. --workers 4), each worker process runs init_scheduler, leading to multiple heartbeat pings per interval. This is harmless for Vigilmon but wastes requests. Use a file-based lock or move to Celery Beat for multi-worker deployments:
import fcntl, os
def init_scheduler(app):
lock_file = "/tmp/scheduler.lock"
try:
lock = open(lock_file, "w")
fcntl.flock(lock, fcntl.LOCK_EX | fcntl.LOCK_NB)
except IOError:
return None # Another worker already holds the lock
# ... rest of scheduler init ...
Step 5: Gunicorn/uWSGI Deployment Health Tips
Gunicorn
# gunicorn.conf.py
bind = "0.0.0.0:5000"
workers = 4
worker_class = "gthread"
threads = 2
timeout = 30
keepalive = 5
# Critical: workers that time out are killed and restarted
# Set this lower than Vigilmon's check timeout
graceful_timeout = 25
Run with:
gunicorn -c gunicorn.conf.py "app:create_app()"
Health check in Docker:
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:5000/health || exit 1
uWSGI
; uwsgi.ini
[uwsgi]
module = app:create_app()
callable = app
master = true
processes = 4
harakiri = 30 ; kill workers that exceed 30s — prevents hangs
py-autoreload = 0
; Expose a stats socket for monitoring
stats = /tmp/uwsgi-stats.sock
Both Gunicorn and uWSGI expose metrics that tools like Prometheus can scrape. For a quick sanity check, the Vigilmon HTTP monitor against /health is sufficient.
Step 6: Alert Routing
Email Alerts
Configure under Vigilmon → Alert Channels → Email. You receive alerts when:
/healthreturns non-2xx- The endpoint doesn't respond within timeout
- The heartbeat window expires without a ping
Slack Webhook
- Create a Slack incoming webhook
- Vigilmon → Alert Channels → Add Channel → Webhook → paste the Slack URL
- Assign to your HTTP monitor and heartbeat monitor
Example alert:
🔴 *yourapp.com/health* is DOWN
Status: 503 | checks.database: error: FATAL: remaining connection slots reserved
Duration: 1m 12s
You can also integrate with PagerDuty for on-call escalations or send to a Microsoft Teams channel.
Step 7: Test the Full Loop
- Simulate a DB failure: set an invalid
DATABASE_URLand restart Flask —/healthshould return503. - Verify Vigilmon detects it: within one check interval, your monitor goes red and an alert fires.
- Kill the worker: stop the Celery beat process and wait for the Vigilmon heartbeat window to expire — you should get an alert.
- Redis failure: point
REDIS_URLat a non-existent host —/healthshould showredis: error: ...and return503. - Recover: fix each issue and verify Vigilmon sends "back online" notifications.
Production Checklist
- [ ]
/healthblueprint is isolated — no imports from your main app blueprint - [ ] Database and Redis checks have explicit timeouts
- [ ]
VIGILMON_HEARTBEAT_URLis set in the environment, not hard-coded - [ ] Celery Beat (or APScheduler) only pings on successful task completion
- [ ] Gunicorn
timeoutis lower than Vigilmon's check timeout - [ ] Alert channels tested end-to-end
- [ ] Maintenance windows configured for deployments
Wrapping Up
You now have layered monitoring for your Flask app:
- Uptime: Vigilmon polls
/healthevery 60 seconds, checking DB and Redis - Heartbeat: Celery Beat (or APScheduler) confirms background workers are alive
- Alert routing: email and Slack with escalation policies
The combination means you'll know about a production failure before your users do — and the specific check that failed (database, redis, or heartbeat silence) tells you exactly where to look.
Sign up for Vigilmon — free tier includes multiple monitors, no credit card required.
Have questions or a specific Flask setup? Drop it in the comments!