Docker containers introduce a monitoring challenge that traditional server monitoring wasn't designed for: ephemeral IPs, health checks buried in compose files, and containers that restart silently — appearing "healthy" at the infrastructure layer while serving errors to users. Vigilmon solves this by monitoring the exposed HTTP endpoints from outside your container network, giving you the same view your users have.
What You'll Set Up
- Vigilmon HTTP monitors for containerised service health endpoints
- A
docker-composehealthcheck that gates container readiness - Webhook alerting when a container restart loop causes downtime
- Cron heartbeat monitoring for scheduled container tasks
Prerequisites
- Docker Engine 20+ and Docker Compose V2
- A containerised web service (any language/framework)
- A free Vigilmon account
The Core Problem: Ephemeral IPs vs Exposed Ports
When you run a container, Docker assigns it an internal IP from the bridge network (e.g. 172.17.0.3). This IP is:
- Unreachable from outside the Docker host without extra network configuration
- Ephemeral — it changes every time the container restarts or is recreated
External monitoring tools like Vigilmon cannot probe 172.17.0.3:8080. What they can probe is your published port on the host — the HOST_PORT:CONTAINER_PORT mapping from your compose file, accessible at HOST_IP:HOST_PORT or your domain name if you have a reverse proxy (nginx, Caddy, Traefik) in front.
The correct approach: always expose your health endpoint through a published port and monitor that.
Step 1: Add a Health Check Endpoint to Your Container
Your service should expose a /health route that confirms the application — not just the container — is working. Here's a minimal example for a few common stacks:
Node.js / Express
app.get('/health', (req, res) => {
res.json({ status: 'ok', uptime: process.uptime() });
});
Python / FastAPI
@app.get("/health")
async def health():
return {"status": "ok"}
Go / net/http
http.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
fmt.Fprintln(w, `{"status":"ok"}`)
})
The endpoint should return 200 OK when healthy and 503 Service Unavailable when degraded (e.g. database unreachable).
Step 2: Publish the Port in docker-compose.yml
services:
api:
image: myapp:latest
ports:
- "8080:8080" # host:container — Vigilmon probes host port 8080
environment:
DATABASE_URL: postgres://db:5432/myapp
depends_on:
db:
condition: service_healthy
db:
image: postgres:16-alpine
environment:
POSTGRES_PASSWORD: secret
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
With this configuration, your API is accessible at http://YOUR_HOST_IP:8080/health — and that's the URL you'll give Vigilmon.
If you're behind a reverse proxy, use your domain name instead: https://api.yourdomain.com/health.
Step 3: Add a docker-compose Healthcheck for the App Container
Docker's built-in healthcheck instruction gates the container's reported state. Without it, Docker marks your container as "running" the moment the process starts — before it's actually ready to serve traffic. Add it to your service:
services:
api:
image: myapp:latest
ports:
- "8080:8080"
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 15s # grace period on first start
What each field does:
| Field | Value | Meaning |
|---|---|---|
| test | curl -f /health | Exit non-zero if HTTP status ≥ 400 |
| interval | 30s | Check every 30 seconds |
| timeout | 10s | Fail if no response within 10 seconds |
| retries | 3 | Mark unhealthy after 3 consecutive failures |
| start_period | 15s | Don't count failures in first 15 seconds |
You can observe the health status with:
docker ps --format "table {{.Names}}\t{{.Status}}"
# api Up 2 hours (healthy)
# db Up 2 hours (healthy)
Step 4: Add Your Container Endpoint to Vigilmon
- Log in to vigilmon.online and click Add Monitor.
- Set Type to
HTTP / HTTPS. - Enter the URL:
https://api.yourdomain.com/health(orhttp://HOST_IP:8080/healthfor internal setups). - Set Check interval to
1 minute. - Under Advanced, set Expected body contains to
"status":"ok"— Vigilmon will alert even if the container responds with 200 but reports a degraded internal state. - Click Save.
Step 5: Alert on Container Restart Loops
A container in a restart loop (Restarting state) is the most dangerous Docker failure mode — it's "running" from the orchestrator's perspective but serving no traffic. Your health endpoint will return errors during the restart window.
Vigilmon's multi-region consensus check catches this: when multiple probes agree the endpoint is failing, an alert fires. To tune the sensitivity:
- Open your monitor settings.
- Set Consecutive failures before alert to
2(catches a restart loop within ~2 minutes at 1-minute intervals). - Add your alert channel (email, Slack, webhook) under Alert Channels.
A Vigilmon Slack alert for a container restart loop looks like:
🚨 api.yourdomain.com/health is DOWN
Status: 502 Bad Gateway
Duration: 3m 14s
Region: EU-West, US-East (2/3 probes failing)
Step 6: Monitor Scheduled Container Tasks with Cron Heartbeats
If you run scheduled work in containers — database backups, report generation, cleanup jobs — use Vigilmon's cron heartbeat monitor to confirm the job ran.
- In Vigilmon, click Add Monitor → Cron Heartbeat.
- Set the expected interval to match your job schedule (e.g.
60minutes). - Copy the unique ping URL provided (e.g.
https://vigilmon.online/heartbeat/abc123). - At the end of your container script, add the ping:
#!/bin/bash
set -e
# Your scheduled work
python manage.py generate_report
# Signal success to Vigilmon
curl -s --retry 3 https://vigilmon.online/heartbeat/abc123
If the container fails mid-script and never pings, Vigilmon will alert after the expected interval passes without a heartbeat.
For compose-based scheduled tasks:
services:
scheduler:
image: myapp:latest
command: ["bash", "-c", "python manage.py generate_report && curl -s https://vigilmon.online/heartbeat/abc123"]
restart: "no" # run once, don't restart on success
Going Further
- Multi-container stacks: Add a monitor for each public-facing service — your API, your frontend, and any public-facing sidecar (metrics exporter, etc.).
- TCP monitoring: If your service is not HTTP-based (e.g. a Redis proxy, a custom TCP server), use Vigilmon's TCP monitor to probe the port directly.
- Docker Swarm / Kubernetes: Exposed service ports work the same way. For Kubernetes, monitor the
LoadBalancerorIngressURL — internal cluster IPs are not reachable externally.
With Vigilmon watching your exposed endpoints, you'll know about container failures before users do — even if Docker itself thinks everything is fine.