Monitoring Docker Containers with Vigilmon

Docker containers introduce a monitoring challenge that traditional server monitoring wasn't designed for: ephemeral IPs, health checks buried in compose files, and containers that restart silently — appearing "healthy" at the infrastructure layer while serving errors to users. Vigilmon solves this by monitoring the exposed HTTP endpoints from outside your container network, giving you the same view your users have.

What You'll Set Up

Vigilmon HTTP monitors for containerised service health endpoints
A docker-compose healthcheck that gates container readiness
Webhook alerting when a container restart loop causes downtime
Cron heartbeat monitoring for scheduled container tasks

Prerequisites

Docker Engine 20+ and Docker Compose V2
A containerised web service (any language/framework)
A free Vigilmon account

The Core Problem: Ephemeral IPs vs Exposed Ports

When you run a container, Docker assigns it an internal IP from the bridge network (e.g. 172.17.0.3). This IP is:

Unreachable from outside the Docker host without extra network configuration
Ephemeral — it changes every time the container restarts or is recreated

External monitoring tools like Vigilmon cannot probe 172.17.0.3:8080. What they can probe is your published port on the host — the HOST_PORT:CONTAINER_PORT mapping from your compose file, accessible at HOST_IP:HOST_PORT or your domain name if you have a reverse proxy (nginx, Caddy, Traefik) in front.

The correct approach: always expose your health endpoint through a published port and monitor that.

Step 1: Add a Health Check Endpoint to Your Container

Your service should expose a /health route that confirms the application — not just the container — is working. Here's a minimal example for a few common stacks:

Node.js / Express

app.get('/health', (req, res) => {
  res.json({ status: 'ok', uptime: process.uptime() });
});

Python / FastAPI

@app.get("/health")
async def health():
    return {"status": "ok"}

Go / net/http

http.HandleFunc("/health", func(w http.ResponseWriter, r *http.Request) {
    w.Header().Set("Content-Type", "application/json")
    fmt.Fprintln(w, `{"status":"ok"}`)
})

The endpoint should return 200 OK when healthy and 503 Service Unavailable when degraded (e.g. database unreachable).

Step 2: Publish the Port in docker-compose.yml

services:
  api:
    image: myapp:latest
    ports:
      - "8080:8080"       # host:container — Vigilmon probes host port 8080
    environment:
      DATABASE_URL: postgres://db:5432/myapp
    depends_on:
      db:
        condition: service_healthy

  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_PASSWORD: secret
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

With this configuration, your API is accessible at http://YOUR_HOST_IP:8080/health — and that's the URL you'll give Vigilmon.

If you're behind a reverse proxy, use your domain name instead: https://api.yourdomain.com/health.

Step 3: Add a docker-compose Healthcheck for the App Container

Docker's built-in healthcheck instruction gates the container's reported state. Without it, Docker marks your container as "running" the moment the process starts — before it's actually ready to serve traffic. Add it to your service:

services:
  api:
    image: myapp:latest
    ports:
      - "8080:8080"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 15s   # grace period on first start

What each field does:

| Field | Value | Meaning | |---|---|---| | test | curl -f /health | Exit non-zero if HTTP status ≥ 400 | | interval | 30s | Check every 30 seconds | | timeout | 10s | Fail if no response within 10 seconds | | retries | 3 | Mark unhealthy after 3 consecutive failures | | start_period | 15s | Don't count failures in first 15 seconds |

You can observe the health status with:

docker ps --format "table {{.Names}}\t{{.Status}}"
# api   Up 2 hours (healthy)
# db    Up 2 hours (healthy)

Step 4: Add Your Container Endpoint to Vigilmon

Log in to vigilmon.online and click Add Monitor.
Set Type to HTTP / HTTPS.
Enter the URL: https://api.yourdomain.com/health (or http://HOST_IP:8080/health for internal setups).
Set Check interval to 1 minute.
Under Advanced, set Expected body contains to "status":"ok" — Vigilmon will alert even if the container responds with 200 but reports a degraded internal state.
Click Save.

Step 5: Alert on Container Restart Loops

A container in a restart loop (Restarting state) is the most dangerous Docker failure mode — it's "running" from the orchestrator's perspective but serving no traffic. Your health endpoint will return errors during the restart window.

Vigilmon's multi-region consensus check catches this: when multiple probes agree the endpoint is failing, an alert fires. To tune the sensitivity:

Open your monitor settings.
Set Consecutive failures before alert to 2 (catches a restart loop within ~2 minutes at 1-minute intervals).
Add your alert channel (email, Slack, webhook) under Alert Channels.

A Vigilmon Slack alert for a container restart loop looks like:

🚨 api.yourdomain.com/health is DOWN
Status: 502 Bad Gateway
Duration: 3m 14s
Region: EU-West, US-East (2/3 probes failing)

Step 6: Monitor Scheduled Container Tasks with Cron Heartbeats

If you run scheduled work in containers — database backups, report generation, cleanup jobs — use Vigilmon's cron heartbeat monitor to confirm the job ran.

In Vigilmon, click Add Monitor → Cron Heartbeat.
Set the expected interval to match your job schedule (e.g. 60 minutes).
Copy the unique ping URL provided (e.g. https://vigilmon.online/heartbeat/abc123).
At the end of your container script, add the ping:

#!/bin/bash
set -e

# Your scheduled work
python manage.py generate_report

# Signal success to Vigilmon
curl -s --retry 3 https://vigilmon.online/heartbeat/abc123

If the container fails mid-script and never pings, Vigilmon will alert after the expected interval passes without a heartbeat.

For compose-based scheduled tasks:

services:
  scheduler:
    image: myapp:latest
    command: ["bash", "-c", "python manage.py generate_report && curl -s https://vigilmon.online/heartbeat/abc123"]
    restart: "no"   # run once, don't restart on success

Going Further

Multi-container stacks: Add a monitor for each public-facing service — your API, your frontend, and any public-facing sidecar (metrics exporter, etc.).
TCP monitoring: If your service is not HTTP-based (e.g. a Redis proxy, a custom TCP server), use Vigilmon's TCP monitor to probe the port directly.
Docker Swarm / Kubernetes: Exposed service ports work the same way. For Kubernetes, monitor the LoadBalancer or Ingress URL — internal cluster IPs are not reachable externally.

With Vigilmon watching your exposed endpoints, you'll know about container failures before users do — even if Docker itself thinks everything is fine.