tutorial

Cron Job Monitoring Guide: Never Miss a Scheduled Task Again (2026)

Your cron jobs are lying to you. Not intentionally — they just don't tell you anything when they fail. A database backup that stops running silently, a night...

Your cron jobs are lying to you. Not intentionally — they just don't tell you anything when they fail. A database backup that stops running silently, a nightly report that hasn't executed in two weeks, an invoice generation job that skipped three customers and no one noticed: cron failures are quiet, and quiet failures are dangerous.

This guide covers the problem of silent cron failures, how to monitor scheduled tasks using Vigilmon's heartbeat monitor system, alert configuration best practices, and examples for the most common types of scheduled tasks.


The Silent Failure Problem

Cron is one of Unix's oldest and most reliable tools. It does exactly what you tell it to do — and only that. If your cron job exits with an error, cron doesn't notify anyone by default. If the job's output isn't captured, the error disappears. If the server running the job is down, or the process is killed mid-execution, or the script has a dependency that stopped working — cron reports nothing.

Consider the real-world failure patterns that never get caught without dedicated monitoring:

Database backups that silently fail: The backup job runs, encounters a database connection error, and exits non-zero. Cron has no one to tell. Weeks pass. The DBA assumes backups are running. Then a disk fails.

Report generation jobs that stop running: A dependency is upgraded, breaking the import. The nightly report job silently fails every night. The business intelligence team doesn't notice for two weeks because they assumed the dashboard was pulling cached data.

Invoice and billing jobs that skip runs: A cron job that generates monthly invoices fails twice due to a temporary third-party API timeout. The jobs aren't retried. Two customers don't receive invoices. The discrepancy is caught three months later during an audit.

Integration sync jobs that fall behind: A sync job that copies records from an external CRM fails because an API key was rotated. Records stop syncing. Customer support teams work from stale data for a week before anyone connects the symptom to the cause.

In each case, the root cause isn't complex — but the silence is. Without monitoring, you have no way to know whether a scheduled task ran, succeeded, or even started.


Heartbeat Monitoring: The Right Model for Cron

The right monitoring model for cron jobs is the heartbeat monitor (also called a dead man's switch or a check-in monitor). Instead of an external probe checking whether something is up, a heartbeat monitor waits for your cron job to check in after it runs successfully. If the check-in doesn't arrive within the expected window, the monitor fires an alert.

This inverts the usual monitoring model:

  • Uptime monitoring: "I'll poke your URL every minute to see if you respond"
  • Heartbeat monitoring: "Call home after each successful run; if you don't call in 90 minutes, something's wrong"

Heartbeat monitoring catches failures that uptime monitoring can't: jobs that never started, jobs that exited early, jobs on servers with no public HTTP endpoint, and jobs where success isn't a matter of returning a 200 but of actually completing work.


Setting Up Cron Job Monitoring with Vigilmon

Vigilmon's heartbeat monitor creates a unique URL that your cron job pings after successful completion. Vigilmon watches for that ping on a schedule you define. No ping in the expected window → alert fires.

Step 1: Create a Heartbeat Monitor

  1. Log in to vigilmon.online
  2. Click Add Monitor → select Heartbeat
  3. Name the monitor (e.g., "Nightly DB Backup")
  4. Set the expected interval — how often should this job run? (every hour, every 24 hours, every Monday at 2 AM)
  5. Set the grace period — how long after the expected time before an alert fires? (5 minutes for frequent jobs, 30 minutes for daily jobs)
  6. Copy the generated ping URL — this is the URL your cron job calls after successful completion

Step 2: Add the Ping to Your Cron Job

The simplest integration is a curl call appended to your existing script or cron command. The ping should only be sent when the job completes successfully — not on failure.

Option A: Inline curl in the crontab

# Crontab entry — runs backup at 2 AM, pings on success
0 2 * * * /opt/scripts/backup.sh && curl -fsS --retry 3 https://hb.vigilmon.online/ping/your-unique-id > /dev/null 2>&1

The && operator ensures the ping only fires if backup.sh exits 0. If the backup fails, no ping is sent, and Vigilmon alerts.

Option B: Add the ping inside the script

#!/bin/bash
set -euo pipefail

# Run backup
pg_dump mydb | gzip > /backups/mydb-$(date +%Y%m%d).sql.gz

# Sync to S3
aws s3 cp /backups/mydb-$(date +%Y%m%d).sql.gz s3://my-backups/

# Report success to Vigilmon
curl -fsS --retry 3 "https://hb.vigilmon.online/ping/your-unique-id" > /dev/null

set -euo pipefail means any unhandled error exits the script before reaching the ping. Vigilmon never receives the ping. Alert fires.

Option C: Language-native HTTP call

For Python scripts:

import requests
import sys

def run_job():
    # Your job logic here
    process_invoices()
    sync_to_crm()
    return True

if __name__ == "__main__":
    try:
        run_job()
        # Ping Vigilmon on success
        requests.get("https://hb.vigilmon.online/ping/your-unique-id", timeout=10)
    except Exception as e:
        print(f"Job failed: {e}", file=sys.stderr)
        sys.exit(1)

For Node.js:

const https = require("https");

async function runJob() {
  // Your job logic here
  await processInvoices();
  await syncDatabase();
}

runJob()
  .then(() => {
    // Ping Vigilmon on success
    https.get("https://hb.vigilmon.online/ping/your-unique-id");
  })
  .catch((err) => {
    console.error("Job failed:", err);
    process.exit(1);
  });

Configuring Alerts for Cron Monitoring

Choosing the Right Grace Period

The grace period is the buffer between when a job should have checked in and when Vigilmon fires an alert. Getting this right reduces false positives while keeping alert latency low.

| Job Type | Suggested Grace Period | |---|---| | Runs every minute | 2–3 minutes | | Runs every 5–15 minutes | 5 minutes | | Runs hourly | 15 minutes | | Runs daily (short job) | 30 minutes | | Runs daily (long job, e.g. a full backup) | 1–2 hours | | Runs weekly | 2–4 hours |

For long-running jobs, add a buffer that accounts for variable execution time. A database backup that sometimes takes 20 minutes and sometimes takes 80 minutes (due to database size growth) needs a grace period that tolerates the slowest plausible run.

Alert Channels

Vigilmon supports multiple alert channels per monitor:

  • Slack: Direct message or channel alert with monitor name, failure timestamp, and link to status
  • Email: Formatted alert to one or more email addresses
  • Webhook: POST to any URL — PagerDuty, OpsGenie, custom incident automation

For cron job failures, a Slack alert to your engineering channel with an email fallback is a reliable baseline. For business-critical jobs (billing, payroll, compliance reports), consider a webhook to your incident management system.


Monitoring Patterns for Common Job Types

Database Backup Jobs

#!/bin/bash
set -euo pipefail

BACKUP_FILE="/backups/$(date +%Y%m%d_%H%M%S).sql.gz"
VIGILMON_PING="https://hb.vigilmon.online/ping/your-backup-monitor-id"

# Dump and compress
pg_dump "$DATABASE_URL" | gzip > "$BACKUP_FILE"

# Verify the backup file is non-empty
[ -s "$BACKUP_FILE" ] || { echo "Backup file is empty"; exit 1; }

# Upload to offsite storage
aws s3 cp "$BACKUP_FILE" "s3://my-backups/postgres/"

# Confirm to Vigilmon
curl -fsS --retry 3 "$VIGILMON_PING" > /dev/null

Key: verify the backup file is non-empty before pinging. A pg_dump that produces a zero-byte file due to connection issues shouldn't be treated as success.

Invoice and Billing Jobs

For financial jobs, consider sending additional context in the ping:

# Ping with status metadata (Vigilmon ignores the body, but it's logged for debugging)
curl -fsS --retry 3 -X POST \
  -H "Content-Type: application/json" \
  -d "{\"invoices_generated\": $COUNT, \"timestamp\": \"$(date -u +%Y-%m-%dT%H:%M:%SZ)\"}" \
  "https://hb.vigilmon.online/ping/your-billing-monitor-id"

Data Sync and Integration Jobs

For jobs that sync data between systems, monitoring the job's run is necessary but not sufficient — you also want to know if the sync is working. Consider a two-stage approach:

  1. Heartbeat monitor confirms the sync job ran
  2. Separate HTTP monitor on an internal health endpoint that reports last-sync timestamp
# After syncing, update a health endpoint your HTTP monitor checks
requests.post("https://yourapp.com/internal/health/crm-sync", 
              json={"last_sync": datetime.utcnow().isoformat()},
              headers={"Authorization": f"Bearer {INTERNAL_TOKEN}"})

# Then ping Vigilmon
requests.get("https://hb.vigilmon.online/ping/your-sync-monitor-id")

Cleanup and Maintenance Jobs

For jobs that clean up temporary files, rotate logs, or archive old records:

#!/bin/bash
set -euo pipefail

# Clean up temp files older than 7 days
find /tmp/app-uploads -mtime +7 -delete

# Archive logs older than 30 days
find /var/log/myapp -mtime +30 -exec gzip {} \;

# Verify cleanup ran (optional: check disk freed)
curl -fsS --retry 3 "https://hb.vigilmon.online/ping/your-cleanup-monitor-id" > /dev/null

Beyond Cron: What Heartbeat Monitoring Catches

Heartbeat monitors are useful beyond traditional cron jobs:

Kubernetes CronJobs: The same ping pattern works in containerized jobs. Add the curl ping to your job's entrypoint script and create a heartbeat monitor per Kubernetes CronJob.

GitHub Actions scheduled workflows: Add a final step to your scheduled workflows that pings Vigilmon on success. Catches cases where GitHub Actions scheduler silently skips runs due to repository inactivity or platform issues.

Celery beat scheduled tasks: Add a Vigilmon ping as the final operation in your Celery beat task functions.

Serverless scheduled functions (AWS Lambda, Cloud Functions, Azure Functions): Call the Vigilmon ping URL from your function before returning. Catches cold start failures, timeout issues, and permission errors that Lambda's own monitoring sometimes surfaces too slowly.


Conclusion

Cron jobs are the invisible backbone of most production systems — and the most common source of silent failures that go undetected until a business process breaks badly enough for someone to notice. Adding a Vigilmon heartbeat monitor takes under five minutes per job and closes the silent-failure gap entirely.

The pattern is simple: create a heartbeat monitor, add one line to your script, and get alerted the next time a scheduled task doesn't complete on time. After running this for a month, most teams discover at least one job they thought was running reliably that wasn't.

Add your first heartbeat monitor for free at vigilmon.online — no credit card required, no agent installation, setup in under two minutes.


Tags: #devops #cron #monitoring #sre #automation #scheduling #reliability

Monitor your app with Vigilmon

Free plan — 5 monitors, no credit card required. Up and running in 60 seconds.

Start free →