tutorial

Monitoring GitHub Actions CI/CD with Vigilmon: Catch Silent Workflow Failures

Use Vigilmon heartbeat monitors to detect silent CI/CD failures — scheduled workflow drift, stuck jobs, and pipelines that stop running entirely.

Your nightly test suite stopped running three weeks ago. Nobody noticed until a critical regression shipped to production. GitHub Actions had no errors to report — the workflow simply wasn't being triggered anymore. Your email notifications only fire on failures, and you can't fail if you never run.

This is the silent CI/CD failure problem. HTTP monitors can't catch it because there's nothing to ping. The fix is heartbeat monitoring: your workflow pings a unique URL at the end of every successful run, and if Vigilmon doesn't receive that ping within the expected window, you get alerted.

This tutorial shows you how to instrument GitHub Actions workflows with Vigilmon heartbeat monitors to catch failures before your team does.

What You'll Cover

  • Heartbeat monitoring for scheduled workflows
  • Catching workflows that stop triggering entirely
  • Alerting on long-running or stuck jobs
  • Multi-environment CI monitoring (staging and production)
  • Detecting scheduled workflow drift

Prerequisites

  • A GitHub repository with Actions workflows
  • A free account at vigilmon.online

The Problem: What Standard CI Monitoring Misses

GitHub Actions has built-in email notifications — but they only fire when a job fails. They won't alert you when:

  • A schedule trigger stops firing (GitHub rate-limits or drops cron triggers on inactive repos)
  • A workflow is accidentally disabled
  • A required status check is removed from branch protection, so nobody notices CI is skipped
  • A deployment pipeline runs but silently skips the actual deployment step

Vigilmon's heartbeat pattern closes all of these gaps.


Step 1: Create a Heartbeat Monitor in Vigilmon

A heartbeat monitor expects a ping on a regular interval. No ping → alert.

  1. Log in to Vigilmon and click New Monitor → Heartbeat.
  2. Give it a name like Nightly Test Suite — main.
  3. Set the Expected interval to match your workflow schedule. For a nightly cron, use 24 hours. For hourly CI, use 90 minutes (adds a 50% buffer).
  4. Set the Grace period to 30 minutes for short intervals, or 1 hour for daily jobs. This prevents false alerts from slight schedule drift.
  5. Save and copy the Ping URL — it looks like https://vigilmon.online/api/heartbeat/<unique-id>.

Step 2: Add the Heartbeat Ping to a Scheduled Workflow

Here's a complete example of a nightly test workflow with Vigilmon instrumentation:

# .github/workflows/nightly-tests.yml
name: Nightly Tests

on:
  schedule:
    - cron: "0 2 * * *"  # 02:00 UTC every night
  workflow_dispatch:      # manual trigger for testing

jobs:
  test:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4

      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: "20"
          cache: "npm"

      - name: Install dependencies
        run: npm ci

      - name: Run tests
        run: npm test

      - name: Ping Vigilmon heartbeat
        if: success()
        run: |
          curl -fsS -X POST "${{ secrets.VIGILMON_NIGHTLY_HEARTBEAT_URL }}" \
            --max-time 10 \
            --retry 3 \
            --retry-delay 2
        # Heartbeat is only sent on success.
        # A failed or cancelled job skips this step, triggering a Vigilmon alert
        # after the grace period expires.

Key points about this setup:

  • if: success() — the heartbeat ping is only sent when all previous steps pass. If tests fail, the job fails, and no ping is sent.
  • --retry 3 — transient network issues won't cause a false "missed heartbeat" alert.
  • --max-time 10 — the ping step won't hang and block the runner.

Store the secret

Go to your GitHub repo → Settings → Secrets and variables → Actions → New repository secret:

  • Name: VIGILMON_NIGHTLY_HEARTBEAT_URL
  • Value: the ping URL from Step 1

Step 3: Monitor Your Deployment Pipeline

Heartbeats are even more valuable for deployment pipelines than for tests — a silently stuck deploy leaves production stale without any error to page you.

# .github/workflows/deploy-production.yml
name: Deploy to Production

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: production

    steps:
      - uses: actions/checkout@v4

      - name: Build
        run: npm run build

      - name: Deploy
        run: |
          # your deployment command here
          ./scripts/deploy.sh production

      - name: Run smoke tests
        run: npm run test:smoke

      - name: Ping Vigilmon deployment heartbeat
        if: success()
        env:
          HEARTBEAT_URL: ${{ secrets.VIGILMON_DEPLOY_PROD_HEARTBEAT_URL }}
        run: |
          curl -fsS -X POST "$HEARTBEAT_URL" \
            --max-time 10 \
            --retry 3

Create a separate heartbeat monitor for this workflow with an interval of 48 hours (or however often you expect to deploy). If no successful deployment reaches production within that window, Vigilmon pages you.

This is useful for catching deploy freezes — situations where commits pile up on main but deployments silently stop going out.


Step 4: Detect Scheduled Workflow Drift

GitHub's schedule trigger has a known limitation: workflows on repositories with low activity may have their scheduled triggers delayed or skipped by GitHub. If your repo goes quiet for a few days, GitHub may decide not to trigger nightly crons.

Vigilmon's heartbeat monitor catches this automatically — if the cron misses a run for any reason, no ping arrives, and you get alerted.

To make the signal even more reliable, add a timestamp to the heartbeat payload:

- name: Ping Vigilmon heartbeat with metadata
  if: success()
  run: |
    curl -fsS -X POST "${{ secrets.VIGILMON_NIGHTLY_HEARTBEAT_URL }}" \
      -H "Content-Type: application/json" \
      -d "{\"workflow\": \"${{ github.workflow }}\", \"run_id\": \"${{ github.run_id }}\", \"sha\": \"${{ github.sha }}\"}" \
      --max-time 10 \
      --retry 3

Vigilmon accepts the body but doesn't require it — it only cares whether the ping arrived within the grace period.


Step 5: Multi-Environment Monitoring

For teams with staging and production pipelines, create one heartbeat monitor per environment per workflow:

| Workflow | Environment | Monitor name | Interval | |---|---|---|---| | Deploy workflow | Production | Deploy → Production | 48 h | | Deploy workflow | Staging | Deploy → Staging | 24 h | | Nightly tests | main | Nightly Tests — main | 25 h | | Weekly security scan | — | Weekly Security Scan | 8 days |

Use separate secrets per environment:

- name: Ping heartbeat
  if: success()
  run: |
    curl -fsS -X POST "${{ secrets[format('VIGILMON_DEPLOY_{0}_HEARTBEAT', env.ENVIRONMENT)] }}" \
      --max-time 10
  env:
    ENVIRONMENT: ${{ github.ref == 'refs/heads/main' && 'PROD' || 'STAGING' }}

Step 6: Alert Channels

In Vigilmon, go to Notifications → New Channel and configure:

  • Email — immediate alert when a heartbeat is missed
  • Slack webhook — ping your #alerts or #ci-cd channel

When a workflow stops pinging:

🔴 MISSED HEARTBEAT: Nightly Tests — main
Last ping: 26 hours ago
Expected interval: 24 hours

When it resumes:

✅ HEARTBEAT RECOVERED: Nightly Tests — main
Gap: 26 hours

Step 7: Protect Against Accidental Workflow Disabling

One more failure mode: a developer accidentally disables a workflow in the GitHub Actions UI (the Disable workflow button is easy to click). Since the workflow never runs, no ping arrives, and Vigilmon alerts you within one interval.

No code change needed — the heartbeat monitor already covers this case.


Complete Workflow Template

Here's a reusable template you can copy into any workflow:

# Add this step at the end of any job you want heartbeat monitoring on
- name: Ping Vigilmon heartbeat
  if: success()
  run: |
    curl -fsS -X POST "${{ secrets.VIGILMON_HEARTBEAT_URL }}" \
      --max-time 10 \
      --retry 3 \
      --retry-delay 2

Replace VIGILMON_HEARTBEAT_URL with the specific secret name for that workflow's monitor (use one heartbeat monitor per workflow).


What You're Now Catching

| Silent failure mode | How Vigilmon detects it | |---|---| | Scheduled cron skipped by GitHub | No ping arrives → alert after grace period | | Workflow accidentally disabled | No ping arrives → alert after grace period | | Tests pass but deployment step skipped | No ping on deploy job → alert | | Workflow hung on a stuck step | Job timeout → no ping → alert | | Branch protection removed, CI bypassed | No ping on missed runs → alert | | Deploy pipeline stopped shipping | Deploy heartbeat missed → alert |


GitHub Actions is reliable — until it quietly stops working. Vigilmon's heartbeat monitors give you the external signal that GitHub itself can't provide: a definitive alert when your CI/CD pipeline hasn't run in longer than expected.

Add heartbeat monitoring to your CI/CD pipelines today — register free at vigilmon.online.

Monitor your app with Vigilmon

Free plan — 5 monitors, no credit card required. Up and running in 60 seconds.

Start free →