tutorial

Monitoring Windmill with Vigilmon: Version Endpoint, Web Editor, Worker Liveness & SSL Alerts

How to monitor Windmill self-hosted workflow automation with Vigilmon — version health endpoint, web editor availability, worker liveness via API, and SSL certificate alerts for automation infrastructure.

Windmill is a self-hosted open-source workflow automation and script runner — an alternative to n8n and Temporal — that lets engineering teams build and deploy scripts, flows, and apps directly from a web editor, with scheduled or event-triggered execution on managed workers. Teams choose Windmill to run data pipelines, automate internal operations, and orchestrate multi-step jobs without managing custom job runners. When Windmill goes down, every scheduled workflow and event-triggered script stops executing: data pipelines miss their runs, operational automations fail silently, and the web editor becomes inaccessible to engineers who need to debug or deploy scripts. Windmill's worker fleet executes jobs asynchronously — the server can appear healthy while workers are starved or crashed, leaving jobs in a perpetual pending state. Vigilmon gives you external visibility into Windmill's version endpoint, web editor, worker liveness, and SSL certificate so failures are caught within 60 seconds.

What You'll Build

  • A monitor on Windmill's /api/version health endpoint
  • An HTTP monitor for the Windmill web editor
  • An HTTP monitor for the worker liveness check (/api/w/{workspace}/jobs/ returning 401 confirms workers are alive)
  • SSL certificate monitoring for your Windmill domain
  • An alerting setup tuned for automation pipeline criticality

Prerequisites

  • A running Windmill instance with a public or network-reachable domain
  • HTTPS configured (e.g., https://windmill.example.com)
  • A workspace already created in Windmill (e.g., ops or main)
  • A free account at vigilmon.online

Step 1: Verify Windmill's Version Endpoint

Windmill exposes a lightweight health-like endpoint at /api/version that returns the running server version:

curl -i https://windmill.example.com/api/version

A healthy instance returns HTTP 200 with a plain-text or JSON version string:

"1.xx.x"

This endpoint requires no authentication and confirms that the Windmill API server (written in Rust/Axum) is running and accepting requests. A non-200 response or timeout indicates the server process has crashed, the database is unreachable, or the container is restarting.

PostgreSQL dependency: Windmill stores all scripts, flows, schedules, job queues, audit logs, variables, and resource definitions in PostgreSQL. The /api/version endpoint performs a lightweight database ping as part of its startup readiness check. If the version endpoint fails, all Windmill functionality — job scheduling, script execution, flow orchestration, and the web editor — is non-functional.


Step 2: Create a Vigilmon HTTP Monitor for the Version Endpoint

  1. Log in to VigilmonAdd Monitor → HTTP.
  2. URL: https://windmill.example.com/api/version.
  3. Check interval: 60 seconds.
  4. Response timeout: 15 seconds.
  5. Expected status: 200.
  6. Label: Windmill Version.
  7. Click Save.

This monitor catches:

  • Windmill API server crashes or process failures
  • PostgreSQL connectivity failures — all job scheduling, script storage, and flow definitions are in PostgreSQL; a database outage makes Windmill entirely non-functional
  • Container restart loops triggered by misconfigured environment variables (DATABASE_URL, BASE_URL, JWT_SECRET)
  • Failed database migrations after a Windmill version upgrade that prevent the server from starting

Rust/Axum server reliability: Windmill's API server is built in Rust on the Axum framework, making it extremely memory-efficient and crash-resistant. When the version endpoint is unreachable, it almost always indicates a database connectivity problem or a misconfigured environment variable, not a server memory issue. Investigate PostgreSQL first when this alert fires.


Step 3: Monitor the Windmill Web Editor

The Windmill web editor is where engineers write scripts, build flows, test automations, and monitor job execution. Monitor it independently from the API to catch reverse proxy failures and frontend asset serving problems:

  1. Add Monitor → HTTP.
  2. URL: https://windmill.example.com.
  3. Check interval: 60 seconds.
  4. Expected status: 200.
  5. Keyword: Windmill.
  6. Label: Windmill Web Editor.
  7. Click Save.

This monitor catches nginx or reverse proxy failures, CDN misconfiguration, and frontend asset serving errors that prevent engineers from accessing the editor — even when the backend API is responding. A broken editor means engineers cannot inspect failed jobs, modify broken scripts, or deploy new automations during an incident.


Step 4: Monitor Worker Liveness via the Jobs API

Windmill workers are separate processes that pull jobs from the PostgreSQL queue and execute scripts. The server can be fully healthy — version endpoint returning 200, web editor loading — while all workers are crashed or unresponsive, leaving every scheduled and triggered job in a perpetual queued state.

The jobs API path at /api/w/{workspace}/jobs/ is the correct liveness check for the Windmill job infrastructure. Calling it without a valid token returns 401 Unauthorized — the correct response confirming the API is alive and enforcing authentication:

curl -i https://windmill.example.com/api/w/ops/jobs/
# Expected: HTTP 401 (API is alive, authentication is enforced)

Replace ops with your actual workspace name (e.g., main, default, or whatever you named your workspace during setup).

  1. Add Monitor → HTTP.
  2. URL: https://windmill.example.com/api/w/ops/jobs/.
  3. Check interval: 60 seconds.
  4. Expected status: 401.
  5. Label: Windmill Jobs API.
  6. Click Save.

A 401 is the correct liveness signal: it proves the API server accepted the connection, ran the authentication middleware, and returned a proper HTTP response — meaning the Windmill server can route and handle job-related requests. A 404 may mean the workspace name is wrong or the workspace was deleted. A 502 or 504 means the reverse proxy is running but the Windmill process is not responding.

Worker starvation is silent: When Windmill workers crash, the API and web editor continue working normally. Jobs are submitted, queued, and visible in the dashboard — but they never execute. The only observable symptom is jobs accumulating in queued status with no completed_at timestamp. Add a Windmill cron monitor (configured in Schedules under your workspace) that sends a heartbeat to Vigilmon after a test script runs successfully — this gives you an end-to-end worker health signal that the Jobs API liveness check cannot provide.


Step 5: Monitor SSL Certificates

An expired SSL certificate on your Windmill instance breaks automation infrastructure in cascading ways:

  • The web editor becomes inaccessible to all engineers
  • External webhook triggers that POST to Windmill flows fail with TLS errors
  • Any scripts inside Windmill that call the Windmill API itself (meta-automations) fail
  • CI/CD pipelines that trigger Windmill jobs via the REST API fail with certificate errors
  • Windmill webhooks sent to external services through HTTPS may fail if the server-side certificate is involved in mutual TLS
  1. Add Monitor → SSL Certificate.
  2. Domain: windmill.example.com.
  3. Alert when expiry is within: 30 days.
  4. Alert again: 14 days, 7 days, 3 days, 1 day.
  5. Click Save.

Step 6: Configure Alerting

In Vigilmon under Settings → Notifications, configure your alert channels:

| Monitor | Trigger | Action | |---|---|---| | /api/version | Non-200 response | Check Windmill server container; inspect PostgreSQL connectivity; review server logs | | Web Editor | Non-200 or keyword missing | Check nginx/reverse proxy; verify frontend asset serving; inspect container logs | | Jobs API | Non-401 response | Check Windmill server process; verify workspace exists; inspect API routing | | SSL certificate | < 30 days to expiry | Renew certificate; verify Let's Encrypt auto-renewal is functioning |

Alert after: 2 consecutive failures for HTTP monitors. For the Jobs API monitor, treat even a single non-401 failure seriously — it may indicate the Windmill server has crashed between scheduled job executions.

Escalation for automation teams: Route Windmill alerts to your platform or SRE on-call channel. A Windmill failure during off-hours means every scheduled overnight job — data pipelines, report generation, cleanup tasks — will fail silently until the server is restored. The sooner an engineer is paged, the smaller the data gap.


Common Windmill Failure Modes and What Vigilmon Catches

| Scenario | Vigilmon monitor | |---|---| | Windmill API server crash | Version endpoint unreachable; alert within 60 s | | PostgreSQL down or unreachable | Version endpoint fails; all scheduling and execution stops | | PostgreSQL disk full from job log retention | Version endpoint may pass; jobs fail on insert; workers log errors | | Worker containers crashed (OOM or panic) | Jobs API returns 401 (server healthy); jobs queue but never execute | | Worker fleet scaled to zero | Jobs API returns 401 (server healthy); cron heartbeat stops | | Reverse proxy misconfiguration after update | Web editor monitor fires; version endpoint may still pass | | Frontend asset serving failure | Web editor keyword check fails; blank screen on load | | SSL certificate expires | TLS errors across all API callers; webhooks fail; editor inaccessible | | DNS misconfiguration | All monitors fire simultaneously | | Failed database migration after version upgrade | Server fails to start; version endpoint returns 500 or is unreachable | | Workspace accidentally deleted | Jobs API returns 404 for workspace-scoped routes; jobs cannot be triggered | | JWT secret rotation (missing env var update) | Workers cannot authenticate to server; jobs queued but never claimed |


Automation infrastructure is only valuable when it runs reliably — a silent Windmill failure means missed pipeline runs and stale data that compounds until someone notices the symptom, not the cause. Vigilmon watches Windmill's version endpoint, web editor, jobs API, and SSL certificate so you're alerted within 60 seconds of any failure, before a missed midnight job becomes a morning data incident.

Start monitoring Windmill in under 5 minutes — register free at vigilmon.online.

Monitor your app with Vigilmon

Free plan — 5 monitors, no credit card required. Up and running in 60 seconds.

Start free →