Monitoring HashiCorp Vault with Vigilmon: Health Endpoints, Seal Status, HA Detection & SSL Alerts

HashiCorp Vault is the secrets backbone for many production environments — storing TLS certificates, API keys, database credentials, and encryption keys for every service that runs in your infrastructure. When Vault seals itself due to a quorum loss or operator error, every service that depends on dynamic secrets or encryption stops working. When the HA cluster loses its active node, secret reads and writes fail until a standby promotes. Vigilmon gives you external visibility into Vault's health: the built-in health endpoint, seal status, HA active-node detection, and SSL certificate expiry — so you know before your services do.

What You'll Build

A monitor on Vault's /v1/sys/health endpoint to detect sealed and uninitialized states
A seal-status check that distinguishes sealed Vault from a downed process
An HA standby detection monitor that confirms an active leader is available
SSL certificate monitoring for your Vault domain
An alerting setup that routes seal events and HA failovers differently from process crashes

Prerequisites

A running HashiCorp Vault 1.9+ instance with a public or network-reachable domain
HTTPS configured (e.g., https://vault.example.com)
A free account at vigilmon.online

Step 1: Understand Vault's Health Endpoint

Vault's /v1/sys/health endpoint returns different HTTP status codes depending on node state — not just 200 vs. non-200. Understanding these codes is essential for correct monitoring:

curl -s -o /dev/null -w "%{http_code}" https://vault.example.com/v1/sys/health

| HTTP status | Meaning | |---|---| | 200 | Initialized, unsealed, active | | 429 | Unsealed, standby (HA) | | 472 | DR secondary (replication) | | 473 | Performance standby | | 501 | Not initialized | | 503 | Sealed or unreachable |

For most deployments, 200 means "healthy and serving secrets." A 503 response means Vault is sealed — secrets cannot be read or written, and every dependent service is affected.

Check the full JSON body to inspect the state:

curl https://vault.example.com/v1/sys/health

{
  "initialized": true,
  "sealed": false,
  "standby": false,
  "performance_standby": false,
  "replication_performance_mode": "disabled",
  "replication_dr_mode": "disabled",
  "server_time_utc": 1719700000,
  "version": "1.16.0",
  "cluster_name": "vault-cluster",
  "cluster_id": "abc-123"
}

Step 2: Create a Vigilmon HTTP Monitor for the Health Endpoint

Log in to Vigilmon → Add Monitor → HTTP.
URL: https://vault.example.com/v1/sys/health.
Check interval: 60 seconds.
Response timeout: 10 seconds.
Expected status: 200.
Keyword: "sealed":false (confirms Vault is unsealed and active).
Click Save.

This monitor fires when:

The Vault process crashes or the host is unreachable (connection error instead of 503)
Vault seals itself (returns 503 instead of 200)
Vault is not yet initialized (returns 501)
An HA standby is mistakenly contacted as the active node

Alert sensitivity: Set to trigger after 1 consecutive failure. A sealed Vault means zero secret access — this is a P1 incident for any service that depends on dynamic credentials.

Step 3: Monitor Seal Status Separately

The default health monitor catches a sealed Vault via status code, but adding a dedicated seal-status check gives you richer keyword detection and a clearer alert label:

Add Monitor → HTTP.
URL: https://vault.example.com/v1/sys/health.
Check interval: 60 seconds.
Expected status: 200 (or any non-error response — Vault always responds on this endpoint).
Keyword: "initialized":true (if this goes false, Vault lost its storage backend).
Label: Vault seal status.
Click Save.

Why two monitors? The first monitor (expected status 200) alerts when Vault is sealed (returns 503). This second monitor (keyword check) catches edge cases where the process responds but the cluster state is unexpected — such as after a storage backend recovery where Vault re-initialized itself.

Step 4: Detect HA Active-Node Availability

In an HA Vault cluster, only the active node serves write requests. Standbys forward reads but reject writes. If the active node goes down and no standby promotes, the cluster is effectively read-only or worse. Monitor specifically for an active node:

Add Monitor → HTTP.
URL: https://vault.example.com/v1/sys/health.
Check interval: 60 seconds.
Expected status: 200 (standby returns 429, active returns 200).
Keyword: "standby":false (confirms this node is active, not a standby).
Label: Vault HA active node.
Click Save.

When this monitor fires with a 429 status, your load balancer is pointing at a standby instead of the active node — check your load balancer health check configuration and Vault's active-node advertisement.

Load balancer tip: Most Vault deployments point a load balancer at the active node using Vault's /v1/sys/leader redirect. If your load balancer uses the health endpoint directly, ensure it treats 200 as healthy and 429 as unhealthy.

Step 5: Monitor SSL Certificates

Vault's TLS certificate is especially critical: every Vault client (the Vault agent, the SDK, CLI, and every service using dynamic secrets) validates the server certificate. An expired certificate causes all secret reads and writes to fail with TLS errors — often with cryptic messages in application logs.

Add Monitor → SSL Certificate.
Domain: vault.example.com.
Alert when expiry is within: 30 days.
Alert again: 14 days, 7 days, 3 days, 1 day.
Click Save.

Vault as its own CA: If Vault issues its own certificates via the PKI secrets engine, the Vault TLS certificate itself may be shorter-lived than typical Let's Encrypt certificates. Check your certificate's actual expiry with openssl s_client -connect vault.example.com:443 2>/dev/null | openssl x509 -noout -dates and set your alert threshold accordingly.

Step 6: Monitor the Vault UI (if Enabled)

If you've enabled the Vault web UI (ui = true in vault.hcl), add a monitor for it separately — the UI runs through the same listener but adds JavaScript and template rendering that can fail even when the API is healthy:

curl https://vault.example.com/ui/

Add Monitor → HTTP.
URL: https://vault.example.com/ui/.
Check interval: 5 minutes.
Expected status: 200.
Keyword: Vault (appears in the page title of the Vault UI).
Label: Vault UI.
Click Save.

Step 7: Configure Alerting

In Vigilmon under Settings → Notifications, configure your alert channels:

| Monitor | Trigger | Action | |---|---|---| | /v1/sys/health (status 200) | 503 or connection error | Check if Vault is sealed; run vault status; unseal if needed | | Seal status (keyword) | initialized missing | Storage backend lost; check Raft/Consul backend health | | HA active node | 429 (standby) or 503 | Active node is down; check if standby auto-promoted | | SSL certificate | < 30 days to expiry | Renew certificate; check ACME or PKI automation | | Vault UI | Non-200 or keyword missing | UI rendering issue; check Vault logs |

Alert after: 1 consecutive failure for all Vault monitors — Vault failures propagate immediately to every secret-dependent service.

Common Vault Failure Modes and What Vigilmon Catches

| Scenario | Vigilmon monitor | |---|---| | Vault process crash | /v1/sys/health connection error; alert within 60 s | | Vault auto-seals (HSM/KMS issue) | Health endpoint returns 503; sealed keyword check fires | | HA failover delay | HA monitor returns 429; alert until standby promotes | | Storage backend (Raft/Consul) down | Vault seals itself; health endpoint returns 503 | | SSL certificate expires | SSL monitor alerts at 30-day threshold; all clients fail TLS | | Vault not initialized after restore | Health endpoint returns 501; alert fires | | Load balancer routing to standby | HA active-node monitor returns 429 while active node exists | | DNS misconfiguration | All monitors fire simultaneously |

HashiCorp Vault is the kind of dependency that's invisible when healthy and catastrophic when it fails. A sealed Vault silently breaks every service that relies on dynamic secrets — database credentials don't rotate, API keys can't be fetched, and encryption operations fail. Vigilmon gives you layered external visibility into Vault's health status, seal state, HA topology, and certificate expiry so you know the moment something changes and can act before services start failing.

Start monitoring Vault in under 5 minutes — register free at vigilmon.online.