tutorial

HashiCorp Vault Health Monitoring with Vigilmon

"Learn how to monitor HashiCorp Vault secrets management using Vigilmon — covering the /v1/sys/health endpoint, sealed/unsealed status detection, keyword monitors, and alerting on Vault outages."

HashiCorp Vault Health Monitoring with Vigilmon

HashiCorp Vault is the most widely adopted secrets management solution in the industry. Applications use it to fetch database passwords, API keys, TLS certificates, and encryption keys at runtime. When Vault goes down or enters a sealed state, every application that depends on dynamic credentials or secret fetching fails to start or loses access to critical resources.

This guide covers how to monitor Vault with Vigilmon, including the health API, sealed state detection, and alerting strategies.


Vault's Health Endpoint

Vault exposes a health endpoint that is uniquely designed for monitoring: it returns different HTTP status codes depending on Vault's state, with no authentication required.

GET https://your-vault.com/v1/sys/health

Status Code Reference

| HTTP Status | Vault State | |-------------|-------------| | 200 | Active and unsealed — healthy | | 429 | Standby node (HA cluster) — reachable but not active | | 472 | Performance standby — readable but not writable | | 501 | Not initialized | | 503 | Sealed or unreachable |

A 503 is the critical state to monitor for. Vault returns 503 when it is sealed — it holds the data but refuses to decrypt or serve anything until unsealed. An unsealing event is either a manual operation or triggered by auto-unseal configuration (AWS KMS, GCP Cloud KMS, Azure Key Vault).

Response Body

{
  "initialized": true,
  "sealed": false,
  "standby": false,
  "performance_standby": false,
  "replication_performance_mode": "disabled",
  "replication_dr_mode": "disabled",
  "server_time_utc": 1750000000,
  "version": "1.16.0",
  "cluster_name": "vault-cluster-a1b2c3",
  "cluster_id": "uuid-here"
}

The sealed field is the most important. When "sealed": true, Vault is locked and will not serve any secrets.


Setting Up Vigilmon Monitors

Monitor 1: Vault Availability (Unsealed)

  1. Log in to VigilmonMonitors → New Monitor
  2. Type: HTTP
  3. Method: GET
  4. URL: https://your-vault.com/v1/sys/health
  5. Interval: 1 minute
  6. Expected status: 200
  7. Keyword check: "sealed":false

This monitor fires when Vault is sealed, down, or uninitialized. The combination of status code 200 and the "sealed":false keyword means only a fully healthy, unsealed, active Vault passes.

Monitor 2: Sealed State Detection

Create a second monitor specifically for sealed state:

  1. Type: HTTP
  2. Method: GET
  3. URL: https://your-vault.com/v1/sys/health
  4. Interval: 1 minute
  5. Keyword check: "sealed":false

Configure this monitor to alert even on a 503 response (don't filter out non-200 status codes). When Vault is sealed, the HTTP 503 plus the keyword "sealed":true in the body gives you two signals to act on.

In Vigilmon's monitor settings, set "Alert on non-2xx status" to enabled so that a 503 from Vault triggers the alert.


Understanding Vault Sealed State

Vault uses Shamir's Secret Sharing to protect its master key. On startup (or after any restart), Vault is sealed — it holds all the data but cannot decrypt anything. To unseal, you need a quorum of key shares:

# Unseal with a key share (requires quorum, typically 3 of 5)
vault operator unseal <unseal-key>

With auto-unseal (recommended for production), Vault uses a cloud KMS to automatically unseal:

# vault.hcl
seal "awskms" {
  region     = "us-east-1"
  kms_key_id = "arn:aws:kms:us-east-1:..."
}

Even with auto-unseal configured, monitor for sealed state. Auto-unseal can fail if:

  • The KMS key is rotated without updating Vault's config
  • IAM permissions change
  • The Vault process restarts while the KMS is unreachable

A Vigilmon alert on sealed state gives you lead time to unseal before applications start failing.


Monitoring Vault in HA Mode

In a Vault HA cluster, only the active node serves writes. Standby nodes return HTTP 429 (not 200) from the health endpoint. This is by design — the 429 distinguishes "this node is healthy but not the leader" from "this node is broken."

Configure Vigilmon differently per node type:

Active node monitor:

  • Expected status: 200
  • Keyword: "standby":false

HA standby monitor (acceptable healthy state):

  • Expected status: 429 (or "any 2xx/4xx except 5xx")
  • Keyword: "sealed":false

Or, monitor the cluster via a load balancer that routes to the active node. The load balancer should only send traffic to the 200-responding node.


Vault Agent Proxy Monitoring

Many teams use Vault Agent or Vault Proxy as a local sidecar that caches secrets. If Vault Agent's connection to Vault breaks, applications relying on the cache can continue briefly — but eventually fail. Monitor Vault Agent's local listener:

# Vault Agent exposes a local API listener
curl http://127.0.0.1:8200/v1/sys/health

In containerized environments, add a sidecar health check to your container orchestration:

# Kubernetes sidecar health check
livenessProbe:
  httpGet:
    path: /v1/sys/health
    port: 8200
  initialDelaySeconds: 10
  periodSeconds: 30

For external monitoring with Vigilmon, expose the Vault Agent's status via a small health proxy if it isn't directly reachable.


Alerting Strategy for Vault

Vault outages require different urgency levels depending on the failure mode:

Sealed State — Immediate Critical Alert

Applications actively fetching secrets (at startup or for short-TTL dynamic credentials) start failing within seconds of Vault being sealed. Alert after 1 failed check.

🔴 CRITICAL: HashiCorp Vault is SEALED
Monitor: Vault Health
URL: https://your-vault.com/v1/sys/health
Status: 503 — "sealed": true
→ Manual unseal or auto-unseal failure. Applications cannot fetch secrets.

Uninitialized State — High Priority Alert

A 501 Not Initialized response means Vault has been wiped or is a new instance. This should never happen in production unexpectedly — it signals a serious incident.

Latency Threshold

Applications that fetch secrets at startup have an acceptable latency window. If Vault response time exceeds 500ms, secret-fetching apps may time out during startup. Configure Vigilmon's response time threshold at 500ms warning, 2000ms critical.

Webhook Integration Example

// Go webhook handler
http.HandleFunc("/webhooks/vigilmon", func(w http.ResponseWriter, r *http.Request) {
    var payload struct {
        Event   string `json:"event"`
        Monitor struct {
            Name string `json:"name"`
            URL  string `json:"url"`
        } `json:"monitor"`
    }

    if err := json.NewDecoder(r.Body).Decode(&payload); err != nil {
        http.Error(w, "bad request", 400)
        return
    }

    if payload.Event == "down" {
        log.Printf("VAULT DOWN: %s", payload.Monitor.URL)
        // Trigger PagerDuty, post to #vault-ops Slack channel
        notifyOncall("Vault sealed or unreachable — secrets unavailable")
    }

    w.WriteHeader(200)
    json.NewEncoder(w).Encode(map[string]bool{"received": true})
})

Monitoring Vault's PKI and Secret Engines

Beyond availability, monitor the outputs that Vault produces. If Vault is up but a specific secret engine is broken, applications that depend on that engine still fail.

For the PKI secrets engine (certificate issuance), monitor the CA bundle endpoint:

GET https://your-vault.com/v1/pki/ca/pem

A 200 response with a valid PEM certificate confirms the PKI engine is functioning. Add a keyword check for BEGIN CERTIFICATE.


Summary

| Vault State | HTTP Status | Keyword Signal | Vigilmon Action | |-------------|-------------|----------------|-----------------| | Healthy and unsealed | 200 | "sealed":false | Pass | | HA standby | 429 | "sealed":false | Pass (configure allowlist) | | Sealed | 503 | "sealed":true | ALERT CRITICAL | | Uninitialized | 501 | "initialized":false | ALERT HIGH | | Unreachable | Connection error | — | ALERT CRITICAL |

Vault is infrastructure that must not fail silently. With Vigilmon's 1-minute checks and immediate alerting on sealed state, you'll detect Vault failures before they cascade into application outages.


Further Reading

Monitor your app with Vigilmon

Free plan — 5 monitors, no credit card required. Up and running in 60 seconds.

Start free →