Monitoring HashiCorp Consul with Vigilmon (Free, Multi-Region)

HashiCorp Consul is the control plane for your microservices. Service discovery, health checking, distributed configuration, and the service mesh all flow through it. Most teams think of Consul as infrastructure that monitors other things — but Consul itself can fail, and when it does, the consequences cascade fast.

This guide covers how to monitor the Consul HTTP API and UI with Vigilmon, giving you an external uptime check that operates independently of Consul's own health system.

Why Consul needs external monitoring

Consul has excellent built-in health checking for services registered in its catalog. But there's a blind spot: Consul's own availability.

If Consul becomes unreachable — because the node crashed, a network partition isolated it, or the server process hung — the built-in health checks stop reporting. Services that rely on Consul for discovery start failing. But nothing in Consul's own system can report the outage, because the reporter is down.

External monitoring solves this. Vigilmon probes from outside your infrastructure, so it detects Consul failures regardless of what's happening inside your cluster.

Step 1: Understand Consul's health endpoints

Consul exposes a comprehensive HTTP API. Three endpoints are particularly useful for monitoring.

`/v1/status/leader`

Returns the current cluster leader's address. If there's no leader (split-brain, quorum loss), the response changes:

curl http://consul.internal:8500/v1/status/leader
# "10.0.0.1:8300"

An empty string or error response here means Consul has lost quorum — a serious cluster event.

`/v1/health/service/consul`

Returns the health status of the Consul agents themselves:

curl http://consul.internal:8500/v1/health/service/consul

Returns an array of node health objects. Any node with status "critical" is failing.

`/v1/agent/self`

Returns information about the local Consul agent. Useful for monitoring a specific node:

curl http://consul.internal:8500/v1/agent/self
# {"Config":{"Datacenter":"dc1","NodeName":"..."},...}

Returns 200 when the agent is healthy, 500 or connection error when it's not.

Which endpoint to monitor

For external uptime monitoring, use /v1/status/leader. It's lightweight, requires no authentication by default, and the presence of a leader address confirms the cluster is functional.

Step 2: Set up Vigilmon monitoring

With the endpoint identified, point Vigilmon at Consul:

Sign up at vigilmon.online — free tier, no credit card
Click New Monitor → HTTP
Enter http://your-consul-host:8500/v1/status/leader
Set the check interval (1 or 5 minutes depending on your tier)
Save

Vigilmon probes from multiple geographic regions. A connection failure or non-200 response from any region triggers an incident.

Add a keyword monitor for leader validation

The /v1/status/leader endpoint returns 200 even when there's no leader (it returns an empty string body). A keyword monitor lets you assert that a leader address is actually present:

New Monitor → Keyword
URL: http://your-consul-host:8500/v1/status/leader
Keyword: :8300" (the Raft port — present in every valid leader address)
If absent, Vigilmon treats it as a failure

This catches quorum loss, where Consul responds but has no elected leader.

Monitor the Consul UI

If your team uses the Consul UI for service discovery visibility, add a monitor for it:

URL: http://your-consul-host:8500/ui/
Type: HTTP
Expected status: 200

Recommended monitor set for Consul

| Monitor | URL | Type | What it catches | |---|---|---|---| | Cluster leader | /v1/status/leader | HTTP | Consul process down | | Leader present | /v1/status/leader | Keyword | Quorum loss, split brain | | Agent health | /v1/agent/self | HTTP | Specific node failure | | Consul UI | /ui/ | HTTP | Web UI unreachable |

Step 3: Consul ACL considerations

If you've enabled Consul ACLs (which you should in production), the health endpoints may require a token. You have two options:

Option A: Create a read-only monitoring token

# monitoring-policy.hcl
node_prefix "" {
  policy = "read"
}
service_prefix "" {
  policy = "read"
}

consul acl policy create -name monitoring -rules @monitoring-policy.hcl
consul acl token create -description "Vigilmon monitoring" -policy-name monitoring

Add the token to your Vigilmon monitor as a header:

Header name: X-Consul-Token
Header value: your monitoring token

Option B: Expose an unauthenticated health endpoint

Configure a Consul agent to allow anonymous access to the status endpoint specifically. Less recommended for security-sensitive environments.

Step 4: Configure alert delivery

Set up where Vigilmon sends alerts when Consul goes down.

Slack:

Create an incoming webhook in your Slack workspace
In Vigilmon: Notifications → New Channel → Slack
Enable the channel on your Consul monitors

When Consul becomes unavailable, you'll get:

🔴 DOWN: consul.internal:8500/v1/status/leader
Status: Connection refused
Regions: US-East, EU-West
Started: 2 minutes ago

Alert your on-call engineer immediately — a Consul outage is a P1 incident. Services relying on Consul for DNS-based discovery will start failing within seconds of Consul going down.

Step 5: Multi-datacenter monitoring

If you run Consul across multiple datacenters, monitor each one independently. A WAN federation issue can take down one datacenter's Consul without affecting others:

| Datacenter | Monitor URL | |---|---| | us-east | http://consul-us-east:8500/v1/status/leader | | eu-west | http://consul-eu-west:8500/v1/status/leader | | ap-south | http://consul-ap-south:8500/v1/status/leader |

Create a separate Vigilmon monitor for each datacenter so alerts are datacenter-specific and actionable.

Step 6: Create an internal status page for your ops team

Consul availability affects every team running services in your infrastructure. Give your engineering teams visibility:

Status Pages → New Status Page in Vigilmon
Name it "Infrastructure Status"
Add your Consul monitors (and Vault, if you have it)
Share the internal URL with your engineering org

Teams can check the status page themselves during incidents rather than pinging the ops team.

What you've built

| What | How | |---|---| | Process health | HTTP monitor on /v1/status/leader | | Quorum validation | Keyword monitor asserting leader address present | | Specific node check | HTTP monitor on /v1/agent/self | | UI availability | HTTP monitor on /ui/ | | ACL-safe monitoring | Read-only monitoring token | | Multi-datacenter | Separate monitor per datacenter | | Instant alerts | Slack/email on down + recovery |

Your Consul cluster is now observable from outside. The next time a node loses quorum, a network partition isolates your Consul servers, or the agent process crashes under load, you'll know about it in minutes — not after your services start returning DNS lookup failures to end users.

Get started free at vigilmon.online — your first monitor is running in under a minute.