Monitoring Headscale with Vigilmon: Health Endpoint, DERP Relay, Coordination API TCP & SSL Alerts

Headscale is the self-hosted implementation of the Tailscale control server — it manages your WireGuard mesh network's key exchange, DERP relay routing, and device coordination without relying on Tailscale's cloud infrastructure. Because Headscale is the control plane for your entire overlay network, its availability is foundational: if Headscale goes down, existing WireGuard tunnels between devices usually continue working (WireGuard is peer-to-peer), but new devices can't join, existing devices can't re-authenticate, and network policy changes can't be applied. If the DERP relay stops responding, devices behind NAT lose their fallback relay path and may lose connectivity entirely. Vigilmon gives you external visibility into Headscale's health endpoint, DERP relay, coordination API, TCP port, and SSL certificate so your network control plane is always monitored.

What You'll Build

A monitor on Headscale's /health endpoint
An HTTP monitor for the DERP relay availability
A monitor for the coordination server /key endpoint
A TCP monitor for Headscale's control port
SSL certificate monitoring for your Headscale domain

Prerequisites

A running Headscale instance with a public or network-reachable domain
HTTPS configured (e.g., https://headscale.example.com)
Headscale listening on port 8080 (default) or 443 (behind a reverse proxy)
A free account at vigilmon.online

Step 1: Verify Headscale's Health Endpoint

Headscale exposes a dedicated health check at /health:

curl https://headscale.example.com/health

A healthy Headscale returns HTTP 200 with a JSON body:

{"healthy":true}

This endpoint is unauthenticated and confirms that the Headscale process is running and its internal health checks pass, including database connectivity and key management services.

Step 2: Create a Vigilmon HTTP Monitor for the Health Endpoint

Log in to Vigilmon → Add Monitor → HTTP.
URL: https://headscale.example.com/health.
Check interval: 60 seconds.
Response timeout: 10 seconds.
Expected status: 200.
Keyword: healthy.
Click Save.

This monitor catches:

Headscale process crashes or unexpected restarts
Database connectivity failures (Headscale stores device registrations and network policy in its database)
Configuration errors after Headscale upgrades
Key management failures that prevent device authentication

Because Headscale is your network's control plane, prompt alerting is critical — even though existing WireGuard tunnels persist through a Headscale outage, you're operating blind to network changes and new device onboarding is completely blocked.

Step 3: Monitor the DERP Relay

Headscale includes built-in DERP (Designated Encrypted Relay for Packets) relay support. DERP relays are the fallback path for devices that cannot establish direct WireGuard connections — typically devices behind symmetric NAT or strict firewalls. Monitor DERP relay availability to confirm devices have a fallback path:

curl https://headscale.example.com/derp

The DERP endpoint returns a plain text response indicating the relay is active.

Add Monitor → HTTP.
URL: https://headscale.example.com/derp.
Check interval: 60 seconds.
Expected status: 200.
Label: Headscale DERP Relay.
Click Save.

When the DERP relay monitor fires but the health endpoint is green, Headscale's core control functions are working but devices relying on the relay for NAT traversal may be losing connectivity. This is particularly important for mobile devices and remote workers behind corporate firewalls.

Step 4: Monitor the Coordination Server `/key` Endpoint

Tailscale clients and the Headscale control server exchange public keys during device registration and re-authentication. The /key endpoint exposes Headscale's public key — it's the first thing a Tailscale client contacts when joining your network. Monitoring it confirms the coordination API is accessible:

curl https://headscale.example.com/key

A healthy Headscale returns a hexadecimal public key string.

Add Monitor → HTTP.
URL: https://headscale.example.com/key.
Check interval: 5 minutes.
Expected status: 200.
Label: Headscale Coordination API.
Click Save.

When the /key endpoint is unreachable but the health endpoint is green, there's likely a routing or reverse proxy configuration issue specifically affecting the coordination API path. Tailscale clients will fail to register or re-authenticate even though Headscale appears healthy.

Step 5: Create a TCP Monitor for the Control Port

Headscale's control port is the entry point for all Tailscale client traffic. A TCP-level check confirms the port is reachable even before verifying HTTP responses — useful for catching firewall rule changes or reverse proxy failures that drop connections at the TCP layer:

Add Monitor → TCP.
Host: headscale.example.com.
Port: 443 (if behind a reverse proxy) or 8080 (Headscale's default port if directly exposed).
Check interval: 60 seconds.
Response timeout: 10 seconds.
Label: Headscale Control Port.
Click Save.

When the TCP monitor fires but your HTTP monitors are green, you have a network split — Vigilmon can reach Headscale via one path, but Tailscale clients reaching the control port via a different route may be blocked. This can indicate an asymmetric routing problem or a firewall rule that blocks the specific port clients use.

Step 6: Monitor SSL Certificates

Headscale's SSL certificate is especially critical because:

Tailscale clients verify the control server's TLS certificate during every authentication handshake — an expired certificate causes all clients to fail re-authentication
The DERP relay uses HTTPS — certificate expiry disables the relay fallback path
Headscale's gRPC API for the headscale CLI tool also requires valid TLS

Add Monitor → SSL Certificate.
Domain: headscale.example.com.
Alert when expiry is within: 30 days.
Alert again: 14 days, 7 days, 3 days, 1 day.
Click Save.

A Headscale certificate expiry is one of the most disruptive failures: existing WireGuard tunnels may survive temporarily, but within hours (as devices attempt re-authentication), the entire network goes offline. A 30-day warning gives you ample time to renew before impact.

Step 7: Configure Alerting

In Vigilmon under Settings → Notifications, configure your alert channels:

| Monitor | Trigger | Action | |---|---|---| | /health | Non-200 or healthy missing | Check systemctl status headscale; inspect Headscale logs | | DERP relay /derp | Non-200 | Check DERP configuration in config.yaml; inspect relay logs | | Coordination API /key | Non-200 | Check reverse proxy routing; inspect Headscale API availability | | Control port TCP | Connection refused or timeout | Check firewall rules; verify reverse proxy is running | | SSL certificate | < 30 days to expiry | Renew certificate; check ACME automation on reverse proxy |

Alert after: 2 consecutive failures for HTTP monitors. 1 failure for the TCP monitor — control port connection failures can indicate broad network issues affecting all Tailscale clients.

Common Headscale Failure Modes and What Vigilmon Catches

| Scenario | Vigilmon monitor | |---|---| | Headscale process crash | /health unreachable; alert within 60 s | | Database goes down | Health check returns non-200; device registrations unavailable | | DERP relay misconfiguration | DERP monitor fires; NAT-traversal fallback path broken | | Control port blocked by firewall | TCP monitor fires; Tailscale clients can't re-authenticate | | Reverse proxy misconfiguration | HTTP monitors fire; /key endpoint unreachable | | SSL certificate expires | SSL monitor alerts at 30-day threshold; all client auth fails | | Headscale upgrade failure | Health check non-200; rollback needed before clients expire | | Key database corruption | Health check may stay green; device authentication fails | | DNS misconfiguration | All HTTP and SSL monitors fire simultaneously | | Server out of disk space | Headscale may crash; health check unreachable |

The paradox of Headscale monitoring is that its failure mode is gradual: existing tunnels keep working until devices need to re-authenticate (typically every few days), which means an outage can go unnoticed for hours. Vigilmon watches Headscale's health endpoint, DERP relay, coordination API, control port, and SSL certificate so you catch failures immediately — not when your team starts reporting that their VPN connections are dropping.

Start monitoring Headscale in under 5 minutes — register free at vigilmon.online.