Monitoring Netbird with Vigilmon: Management API, Signal Server TCP, Relay Availability & SSL Alerts

Netbird is an open-source WireGuard-based mesh VPN that lets you connect your servers, devices, and cloud workloads into a private network without manual firewall rules or complex VPN configurations. Because Netbird's management server is the control plane for your entire mesh — peers register through it, receive route updates from it, and authenticate against it — a management server outage means new peers can't join, disconnected peers can't reconnect, and your network policy changes can't propagate. If the signal server goes down, peer-to-peer hole-punching fails and connections fall back to (or entirely depend on) the relay. Vigilmon gives you external visibility into Netbird's management API, signal server, relay, web dashboard, and SSL certificate so you detect control-plane failures before your mesh network silently degrades.

What You'll Build

A monitor on Netbird's management API (returning 401 = server is alive)
A TCP monitor for the signal server
An HTTP monitor for the relay/TURN server availability
An HTTP monitor for the Netbird web dashboard
SSL certificate monitoring for your Netbird domain
An alerting setup that distinguishes management failures from signal or relay failures

Prerequisites

A self-hosted Netbird instance (management server, signal server, relay/TURN)
A public domain with HTTPS (e.g., https://netbird.example.com)
A free account at vigilmon.online

Step 1: Verify the Management API Health Check

Netbird's management API does not expose a public /health endpoint, but the /api/users endpoint responds with HTTP 401 Unauthorized when the management server is healthy and rejecting unauthenticated requests. A 401 is your health signal — it means the API is running and enforcing authentication:

curl -I https://netbird.example.com/api/users
# Expected: HTTP/2 401

A connection refused, 502, or timeout means the management server itself is down — not just unauthorized.

Step 2: Create a Vigilmon HTTP Monitor for the Management API

Log in to Vigilmon → Add Monitor → HTTP.
URL: https://netbird.example.com/api/users.
Check interval: 60 seconds.
Response timeout: 10 seconds.
Expected status: 401.
Label: Netbird Management API.
Click Save.

Why 401? Vigilmon lets you configure any HTTP status code as the "expected healthy" response. For APIs that require authentication, a 401 from an unauthenticated probe is the correct liveness signal — it means the server received the request and is enforcing access control normally.

This monitor catches:

Management server process crashes
Database connectivity failures (Netbird stores peer registrations and policies in its database)
TLS termination failures at the reverse proxy
Configuration errors after Netbird upgrades

Step 3: Create a TCP Monitor for the Signal Server

Netbird's signal server facilitates WireGuard peer-to-peer connection establishment (hole-punching). It typically runs on port 10000 (gRPC) or 443 (if configured behind a shared reverse proxy). A TCP check confirms the signal server is accepting connections:

Add Monitor → TCP.
Host: netbird.example.com (or the dedicated signal server host if separated).
Port: 10000 (or your configured signal port).
Check interval: 60 seconds.
Response timeout: 10 seconds.
Label: Netbird Signal Server.
Click Save.

When the signal server TCP monitor fires but the management API is healthy, existing WireGuard tunnels between peers with stable endpoints remain up — but peers that need hole-punching to establish new connections will fail silently. This is a subtle degradation that Vigilmon catches before users report connectivity issues.

Step 4: Monitor the Netbird Web Dashboard

The management API and signal server handle peer connectivity, but the web dashboard is how administrators configure peers, routes, access control policies, and DNS. Monitor it independently:

Add Monitor → HTTP.
URL: https://netbird.example.com.
Check interval: 60 seconds.
Expected status: 200.
Keyword: Netbird.
Label: Netbird Dashboard.
Click Save.

This monitor catches reverse proxy failures, frontend static asset serving errors, and CDN misconfigurations that wouldn't affect peer connectivity but would prevent administrators from managing the network.

Step 5: Monitor SSL Certificates

Netbird's management API and signal server both communicate over TLS. An expired SSL certificate causes:

Netbird clients to refuse to connect to the management API (certificate validation failure)
Signal server connections to fail, breaking hole-punching
The dashboard to become inaccessible to administrators

Add Monitor → SSL Certificate.
Domain: netbird.example.com.
Alert when expiry is within: 30 days.
Alert again: 14 days, 7 days, 3 days, 1 day.
Click Save.

Note: If your signal server runs on a different subdomain (e.g., signal.netbird.example.com), add a separate SSL monitor for that domain as well.

Step 6: Configure Alerting

In Vigilmon under Settings → Notifications, configure your alert channels:

| Monitor | Trigger | Action | |---|---|---| | Management API (/api/users) | Non-401 response | Check management server process; inspect database; review proxy logs | | Signal Server TCP | Connection refused or timeout | Restart signal server container; check port binding and firewall | | Web Dashboard | Non-200 or keyword missing | Check reverse proxy; inspect frontend container | | SSL certificate | < 30 days to expiry | Renew certificate; check ACME renewal configuration |

Alert after: 2 consecutive failures for HTTP monitors. 1 failure for the TCP signal server monitor — signal server failures affect peer connectivity immediately.

Common Netbird Failure Modes and What Vigilmon Catches

| Scenario | Vigilmon monitor | |---|---| | Management server crash | API returns non-401; new peers can't register; alert within 60 s | | Database down | Management API returns 500; peer registrations fail | | Signal server down | TCP monitor fires; peer hole-punching fails; relay fallback activates | | Reverse proxy misconfiguration | Dashboard and API monitors fire simultaneously | | SSL certificate expires | SSL monitor alerts at 30 days; clients refuse TLS connection | | Port conflict after server update | TCP monitor catches signal server port change | | Management DB migration failure | API returns 500 after upgrade; policy changes lost | | DNS misconfiguration | All monitors fire simultaneously; entire mesh loses control plane |

A mesh VPN control plane failure is one of the worst kinds of silent failure — existing WireGuard tunnels often stay up while the management server is down, making it easy to believe everything is fine until a peer reconnects or a new device tries to join. Vigilmon's external monitors catch Netbird management API failures, signal server TCP issues, and SSL certificate expiry before your next peer-reconnect event turns into an incident.

Start monitoring Netbird in under 5 minutes — register free at vigilmon.online.