Monitoring Linkerd with Vigilmon: Control Plane Health, Web Dashboard, Proxy Port TCP & SSL Certificate Alerts

Linkerd is the lightweight Kubernetes service mesh that injects sidecar proxies into every pod to handle mTLS, retries, load balancing, and observability without application changes. When Linkerd's control plane goes down, the mesh enters a degraded state: existing proxy connections continue using cached configuration, but new pods fail to get their certificates injected, new services cannot register, and mutual TLS breaks for any pod that restarts. When the identity component fails, pods cannot rotate their mTLS certificates and communication eventually fails silently. Vigilmon gives you external visibility into Linkerd's control plane health: the identity, destination, and proxy-injector components, the web dashboard, the proxy TCP port, and SSL certificate expiry.

What You'll Build

Monitors on Linkerd's control plane component health endpoints (identity, destination, proxy-injector)
A web dashboard availability check to confirm the Linkerd viz extension is accessible
A TCP monitor on the proxy port to verify sidecar connectivity
SSL certificate monitoring for your Linkerd dashboard domain
An alerting setup that isolates component-level failures from cluster-wide issues

Prerequisites

A running Linkerd 2.x (stable or edge) cluster with control plane exposed via Ingress or port-forward
Linkerd viz extension installed and accessible over HTTPS (e.g., https://linkerd.example.com)
A free account at vigilmon.online

Step 1: Understand Linkerd's Control Plane Health Architecture

Linkerd's control plane consists of three core components, each with its own health endpoints:

linkerd-identity — issues mTLS certificates to sidecar proxies. If this fails, new pods cannot get certificates and mTLS breaks on restart.
linkerd-destination — provides service discovery and policy to proxies. If this fails, proxies cannot route traffic to new endpoints.
linkerd-proxy-injector — injects the Linkerd proxy sidecar into new pods. If this fails, new deployments start without the mesh.

Each component exposes a readiness endpoint at /ready:

# From within the cluster (via kubectl port-forward or an exposed service)
curl http://linkerd-identity.linkerd.svc.cluster.local:9990/ready
curl http://linkerd-destination.linkerd.svc.cluster.local:9990/ready
curl http://linkerd-proxy-injector.linkerd.svc.cluster.local:9995/ready

A ready component returns 200 OK with body ok. Not-ready components return 500 or a connection error.

Exposing control plane health externally: To monitor these endpoints with Vigilmon, expose them via a Kubernetes Service of type LoadBalancer or route them through your Ingress. Alternatively, if you already expose the Linkerd dashboard via Ingress, add path-based routes for the health endpoints.

Step 2: Create Vigilmon Monitors for Control Plane Components

linkerd-identity Health Monitor

Log in to Vigilmon → Add Monitor → HTTP.
URL: https://linkerd.example.com/identity/ready (adjust path per your Ingress routing).
Check interval: 60 seconds.
Response timeout: 10 seconds.
Expected status: 200.
Keyword: ok.
Label: Linkerd identity.
Click Save.

linkerd-destination Health Monitor

Add Monitor → HTTP.
URL: https://linkerd.example.com/destination/ready.
Check interval: 60 seconds.
Response timeout: 10 seconds.
Expected status: 200.
Keyword: ok.
Label: Linkerd destination.
Click Save.

linkerd-proxy-injector Health Monitor

Add Monitor → HTTP.
URL: https://linkerd.example.com/proxy-injector/ready.
Check interval: 2 minutes.
Response timeout: 10 seconds.
Expected status: 200.
Keyword: ok.
Label: Linkerd proxy-injector.
Click Save.

These monitors catch:

Control plane pod OOM kills or crashes
Kubernetes API server failures that prevent Linkerd components from functioning
Configuration errors from linkerd upgrade operations
Certificate rotation failures in the linkerd-identity component
CRD webhook failures in the proxy-injector

Alert sensitivity: Set to trigger after 1 consecutive failure for linkerd-identity. Use 2 consecutive failures for destination and proxy-injector, as brief restarts during upgrades are expected.

Step 3: Monitor the Linkerd Web Dashboard

The Linkerd viz extension provides the web dashboard showing mesh-wide golden metrics (success rate, latency, RPS), service topology, and tap inspection. When the dashboard is unavailable, platform teams lose operational visibility into service mesh health:

curl https://linkerd.example.com
# Returns HTML with "Linkerd" in the title

Add Monitor → HTTP.
URL: https://linkerd.example.com.
Check interval: 2 minutes.
Response timeout: 15 seconds.
Expected status: 200.
Keyword: Linkerd (appears in the viz dashboard page title).
Label: Linkerd web dashboard.
Click Save.

Linkerd viz authentication: The Linkerd dashboard defaults to requiring linkerd viz dashboard (which sets up a localhost port-forward with kubectl authentication). If you expose it via Ingress with an auth proxy, the keyword monitor still works — Linkerd appears in the HTML even behind authentication barriers that return the login page.

Step 4: Monitor the Proxy Port via TCP Check

Every Linkerd-injected pod runs a sidecar proxy (the Linkerd2-proxy, written in Rust) that listens on several ports:

4143 — outbound proxy port (applications route egress traffic here via iptables)
4191 — control port (for policy updates and health checks from the control plane)
4140 — inbound proxy port (handles incoming mTLS-wrapped requests)

From an external monitoring perspective, check that port 4191 (the admin/health port) is reachable on your Linkerd-injected services. If you have a specific service exposed externally, monitor its proxy admin port:

nc -zv my-service.example.com 4191

For the Linkerd control plane's own proxy, monitor the admin port on the exposed control plane endpoint:

Add Monitor → TCP.
Host: linkerd.example.com (your exposed Linkerd endpoint).
Port: 4191.
Check interval: 2 minutes.
Label: Linkerd proxy admin port.
Click Save.

Internal vs. external: The proxy ports are typically not exposed externally. If your Linkerd setup is purely internal, skip this TCP check or substitute a TCP check on port 443 of the Linkerd dashboard Ingress to confirm TLS termination is functioning.

Step 5: Monitor SSL Certificates

Linkerd uses TLS in two separate layers: the external TLS on the viz dashboard Ingress (managed by cert-manager or your ingress controller), and internal mTLS between proxies (managed by the linkerd-identity component itself). Monitor the external certificate on your Linkerd domain:

openssl s_client -connect linkerd.example.com:443 2>/dev/null | openssl x509 -noout -dates

Add Monitor → SSL Certificate.
Domain: linkerd.example.com.
Alert when expiry is within: 30 days.
Alert again: 14 days, 7 days, 3 days, 1 day.
Click Save.

Linkerd's trust anchor certificate: The most critical Linkerd certificate is the trust anchor (root CA), which has a default 10-year validity but must be rotated before it expires. This is an internal certificate and cannot be monitored externally by Vigilmon — use linkerd check in a scheduled CronJob and alert on failure to track trust anchor and issuer certificate validity.

Step 6: Configure Alerting

In Vigilmon under Settings → Notifications, configure your alert channels:

| Monitor | Trigger | Action | |---|---|---| | linkerd-identity | Non-200 or ok missing | Pods cannot get new mTLS certs; run linkerd check; check identity pod logs | | linkerd-destination | Non-200 or ok missing | Service discovery failing; new routes won't resolve; check destination pod | | linkerd-proxy-injector | Non-200 or ok missing | New pods start without mesh injection; check webhook configuration | | Web dashboard | Non-200 or Linkerd missing | Viz extension issue; check linkerd-viz namespace pods | | Proxy port TCP | Connection refused | Proxy connectivity issue; check iptables rules and proxy pod status | | SSL certificate | < 30 days to expiry | Renew Ingress certificate; check cert-manager events |

Alert after: 1 consecutive failure for linkerd-identity. 2 consecutive failures for all other monitors.

Common Linkerd Failure Modes and What Vigilmon Catches

| Scenario | Vigilmon monitor | |---|---| | linkerd-identity pod OOM killed | Identity health monitor fires; new mTLS certificates can't be issued | | linkerd-destination pod crashes | Destination health monitor fires; existing connections use stale routing | | Proxy-injector webhook times out | Proxy-injector monitor fires; new deployments start uninjected | | Trust anchor certificate expired | Not caught externally — use linkerd check in-cluster CronJob | | Linkerd viz extension pods crash | Dashboard monitor fires; control plane components may be healthy | | SSL certificate on Ingress expires | SSL monitor alerts at 30-day threshold; external dashboard access breaks | | Kubernetes API server throttling | All control plane monitors fire as components lose API connectivity | | DNS misconfiguration | All monitors fire simultaneously | | Linkerd upgrade rollout breaks destination | Destination health monitor catches mid-upgrade failures | | iptables rules cleared on node restart | Proxy traffic routing breaks; not caught by external monitoring |

Linkerd's control plane is what keeps your service mesh running — when the identity component fails, mTLS certificates stop being issued. When the destination component fails, service discovery degrades. When the proxy-injector fails, new deployments start outside the mesh. These failures are silent from the outside until applications start timing out. Vigilmon gives you external monitoring of every critical path: per-component readiness, the web dashboard, proxy port TCP connectivity, and SSL certificate expiry, so you catch Linkerd control plane problems before they cascade into application failures.

Start monitoring Linkerd in under 5 minutes — register free at vigilmon.online.