Monitoring Istio Service Mesh with Vigilmon: Control Plane Health, Kiali Availability & SSL Certificate Alerts

Istio is the most widely deployed service mesh for Kubernetes — it handles mutual TLS, traffic management, observability, and policy enforcement across microservices. When Istio's control plane (istiod) goes down, sidecars lose the ability to receive configuration updates, certificate rotations stall, and new deployments may fail to start. When the ingress gateway fails, external traffic stops reaching your services entirely. Vigilmon gives you external visibility into Istio's health: the istiod readiness endpoint, the Kiali observability dashboard, the ingress gateway TCP port, and SSL certificates — so you catch control plane failures before they cascade into service disruptions.

What You'll Build

A monitor on istiod's health endpoint (/healthz/ready)
A monitor on the Kiali dashboard availability
A TCP port check on the Istio ingress gateway
SSL certificate monitoring for your ingress domain
Alerting that distinguishes control plane failures from data plane outages

Prerequisites

A running Istio installation on Kubernetes with istiod and ingress gateway accessible
Istiod health endpoint reachable at a public or network-accessible URL (e.g., https://istio.example.com)
A free account at vigilmon.online

Step 1: Verify Istio's Health Endpoints

Istiod exposes a health endpoint on port 15021:

# Istiod readiness check (in-cluster or port-forwarded)
curl http://istiod.istio-system.svc.cluster.local:15021/healthz/ready

# Via port-forward for external testing
kubectl port-forward -n istio-system svc/istiod 15021:15021
curl http://localhost:15021/healthz/ready

# Ingress gateway status
kubectl get svc istio-ingressgateway -n istio-system

A healthy istiod returns HTTP 200 OK on /healthz/ready. If you expose istiod's health port externally via a LoadBalancer or ingress, Vigilmon can monitor it directly.

Step 2: Create a Vigilmon Monitor for Istiod Health

Expose the istiod health port (15021) externally or use a dedicated health-check URL, then:

Log in to Vigilmon → Add Monitor → HTTP.
URL: https://istio-health.example.com/healthz/ready (your exposed istiod health endpoint).
Check interval: 60 seconds.
Response timeout: 10 seconds.
Expected status: 200.
Label: Istiod Control Plane.
Click Save.

This is your primary control plane signal. When istiod fails:

Sidecar proxies stop receiving xDS configuration updates
mTLS certificate rotation stalls — certificates expire after their TTL
New pod deployments may hang waiting for sidecar injection
Traffic management rules (VirtualServices, DestinationRules) stop propagating

Step 3: Monitor the Kiali Dashboard

Kiali is Istio's observability UI — it shows service topology, traffic flow, and configuration validation. Monitoring Kiali availability confirms the observability stack is functional:

Add Monitor → HTTP.
URL: https://kiali.example.com (your Kiali ingress URL).
Check interval: 5 minutes.
Response timeout: 15 seconds.
Expected status: 200.
Label: Kiali Dashboard.
Click Save.

Kiali's health endpoint is also available at /kiali/healthz if you want a more direct liveness probe rather than the UI page.

Step 4: Add a TCP Monitor for the Ingress Gateway

The Istio ingress gateway is the entry point for all external traffic. Add a TCP check to confirm it's accepting connections:

Add Monitor → TCP.
Host: ingress.example.com (your ingress gateway's external IP or hostname).
Port: 443 (HTTPS) or 80 (HTTP).
Check interval: 60 seconds.
Label: Istio Ingress Gateway.
Click Save.

If this monitor fires while istiod is healthy, the data plane has a failure — the gateway pod has crashed or the LoadBalancer IP is unreachable. External users cannot reach any services in your mesh.

Step 5: Monitor SSL Certificates

Istio uses mTLS internally, but your ingress gateway serves traffic to external clients with a standard TLS certificate. Monitor it to prevent expiry:

Add Monitor → SSL Certificate.
Domain: ingress.example.com.
Alert when expiry is within: 30 days.
Alert again: 14 days, 7 days, 3 days, 1 day.
Click Save.

If you use separate domains for Kiali or other Istio components exposed externally, add SSL monitors for those domains as well.

Step 6: Configure Alerting

In Vigilmon under Settings → Notifications, configure your alert channels:

| Monitor | Trigger | Action | |---|---|---| | Istiod health (/healthz/ready) | Non-200 | Control plane down; sidecar config updates stopped; check istiod pod logs | | Kiali dashboard | Non-200 | Observability UI unavailable; check Kiali deployment | | Ingress gateway TCP | TCP fail | No external traffic can reach services; check gateway pod and LoadBalancer | | SSL certificate | < 30 days to expiry | Renew ingress certificate; update gateway TLS secret |

Alert after: 1 consecutive failure for istiod and ingress gateway monitors — these are critical path. 2 failures for Kiali to allow for restarts.

Common Istio Failure Modes and What Vigilmon Catches

| Scenario | Vigilmon monitor | |---|---| | Istiod pod crash / OOM | Health endpoint unreachable; alert within 60 s | | Istiod CrashLoopBackOff | Health endpoint returns non-200 intermittently | | Ingress gateway pod crash | TCP monitor fires; istiod stays green | | LoadBalancer IP changes / unassigned | TCP monitor fires; DNS resolves to wrong IP | | Kiali pod evicted | Dashboard monitor fires | | TLS certificate expires on ingress | SSL monitor alerts at 30-day threshold | | Namespace-level network policy blocks health port | Health endpoint monitor fires |

Istio's control plane health is invisible to end users until something fails — sidecars continue serving traffic on cached config until that config becomes stale. Vigilmon's external monitoring of the istiod health endpoint, Kiali dashboard, ingress gateway, and SSL certificates gives you the early warning system you need before stale configuration, stalled certificate rotation, or a downed gateway causes a production incident.

Start monitoring Istio in under 5 minutes — register free at vigilmon.online.