Monitoring Argo CD with Vigilmon: Server Health API, UI Availability, gRPC Port & SSL Certificate Alerts

Argo CD is the GitOps continuous delivery engine that keeps your Kubernetes cluster in sync with your Git repositories — reconciling manifests, rolling out deployments, and surfacing sync drift to platform teams. When Argo CD's API server goes down, engineers lose the ability to trigger manual syncs, approve Applications, and inspect pod health through the UI. When the application controller crashes, Kubernetes resources drift from Git without anyone noticing. When the gRPC port is unreachable, the Argo CD CLI (argocd) stops working for every engineer in your organization. Vigilmon gives you external visibility into Argo CD's availability: the server health API, web UI, gRPC port, and SSL certificate expiry.

What You'll Build

A monitor on Argo CD's server health API to detect API server failures
A web UI availability check to confirm the dashboard is accessible
A TCP monitor on the gRPC port to catch CLI connectivity failures
SSL certificate monitoring for your Argo CD domain
An alerting setup that distinguishes API failures from UI rendering issues

Prerequisites

A running Argo CD 2.0+ instance exposed via an Ingress or LoadBalancer
HTTPS configured (e.g., https://argocd.example.com)
The gRPC port accessible externally (typically the same as HTTPS on port 443, or a separate port)
A free account at vigilmon.online

Step 1: Understand Argo CD's Health Endpoints

Argo CD exposes health endpoints through its API server. The primary liveness probe is at /healthz:

curl https://argocd.example.com/healthz
# Returns: ok

For a more detailed readiness check:

curl https://argocd.example.com/healthz?full=true

A healthy response returns HTTP 200 with body ok. If the API server process is crashed or unresponsive, you receive a connection error or a 502/503 from the ingress controller.

Argo CD also exposes metrics at /metrics on port 8082 (internal), but the /healthz endpoint on the main HTTPS port is what external monitoring should target.

Step 2: Create a Vigilmon HTTP Monitor for the Health API

Log in to Vigilmon → Add Monitor → HTTP.
URL: https://argocd.example.com/healthz.
Check interval: 60 seconds.
Response timeout: 15 seconds.
Expected status: 200.
Keyword: ok (the literal response body from a healthy Argo CD server).
Click Save.

This monitor catches:

Argo CD API server crashes or restarts
Kubernetes control plane failures that prevent Argo CD from starting
Memory pressure causing the API server pod to be OOM killed
Ingress controller failures that block access to the Argo CD service
Deployment failures after Argo CD version upgrades

Alert sensitivity: Set to trigger after 1 consecutive failure. When the Argo CD API server is down, no deployments can be manually synced, and engineers lose visibility into cluster drift.

Step 3: Monitor the Argo CD Web UI

The Argo CD web UI provides the dashboard that platform teams use daily to inspect application sync status, review resource trees, and trigger rollbacks. A UI failure is a separate failure mode from an API failure:

curl https://argocd.example.com
# Returns HTML with "Argo CD" in the title

Add Monitor → HTTP.
URL: https://argocd.example.com.
Check interval: 2 minutes.
Response timeout: 15 seconds.
Expected status: 200.
Keyword: Argo CD (appears in the web UI page title and content).
Label: Argo CD web UI.
Click Save.

When the web UI monitor fires but the /healthz monitor is green, the issue is typically in the Argo CD UI serving layer (the static assets or dex authentication front-end), not the API server itself. This pattern helps you isolate UI rendering issues from backend API failures.

Step 4: Monitor the gRPC Port via TCP Check

The Argo CD CLI (argocd) uses gRPC over HTTPS (port 443) or a separate gRPC port to communicate with the API server. TCP-level monitoring confirms that the port is accepting connections — essential for catching ingress TLS termination failures or load balancer misconfigurations that affect gRPC but not HTTP:

# Test gRPC connectivity at the TCP level
nc -zv argocd.example.com 443

Argo CD serves gRPC and HTTP/2 on the same port 443 when using an ingress that supports protocol detection (e.g., NGINX with nginx.ingress.kubernetes.io/backend-protocol: GRPC). Some deployments expose a separate gRPC port (typically 8080 behind a load balancer):

Add Monitor → TCP.
Host: argocd.example.com.
Port: 443 (or your dedicated gRPC port if different).
Check interval: 2 minutes.
Label: Argo CD gRPC port.
Click Save.

gRPC and HTTP/2: Many ingress controllers (especially AWS ALB) handle gRPC and HTTP/1.1 on different ports or with different annotation sets. If your engineers report argocd login failures while the HTTP UI still loads, the TCP check on the gRPC port will catch the difference.

Step 5: Monitor SSL Certificates

Argo CD's SSL certificate is critical for both browser access and CLI use. The argocd CLI validates certificates by default, and engineers who hit a certificate error typically work around it with --insecure flags — a practice that masks future security issues. Proactive certificate monitoring prevents this:

openssl s_client -connect argocd.example.com:443 2>/dev/null | openssl x509 -noout -dates

Add Monitor → SSL Certificate.
Domain: argocd.example.com.
Alert when expiry is within: 30 days.
Alert again: 14 days, 7 days, 3 days, 1 day.
Click Save.

Cert-manager and Let's Encrypt: Most Kubernetes-native Argo CD deployments use cert-manager to issue Let's Encrypt certificates automatically. Auto-renewal can fail silently if the ClusterIssuer hits rate limits, if the ACME HTTP-01 challenge is blocked by a NetworkPolicy, or if the Ingress annotation changes during an upgrade. A 30-day alert window gives you time to investigate cert-manager events before the certificate actually expires.

Step 6: Monitor Dex (SSO) If Configured

Many Argo CD deployments use Dex as an OpenID Connect provider for SSO with GitHub, GitLab, or LDAP. If Dex is down, users cannot log in even when the Argo CD API server is healthy:

curl https://argocd.example.com/api/dex/.well-known/openid-configuration
# Returns JSON with the OIDC configuration

Add Monitor → HTTP.
URL: https://argocd.example.com/api/dex/.well-known/openid-configuration.
Check interval: 5 minutes.
Response timeout: 10 seconds.
Expected status: 200.
Keyword: issuer (present in all valid OIDC configuration responses).
Label: Argo CD Dex SSO.
Click Save.

Step 7: Configure Alerting

In Vigilmon under Settings → Notifications, configure your alert channels:

| Monitor | Trigger | Action | |---|---|---| | /healthz | Non-200 or ok missing | Check Argo CD API server pod; run kubectl get pods -n argocd | | Web UI | Non-200 or keyword missing | UI rendering issue; check Argo CD deployment and ingress logs | | gRPC port TCP | Connection refused | gRPC port blocked; check ingress annotations and load balancer health | | SSL certificate | < 30 days to expiry | Renew certificate; check cert-manager events with kubectl get certificate -n argocd | | Dex SSO | Non-200 or keyword missing | SSO login broken; check Dex pod logs; engineers must use local admin account |

Alert after: 1 consecutive failure for the health API and gRPC port monitors. 2 consecutive failures for UI and SSO monitors.

Common Argo CD Failure Modes and What Vigilmon Catches

| Scenario | Vigilmon monitor | |---|---| | API server pod OOM killed | /healthz returns connection error; alert within 60 s | | Application controller crash | API server healthy but sync stops; requires separate app-level monitoring | | Ingress controller failure | All monitors fire simultaneously | | gRPC-HTTP/2 protocol mismatch after ingress upgrade | gRPC TCP monitor fails while HTTP UI monitor stays green | | SSL certificate expires | SSL monitor alerts at 30-day threshold; CLI and browser access fail | | Dex OIDC provider down | Health API green; Dex monitor fires; all SSO logins fail | | Kubernetes API server degraded | Argo CD cannot reconcile; API server may still respond on /healthz | | DNS misconfiguration | All monitors fire simultaneously | | Argo CD upgrade breaks UI | UI keyword monitor fires while health API stays green | | Repository server (git) failures | Sync operations fail; not caught by external monitoring — use Argo CD application health in the UI |

Argo CD is the control plane for GitOps deployments — when it fails, engineering teams lose the ability to deploy, roll back, and inspect drift across the entire Kubernetes fleet. Vigilmon gives you external visibility into every layer: the API server health, web UI availability, gRPC CLI port, SSO authentication, and SSL certificate expiry, so you know the moment something breaks and can restore GitOps operations before engineers notice they can't deploy.

Start monitoring Argo CD in under 5 minutes — register free at vigilmon.online.