tutorial

Monitoring Grafana with Vigilmon: Health Endpoint, Login Page Uptime, Datasource Proxy & SSL Certificate Alerts

How to monitor Grafana with Vigilmon — health API endpoint checks, login page availability, datasource proxy uptime, and SSL certificate monitoring.

Grafana is where your team watches the health of everything else — dashboards surface metrics from Prometheus, Loki, InfluxDB, CloudWatch, and dozens of other sources. When Grafana itself goes down, you lose visibility into your entire stack at the worst possible moment: during an incident. When the datasource proxy becomes unreachable, dashboards go blank even though Grafana appears to load. When the SSL certificate expires, SAML and OAuth login breaks before the browser warning appears. Vigilmon gives you external monitoring of Grafana's own health: the health API, login page, datasource proxy, and SSL certificate — so you have visibility into your observability tool.

What You'll Build

  • A monitor on Grafana's /api/health health endpoint
  • A login page availability check to confirm users can reach Grafana
  • A datasource proxy check to verify Grafana can proxy metric queries
  • SSL certificate monitoring for your Grafana domain
  • An alerting setup that separates API failures from UI and datasource issues

Prerequisites

  • A running Grafana 8.0+ instance accessible over HTTPS
  • Your Grafana domain with a valid SSL certificate (e.g., https://grafana.example.com)
  • At least one configured datasource (Prometheus, InfluxDB, etc.)
  • A free account at vigilmon.online

Step 1: Verify Grafana's Health Endpoint

Grafana exposes a built-in health endpoint at /api/health that returns JSON with the current database state and Grafana version:

curl https://grafana.example.com/api/health

A healthy Grafana instance returns:

{
  "commit": "abc1234",
  "database": "ok",
  "version": "10.2.3"
}

The database field reflects the state of Grafana's embedded SQLite or external PostgreSQL/MySQL database. If the database connection fails, Grafana returns {"database": "failing"} — still an HTTP 200, but with degraded state. Monitor both the status code and the database field.

Grafana with an external database: If you run Grafana with PostgreSQL or MySQL for high availability, a "database": "failing" response means Grafana cannot read dashboard definitions or user sessions. Alert on this keyword explicitly.


Step 2: Create a Vigilmon HTTP Monitor for the Health Endpoint

  1. Log in to VigilmonAdd Monitor → HTTP.
  2. URL: https://grafana.example.com/api/health.
  3. Check interval: 60 seconds.
  4. Response timeout: 10 seconds.
  5. Expected status: 200.
  6. Keyword: "database":"ok" (matches the JSON field confirming database connectivity).
  7. Click Save.

This monitor catches:

  • Grafana process crashes or OOM kills
  • Database connectivity failures (SQLite corruption, PostgreSQL connection pool exhaustion)
  • Failed Grafana upgrades that leave the API unresponsive
  • Reverse proxy failures that return 502/503 instead of Grafana responses
  • Memory pressure causing the Grafana container to restart

Alert sensitivity: Set to trigger after 1 consecutive failure. When Grafana is down, your team loses all dashboard visibility — exactly when you need it most during incidents.


Step 3: Monitor the Grafana Login Page

The Grafana login page (/login) is the entry point for users. It can fail independently of the health API — for example, if the static assets (JavaScript bundles) fail to load, if a reverse proxy strips security headers, or if Grafana's frontend plugin system crashes after an update:

curl https://grafana.example.com/login
# Returns HTML with "Grafana" in the title
  1. Add Monitor → HTTP.
  2. URL: https://grafana.example.com/login.
  3. Check interval: 2 minutes.
  4. Response timeout: 15 seconds.
  5. Expected status: 200.
  6. Keyword: Grafana (appears in the page title and content).
  7. Label: Grafana login page.
  8. Click Save.

When the login page monitor fires but /api/health stays green, the issue is typically in frontend serving: a broken JavaScript bundle after a plugin update, a CDN misconfiguration, or a reverse proxy that handles API routes correctly but breaks static asset serving. This separation directs your investigation immediately.


Step 4: Monitor the Grafana Datasource Proxy

Grafana proxies all metric queries from your browser through its backend to datasources like Prometheus, InfluxDB, and CloudWatch. This proxy is critical — if it fails, dashboards go blank even when Grafana's UI loads perfectly. The proxy endpoint is at /api/datasources/proxy/:id/.

To check that the datasource proxy is reachable without authentication, monitor the unauthenticated access response:

curl https://grafana.example.com/api/datasources
# Returns 401 Unauthorized when Grafana is up and the proxy layer is functional
  1. Add Monitor → HTTP.
  2. URL: https://grafana.example.com/api/datasources.
  3. Check interval: 5 minutes.
  4. Response timeout: 10 seconds.
  5. Expected status: 401 (unauthenticated requests confirm the endpoint is alive and routing correctly).
  6. Label: Grafana datasource API.
  7. Click Save.

A 401 response is the correct health signal here — it proves the datasource API layer is live and enforcing authentication. A connection error, 502, or 404 means the proxy layer is down. If Grafana uses anonymous access (not recommended), change the expected status to 200 and add a keyword from the JSON response.


Step 5: Monitor SSL Certificates

Grafana commonly integrates with OAuth2 (GitHub, Google, Azure AD), SAML, and LDAP for authentication. A certificate expiry breaks OAuth2 redirect flows and SAML assertions before your browser shows a warning — identity providers reject SSL handshakes immediately. Monitor certificates with generous advance warning:

openssl s_client -connect grafana.example.com:443 2>/dev/null | openssl x509 -noout -dates
  1. Add Monitor → SSL Certificate.
  2. Domain: grafana.example.com.
  3. Alert when expiry is within: 30 days.
  4. Alert again: 14 days, 7 days, 3 days, 1 day.
  5. Click Save.

Grafana with SAML or OAuth2: Certificate expiry manifests as "redirect_uri_mismatch" or "invalid certificate" errors during login — errors that are confusing to debug under time pressure. A 30-day alert window ensures you renew before any authentication path breaks.


Step 6: Configure Alerting

In Vigilmon under Settings → Notifications, configure your alert channels:

| Monitor | Trigger | Action | |---|---|---| | /api/health | Non-200 or "database":"ok" missing | Restart Grafana; check database connectivity; inspect grafana.log | | Login page | Non-200 or Grafana missing | Frontend issue; check static assets; inspect reverse proxy logs | | Datasource API | Non-401/200 | Proxy layer down; Grafana may be partially functional; check API server | | SSL certificate | < 30 days to expiry | Renew certificate; verify ACME automation or manual renewal schedule |

Alert after: 1 consecutive failure for the health endpoint. 2 consecutive failures for the login page and datasource API monitors.


Common Grafana Failure Modes and What Vigilmon Catches

| Scenario | Vigilmon monitor | |---|---| | Grafana process OOM killed | /api/health unreachable; alert within 60 s | | Database connection exhausted (PostgreSQL) | /api/health returns "database":"failing" keyword mismatch | | JavaScript bundle corrupted after plugin update | Login page monitor fires; /api/health stays green | | Reverse proxy strips required CORS headers | Datasource API monitor fires; login page may still load | | SSL certificate expires | SSL monitor alerts at 30-day threshold; OAuth2/SAML login breaks | | Grafana upgrade breaks UI rendering | Login page keyword monitor fires; health API stays green | | Static asset CDN fails | Login page monitor fires; API monitors stay green | | DNS misconfiguration | All monitors fire simultaneously | | Grafana behind auth proxy (no anonymous access) | /login returns 200; API calls return 401 — both expected | | Plugin crash causing partial dashboard failure | Not caught by external monitoring — use Grafana's own alerting for panel-level errors |


Grafana is the lens through which your team observes system health — when it fails, you lose visibility at exactly the moment you most need it. When the datasource proxy breaks, dashboards silently go blank. When SSL expires, authentication flows break before anyone logs a clear error. Vigilmon gives you external monitoring of Grafana's own health: the API, login page, datasource proxy, and SSL certificate, so you're alerted before your observability tool becomes an unobserved failure.

Start monitoring Grafana in under 5 minutes — register free at vigilmon.online.

Monitor your app with Vigilmon

Free plan — 5 monitors, no credit card required. Up and running in 60 seconds.

Start free →