tutorial

Hasura GraphQL Engine Monitoring with Vigilmon

"Learn how to monitor Hasura GraphQL Engine using Vigilmon — covering the /healthz endpoint, metadata status, ping and keyword monitors, and alerting on engine failures."

Hasura GraphQL Engine Monitoring with Vigilmon

Hasura is an open-source GraphQL engine that auto-generates a GraphQL API over your Postgres (or other supported) database. It's popular for rapidly building data-driven backends and for adding GraphQL capabilities to existing databases. When Hasura goes down, your entire GraphQL API layer disappears — every client querying your data fails simultaneously.

This guide covers how to monitor Hasura with Vigilmon, including built-in health endpoints, metadata status checks, and alerting configuration.


Hasura's Built-in Health Endpoints

Hasura ships with dedicated health endpoints you should monitor. Unlike a generic GraphQL API where you might craft a health query, Hasura exposes these out of the box:

/healthz — Primary Health Check

GET https://your-hasura-instance.com/healthz

A healthy Hasura instance responds with:

HTTP/1.1 200 OK

OK

An unhealthy instance returns a non-200 status. This endpoint checks that the Hasura process is running and the database connection is active.

/v1/version — Version and Metadata

GET https://your-hasura-instance.com/v1/version

Response:

{
  "version": "v2.36.0",
  "is_metadata_inconsistent": false
}

The is_metadata_inconsistent field is critical. When it is true, Hasura has loaded but its metadata (table relationships, permissions, event triggers) is in a broken state. Queries may partially succeed or silently return wrong data. This is harder to catch than a full outage and easy to miss without a keyword monitor.


Monitoring the Hasura Health Endpoint

Step 1: Create a Ping Monitor

  1. Log in to VigilmonMonitors → New Monitor
  2. Type: HTTP
  3. Method: GET
  4. URL: https://your-hasura-instance.com/healthz
  5. Interval: 1 minute
  6. Expected status: 200
  7. Keyword check: OK

This gives you 60-second detection of Hasura process failures and database connectivity issues.

Step 2: Metadata Consistency Monitor

Create a second monitor targeting /v1/version:

  1. Type: HTTP
  2. Method: GET
  3. URL: https://your-hasura-instance.com/v1/version
  4. Interval: 5 minutes
  5. Keyword check: "is_metadata_inconsistent":false

When Hasura's metadata becomes inconsistent (after a bad migration, a dropped table, or a permissions error), this monitor fires — even if /healthz still returns 200.


Monitoring Hasura on Docker or Kubernetes

If you're running Hasura via Docker Compose, the health endpoint is reachable on the mapped port:

# docker-compose.yml excerpt
services:
  hasura:
    image: hasura/graphql-engine:v2.36.0
    ports:
      - "8080:8080"
    environment:
      HASURA_GRAPHQL_DATABASE_URL: postgres://...
      HASURA_GRAPHQL_ENABLE_CONSOLE: "true"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/healthz"]
      interval: 30s
      timeout: 10s
      retries: 3

In Vigilmon, point to your public domain or load balancer, not the internal Docker host.

On Kubernetes, the standard liveness/readiness probes mirror what Vigilmon checks externally:

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 30
readinessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

Vigilmon provides an external perspective that your cluster's internal probes cannot — it catches networking failures, ingress misconfigurations, and TLS certificate issues.


Monitoring the Hasura Console

If you expose the Hasura console for your team, monitor that separately:

GET https://your-hasura-instance.com/console

The console serves a React app. A 200 response with keyword Hasura in the HTML confirms the static assets are being served. This is useful for deployments where the API is up but a CDN or proxy is blocking console access.


Webhook Alerting on Hasura Failures

When Hasura goes down, you lose your entire GraphQL data layer. Configure Vigilmon to fire an alert immediately.

Slack Alerting

In Vigilmon → Alert Channels → Slack, add your Slack webhook URL. Configure the alert to trigger after 1 failed check — Hasura failures cascade quickly to user-facing errors and should not wait for multiple confirmation checks.

Webhook Integration

For custom incident handling:

// Example webhook handler (Node.js/Express)
app.post('/webhooks/vigilmon', express.json(), (req, res) => {
  const { event, monitor } = req.body;

  if (event === 'down') {
    console.error(`[Vigilmon] Hasura DOWN: ${monitor.url}`);
    // Page on-call, create incident ticket, notify team
    notifyPagerDuty({
      summary: `Hasura GraphQL Engine is DOWN`,
      severity: 'critical',
      source: monitor.url,
    });
  } else if (event === 'up') {
    console.info(`[Vigilmon] Hasura RECOVERED: ${monitor.url}`);
    resolvePagerDutyIncident();
  }

  res.json({ received: true });
});

Register this URL under Alert Channels → Webhook in Vigilmon.


Monitoring Hasura Cloud

If you're on Hasura Cloud rather than self-hosted, you still monitor the same endpoints — Hasura Cloud instances expose /healthz and /v1/version identically. Your instance URL is the hasura.app subdomain assigned to your project.

One additional consideration for Hasura Cloud: monitor your Postgres database host separately. Hasura Cloud can report healthy while your underlying database is degraded. Add a second Vigilmon monitor for your database's connection pooler or health endpoint if your provider exposes one.


Alerting Strategy

| Scenario | Monitor | Alert Threshold | |----------|---------|-----------------| | Hasura process down | /healthz returns non-200 | 1 failed check | | Database connection lost | /healthz returns non-200 | 1 failed check | | Metadata inconsistent | /v1/version keyword missing | 2 failed checks | | Console inaccessible | /console returns non-200 | 3 failed checks | | Response latency spike | /healthz > 2000ms | Warning alert |


Summary

| Component | Endpoint | Vigilmon Check | |-----------|----------|----------------| | Process + DB health | /healthz | Status 200 + keyword OK | | Metadata consistency | /v1/version | Keyword "is_metadata_inconsistent":false | | Console availability | /console | Status 200 | | Event trigger latency | Custom endpoint | Response time threshold |

Hasura is a high-leverage component — one instance often serves many frontend clients. With Vigilmon's 1-minute check interval and instant alerting, you'll detect engine failures before users start reporting missing data.


Further Reading

Monitor your app with Vigilmon

Free plan — 5 monitors, no credit card required. Up and running in 60 seconds.

Start free →