tutorial

Uptime Monitoring for GraphQL APIs: A Complete Guide

"Learn how to monitor GraphQL APIs built with Apollo Server, Hasura, or any GraphQL implementation using Vigilmon — covering POST-based checks, introspection health probes, and alert configuration."

Uptime Monitoring for GraphQL APIs: A Complete Guide

GraphQL APIs have a quirk that trips up most generic uptime monitors: everything goes to a single endpoint via POST. There's no /users to GET or /products/123 to check. A naive monitor that just does a GET request to https://api.example.com/graphql will get a 400 Bad Request and conclude your API is down — even when it's perfectly healthy.

This guide explains how to monitor GraphQL APIs correctly using Vigilmon, covering introspection health checks, custom health endpoints, and Apollo/Hasura-specific approaches.


The Core Challenge: GraphQL Is POST-First

A standard GraphQL request looks like this:

curl -X POST https://api.example.com/graphql \
  -H 'Content-Type: application/json' \
  -d '{"query": "{ __typename }"}'
# {"data":{"__typename":"Query"}}

This returns 200 with valid data. But a plain GET to the same URL typically returns:

curl https://api.example.com/graphql
# 400 Bad Request: Must provide query string.

Your uptime monitor needs to send a proper GraphQL request body. Here are three strategies.


Strategy 1: Dedicated HTTP Health Endpoint (Recommended)

The cleanest solution is a separate, non-GraphQL /health endpoint on the same server. This is the recommended approach because it:

  • Decouples your monitoring probe from your GraphQL schema
  • Avoids adding monitoring noise to your operation logs
  • Works even if your schema evolves

Apollo Server (Node.js)

// server.js
const express = require('express');
const { ApolloServer } = require('@apollo/server');
const { expressMiddleware } = require('@apollo/server/express4');

const app = express();

// Health endpoint — checked by Vigilmon
app.get('/health', async (req, res) => {
  try {
    // Lightweight DB ping
    await db.raw('SELECT 1');
    res.json({ status: 'ok', service: 'graphql-api' });
  } catch (err) {
    res.status(503).json({ status: 'down', error: err.message });
  }
});

// GraphQL endpoint
app.use('/graphql', express.json(), expressMiddleware(server));

app.listen(4000);

Apollo Server with TypeScript

import express, { Request, Response } from 'express';
import { ApolloServer } from '@apollo/server';
import { expressMiddleware } from '@apollo/server/express4';

const app = express();

app.get('/health', async (_req: Request, res: Response) => {
  const checks = { database: false };

  try {
    await dataSource.query('SELECT 1');
    checks.database = true;
  } catch {}

  const healthy = Object.values(checks).every(Boolean);
  res.status(healthy ? 200 : 503).json({
    status: healthy ? 'ok' : 'degraded',
    checks,
  });
});

app.use('/graphql', express.json(), expressMiddleware(server));

Hasura

Hasura exposes /healthz out of the box:

curl https://your-hasura-instance.com/healthz
# OK

Hasura returns 200 OK (body: OK) when healthy and 500 when the metadata database is unreachable. Point Vigilmon directly at https://your-hasura-instance.com/healthz.


Strategy 2: GraphQL Introspection Ping via POST Monitor

If you can't add a separate health endpoint (e.g. a third-party API gateway), use Vigilmon's HTTP POST monitoring with an introspection query. The { __typename } query is the lightest valid operation:

{ "query": "{ __typename }" }

In Vigilmon:

  1. Create a new monitor, select type HTTP
  2. Set Method to POST
  3. Set URL to https://api.example.com/graphql
  4. Add header: Content-Type: application/json
  5. Set Request body to: {"query":"{ __typename }"}
  6. Under Keyword check, add "data" — this ensures the response body contains a valid GraphQL data envelope, not an error object
  7. Save

This verifies the GraphQL layer is actually parsing and resolving queries, not just serving a 200 from a gateway that's already dropped the backend.

Caution about introspection in production: many teams disable introspection queries in production for security reasons. If you've done this, use a lightweight named query instead:

{ "query": "query HealthPing { __typename }" }

Or even better, add a dedicated health query to your schema:

type Query {
  health: HealthStatus!
}

type HealthStatus {
  ok: Boolean!
  version: String!
}
// resolver
Query: {
  health: async () => ({
    ok: true,
    version: process.env.APP_VERSION ?? '0.0.0',
  }),
},

Then monitor with:

{ "query": "{ health { ok version } }" }

And keyword check on "ok":true.


Strategy 3: Apollo Server Built-in Health Check

Apollo Server 4.x ships with a built-in health check route:

const server = new ApolloServer({ typeDefs, resolvers });
await server.start();

// Apollo exposes /.well-known/apollo/server-health by default
// You can also configure a custom healthCheckPath:
const server = new ApolloServer({
  typeDefs,
  resolvers,
  // healthCheckPath: '/health', // Apollo Server 3.x option
});

For Apollo Server 4 with Express:

// The built-in landing page serves GET /, but for a proper health check:
app.get('/.well-known/apollo/server-health', (req, res) => {
  res.json({ status: 'pass' });
});

Point Vigilmon at https://api.example.com/.well-known/apollo/server-health with a keyword check for "pass".


Step 3: Create the Vigilmon Monitor

For a /health or /healthz endpoint:

  1. Log in to VigilmonMonitors → New Monitor
  2. Type: HTTP
  3. Method: GET
  4. URL: https://api.example.com/health
  5. Interval: 1 minute
  6. Keyword check: "status":"ok" (or OK for Hasura)
  7. Save

For a POST-based GraphQL probe:

  1. Type: HTTP
  2. Method: POST
  3. URL: https://api.example.com/graphql
  4. Headers: Content-Type: application/json
  5. Body: {"query":"{ __typename }"}
  6. Keyword check: "data"
  7. Save

Step 4: Handle Vigilmon Webhooks in Your GraphQL Server

When a monitor fires, Vigilmon can call your server's webhook endpoint. Add a mutation or a REST handler:

REST handler (recommended — no auth needed for webhooks)

// Express route alongside your GraphQL endpoint
app.post('/webhooks/vigilmon', express.json(), (req, res) => {
  const { event, monitor } = req.body;

  if (event === 'down') {
    console.error(`[Vigilmon] DOWN: ${monitor.name}`);
    // Notify Slack, PagerDuty, create incident ticket
  } else if (event === 'up') {
    console.info(`[Vigilmon] RECOVERED: ${monitor.name}`);
  }

  res.sendStatus(200);
});

Configure this URL in Vigilmon under Alert Channels → Webhook.


Step 5: Subscription and WebSocket Monitoring

If you expose GraphQL subscriptions over WebSocket, your HTTP health check won't cover the WebSocket transport. Add a secondary monitor or a heartbeat:

// Send a heartbeat from the subscription server every 60 seconds
const HEARTBEAT_URL = process.env.VIGILMON_HEARTBEAT_URL;

setInterval(async () => {
  if (!HEARTBEAT_URL) return;
  try {
    await fetch(HEARTBEAT_URL);
  } catch (e) {
    console.warn('[Vigilmon] Heartbeat ping failed:', e.message);
  }
}, 60_000);

Create a Heartbeat monitor in Vigilmon with a 2-minute expected interval.


Step 6: Monitoring Federated GraphQL (Apollo Federation)

For a federated supergraph, monitor at multiple levels:

| Layer | What to monitor | Monitor type | |-------|-----------------|--------------| | Router / Gateway | https://gateway.example.com/health | HTTP GET | | Auth subgraph | https://auth-subgraph.example.com/health | HTTP GET | | Products subgraph | https://products-subgraph.example.com/health | HTTP GET |

If a subgraph goes down, the gateway may still respond 200 but return partial errors. Monitoring each subgraph independently lets you pinpoint which service failed.

Add a keyword check on "errors" absence or use a query that touches the affected subgraph's data:

{ "query": "{ products { id } }" }

And keyword check that "data" is present and "errors" is absent.


Step 7: Configure Alert Escalation

In Monitors → (your monitor) → Alert Channels, configure:

  • Immediate: Email or Slack when the monitor first fails
  • Escalation (10 min): Page the on-call engineer if unacknowledged
  • Recovery alert: Notify when the API comes back up so you can close the incident

For critical GraphQL APIs serving paying customers, set the interval to 1 minute and escalation to 5 minutes.


Common Pitfalls

1. Monitoring the wrong thing Don't monitor a CDN or gateway cache — monitor the actual origin. A cached 200 from a CDN doesn't mean your GraphQL server is healthy.

2. Ignoring partial errors GraphQL returns 200 even for { "errors": [...] } responses. Use a keyword check to verify "data" is present, not just the HTTP status code.

3. No authentication on health endpoints Your /health endpoint should be accessible without auth — monitoring probes can't log in. Use a separate path that's explicitly public.

4. Overly heavy health checks Don't run schema validation or resolver benchmarks inside your health endpoint. A simple SELECT 1 or ping is enough. The health check itself should complete in under 50ms.


Summary

| Strategy | Best for | |----------|----------| | Dedicated /health GET endpoint | Any GraphQL server you control | | Introspection POST probe | Third-party APIs or managed services | | Hasura /healthz | Hasura Cloud or self-hosted Hasura | | Apollo built-in health | Apollo Server 4+ |

With Vigilmon monitoring your GraphQL API, you'll know within 60 seconds when resolvers stop working, databases disconnect, or the server crashes — before users start seeing broken queries.


Further Reading

Monitor your app with Vigilmon

Free plan — 5 monitors, no credit card required. Up and running in 60 seconds.

Start free →