Uptime Monitoring for GraphQL APIs: A Complete Guide

GraphQL APIs have a quirk that trips up most generic uptime monitors: everything goes to a single endpoint via POST. There's no /users to GET or /products/123 to check. A naive monitor that just does a GET request to https://api.example.com/graphql will get a 400 Bad Request and conclude your API is down — even when it's perfectly healthy.

This guide explains how to monitor GraphQL APIs correctly using Vigilmon, covering introspection health checks, custom health endpoints, and Apollo/Hasura-specific approaches.

The Core Challenge: GraphQL Is POST-First

A standard GraphQL request looks like this:

curl -X POST https://api.example.com/graphql \
  -H 'Content-Type: application/json' \
  -d '{"query": "{ __typename }"}'
# {"data":{"__typename":"Query"}}

This returns 200 with valid data. But a plain GET to the same URL typically returns:

curl https://api.example.com/graphql
# 400 Bad Request: Must provide query string.

Your uptime monitor needs to send a proper GraphQL request body. Here are three strategies.

Strategy 1: Dedicated HTTP Health Endpoint (Recommended)

The cleanest solution is a separate, non-GraphQL /health endpoint on the same server. This is the recommended approach because it:

Decouples your monitoring probe from your GraphQL schema
Avoids adding monitoring noise to your operation logs
Works even if your schema evolves

Apollo Server (Node.js)

// server.js
const express = require('express');
const { ApolloServer } = require('@apollo/server');
const { expressMiddleware } = require('@apollo/server/express4');

const app = express();

// Health endpoint — checked by Vigilmon
app.get('/health', async (req, res) => {
  try {
    // Lightweight DB ping
    await db.raw('SELECT 1');
    res.json({ status: 'ok', service: 'graphql-api' });
  } catch (err) {
    res.status(503).json({ status: 'down', error: err.message });
  }
});

// GraphQL endpoint
app.use('/graphql', express.json(), expressMiddleware(server));

app.listen(4000);

Apollo Server with TypeScript

import express, { Request, Response } from 'express';
import { ApolloServer } from '@apollo/server';
import { expressMiddleware } from '@apollo/server/express4';

const app = express();

app.get('/health', async (_req: Request, res: Response) => {
  const checks = { database: false };

  try {
    await dataSource.query('SELECT 1');
    checks.database = true;
  } catch {}

  const healthy = Object.values(checks).every(Boolean);
  res.status(healthy ? 200 : 503).json({
    status: healthy ? 'ok' : 'degraded',
    checks,
  });
});

app.use('/graphql', express.json(), expressMiddleware(server));

Hasura

Hasura exposes /healthz out of the box:

curl https://your-hasura-instance.com/healthz
# OK

Hasura returns 200 OK (body: OK) when healthy and 500 when the metadata database is unreachable. Point Vigilmon directly at https://your-hasura-instance.com/healthz.

Strategy 2: GraphQL Introspection Ping via POST Monitor

If you can't add a separate health endpoint (e.g. a third-party API gateway), use Vigilmon's HTTP POST monitoring with an introspection query. The { __typename } query is the lightest valid operation:

{ "query": "{ __typename }" }

In Vigilmon:

Create a new monitor, select type HTTP
Set Method to POST
Set URL to https://api.example.com/graphql
Add header: Content-Type: application/json
Set Request body to: {"query":"{ __typename }"}
Under Keyword check, add "data" — this ensures the response body contains a valid GraphQL data envelope, not an error object
Save

This verifies the GraphQL layer is actually parsing and resolving queries, not just serving a 200 from a gateway that's already dropped the backend.

Caution about introspection in production: many teams disable introspection queries in production for security reasons. If you've done this, use a lightweight named query instead:

{ "query": "query HealthPing { __typename }" }

Or even better, add a dedicated health query to your schema:

type Query {
  health: HealthStatus!
}

type HealthStatus {
  ok: Boolean!
  version: String!
}

// resolver
Query: {
  health: async () => ({
    ok: true,
    version: process.env.APP_VERSION ?? '0.0.0',
  }),
},

Then monitor with:

{ "query": "{ health { ok version } }" }

And keyword check on "ok":true.

Strategy 3: Apollo Server Built-in Health Check

Apollo Server 4.x ships with a built-in health check route:

const server = new ApolloServer({ typeDefs, resolvers });
await server.start();

// Apollo exposes /.well-known/apollo/server-health by default
// You can also configure a custom healthCheckPath:
const server = new ApolloServer({
  typeDefs,
  resolvers,
  // healthCheckPath: '/health', // Apollo Server 3.x option
});

For Apollo Server 4 with Express:

// The built-in landing page serves GET /, but for a proper health check:
app.get('/.well-known/apollo/server-health', (req, res) => {
  res.json({ status: 'pass' });
});

Point Vigilmon at https://api.example.com/.well-known/apollo/server-health with a keyword check for "pass".

Step 3: Create the Vigilmon Monitor

For a `/health` or `/healthz` endpoint:

Log in to Vigilmon → Monitors → New Monitor
Type: HTTP
Method: GET
URL: https://api.example.com/health
Interval: 1 minute
Keyword check: "status":"ok" (or OK for Hasura)
Save

For a POST-based GraphQL probe:

Type: HTTP
Method: POST
URL: https://api.example.com/graphql
Headers: Content-Type: application/json
Body: {"query":"{ __typename }"}
Keyword check: "data"
Save

Step 4: Handle Vigilmon Webhooks in Your GraphQL Server

When a monitor fires, Vigilmon can call your server's webhook endpoint. Add a mutation or a REST handler:

REST handler (recommended — no auth needed for webhooks)

// Express route alongside your GraphQL endpoint
app.post('/webhooks/vigilmon', express.json(), (req, res) => {
  const { event, monitor } = req.body;

  if (event === 'down') {
    console.error(`[Vigilmon] DOWN: ${monitor.name}`);
    // Notify Slack, PagerDuty, create incident ticket
  } else if (event === 'up') {
    console.info(`[Vigilmon] RECOVERED: ${monitor.name}`);
  }

  res.sendStatus(200);
});

Configure this URL in Vigilmon under Alert Channels → Webhook.

Step 5: Subscription and WebSocket Monitoring

If you expose GraphQL subscriptions over WebSocket, your HTTP health check won't cover the WebSocket transport. Add a secondary monitor or a heartbeat:

// Send a heartbeat from the subscription server every 60 seconds
const HEARTBEAT_URL = process.env.VIGILMON_HEARTBEAT_URL;

setInterval(async () => {
  if (!HEARTBEAT_URL) return;
  try {
    await fetch(HEARTBEAT_URL);
  } catch (e) {
    console.warn('[Vigilmon] Heartbeat ping failed:', e.message);
  }
}, 60_000);

Create a Heartbeat monitor in Vigilmon with a 2-minute expected interval.

Step 6: Monitoring Federated GraphQL (Apollo Federation)

For a federated supergraph, monitor at multiple levels:

| Layer | What to monitor | Monitor type | |-------|-----------------|--------------| | Router / Gateway | https://gateway.example.com/health | HTTP GET | | Auth subgraph | https://auth-subgraph.example.com/health | HTTP GET | | Products subgraph | https://products-subgraph.example.com/health | HTTP GET |

If a subgraph goes down, the gateway may still respond 200 but return partial errors. Monitoring each subgraph independently lets you pinpoint which service failed.

Add a keyword check on "errors" absence or use a query that touches the affected subgraph's data:

{ "query": "{ products { id } }" }

And keyword check that "data" is present and "errors" is absent.

Step 7: Configure Alert Escalation

In Monitors → (your monitor) → Alert Channels, configure:

Immediate: Email or Slack when the monitor first fails
Escalation (10 min): Page the on-call engineer if unacknowledged
Recovery alert: Notify when the API comes back up so you can close the incident

For critical GraphQL APIs serving paying customers, set the interval to 1 minute and escalation to 5 minutes.

Common Pitfalls

1. Monitoring the wrong thing Don't monitor a CDN or gateway cache — monitor the actual origin. A cached 200 from a CDN doesn't mean your GraphQL server is healthy.

2. Ignoring partial errors GraphQL returns 200 even for { "errors": [...] } responses. Use a keyword check to verify "data" is present, not just the HTTP status code.

3. No authentication on health endpoints Your /health endpoint should be accessible without auth — monitoring probes can't log in. Use a separate path that's explicitly public.

4. Overly heavy health checks Don't run schema validation or resolver benchmarks inside your health endpoint. A simple SELECT 1 or ping is enough. The health check itself should complete in under 50ms.

Summary

| Strategy | Best for | |----------|----------| | Dedicated /health GET endpoint | Any GraphQL server you control | | Introspection POST probe | Third-party APIs or managed services | | Hasura /healthz | Hasura Cloud or self-hosted Hasura | | Apollo built-in health | Apollo Server 4+ |

With Vigilmon monitoring your GraphQL API, you'll know within 60 seconds when resolvers stop working, databases disconnect, or the server crashes — before users start seeing broken queries.

Uptime Monitoring for GraphQL APIs: A Complete Guide

Uptime Monitoring for GraphQL APIs: A Complete Guide

The Core Challenge: GraphQL Is POST-First

Strategy 1: Dedicated HTTP Health Endpoint (Recommended)

Apollo Server (Node.js)

Apollo Server with TypeScript

Hasura

Strategy 2: GraphQL Introspection Ping via POST Monitor

Strategy 3: Apollo Server Built-in Health Check

Step 3: Create the Vigilmon Monitor

For a `/health` or `/healthz` endpoint:

For a POST-based GraphQL probe:

Step 4: Handle Vigilmon Webhooks in Your GraphQL Server

REST handler (recommended — no auth needed for webhooks)

Step 5: Subscription and WebSocket Monitoring

Step 6: Monitoring Federated GraphQL (Apollo Federation)

Step 7: Configure Alert Escalation

Common Pitfalls

Summary

Further Reading

Monitor your app with Vigilmon

Uptime Monitoring for GraphQL APIs: A Complete Guide

The Core Challenge: GraphQL Is POST-First

Strategy 1: Dedicated HTTP Health Endpoint (Recommended)

Apollo Server (Node.js)

Apollo Server with TypeScript

Hasura

Strategy 2: GraphQL Introspection Ping via POST Monitor

Strategy 3: Apollo Server Built-in Health Check

Step 3: Create the Vigilmon Monitor

For a /health or /healthz endpoint:

For a POST-based GraphQL probe:

Step 4: Handle Vigilmon Webhooks in Your GraphQL Server

REST handler (recommended — no auth needed for webhooks)

Step 5: Subscription and WebSocket Monitoring

Step 6: Monitoring Federated GraphQL (Apollo Federation)

Step 7: Configure Alert Escalation

Common Pitfalls

Summary

Further Reading

Monitor your app with Vigilmon

For a `/health` or `/healthz` endpoint: