Monitoring AI/LLM API Dependencies with Vigilmon

If your product uses OpenAI, Anthropic, Groq, or any other AI provider, you have a dependency you can't control — and it fails more often than you'd expect. Provider outages, rate limit changes, model deprecations, and latency spikes happen regularly. In 2026, AI API incidents are one of the top sources of unexpected user-facing degradation for products that have integrated LLMs.

The problem: your own infrastructure might be perfectly healthy while your AI features are completely broken because a provider is having an incident. Without monitoring, you find out when users complain.

This guide covers how to monitor AI API dependencies with Vigilmon so you know about provider problems before your users do.

Why AI APIs Need External Monitoring

Standard application monitoring (APM, error tracking) misses provider-side issues because:

The failure is upstream — your server makes a request, the AI API returns a 500 or times out, your error handler logs it, but there's no alert that distinguishes "our code broke" from "OpenAI is having an incident"
Errors look like application errors — an openai.APIStatusError looks like any other exception in Sentry or Datadog
Degraded performance isn't an error — if response times go from 2s to 45s, your error tracker doesn't fire, but your users are experiencing broken UX
Status pages are reactive — providers post to their status pages after they've detected and confirmed an incident, often 5–20 minutes after the first failures

External monitoring from Vigilmon lets you detect provider issues in real time, independent of your application logs.

What to Monitor for Each AI Provider

OpenAI

OpenAI exposes a public status page and API endpoint you can probe:

| Check | URL | Monitor type | |-------|-----|--------------| | Status page availability | https://status.openai.com | HTTP GET | | API reachability | https://api.openai.com/v1/models | HTTP GET (with auth header) |

The /v1/models endpoint is a lightweight API call that returns your available models. A 200 response means the API is reachable and your API key is valid. A 503 or connection timeout means there's an incident.

Vigilmon setup for OpenAI:

Monitor 1: https://status.openai.com — keyword check All Systems Operational
Monitor 2: https://api.openai.com/v1/models — header Authorization: Bearer sk-..., keyword check "gpt-4"

Security note: Use a dedicated read-only API key for monitoring. Create a project-scoped key in OpenAI's dashboard with no write permissions.

Anthropic

https://status.anthropic.com          (status page)
https://api.anthropic.com/v1/models   (API probe, requires API key)

Vigilmon setup for Anthropic:

Monitor 1: https://status.anthropic.com — keyword check All Systems Operational
Monitor 2: https://api.anthropic.com/v1/models — headers:
- x-api-key: sk-ant-...
- anthropic-version: 2023-06-01
- Keyword check: claude

Groq

Groq is popular for its fast inference speeds. It also exposes a models endpoint:

https://status.groq.com
https://api.groq.com/openai/v1/models

Vigilmon setup for Groq:

Monitor 1: https://status.groq.com — keyword check All Systems Operational
Monitor 2: https://api.groq.com/openai/v1/models — header Authorization: Bearer gsk_..., keyword check llama

Google Gemini / Vertex AI

https://status.cloud.google.com
https://generativelanguage.googleapis.com/v1beta/models?key=YOUR_KEY

Mistral AI

https://help.mistral.ai/en/
https://api.mistral.ai/v1/models

Setting Up the Vigilmon Monitors

Step-by-step for OpenAI API probe

Log in to Vigilmon → Monitors → New Monitor
Type: HTTP
Method: GET
URL: https://api.openai.com/v1/models
Interval: 5 minutes (avoid unnecessary API calls that count against your rate limits)
Custom headers:
- Authorization: Bearer sk-your-dedicated-monitoring-key
- Content-Type: application/json
Keyword check: "gpt-4" (confirms the model list returned valid data)
Response timeout: 10 seconds (flag slow responses as failures)
Save

Repeat for each provider your product depends on.

Step-by-step for status page monitoring

Type: HTTP
Method: GET
URL: https://status.openai.com
Keyword check: All Systems Operational
Interval: 5 minutes
Save

When a provider posts an incident to their status page, this monitor will fail (the keyword won't match) and alert you. This is often your fastest signal outside of the API probe itself.

Synthetic AI API Availability Checks

A step beyond the models endpoint probe: make an actual lightweight API call to verify the AI pipeline is working end-to-end. Use a minimal prompt designed for monitoring.

Using a server-side synthetic check script

// health/ai-probe.ts — called by a cron or Vigilmon heartbeat
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

export async function probeAnthropicAPI(): Promise<{
  ok: boolean;
  latencyMs: number;
  error?: string;
}> {
  const start = Date.now();

  try {
    const response = await client.messages.create({
      model: 'claude-haiku-4-5-20251001',
      max_tokens: 5,
      messages: [{ role: 'user', content: 'Reply with just: ok' }],
    });

    const latencyMs = Date.now() - start;
    const replied = response.content[0]?.type === 'text';

    return { ok: replied, latencyMs };
  } catch (err: any) {
    return { ok: false, latencyMs: Date.now() - start, error: err.message };
  }
}

Expose this as an HTTP health endpoint and point Vigilmon at it:

// app/api/health/ai/route.ts (Next.js)
import { NextResponse } from 'next/server';
import { probeAnthropicAPI } from '@/health/ai-probe';

export async function GET() {
  const result = await probeAnthropicAPI();

  if (!result.ok) {
    return NextResponse.json(result, { status: 503 });
  }

  return NextResponse.json(result);
}

Monitor https://yourapp.com/api/health/ai with Vigilmon — keyword check "ok":true.

Use the cheapest, fastest model for probes (Haiku, Groq's llama-3.1-8b, etc.) — you're paying per token, and you're calling this every few minutes.

Handling Multi-Provider Fallback

If your application falls back to a secondary provider when the primary is down:

// lib/ai-client.ts
export async function generateWithFallback(prompt: string): Promise<string> {
  // Try primary (OpenAI)
  try {
    const result = await openaiClient.chat.completions.create({
      model: 'gpt-4o-mini',
      messages: [{ role: 'user', content: prompt }],
    });
    return result.choices[0].message.content ?? '';
  } catch (primaryError) {
    console.warn('[AI] Primary provider failed, falling back:', primaryError);

    // Fall back to Anthropic
    const result = await anthropicClient.messages.create({
      model: 'claude-haiku-4-5-20251001',
      max_tokens: 1024,
      messages: [{ role: 'user', content: prompt }],
    });
    return result.content[0].type === 'text' ? result.content[0].text : '';
  }
}

Set up Vigilmon monitors for both providers. When the primary goes down and you're silently falling back, your Vigilmon alert tells you that you're now running in degraded mode — so you can communicate it to users and plan for the primary to come back.

Alerting Strategy for AI API Dependencies

AI provider outages have a different urgency profile than infrastructure outages:

Your servers going down: Critical, wake someone up now
An AI provider going down: High priority, but the app may still work (features degrade gracefully)

Structure your alerts accordingly:

| Monitor | Alert type | Escalation | |---------|-----------|------------| | OpenAI API probe | Slack #ai-ops immediately | Email on-call at 15 min | | Anthropic API probe | Slack #ai-ops immediately | Email on-call at 15 min | | OpenAI status page | Slack #ai-ops immediately | — | | Your AI health endpoint | PagerDuty if critical path | Slack otherwise |

Set a 5-minute check interval for AI APIs — calling the models endpoint every minute would burn through rate limits and add unnecessary API costs across dozens of monitoring runs per hour.

Monitoring Latency Degradation

Providers sometimes have incidents that manifest as extreme slowness rather than outright failures. Responses that normally take 2 seconds suddenly take 30 seconds. Your users see spinning loaders; your error trackers show nothing.

In Vigilmon, set a response time threshold:

Open your AI API monitor → Advanced Settings
Enable Response time threshold
Warning: 5000ms (flag slow responses)
Critical: 15000ms (alert the team)

For the synthetic API call monitor (the one that makes a real prompt), set thresholds relative to what you'd expect from the cheapest model: 2000ms warning, 8000ms critical.

Building a Provider Health Dashboard

Create a Vigilmon monitor group for all AI provider monitors:

In Vigilmon → Groups → New Group
Name: "AI Provider Dependencies"
Add all provider monitors to the group

This gives you a single dashboard view showing which providers are healthy, which are degraded, and historical uptime by provider — useful for vendor reviews and SLA discussions.

Summary

| Provider | Status page | API probe endpoint | |----------|------------|-------------------| | OpenAI | status.openai.com | api.openai.com/v1/models | | Anthropic | status.anthropic.com | api.anthropic.com/v1/models | | Groq | status.groq.com | api.groq.com/openai/v1/models | | Google Gemini | status.cloud.google.com | generativelanguage.googleapis.com/v1beta/models |

Key principles:

Monitor both the status page and the API endpoint — one may flag an incident before the other
Use dedicated read-only API keys for monitoring probes
Check every 5 minutes to avoid burning rate limits
Set latency thresholds, not just availability — slowness is often the first sign of an incident
If you have fallback providers, monitor them too so you know when you're in degraded mode

With Vigilmon watching your AI API dependencies, you'll detect provider incidents within minutes — in time to add an in-app banner, trigger fallback behavior, or page the on-call before users start filing support tickets.

Monitoring AI/LLM API Dependencies with Vigilmon

Monitoring AI/LLM API Dependencies with Vigilmon

Why AI APIs Need External Monitoring

What to Monitor for Each AI Provider

OpenAI

Anthropic

Groq

Google Gemini / Vertex AI

Mistral AI

Setting Up the Vigilmon Monitors

Step-by-step for OpenAI API probe

Step-by-step for status page monitoring

Synthetic AI API Availability Checks

Using a server-side synthetic check script

Handling Multi-Provider Fallback

Alerting Strategy for AI API Dependencies

Monitoring Latency Degradation

Building a Provider Health Dashboard

Summary

Further Reading

Monitor your app with Vigilmon