Monitoring AI/LLM API Dependencies with Vigilmon
If your product uses OpenAI, Anthropic, Groq, or any other AI provider, you have a dependency you can't control — and it fails more often than you'd expect. Provider outages, rate limit changes, model deprecations, and latency spikes happen regularly. In 2026, AI API incidents are one of the top sources of unexpected user-facing degradation for products that have integrated LLMs.
The problem: your own infrastructure might be perfectly healthy while your AI features are completely broken because a provider is having an incident. Without monitoring, you find out when users complain.
This guide covers how to monitor AI API dependencies with Vigilmon so you know about provider problems before your users do.
Why AI APIs Need External Monitoring
Standard application monitoring (APM, error tracking) misses provider-side issues because:
- The failure is upstream — your server makes a request, the AI API returns a 500 or times out, your error handler logs it, but there's no alert that distinguishes "our code broke" from "OpenAI is having an incident"
- Errors look like application errors — an
openai.APIStatusErrorlooks like any other exception in Sentry or Datadog - Degraded performance isn't an error — if response times go from 2s to 45s, your error tracker doesn't fire, but your users are experiencing broken UX
- Status pages are reactive — providers post to their status pages after they've detected and confirmed an incident, often 5–20 minutes after the first failures
External monitoring from Vigilmon lets you detect provider issues in real time, independent of your application logs.
What to Monitor for Each AI Provider
OpenAI
OpenAI exposes a public status page and API endpoint you can probe:
| Check | URL | Monitor type |
|-------|-----|--------------|
| Status page availability | https://status.openai.com | HTTP GET |
| API reachability | https://api.openai.com/v1/models | HTTP GET (with auth header) |
The /v1/models endpoint is a lightweight API call that returns your available models. A 200 response means the API is reachable and your API key is valid. A 503 or connection timeout means there's an incident.
Vigilmon setup for OpenAI:
- Monitor 1:
https://status.openai.com— keyword checkAll Systems Operational - Monitor 2:
https://api.openai.com/v1/models— headerAuthorization: Bearer sk-..., keyword check"gpt-4"
Security note: Use a dedicated read-only API key for monitoring. Create a project-scoped key in OpenAI's dashboard with no write permissions.
Anthropic
https://status.anthropic.com (status page)
https://api.anthropic.com/v1/models (API probe, requires API key)
Vigilmon setup for Anthropic:
- Monitor 1:
https://status.anthropic.com— keyword checkAll Systems Operational - Monitor 2:
https://api.anthropic.com/v1/models— headers:x-api-key: sk-ant-...anthropic-version: 2023-06-01- Keyword check:
claude
Groq
Groq is popular for its fast inference speeds. It also exposes a models endpoint:
https://status.groq.com
https://api.groq.com/openai/v1/models
Vigilmon setup for Groq:
- Monitor 1:
https://status.groq.com— keyword checkAll Systems Operational - Monitor 2:
https://api.groq.com/openai/v1/models— headerAuthorization: Bearer gsk_..., keyword checkllama
Google Gemini / Vertex AI
https://status.cloud.google.com
https://generativelanguage.googleapis.com/v1beta/models?key=YOUR_KEY
Mistral AI
https://help.mistral.ai/en/
https://api.mistral.ai/v1/models
Setting Up the Vigilmon Monitors
Step-by-step for OpenAI API probe
- Log in to Vigilmon → Monitors → New Monitor
- Type: HTTP
- Method: GET
- URL:
https://api.openai.com/v1/models - Interval: 5 minutes (avoid unnecessary API calls that count against your rate limits)
- Custom headers:
Authorization: Bearer sk-your-dedicated-monitoring-keyContent-Type: application/json
- Keyword check:
"gpt-4"(confirms the model list returned valid data) - Response timeout: 10 seconds (flag slow responses as failures)
- Save
Repeat for each provider your product depends on.
Step-by-step for status page monitoring
- Type: HTTP
- Method: GET
- URL:
https://status.openai.com - Keyword check:
All Systems Operational - Interval: 5 minutes
- Save
When a provider posts an incident to their status page, this monitor will fail (the keyword won't match) and alert you. This is often your fastest signal outside of the API probe itself.
Synthetic AI API Availability Checks
A step beyond the models endpoint probe: make an actual lightweight API call to verify the AI pipeline is working end-to-end. Use a minimal prompt designed for monitoring.
Using a server-side synthetic check script
// health/ai-probe.ts — called by a cron or Vigilmon heartbeat
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
export async function probeAnthropicAPI(): Promise<{
ok: boolean;
latencyMs: number;
error?: string;
}> {
const start = Date.now();
try {
const response = await client.messages.create({
model: 'claude-haiku-4-5-20251001',
max_tokens: 5,
messages: [{ role: 'user', content: 'Reply with just: ok' }],
});
const latencyMs = Date.now() - start;
const replied = response.content[0]?.type === 'text';
return { ok: replied, latencyMs };
} catch (err: any) {
return { ok: false, latencyMs: Date.now() - start, error: err.message };
}
}
Expose this as an HTTP health endpoint and point Vigilmon at it:
// app/api/health/ai/route.ts (Next.js)
import { NextResponse } from 'next/server';
import { probeAnthropicAPI } from '@/health/ai-probe';
export async function GET() {
const result = await probeAnthropicAPI();
if (!result.ok) {
return NextResponse.json(result, { status: 503 });
}
return NextResponse.json(result);
}
Monitor https://yourapp.com/api/health/ai with Vigilmon — keyword check "ok":true.
Use the cheapest, fastest model for probes (Haiku, Groq's llama-3.1-8b, etc.) — you're paying per token, and you're calling this every few minutes.
Handling Multi-Provider Fallback
If your application falls back to a secondary provider when the primary is down:
// lib/ai-client.ts
export async function generateWithFallback(prompt: string): Promise<string> {
// Try primary (OpenAI)
try {
const result = await openaiClient.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: prompt }],
});
return result.choices[0].message.content ?? '';
} catch (primaryError) {
console.warn('[AI] Primary provider failed, falling back:', primaryError);
// Fall back to Anthropic
const result = await anthropicClient.messages.create({
model: 'claude-haiku-4-5-20251001',
max_tokens: 1024,
messages: [{ role: 'user', content: prompt }],
});
return result.content[0].type === 'text' ? result.content[0].text : '';
}
}
Set up Vigilmon monitors for both providers. When the primary goes down and you're silently falling back, your Vigilmon alert tells you that you're now running in degraded mode — so you can communicate it to users and plan for the primary to come back.
Alerting Strategy for AI API Dependencies
AI provider outages have a different urgency profile than infrastructure outages:
- Your servers going down: Critical, wake someone up now
- An AI provider going down: High priority, but the app may still work (features degrade gracefully)
Structure your alerts accordingly:
| Monitor | Alert type | Escalation |
|---------|-----------|------------|
| OpenAI API probe | Slack #ai-ops immediately | Email on-call at 15 min |
| Anthropic API probe | Slack #ai-ops immediately | Email on-call at 15 min |
| OpenAI status page | Slack #ai-ops immediately | — |
| Your AI health endpoint | PagerDuty if critical path | Slack otherwise |
Set a 5-minute check interval for AI APIs — calling the models endpoint every minute would burn through rate limits and add unnecessary API costs across dozens of monitoring runs per hour.
Monitoring Latency Degradation
Providers sometimes have incidents that manifest as extreme slowness rather than outright failures. Responses that normally take 2 seconds suddenly take 30 seconds. Your users see spinning loaders; your error trackers show nothing.
In Vigilmon, set a response time threshold:
- Open your AI API monitor → Advanced Settings
- Enable Response time threshold
- Warning: 5000ms (flag slow responses)
- Critical: 15000ms (alert the team)
For the synthetic API call monitor (the one that makes a real prompt), set thresholds relative to what you'd expect from the cheapest model: 2000ms warning, 8000ms critical.
Building a Provider Health Dashboard
Create a Vigilmon monitor group for all AI provider monitors:
- In Vigilmon → Groups → New Group
- Name: "AI Provider Dependencies"
- Add all provider monitors to the group
This gives you a single dashboard view showing which providers are healthy, which are degraded, and historical uptime by provider — useful for vendor reviews and SLA discussions.
Summary
| Provider | Status page | API probe endpoint |
|----------|------------|-------------------|
| OpenAI | status.openai.com | api.openai.com/v1/models |
| Anthropic | status.anthropic.com | api.anthropic.com/v1/models |
| Groq | status.groq.com | api.groq.com/openai/v1/models |
| Google Gemini | status.cloud.google.com | generativelanguage.googleapis.com/v1beta/models |
Key principles:
- Monitor both the status page and the API endpoint — one may flag an incident before the other
- Use dedicated read-only API keys for monitoring probes
- Check every 5 minutes to avoid burning rate limits
- Set latency thresholds, not just availability — slowness is often the first sign of an incident
- If you have fallback providers, monitor them too so you know when you're in degraded mode
With Vigilmon watching your AI API dependencies, you'll detect provider incidents within minutes — in time to add an in-app banner, trigger fallback behavior, or page the on-call before users start filing support tickets.