tutorial

SaaS Website Monitoring: What to Watch When You Charge Customers

When a visitor's blog goes down, it's inconvenient. When a SaaS product's checkout flow stops working for two hours on a Tuesday afternoon, it's a revenue ev...

When a visitor's blog goes down, it's inconvenient. When a SaaS product's checkout flow stops working for two hours on a Tuesday afternoon, it's a revenue event, a trust event, and potentially a churn event — for every customer who hit that flow and saw an error instead of a confirmation.

SaaS has a different relationship with uptime than consumer sites. Your users are paying you. They have expectations. They're evaluating whether your reliability justifies their subscription renewal. Downtime has an immediate dollar amount attached to it, and that amount is often calculable.

This guide is for SaaS founders and engineering leads who want to monitor the right things, set the right alert thresholds, and understand the actual cost when something goes wrong.


What Makes SaaS Monitoring Different

Generic uptime monitoring — "is the homepage returning 200?" — is necessary but not sufficient for a SaaS product. The endpoints that actually matter are the ones your customers use to do the things they pay you to do.

Three principles apply:

1. Monitor the revenue path. If your checkout flow is broken but your marketing homepage is fine, a monitor on the homepage tells you nothing. The checkout endpoint is the one that needs a check.

2. External monitoring catches what internal monitoring misses. Your servers might be fine. Your CDN might be routing incorrectly. Your SSL certificate might have expired. Your database might be up but the connection pool might be exhausted. External probing from outside your infrastructure — the way your users hit it — is the only view that actually represents their experience.

3. Speed matters as much as availability. A checkout that takes 18 seconds to load isn't "down" in a binary sense, but it's functionally broken for conversion. Response time monitoring alongside availability monitoring gives you the full picture.


What to Monitor in a SaaS Product

1. The Signup Flow

New user acquisition is expensive. If a potential customer clicks your pricing page CTA and hits a 500 error on the signup form, that acquisition cost is wasted.

Monitor:

  • GET /signup — the signup page loads
  • POST /auth/register or equivalent — the form submission succeeds (use synthetic checks for this)
  • Email confirmation delivery — consider monitoring deliverability separately if signup requires email verification

Alert threshold: any failure. The signup endpoint should be treated with the same urgency as your payments flow.

2. Login and Authentication

Existing customers who can't log in are immediately looking at alternatives. Authentication failures generate support tickets, social posts, and churn risk.

Monitor:

  • GET /login — the login page loads and your auth service is reachable
  • POST /auth/login or your identity provider's SAML/OAuth endpoint
  • Token refresh endpoints for long-lived sessions

Alert threshold: any failure, or latency above 3 seconds. Auth flows that are slow generate just as much support friction as auth flows that fail outright.

3. Checkout and Billing

This is the most direct revenue-critical endpoint in your stack. If Stripe (or your payment processor) integration is broken, payments fail silently for customers who are trying to upgrade or start subscriptions.

Monitor:

  • Your checkout page
  • Your payment API endpoint (the endpoint your frontend posts card data to)
  • Plan upgrade flows if they're separate from initial checkout

Alert threshold: any failure. Consider a 30-second check interval instead of 1-minute for payment flows specifically.

4. Core API Endpoints

The actions your customers take most often in your product — creating a record, running a report, uploading a file, triggering an integration — these are your "job to be done" endpoints. When they fail, customers can't do what they're paying you to let them do.

Monitor:

  • Your most-used API endpoints by request volume
  • Endpoints with external dependencies (integrations that call third-party services)
  • Webhook delivery endpoints if you receive webhooks from external platforms

Alert threshold: error rates above 1%, or latency above your p95 baseline.

5. Email Deliverability

SaaS products depend on transactional email for password resets, trial expiration reminders, billing notices, and onboarding sequences. If your email provider has an outage or your sending domain's SPF/DKIM fails, these emails silently stop delivering.

Monitor:

  • Your email provider's status (SendGrid, Postmark, SES — most have status pages you can monitor via HTTP)
  • Delivery rates if your ESP provides an API for this
  • Consider a synthetic check that sends a test message to a dedicated inbox and verifies receipt

Alert threshold: delivery rate drops or status page shows degraded service.

6. SSL Certificate Expiry

SSL certificate expiry is entirely preventable but still causes hundreds of SaaS outages every year. Browser HTTPS errors are non-negotiable — users can't proceed past the certificate warning on modern browsers without a security override most won't do.

Monitor:

  • SSL validity on all domains your product uses (app subdomain, API subdomain, marketing site, any custom domain your customers configure)
  • Alert at 30 days remaining, again at 14 days, and daily from 7 days out

Recommended Alert Thresholds

| Endpoint Type | Availability Alert | Latency Alert | |---|---|---| | Signup flow | Any failure | > 5 seconds | | Login / auth | Any failure | > 3 seconds | | Checkout / payments | Any failure | > 5 seconds | | Core API endpoints | > 1% error rate | > 2× baseline p95 | | Email provider status | Any degradation reported | — | | SSL certificates | < 30 days remaining | — |

For check intervals: critical revenue paths (checkout, login) warrant 30-second checks. Supporting flows can use 1-minute intervals. SSL checks can run hourly.


The Cost-of-Downtime Calculation

Understanding the dollar value of an hour of downtime gives you an anchor for prioritization decisions: how much to invest in monitoring, alerting, redundancy, and incident response tooling.

A simple SaaS downtime cost formula:

Hourly revenue = (Monthly Recurring Revenue / 730)
Downtime cost per hour = Hourly revenue × (affected user % / 100)

Example: $50,000 MRR, checkout flow down (affects 5% of active sessions):

Hourly revenue = $50,000 / 730 = $68.49/hour
Downtime cost = $68.49 × 0.05 = $3.42/hour in direct lost transactions

That's the direct transaction impact. Add:

  • Support cost: Each downtime ticket costs an engineering-hour or support-hour to resolve. At $5–10K average support cost per engineer per month, each ticket costs ~$25–50 in loaded labor.
  • Churn multiplier: A customer who experiences a broken checkout during a purchase decision is more likely to cancel. The LTV impact of even one churn event from an avoidable outage can dwarf the $3/hour direct calculation.
  • Brand cost: For B2B SaaS especially, visible outages get shared on Slack by your customers' teams. A status page that shows you detected and communicated the issue fast is worth more than one that shows you found out when a customer emailed.

Setting Up Vigilmon for SaaS Monitoring

Vigilmon is built for exactly this use case: external monitoring of HTTP endpoints, TCP ports, and SSL certificates from multiple geographic regions, with no agent installation required.

For a SaaS product, a baseline Vigilmon configuration looks like:

Monitors to create:

  1. https://app.yourproduct.com/login — HTTP 200 check, 30-second interval
  2. https://app.yourproduct.com/signup — HTTP 200 check, 30-second interval
  3. https://app.yourproduct.com/checkout — HTTP 200 check, 30-second interval
  4. https://api.yourproduct.com/health — HTTP 200 check, 1-minute interval
  5. SSL certificate for app.yourproduct.com — alert at 30 days remaining
  6. Your email provider's status URL — HTTP 200 check, 5-minute interval

Multi-region consensus: Vigilmon's quorum-based alerting means a single regional blip doesn't wake your team. All checks run from multiple geographic probes; you only get alerted when the failure is confirmed across regions.

Alert routing: Connect Vigilmon to Slack for immediate team notification. Use webhook delivery to pipe alerts into your incident management workflow if you have one.

Status page: Vigilmon's built-in status page gives your customers a public URL to check when they experience issues — keeping support inboxes clear during incidents and demonstrating that you communicate proactively.


Operational Recommendations

Define ownership now, not during an incident. Who gets the Vigilmon Slack alerts? Who's responsible for responding? Who owns the status page update during an incident? Document this before the alert fires.

Runbooks for each monitor. When a checkout alert fires at 11pm, the on-call engineer shouldn't have to think about what to check first. A short runbook — check Stripe status, check database connections, check deploy queue — turns a 30-minute scramble into a 5-minute triage.

Incident communication is part of reliability. Your status page update speed during an incident directly affects customer trust. A status page showing "investigating" within 5 minutes of an alert is dramatically better than customers discovering the issue before you do.

Review false positives. If your monitors fire on transient issues (single-region DNS hiccups, CDN edge cases), adjust thresholds. Alert fatigue from false positives is the fastest way to ensure the team stops treating real alerts seriously.


Conclusion

SaaS monitoring is not the same as checking whether your homepage is up. Your revenue path — signup, login, checkout, core API — needs external, multi-region verification on short check intervals, with clear ownership and a status page ready to communicate when something fails.

The cost of not monitoring these paths properly is calculated in lost conversions, support overhead, and customer churn. The cost of monitoring them properly with a tool like Vigilmon is $0 to start.

Start monitoring your SaaS product at vigilmon.online — 5 monitors, 1-minute intervals, SSL monitoring, status page, Slack integration, no credit card required.


Tags: #saas #monitoring #devops #uptime #startup #webdev

Monitor your app with Vigilmon

Free plan — 5 monitors, no credit card required. Up and running in 60 seconds.

Start free →