Actix-web Uptime Monitoring: Zero-Downtime Deployment Guide

Actix-web is one of the fastest web frameworks available — but raw throughput doesn't protect you from a deadlocked actor, a connection pool exhausted after a traffic spike, or a botched deploy that restarts the process without terminating in-flight requests. Vigilmon catches all of these from outside your infrastructure, alerting you within seconds of the first failure before users notice.

In this guide you'll add production-grade uptime monitoring to an Actix-web service, wire up actor system health checks, set up a Docker build that works with cargo, and configure zero-downtime deployments with graceful shutdown.

What You'll Build

A /health handler that checks the database and reports latency
An actor health check that pings a background actor to verify it is alive
A Vigilmon HTTP monitor with multi-region probes
A multi-stage Dockerfile optimized for cargo build caching
Docker HEALTHCHECK that restarts unhealthy containers
A Vigilmon heartbeat for background tokio tasks
Email and Slack alerts

Prerequisites

Rust 1.75+ with cargo
An Actix-web project (or follow along from scratch)
A free Vigilmon account

Step 1: Add the Health Handler

Add the required dependencies to Cargo.toml:

[dependencies]
actix-web = "4"
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
sqlx = { version = "0.7", features = ["postgres", "runtime-tokio-native-tls"] }
chrono = { version = "0.4", features = ["serde"] }
reqwest = { version = "0.11", features = ["json"] }

Create src/health.rs:

use actix_web::{web, HttpResponse};
use chrono::Utc;
use serde::Serialize;
use sqlx::PgPool;
use std::collections::HashMap;
use std::time::Instant;

#[derive(Serialize)]
pub struct HealthResponse {
    pub status: String,
    pub timestamp: String,
    pub latency_ms: f64,
    pub checks: HashMap<String, String>,
}

pub async fn health_check(pool: web::Data<PgPool>) -> HttpResponse {
    let start = Instant::now();
    let mut checks = HashMap::new();
    let mut overall = "ok".to_string();

    // Ping the database with a short timeout
    match sqlx::query("SELECT 1")
        .execute(pool.get_ref())
        .await
    {
        Ok(_) => {
            checks.insert("database".to_string(), "ok".to_string());
        }
        Err(e) => {
            checks.insert("database".to_string(), format!("error: {}", e));
            overall = "degraded".to_string();
        }
    }

    let latency_ms = start.elapsed().as_secs_f64() * 1000.0;

    let body = HealthResponse {
        status: overall.clone(),
        timestamp: Utc::now().to_rfc3339(),
        latency_ms: (latency_ms * 10.0).round() / 10.0,
        checks,
    };

    if overall == "ok" {
        HttpResponse::Ok().json(body)
    } else {
        HttpResponse::ServiceUnavailable().json(body)
    }
}

use actix_web::{web, App, HttpServer};
use sqlx::PgPool;

mod health;

#[actix_web::main]
async fn main() -> std::io::Result<()> {
    let db_url = std::env::var("DATABASE_URL")
        .expect("DATABASE_URL must be set");

    let pool = PgPool::connect(&db_url)
        .await
        .expect("Failed to connect to database");

    let pool = web::Data::new(pool);

    HttpServer::new(move || {
        App::new()
            .app_data(pool.clone())
            // Health endpoint — no auth middleware
            .route("/health", web::get().to(health::health_check))
            // ... rest of your routes ...
    })
    .bind("0.0.0.0:8080")?
    .run()
    .await
}

Test it:

cargo run
curl -s http://localhost:8080/health | jq .

{
  "status": "ok",
  "timestamp": "2025-06-29T10:00:00+00:00",
  "latency_ms": 0.9,
  "checks": {
    "database": "ok"
  }
}

Step 2: Actor System Health Check

If you use Actix actors for background processing, an HTTP check against /health won't detect a deadlocked actor. Add an actor health probe using the Actix messaging system.

// src/actors/processor.rs
use actix::prelude::*;

pub struct ProcessorActor;

impl Actor for ProcessorActor {
    type Context = Context<Self>;
}

// A lightweight ping message that returns when the actor is alive
#[derive(Message)]
#[rtype(result = "bool")]
pub struct Ping;

impl Handler<Ping> for ProcessorActor {
    type Result = bool;

    fn handle(&mut self, _msg: Ping, _ctx: &mut Context<Self>) -> bool {
        true
    }
}

Then extend the health check to ping the actor:

// src/health.rs (updated)
use crate::actors::processor::{Ping, ProcessorActor};
use actix::Addr;

pub async fn health_check(
    pool: web::Data<PgPool>,
    actor: web::Data<Addr<ProcessorActor>>,
) -> HttpResponse {
    let start = Instant::now();
    let mut checks = HashMap::new();
    let mut overall = "ok".to_string();

    // Database check
    match sqlx::query("SELECT 1").execute(pool.get_ref()).await {
        Ok(_) => { checks.insert("database".to_string(), "ok".to_string()); }
        Err(e) => {
            checks.insert("database".to_string(), format!("error: {}", e));
            overall = "degraded".to_string();
        }
    }

    // Actor system check — 2-second timeout
    match tokio::time::timeout(
        std::time::Duration::from_secs(2),
        actor.send(Ping),
    ).await {
        Ok(Ok(true)) => { checks.insert("actor".to_string(), "ok".to_string()); }
        _ => {
            checks.insert("actor".to_string(), "error: timeout or dead".to_string());
            overall = "degraded".to_string();
        }
    }

    let latency_ms = start.elapsed().as_secs_f64() * 1000.0;
    let body = HealthResponse {
        status: overall.clone(),
        timestamp: Utc::now().to_rfc3339(),
        latency_ms: (latency_ms * 10.0).round() / 10.0,
        checks,
    };

    if overall == "ok" {
        HttpResponse::Ok().json(body)
    } else {
        HttpResponse::ServiceUnavailable().json(body)
    }
}

let processor = ProcessorActor.start();
let processor = web::Data::new(processor);

HttpServer::new(move || {
    App::new()
        .app_data(pool.clone())
        .app_data(processor.clone())
        .route("/health", web::get().to(health::health_check))
})

Now a deadlocked or crashed actor shows up in your health response as 503 degraded, and Vigilmon alerts you immediately.

Step 3: Set Up the Vigilmon HTTP Monitor

Log in to Vigilmon and click New Monitor → HTTP.
Enter https://your-domain.com/health.
Set check interval to 60 seconds (free) or 30 seconds (paid).
Add assertions:
- Status code equals 200
- Response body contains "status":"ok"
Save.

Vigilmon probes your endpoint from multiple regions and escalates to an alert if two or more consecutive checks fail.

Step 4: Configure Alert Channels

In Alerts → Channels:

Email — add your on-call address. Vigilmon notifies within 30 seconds of the first failed check and sends a recovery alert when the service comes back.

Slack — click Add Channel → Slack, paste your incoming webhook URL. Route CRITICAL monitors to both channels under Alerts → Routing.

Step 5: Multi-Stage Dockerfile with Cargo Build Caching

cargo build compiles your entire dependency tree from scratch unless you cache the layer correctly. This Dockerfile splits the dependency download from your application code so rebuilds only recompile your own crate:

FROM rust:1.75-slim AS builder
WORKDIR /app

# Cache dependency compilation separately from application code.
# Copy manifests first, build deps with a dummy main, then replace with real source.
COPY Cargo.toml Cargo.lock ./
RUN mkdir src && echo "fn main() {}" > src/main.rs
RUN cargo build --release && rm -rf src

# Now compile the actual application
COPY src ./src
RUN touch src/main.rs && cargo build --release

FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y ca-certificates curl && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY --from=builder /app/target/release/myapp .

EXPOSE 8080

# Docker probes /health every 30s. 3 consecutive failures → restart.
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
    CMD curl -f http://localhost:8080/health || exit 1

CMD ["./myapp"]

Build and run:

docker build -t myapp .
docker run -p 8080:8080 -e DATABASE_URL=postgres://... myapp

The first build compiles all dependencies (slow). Subsequent builds that only change src/ skip the dependency layer — much faster in CI.

Step 6: Heartbeat Monitor for Background tokio Tasks

If you run background tokio::spawn tasks (cron jobs, queue consumers, etc.), HTTP monitoring won't detect a silently stopped task. Set up a heartbeat:

In Vigilmon, go to New Monitor → Heartbeat.
Set the expected interval (e.g., 5 minutes).
Copy the ping URL.

Ping it from your background task on each successful run:

// src/tasks/cleanup.rs
use std::time::Duration;

const VIGILMON_HEARTBEAT: &str = "https://vigilmon.online/ping/YOUR_HEARTBEAT_ID";

pub async fn run_cleanup_loop() {
    let mut interval = tokio::time::interval(Duration::from_secs(300));
    let client = reqwest::Client::new();

    loop {
        interval.tick().await;

        match cleanup_old_records().await {
            Ok(_) => {
                // Ping Vigilmon to signal the task is alive and healthy
                let _ = client
                    .get(VIGILMON_HEARTBEAT)
                    .timeout(Duration::from_secs(5))
                    .send()
                    .await;
            }
            Err(e) => {
                // Don't ping — Vigilmon will detect the missed heartbeat
                eprintln!("cleanup error: {}", e);
            }
        }
    }
}

Spawn it from main:

tokio::spawn(crate::tasks::cleanup::run_cleanup_loop());

If the task panics or is dropped, the next ping never arrives and Vigilmon fires an alert.

Step 7: Graceful Shutdown

Actix-web's HttpServer handles SIGTERM and SIGINT gracefully via .shutdown_timeout(). Set this to a value less than your load balancer or orchestrator's drain timeout:

HttpServer::new(move || {
    App::new()
        .app_data(pool.clone())
        .route("/health", web::get().to(health::health_check))
})
.bind("0.0.0.0:8080")?
// Give in-flight requests up to 10 seconds to finish after SIGTERM
.shutdown_timeout(10)
.run()
.await

Combine this with a Docker or Kubernetes pre-stop hook that waits for connections to drain before the process exits, and Vigilmon will see at most one missed check during a rolling deploy.

Step 8: Verify End-to-End

Confirm the Vigilmon dashboard shows the monitor as UP.
Temporarily return 503 from your health handler and verify an alert fires.
Restore 200 and confirm the recovery notification.

To simulate a database failure:

// Break the pool connection string temporarily
let pool = PgPool::connect("postgres://bad:creds@localhost/none").await?;

The health check returns 503 with "database": "error: ..." in the checks, and Vigilmon catches it within the next interval.

Production Checklist

[ ] /health handler returning 200 with JSON body
[ ] Database sqlx::query("SELECT 1") in health check with timeout
[ ] Actor Ping message check if using Actix actors
[ ] Vigilmon HTTP monitor with status code + body assertions
[ ] Email and Slack alert channels configured
[ ] Multi-stage Dockerfile with cargo layer caching
[ ] HEALTHCHECK directive in Dockerfile
[ ] Heartbeat monitor for background tokio tasks
[ ] shutdown_timeout set in HttpServer

Summary

You now have an Actix-web service that:

Exposes a structured /health endpoint with active DB and actor checks
Reports sub-millisecond latency in the health response body
Is probed every minute by Vigilmon from multiple regions
Fires Slack and email alerts within seconds of any failure
Builds efficiently in Docker with cargo layer caching
Monitors background tokio tasks with heartbeat checks
Shuts down gracefully on SIGTERM without dropping requests

Actix-web's performance means your health endpoint adds negligible overhead. The monitoring infrastructure is free on Vigilmon's starter tier. Your on-call team will know about failures in Slack — long before users find them.