Actix-web is one of the fastest web frameworks available — but raw throughput doesn't protect you from a deadlocked actor, a connection pool exhausted after a traffic spike, or a botched deploy that restarts the process without terminating in-flight requests. Vigilmon catches all of these from outside your infrastructure, alerting you within seconds of the first failure before users notice.
In this guide you'll add production-grade uptime monitoring to an Actix-web service, wire up actor system health checks, set up a Docker build that works with cargo, and configure zero-downtime deployments with graceful shutdown.
What You'll Build
- A
/healthhandler that checks the database and reports latency - An actor health check that pings a background actor to verify it is alive
- A Vigilmon HTTP monitor with multi-region probes
- A multi-stage
Dockerfileoptimized forcargobuild caching - Docker
HEALTHCHECKthat restarts unhealthy containers - A Vigilmon heartbeat for background
tokiotasks - Email and Slack alerts
Prerequisites
- Rust 1.75+ with
cargo - An Actix-web project (or follow along from scratch)
- A free Vigilmon account
Step 1: Add the Health Handler
Add the required dependencies to Cargo.toml:
[dependencies]
actix-web = "4"
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
serde_json = "1"
sqlx = { version = "0.7", features = ["postgres", "runtime-tokio-native-tls"] }
chrono = { version = "0.4", features = ["serde"] }
reqwest = { version = "0.11", features = ["json"] }
Create src/health.rs:
use actix_web::{web, HttpResponse};
use chrono::Utc;
use serde::Serialize;
use sqlx::PgPool;
use std::collections::HashMap;
use std::time::Instant;
#[derive(Serialize)]
pub struct HealthResponse {
pub status: String,
pub timestamp: String,
pub latency_ms: f64,
pub checks: HashMap<String, String>,
}
pub async fn health_check(pool: web::Data<PgPool>) -> HttpResponse {
let start = Instant::now();
let mut checks = HashMap::new();
let mut overall = "ok".to_string();
// Ping the database with a short timeout
match sqlx::query("SELECT 1")
.execute(pool.get_ref())
.await
{
Ok(_) => {
checks.insert("database".to_string(), "ok".to_string());
}
Err(e) => {
checks.insert("database".to_string(), format!("error: {}", e));
overall = "degraded".to_string();
}
}
let latency_ms = start.elapsed().as_secs_f64() * 1000.0;
let body = HealthResponse {
status: overall.clone(),
timestamp: Utc::now().to_rfc3339(),
latency_ms: (latency_ms * 10.0).round() / 10.0,
checks,
};
if overall == "ok" {
HttpResponse::Ok().json(body)
} else {
HttpResponse::ServiceUnavailable().json(body)
}
}
Register the handler in src/main.rs:
use actix_web::{web, App, HttpServer};
use sqlx::PgPool;
mod health;
#[actix_web::main]
async fn main() -> std::io::Result<()> {
let db_url = std::env::var("DATABASE_URL")
.expect("DATABASE_URL must be set");
let pool = PgPool::connect(&db_url)
.await
.expect("Failed to connect to database");
let pool = web::Data::new(pool);
HttpServer::new(move || {
App::new()
.app_data(pool.clone())
// Health endpoint — no auth middleware
.route("/health", web::get().to(health::health_check))
// ... rest of your routes ...
})
.bind("0.0.0.0:8080")?
.run()
.await
}
Test it:
cargo run
curl -s http://localhost:8080/health | jq .
{
"status": "ok",
"timestamp": "2025-06-29T10:00:00+00:00",
"latency_ms": 0.9,
"checks": {
"database": "ok"
}
}
Step 2: Actor System Health Check
If you use Actix actors for background processing, an HTTP check against /health won't detect a deadlocked actor. Add an actor health probe using the Actix messaging system.
// src/actors/processor.rs
use actix::prelude::*;
pub struct ProcessorActor;
impl Actor for ProcessorActor {
type Context = Context<Self>;
}
// A lightweight ping message that returns when the actor is alive
#[derive(Message)]
#[rtype(result = "bool")]
pub struct Ping;
impl Handler<Ping> for ProcessorActor {
type Result = bool;
fn handle(&mut self, _msg: Ping, _ctx: &mut Context<Self>) -> bool {
true
}
}
Then extend the health check to ping the actor:
// src/health.rs (updated)
use crate::actors::processor::{Ping, ProcessorActor};
use actix::Addr;
pub async fn health_check(
pool: web::Data<PgPool>,
actor: web::Data<Addr<ProcessorActor>>,
) -> HttpResponse {
let start = Instant::now();
let mut checks = HashMap::new();
let mut overall = "ok".to_string();
// Database check
match sqlx::query("SELECT 1").execute(pool.get_ref()).await {
Ok(_) => { checks.insert("database".to_string(), "ok".to_string()); }
Err(e) => {
checks.insert("database".to_string(), format!("error: {}", e));
overall = "degraded".to_string();
}
}
// Actor system check — 2-second timeout
match tokio::time::timeout(
std::time::Duration::from_secs(2),
actor.send(Ping),
).await {
Ok(Ok(true)) => { checks.insert("actor".to_string(), "ok".to_string()); }
_ => {
checks.insert("actor".to_string(), "error: timeout or dead".to_string());
overall = "degraded".to_string();
}
}
let latency_ms = start.elapsed().as_secs_f64() * 1000.0;
let body = HealthResponse {
status: overall.clone(),
timestamp: Utc::now().to_rfc3339(),
latency_ms: (latency_ms * 10.0).round() / 10.0,
checks,
};
if overall == "ok" {
HttpResponse::Ok().json(body)
} else {
HttpResponse::ServiceUnavailable().json(body)
}
}
Register the actor address as app data:
let processor = ProcessorActor.start();
let processor = web::Data::new(processor);
HttpServer::new(move || {
App::new()
.app_data(pool.clone())
.app_data(processor.clone())
.route("/health", web::get().to(health::health_check))
})
Now a deadlocked or crashed actor shows up in your health response as 503 degraded, and Vigilmon alerts you immediately.
Step 3: Set Up the Vigilmon HTTP Monitor
- Log in to Vigilmon and click New Monitor → HTTP.
- Enter
https://your-domain.com/health. - Set check interval to 60 seconds (free) or 30 seconds (paid).
- Add assertions:
- Status code equals
200 - Response body contains
"status":"ok"
- Status code equals
- Save.
Vigilmon probes your endpoint from multiple regions and escalates to an alert if two or more consecutive checks fail.
Step 4: Configure Alert Channels
In Alerts → Channels:
Email — add your on-call address. Vigilmon notifies within 30 seconds of the first failed check and sends a recovery alert when the service comes back.
Slack — click Add Channel → Slack, paste your incoming webhook URL. Route CRITICAL monitors to both channels under Alerts → Routing.
Step 5: Multi-Stage Dockerfile with Cargo Build Caching
cargo build compiles your entire dependency tree from scratch unless you cache the layer correctly. This Dockerfile splits the dependency download from your application code so rebuilds only recompile your own crate:
FROM rust:1.75-slim AS builder
WORKDIR /app
# Cache dependency compilation separately from application code.
# Copy manifests first, build deps with a dummy main, then replace with real source.
COPY Cargo.toml Cargo.lock ./
RUN mkdir src && echo "fn main() {}" > src/main.rs
RUN cargo build --release && rm -rf src
# Now compile the actual application
COPY src ./src
RUN touch src/main.rs && cargo build --release
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y ca-certificates curl && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY --from=builder /app/target/release/myapp .
EXPOSE 8080
# Docker probes /health every 30s. 3 consecutive failures → restart.
HEALTHCHECK --interval=30s --timeout=5s --retries=3 \
CMD curl -f http://localhost:8080/health || exit 1
CMD ["./myapp"]
Build and run:
docker build -t myapp .
docker run -p 8080:8080 -e DATABASE_URL=postgres://... myapp
The first build compiles all dependencies (slow). Subsequent builds that only change src/ skip the dependency layer — much faster in CI.
Step 6: Heartbeat Monitor for Background tokio Tasks
If you run background tokio::spawn tasks (cron jobs, queue consumers, etc.), HTTP monitoring won't detect a silently stopped task. Set up a heartbeat:
- In Vigilmon, go to New Monitor → Heartbeat.
- Set the expected interval (e.g., 5 minutes).
- Copy the ping URL.
Ping it from your background task on each successful run:
// src/tasks/cleanup.rs
use std::time::Duration;
const VIGILMON_HEARTBEAT: &str = "https://vigilmon.online/ping/YOUR_HEARTBEAT_ID";
pub async fn run_cleanup_loop() {
let mut interval = tokio::time::interval(Duration::from_secs(300));
let client = reqwest::Client::new();
loop {
interval.tick().await;
match cleanup_old_records().await {
Ok(_) => {
// Ping Vigilmon to signal the task is alive and healthy
let _ = client
.get(VIGILMON_HEARTBEAT)
.timeout(Duration::from_secs(5))
.send()
.await;
}
Err(e) => {
// Don't ping — Vigilmon will detect the missed heartbeat
eprintln!("cleanup error: {}", e);
}
}
}
}
Spawn it from main:
tokio::spawn(crate::tasks::cleanup::run_cleanup_loop());
If the task panics or is dropped, the next ping never arrives and Vigilmon fires an alert.
Step 7: Graceful Shutdown
Actix-web's HttpServer handles SIGTERM and SIGINT gracefully via .shutdown_timeout(). Set this to a value less than your load balancer or orchestrator's drain timeout:
HttpServer::new(move || {
App::new()
.app_data(pool.clone())
.route("/health", web::get().to(health::health_check))
})
.bind("0.0.0.0:8080")?
// Give in-flight requests up to 10 seconds to finish after SIGTERM
.shutdown_timeout(10)
.run()
.await
Combine this with a Docker or Kubernetes pre-stop hook that waits for connections to drain before the process exits, and Vigilmon will see at most one missed check during a rolling deploy.
Step 8: Verify End-to-End
- Confirm the Vigilmon dashboard shows the monitor as UP.
- Temporarily return
503from your health handler and verify an alert fires. - Restore
200and confirm the recovery notification.
To simulate a database failure:
// Break the pool connection string temporarily
let pool = PgPool::connect("postgres://bad:creds@localhost/none").await?;
The health check returns 503 with "database": "error: ..." in the checks, and Vigilmon catches it within the next interval.
Production Checklist
- [ ]
/healthhandler returning200with JSON body - [ ] Database
sqlx::query("SELECT 1")in health check with timeout - [ ] Actor
Pingmessage check if using Actix actors - [ ] Vigilmon HTTP monitor with status code + body assertions
- [ ] Email and Slack alert channels configured
- [ ] Multi-stage
Dockerfilewith cargo layer caching - [ ]
HEALTHCHECKdirective inDockerfile - [ ] Heartbeat monitor for background
tokiotasks - [ ]
shutdown_timeoutset inHttpServer
Summary
You now have an Actix-web service that:
- Exposes a structured
/healthendpoint with active DB and actor checks - Reports sub-millisecond latency in the health response body
- Is probed every minute by Vigilmon from multiple regions
- Fires Slack and email alerts within seconds of any failure
- Builds efficiently in Docker with cargo layer caching
- Monitors background tokio tasks with heartbeat checks
- Shuts down gracefully on
SIGTERMwithout dropping requests
Actix-web's performance means your health endpoint adds negligible overhead. The monitoring infrastructure is free on Vigilmon's starter tier. Your on-call team will know about failures in Slack — long before users find them.