Uptime Monitoring for Kotlin/Ktor Applications

Your Ktor application is running in production — but is it actually up? Ktor is lightweight by design; it doesn't ship an opinionated health system the way Spring Boot does. That makes adding your own health endpoint dead simple, but it also means no one is watching it unless you hook up external monitoring.

One silent 503 at 3 AM, a stuck coroutine job that stopped processing orders, a database connection pool that quietly exhausted itself — these failures cost you users before anyone notices. By the end of this guide you'll have external uptime monitoring, multi-region checks, heartbeat monitoring for your Kotlin coroutine jobs, and a public status page — all on the free tier.

Why Kotlin/Ktor apps still go dark

Ktor apps fail silently in two common ways:

Endpoint failures — your API starts returning 500s or timeouts. The process is still alive. No exception bubbles out to kill the JVM. Your health route might still return OK while individual business routes are broken by a bad deploy or a saturated thread pool.

Silent coroutine job failures — a launch block in a background coroutine catches an exception internally and moves on, or the whole CoroutineScope is cancelled with no alert. Your nightly sync job stopped running three days ago and nobody knew.

Both are invisible from inside your own infrastructure. You need an independent external observer — a monitoring service that probes your app from outside and alerts you the moment something goes wrong.

Step 1: Add a health endpoint to your Ktor app

Add the following Ktor dependencies to your build.gradle.kts:

dependencies {
    implementation("io.ktor:ktor-server-core:2.3.12")
    implementation("io.ktor:ktor-server-netty:2.3.12")
    implementation("io.ktor:ktor-server-content-negotiation:2.3.12")
    implementation("io.ktor:ktor-serialization-kotlinx-json:2.3.12")
}

Now add a /health route to your Ktor application. The pattern is a dedicated plugin so health check logic stays separate from your business routes:

import io.ktor.http.*
import io.ktor.server.application.*
import io.ktor.server.response.*
import io.ktor.server.routing.*
import kotlinx.serialization.Serializable

@Serializable
data class HealthResponse(
    val status: String,
    val components: Map<String, ComponentStatus>
)

@Serializable
data class ComponentStatus(val status: String, val detail: String? = null)

fun Application.configureHealth(db: DatabaseService) {
    routing {
        get("/health") {
            val dbStatus = checkDatabase(db)
            val overall = if (dbStatus.status == "UP") HttpStatusCode.OK else HttpStatusCode.ServiceUnavailable

            call.respond(
                overall,
                HealthResponse(
                    status = if (overall == HttpStatusCode.OK) "UP" else "DOWN",
                    components = mapOf("database" to dbStatus)
                )
            )
        }
    }
}

private suspend fun checkDatabase(db: DatabaseService): ComponentStatus {
    return try {
        db.ping()
        ComponentStatus(status = "UP")
    } catch (e: Exception) {
        ComponentStatus(status = "DOWN", detail = e.message)
    }
}

When the database is healthy the endpoint returns HTTP 200 and "status": "UP". When it's down it returns HTTP 503 — exactly the signal your monitoring tool needs to open an incident.

Wire it into your server entry point:

fun main() {
    embeddedServer(Netty, port = 8080) {
        val db = DatabaseService()
        configureHealth(db)
        configureRouting()
    }.start(wait = true)
}

Verify it locally:

curl -s http://localhost:8080/health | jq .

Expected output when everything is healthy:

{
  "status": "UP",
  "components": {
    "database": {
      "status": "UP"
    }
  }
}

Step 2: Add a custom dependency health check

For dependencies beyond your primary database — external APIs, message brokers, caches — add a typed HealthChecker interface so each check is self-contained:

interface HealthChecker {
    val name: String
    suspend fun check(): ComponentStatus
}

class RedisHealthChecker(private val redis: RedisClient) : HealthChecker {
    override val name = "redis"

    override suspend fun check(): ComponentStatus {
        return try {
            redis.ping()
            ComponentStatus(status = "UP")
        } catch (e: Exception) {
            ComponentStatus(status = "DOWN", detail = e.message)
        }
    }
}

class ExternalApiHealthChecker(private val client: HttpClient, private val url: String) : HealthChecker {
    override val name = "external-api"

    override suspend fun check(): ComponentStatus {
        return try {
            val response = client.get(url)
            if (response.status.isSuccess()) {
                ComponentStatus(status = "UP")
            } else {
                ComponentStatus(status = "DOWN", detail = "HTTP ${response.status.value}")
            }
        } catch (e: Exception) {
            ComponentStatus(status = "DOWN", detail = e.message)
        }
    }
}

Update the health plugin to run all checkers concurrently using async/await — no point waiting for each one serially:

fun Application.configureHealth(vararg checkers: HealthChecker) {
    routing {
        get("/health") {
            val results = coroutineScope {
                checkers.map { checker ->
                    async { checker.name to checker.check() }
                }.awaitAll().toMap()
            }

            val allUp = results.values.all { it.status == "UP" }
            val httpStatus = if (allUp) HttpStatusCode.OK else HttpStatusCode.ServiceUnavailable

            call.respond(
                httpStatus,
                HealthResponse(
                    status = if (allUp) "UP" else "DOWN",
                    components = results
                )
            )
        }
    }
}

All health checks run in parallel, so the response time is bounded by your slowest dependency — not the sum of all of them.

Step 3: Set up external monitoring with Vigilmon

With /health live and returning accurate 503s on failure, point Vigilmon at it:

Sign up at vigilmon.online — free tier, no credit card
Click New Monitor → HTTP
Enter https://yourdomain.com/health
Set check interval (5 minutes on free tier)
Save

Vigilmon probes from multiple geographic regions simultaneously. If it gets a non-2xx response or a connection timeout, it opens an incident and alerts you — before your users notice.

Add multiple monitors for different layers of your stack:

| Endpoint | What it catches | |---|---| | /health | Database down, external API unreachable, custom dependency failures | | /api/v1/ping | Business API layer breakage | | / | Frontend / static asset serving broken |

Monitor your database sidecar and message broker separately if they're exposed — that way you know whether the problem is Ktor or its dependencies.

Step 4: Heartbeat monitoring for coroutine jobs

HTTP monitors don't catch silent background job failures. For that you need heartbeat monitoring.

The pattern: at the end of each successful coroutine job run, ping a unique URL. If Vigilmon stops receiving pings within the expected window, it fires an alert. No ping = job failed or crashed.

Create a heartbeat client that reads the URL from environment config so you can swap it per environment without code changes:

import io.ktor.client.*
import io.ktor.client.request.*

class HeartbeatClient(
    private val httpClient: HttpClient,
    private val heartbeatUrl: String?
) {
    suspend fun ping() {
        if (heartbeatUrl.isNullOrBlank()) return
        try {
            httpClient.get(heartbeatUrl)
        } catch (e: Exception) {
            // Log failure but don't rethrow — monitoring outage shouldn't crash the job
            println("Heartbeat ping failed: ${e.message}")
        }
    }
}

Use it in a scheduled coroutine job:

import kotlinx.coroutines.*

class ReportJob(
    private val reportService: ReportService,
    private val heartbeat: HeartbeatClient,
    private val scope: CoroutineScope
) {
    fun schedule() {
        scope.launch {
            while (isActive) {
                runCatching {
                    processReport()
                    // Only ping on success — an exception skips the heartbeat
                    heartbeat.ping()
                }.onFailure { e ->
                    println("Report job failed: ${e.message}")
                    // Don't ping heartbeat — let Vigilmon detect the missed beat
                }

                delay(24 * 60 * 60 * 1000L) // 24 hours
            }
        }
    }

    private suspend fun processReport() {
        reportService.generate()
    }
}

fun Application.configureJobs(httpClient: HttpClient) {
    val heartbeat = HeartbeatClient(
        httpClient = httpClient,
        heartbeatUrl = environment.config.propertyOrNull("heartbeat.report.url")?.getString()
    )

    val reportJob = ReportJob(
        reportService = ReportService(),
        heartbeat = heartbeat,
        scope = this
    )

    reportJob.schedule()
}

Add to your application.conf (or environment variables in production):

heartbeat {
    report {
        url = ${?HEARTBEAT_REPORT_URL}
    }
}

In Vigilmon:

Click New Monitor → Heartbeat
Set the expected interval (e.g. 25 hours for a daily job — gives a 1-hour grace window)
Copy the unique ping URL
Set HEARTBEAT_REPORT_URL in your deployment environment

Now if the coroutine job throws, the runCatching block logs the failure and skips heartbeat.ping(). Vigilmon sees a missed beat and alerts you.

Step 5: Webhook alerts and the uptime badge

Slack and Discord alerts:

In Vigilmon go to Notifications → New Channel
Choose Slack or Discord, paste your webhook URL
Enable it on your monitors

You'll get an instant alert when a monitor goes down and a recovery notification when it comes back up. No more checking logs at 9 AM to find out the site was down all night.

Add an uptime badge to your README:

Open any monitor in Vigilmon and copy the badge embed code:

[![Uptime](https://vigilmon.online/badge/your-monitor-id.svg)](https://vigilmon.online?utm_source=devto&utm_medium=article&utm_campaign=kotlin-tutorial)

Drop it at the top of your README.md. The badge stays green while your monitors are healthy and flips red during an incident — useful for open-source Kotlin libraries to show consumers that CI and the demo API are alive.

What you've built

| What | How | |---|---| | External health checks | /health endpoint + Vigilmon HTTP monitor | | Parallel dependency checks | Concurrent async/await health checkers | | Silent coroutine job detection | Heartbeat ping inside runCatching blocks | | Instant incident alerts | Slack/Discord webhook notifications | | README status badge | Vigilmon badge embed |

The whole setup runs on Vigilmon's free tier and takes under 30 minutes. Your /health endpoint returns accurate 503s when dependencies fail, Vigilmon probes it from multiple regions, and your coroutine jobs ping in on every successful run. The next silent failure gets caught before your users do.

Monitor your Kotlin app free at vigilmon.online — monitors running in under a minute, no credit card required.