tutorial

Monitoring IoT and Edge Devices with Vigilmon: A Practical Guide for 2026

IoT and edge device fleets present a monitoring challenge that conventional infrastructure tools don't fully address. Servers in a data center are assumed to...

IoT and edge device fleets present a monitoring challenge that conventional infrastructure tools don't fully address. Servers in a data center are assumed to be always-on and network-reachable. IoT devices — sensors on factory floors, gateways in remote locations, cameras in the field, edge compute nodes running inference workloads — have intermittent connectivity, constrained resources, and failure modes that don't map cleanly to standard uptime monitoring patterns.

This guide covers the practical strategies for monitoring IoT device availability and backend health using Vigilmon in 2026: HTTP health endpoints on edge devices, MQTT broker monitoring, firmware update detection, heartbeat monitoring for devices that phone home, alerting on device fleet health, and setting up Vigilmon for IoT backends.


The Core Challenge of IoT Availability Monitoring

IoT monitoring has several properties that distinguish it from traditional service monitoring:

Intermittent connectivity: Many IoT devices don't maintain persistent connections. A sensor might transmit readings every 5 minutes and be unreachable between transmissions. An always-on probe that checks connectivity every 30 seconds would generate false alerts constantly.

Diverse communication protocols: IoT devices use HTTP, MQTT, CoAP, AMQP, WebSockets, and proprietary protocols. Standard HTTP probing doesn't reach devices behind MQTT brokers or using non-HTTP transports.

Fleet scale: A single IoT deployment might involve hundreds or thousands of devices. Monitoring each individually is impractical; you need fleet-level health signals.

Backend dependency: The availability failure mode that matters most to end users is usually not "one device is offline" but "the IoT backend is down and no data is flowing from any device." The backend is the critical availability surface.

Silent failures: IoT devices fail in subtle ways — firmware updates that brick a device, sensors that keep transmitting without producing valid readings, edge nodes that appear up but have stale data caches. Simple TCP reachability doesn't catch these.

Vigilmon's monitoring approach handles the backend and communication-layer aspects of this problem cleanly. Here's how to apply each capability.


HTTP Health Endpoints on Edge Devices

If your edge devices run a web server or HTTP API — common in industrial IoT gateways, edge compute nodes, and Raspberry Pi deployments — exposing a /health endpoint creates a natural uptime monitoring surface.

Designing an IoT Health Endpoint

A useful health endpoint for an edge device should report:

{
  "status": "ok",
  "uptime_seconds": 86400,
  "last_reading_at": "2026-06-30T10:15:00Z",
  "sensor_readings": 1847,
  "connectivity": "online",
  "firmware_version": "2.4.1",
  "storage_free_mb": 512
}

This gives Vigilmon — and any monitoring tool — a rich signal surface to work with.

Configuring Vigilmon for Edge Device HTTP Monitoring

For edge devices with public IP addresses or accessible via a reverse proxy/VPN:

  1. Add an HTTP monitor in Vigilmon pointing to https://device.yourdomain.com/health
  2. Set status code check: 200
  3. Add a keyword check for "status": "ok" to validate the response body
  4. Set check interval to 1 minute (paid) or 5 minutes (free tier)

Using keyword matching for data freshness: If your device reports last_reading_at in the health response, you can't validate the timestamp directly in Vigilmon's keyword check. Instead, have your health endpoint return a degraded status if the last reading is stale:

{
  "status": "degraded",
  "reason": "no_reading_for_300s"
}

Vigilmon then catches data freshness failures via the keyword check on "status": "ok".

Edge Devices Behind NAT or Firewalls

Most edge devices are not publicly addressable. They sit behind NAT, on private LAN segments, or behind cellular data connections that don't allow inbound connections. In this case, direct HTTP probing from Vigilmon isn't viable.

The pattern to use instead: the backend exposes device status via an API. Your IoT backend aggregates device connectivity state, and Vigilmon monitors that backend API endpoint.


MQTT Broker Monitoring

MQTT is the dominant messaging protocol in IoT architectures. Devices publish sensor readings to an MQTT broker (Mosquitto, EMQX, AWS IoT Core, HiveMQ), and backend services subscribe to topics to process the data.

Vigilmon cannot subscribe to MQTT topics directly, but you can monitor MQTT broker availability in two ways:

TCP Port Monitoring for MQTT Brokers

MQTT runs on TCP port 1883 (unencrypted) or 8883 (TLS). Vigilmon's TCP monitor checks whether the port is open and accepting connections:

  1. Add a TCP monitor in Vigilmon
  2. Set host to your MQTT broker address
  3. Set port to 1883 or 8883
  4. Set check interval to 1 minute

This confirms the broker process is running and accepting connections. It does not validate message flow, but it catches broker crashes, firewall misconfigurations, and port binding failures immediately.

MQTT Health API Endpoint

Most production MQTT brokers expose an HTTP management API:

  • Mosquitto: mosquitto_sub -t '$SYS/#' provides broker stats; third-party dashboards expose these via HTTP
  • EMQX: Ships with a REST API at /api/v5/status — returns broker health, connected clients, message rates
  • HiveMQ: REST API at /api/v1/health returns broker health status

Add an HTTP monitor to Vigilmon pointing at your MQTT broker's health API endpoint. For EMQX:

https://mqtt-broker.yourdomain.com/api/v5/status

Keyword check: "status": "running" (or equivalent for your broker). This gives you application-layer MQTT health monitoring beyond the TCP port check.


Firmware Update Detection via API

Firmware updates are one of the highest-risk events in an IoT device fleet. A failed firmware update can brick devices silently — the device stops reporting, but monitoring tools that only check network reachability don't catch the failure.

Pattern: Version Endpoint Monitoring

Have each device (or the backend on behalf of devices) expose a version endpoint:

GET /api/devices/{id}/firmware

Response:

{
  "device_id": "sensor-042",
  "firmware_version": "2.4.1",
  "expected_version": "2.4.1",
  "status": "current"
}

When a device is stuck on an old version after a fleet-wide update, the endpoint returns:

{
  "status": "outdated",
  "firmware_version": "2.3.8",
  "expected_version": "2.4.1"
}

Vigilmon keyword check on "status": "current" catches devices that failed to update.

For fleet-level detection, have your backend expose an aggregate endpoint:

GET /api/firmware/fleet-status

Response:

{
  "status": "ok",
  "devices_current": 248,
  "devices_outdated": 0,
  "devices_unreachable": 0
}

If any devices are stuck or unreachable post-update, the status changes to "degraded". Vigilmon detects it on the next check cycle.


Heartbeat Monitoring for Devices That Phone Home

The most powerful Vigilmon feature for IoT architectures is heartbeat monitoring — and it maps directly to the "phone home" pattern that IoT devices already use.

How IoT Heartbeats Work

Many IoT devices are designed to send periodic check-in messages to a backend service:

  • A smart meter sends a daily summary reading
  • An industrial sensor sends a heartbeat every 5 minutes confirming it's operational
  • A field device sends a status packet every 15 minutes

Normally, these check-ins are consumed by the backend and logged — but there's no alerting if they stop arriving. Vigilmon's heartbeat monitoring adds that alerting.

Setting Up Heartbeat Monitors in Vigilmon

  1. Create a heartbeat monitor in Vigilmon for each device type (or device group)
  2. Vigilmon generates a unique ping URL: https://hb.vigilmon.online/ping/your-heartbeat-id
  3. Configure each device to POST to this URL on each successful check-in
  4. Set the timeout period (e.g., 10 minutes for a device that should check in every 5 minutes)
  5. If no ping arrives within the timeout window, Vigilmon fires an alert

For a fleet of identical devices, you don't need one heartbeat monitor per device. Use a fleet-level heartbeat:

# On each device check-in, proxy through the backend:
import requests

def device_checkin(device_id: str, readings: dict):
    backend.record_readings(device_id, readings)
    # Only ping Vigilmon if at least one device checked in recently
    if backend.devices_active_count() > 0:
        requests.post("https://hb.vigilmon.online/ping/fleet-heartbeat", timeout=5)

This fires Vigilmon's heartbeat alert when the entire fleet goes dark — no devices checking in — rather than alerting on individual device failures.

Per-Device Heartbeat Pattern

For smaller fleets where individual device health matters:

# Each device sends its own heartbeat ping via the backend
def device_checkin(device_id: str, readings: dict):
    backend.record_readings(device_id, readings)
    heartbeat_url = DEVICE_HEARTBEAT_URLS.get(device_id)
    if heartbeat_url:
        requests.post(heartbeat_url, timeout=5)

Maintain a mapping of device_id → Vigilmon heartbeat URL. Create one heartbeat monitor per critical device in Vigilmon.


Alerting on Device Fleet Health

The Backend API as Fleet Health Surface

Rather than monitoring individual devices, expose a fleet health endpoint from your IoT backend and monitor that endpoint with Vigilmon.

Example fleet health endpoint:

GET /api/fleet/health
{
  "status": "ok",
  "total_devices": 312,
  "online_devices": 311,
  "offline_devices": 1,
  "offline_threshold_pct": 5.0,
  "data_ingestion_rate_per_min": 2847,
  "last_ingestion_at": "2026-06-30T10:22:00Z"
}

Your backend sets "status" based on business-defined thresholds:

  • "ok" — fewer than 5% of devices offline, data flowing normally
  • "degraded" — 5–20% of devices offline or data ingestion rate below threshold
  • "critical" — more than 20% offline or data ingestion stopped

Vigilmon keyword check on "status": "ok" — if degraded or critical, the check fails and alerts fire. This single monitor covers fleet-wide availability without creating hundreds of individual device monitors.

Layered Monitoring Strategy

For comprehensive IoT backend coverage, use a layered approach:

| Layer | Monitor Type | What It Detects | |---|---|---| | MQTT broker TCP | TCP port 8883 | Broker process down | | MQTT broker API | HTTP | Broker health degraded | | IoT backend API | HTTP + keyword | Backend down or fleet degraded | | Data ingestion endpoint | HTTP | Write path unavailable | | Fleet heartbeat | Heartbeat | All device reporting stopped | | Data processing job | Heartbeat | Pipeline processing stopped |


Practical Vigilmon Setup for an IoT Backend

Here's a complete setup for a typical IoT backend stack:

Monitor 1: Backend API Health

  • Type: HTTP
  • URL: https://api.iot.yourdomain.com/health
  • Expected status: 200
  • Keyword: "status": "ok"
  • Interval: 1 minute

Monitor 2: MQTT Broker Availability

  • Type: TCP
  • Host: mqtt.iot.yourdomain.com
  • Port: 8883
  • Interval: 1 minute

Monitor 3: MQTT Broker Health API

  • Type: HTTP
  • URL: https://mqtt.iot.yourdomain.com/api/v5/status
  • Expected status: 200
  • Keyword: "node_status": "Running"
  • Interval: 2 minutes

Monitor 4: Fleet Health Dashboard

  • Type: HTTP
  • URL: https://api.iot.yourdomain.com/api/fleet/health
  • Expected status: 200
  • Keyword: "status": "ok"
  • Interval: 2 minutes

Monitor 5: Data Ingestion Pipeline Heartbeat

  • Type: Heartbeat
  • Timeout: 10 minutes (for a pipeline that processes every 5 minutes)
  • Ping URL: Configure your data processing job to POST on each successful run

Monitor 6: Device Firmware Status

  • Type: HTTP
  • URL: https://api.iot.yourdomain.com/api/firmware/fleet-status
  • Expected status: 200
  • Keyword: "status": "ok"
  • Interval: 5 minutes

Monitor 7: Backup Job Heartbeat

  • Type: Heartbeat
  • Timeout: 25 hours (for a daily backup job)
  • Ping URL: Configure your backup script to POST on successful completion

SSL Certificate Monitoring for IoT Backends

IoT backends often use mutual TLS (mTLS) for device authentication, but the public-facing API and device enrollment endpoints use standard TLS certificates. Vigilmon automatically monitors SSL certificate expiry for all HTTPS monitors.

An expired certificate on the device enrollment endpoint or the MQTT broker's TLS port breaks all device connections silently. Vigilmon's automatic SSL expiry warnings give you 30-day advance notice to renew before devices start failing to connect.


What Vigilmon Cannot Do for IoT

Be clear about the scope:

  • Vigilmon cannot monitor devices behind NAT without a phone-home / heartbeat pattern or backend proxy
  • Vigilmon cannot validate sensor data quality — it can check that your API returns "status": "ok", but whether the temperature readings are physically accurate requires domain-specific validation in your application
  • Vigilmon cannot monitor proprietary IoT protocols (Zigbee, Z-Wave, LoRaWAN, BLE) — these require protocol-specific gateways that expose HTTP health endpoints
  • Vigilmon cannot replace device management platforms like AWS IoT Core, Azure IoT Hub, or Google Cloud IoT — these provide device registry, certificate management, and over-the-air updates that Vigilmon doesn't cover

Vigilmon's role in an IoT monitoring stack is the availability layer: confirming that the backends, brokers, and processing pipelines are up, and that devices are phoning home as expected.


Summary

IoT monitoring is a multi-layer problem. Vigilmon covers the availability and connectivity layers effectively:

  • HTTP health endpoints on edge devices and backends for status-code and keyword validation
  • TCP port monitoring for MQTT brokers and other non-HTTP services
  • Heartbeat monitoring for devices that phone home and for data processing pipelines
  • Fleet health API monitoring for aggregate device fleet status with a single monitor
  • SSL certificate monitoring for device enrollment and broker TLS endpoints

These patterns work with any IoT stack — AWS IoT, Azure IoT Hub, self-hosted EMQX, or custom backends — without requiring Vigilmon to understand the IoT protocols directly.

Try Vigilmon free at vigilmon.online — 5 monitors, no credit card, no trial expiry, multi-region consensus alerting from the first monitor.


Tags: #iot #edgecomputing #monitoring #uptime #mqtt #heartbeat #vigilmon #devops #iotmonitoring #2026

Monitor your app with Vigilmon

Free plan — 5 monitors, no credit card required. Up and running in 60 seconds.

Start free →