ScyllaDB is a high-performance, low-latency Cassandra-compatible database written in C++. It's used for time-series data, IoT ingestion, ad-tech, and any workload where predictable microsecond latency matters. When ScyllaDB goes down or a node drops out of a cluster, writes silently fail or route to fewer replicas, degrading fault tolerance before any application-level error appears. Vigilmon gives you external visibility into ScyllaDB's health: the REST API health endpoint, the CQL native transport port, the metrics endpoint, and SSL certificates — so you catch failures before they affect consistency or availability.
What You'll Build
- A monitor on ScyllaDB's REST health endpoint (
/, port 10000) - A TCP port check on the CQL native transport port (9042)
- A metrics endpoint availability check
- SSL certificate monitoring for your ScyllaDB domain
- Alerting that distinguishes node failures from cluster-wide degradation
Prerequisites
- A running ScyllaDB node or cluster with the REST API and CQL port accessible
- ScyllaDB REST API accessible at a reachable address (e.g.,
http://scylladb.example.com:10000) - A free account at vigilmon.online
Step 1: Verify ScyllaDB's Health Endpoints
ScyllaDB exposes a REST API on port 10000. The root endpoint returns node status:
# REST health check (port 10000)
curl http://scylladb.example.com:10000/
# CQL port reachability
nc -zv scylladb.example.com 9042
# Metrics endpoint
curl http://scylladb.example.com:9180/metrics | head -20
A healthy ScyllaDB node returns 200 OK on the REST endpoint. The CQL port (9042) should accept TCP connections. The metrics endpoint at port 9180 serves Prometheus-format metrics.
Step 2: Create a Vigilmon Monitor for the ScyllaDB REST Endpoint
- Log in to Vigilmon → Add Monitor → HTTP.
- URL:
http://scylladb.example.com:10000/. - Check interval: 60 seconds.
- Response timeout: 10 seconds.
- Expected status:
200. - Label:
ScyllaDB REST Health. - Click Save.
This is your primary node health signal — a non-200 response means the ScyllaDB REST API is unavailable, which typically indicates the node process has crashed or is unreachable.
Step 3: Add a TCP Monitor for the CQL Port
Applications connect to ScyllaDB via the CQL native transport on port 9042. Monitoring TCP reachability independently of the REST API catches cases where the node is partially started or the CQL listener has failed:
- Add Monitor → TCP.
- Host:
scylladb.example.com. - Port:
9042. - Check interval: 60 seconds.
- Label:
ScyllaDB CQL Port. - Click Save.
If this monitor fires while the REST endpoint is green, the CQL listener has failed — applications cannot connect even though the node's management API responds. This is a critical failure: no client can read or write.
Step 4: Monitor the Metrics Endpoint
ScyllaDB's Prometheus metrics endpoint (port 9180) provides rich observability data. Monitoring its availability ensures your metrics pipeline is healthy and that ScyllaDB's internal metrics subsystem is functioning:
- Add Monitor → HTTP.
- URL:
http://scylladb.example.com:9180/metrics. - Check interval: 5 minutes.
- Response timeout: 15 seconds.
- Expected status:
200. - Keyword:
scylla_(every ScyllaDB metrics response contains metric names prefixed withscylla_). - Label:
ScyllaDB Metrics Endpoint. - Click Save.
Step 5: Monitor SSL Certificates
If your ScyllaDB clients connect via TLS (recommended in production), monitor the certificate:
- Add Monitor → SSL Certificate.
- Domain:
scylladb.example.com. - Alert when expiry is within: 30 days.
- Alert again: 14 days, 7 days, 3 days, 1 day.
- Click Save.
An expired ScyllaDB TLS certificate causes all client connections to fail immediately with TLS handshake errors — no data reads or writes succeed.
Step 6: Configure Alerting
In Vigilmon under Settings → Notifications, configure your alert channels:
| Monitor | Trigger | Action |
|---|---|---|
| REST health (/, port 10000) | Non-200 | Node process crashed or unreachable; check ScyllaDB logs |
| CQL port (9042) | TCP fail | CQL listener down; applications cannot connect |
| Metrics endpoint (port 9180) | Non-200 or keyword missing | Metrics subsystem failed; check node health |
| SSL certificate | < 30 days to expiry | Renew TLS certificate; update ScyllaDB TLS config |
Alert after: 1 consecutive failure for REST and CQL monitors — ScyllaDB failures cause immediate write and read impact. 2 failures for the metrics endpoint to avoid false positives during brief collection pauses.
Common ScyllaDB Failure Modes and What Vigilmon Catches
| Scenario | Vigilmon monitor | |---|---| | Node process crash / OOM kill | REST endpoint unreachable; alert within 60 s | | CQL listener fails to start | TCP monitor fires; REST endpoint may stay green | | Node overloaded, REST API slow | REST timeout alert | | TLS certificate expires | SSL monitor alerts at 30-day threshold | | Metrics collection stops | Metrics keyword check fails | | Network partition isolates node | All monitors fire simultaneously | | Port 10000 firewall change | REST monitor fires; CQL may still pass |
ScyllaDB's REST API and metrics endpoints make external monitoring straightforward, but the CQL port check is the critical signal your applications care about — a node can respond on port 10000 while refusing CQL connections. Vigilmon's layered monitoring of the REST API, CQL port, metrics endpoint, and SSL certificate gives you complete external visibility into ScyllaDB node health, catching failures before they degrade cluster consistency or availability.
Start monitoring ScyllaDB in under 5 minutes — register free at vigilmon.online.