Rocket.Chat is a self-hosted open-source team messaging platform — a Slack alternative that runs entirely on your own infrastructure, giving you full control over your team's messages, files, and integrations. Engineering and IT teams choose Rocket.Chat to eliminate per-seat SaaS costs, meet data sovereignty requirements, and integrate with internal tooling via its REST and webhook APIs. When Rocket.Chat goes down, the impact is felt immediately across the entire organisation: team messaging stops, support channels go dark, CI/CD notifications are not delivered, and any automation that posts to Rocket.Chat rooms fails silently. The platform has a layered architecture — the Node.js application server, MongoDB database, and optional file storage must all be healthy for the platform to function. Vigilmon gives you external visibility into Rocket.Chat's health API, web app, TCP port, and SSL certificate so failures are caught within 60 seconds.
What You'll Build
- A monitor on Rocket.Chat's
/healthAPI endpoint - An HTTP monitor for the Rocket.Chat web application
- An HTTP monitor for the REST API liveness check (
/api/v1/inforeturning 200 confirms the API is alive) - A TCP monitor for Rocket.Chat's application port
- SSL certificate monitoring for your Rocket.Chat domain
- An alerting setup tuned for team communication criticality
Prerequisites
- A running Rocket.Chat instance with a public or network-reachable domain
- HTTPS configured (e.g.,
https://chat.example.com) - A free account at vigilmon.online
Step 1: Verify Rocket.Chat's Health Endpoint
Rocket.Chat exposes a health check at /health that reports the status of the application server and its MongoDB connection:
curl -i https://chat.example.com/health
A healthy instance returns HTTP 200 with a JSON body:
{
"status": "healthy"
}
This endpoint requires no authentication and is designed for uptime probes. A 200 with "status":"healthy" confirms the Node.js application server is running and the MongoDB replica set (or standalone) is reachable. A non-200 response or a timeout indicates the server has crashed, MongoDB is unreachable, or the container is restarting.
MongoDB dependency: Every Rocket.Chat message, channel, user account, room setting, role assignment, and notification preference is stored in MongoDB. If MongoDB goes down, the
/healthendpoint immediately reflects the failure — Rocket.Chat cannot function in any capacity without its database. This is the most important monitor in your Rocket.Chat stack.
Step 2: Create a Vigilmon HTTP Monitor for the Health Endpoint
- Log in to Vigilmon → Add Monitor → HTTP.
- URL:
https://chat.example.com/health. - Check interval: 60 seconds.
- Response timeout: 15 seconds.
- Expected status:
200. - Keyword:
healthy. - Label:
Rocket.Chat Health. - Click Save.
This monitor catches:
- Node.js application server crashes or OOM kills from message volume spikes
- MongoDB connectivity failures — Rocket.Chat stores all messages, rooms, users, files metadata, and settings in MongoDB; a database outage makes the entire platform non-functional
- MongoDB replica set election delays that temporarily render the primary unreachable
- Rocket.Chat process failures caused by misconfigured environment variables or failed migrations
The healthy keyword check ensures the application is reporting a healthy database connection — not just a 200 from a reverse proxy in front of an unreachable backend.
Step 3: Monitor the Rocket.Chat Web Application
The Rocket.Chat web application is the primary interface for all team members. Monitor it independently from the API to catch reverse proxy failures, static asset serving problems, and Meteor bundle loading errors:
- Add Monitor → HTTP.
- URL:
https://chat.example.com. - Check interval: 60 seconds.
- Expected status:
200. - Keyword:
Rocket.Chat. - Label:
Rocket.Chat Web App. - Click Save.
This monitor catches nginx or reverse proxy failures, CDN misconfiguration, and static bundle serving errors that prevent users from loading the messaging interface — even when the backend API is healthy. A broken web app means your entire team loses access to messages, channels, and notifications simultaneously, regardless of whether the underlying Node.js process is running.
Step 4: Monitor the REST API Liveness
Rocket.Chat's REST API is used by integrations, bots, CI/CD notification webhooks, and external automation scripts. The /api/v1/info endpoint returns server version and configuration without requiring authentication, making it an ideal liveness check:
curl -i https://chat.example.com/api/v1/info
# Expected: HTTP 200 with version and server info
- Add Monitor → HTTP.
- URL:
https://chat.example.com/api/v1/info. - Check interval: 60 seconds.
- Expected status:
200. - Keyword:
version. - Label:
Rocket.Chat REST API. - Click Save.
A 200 with version in the response confirms the Node.js application server and its Meteor framework are running and serving API routes. A 502 or 504 means the reverse proxy is running but the Rocket.Chat process is not responding. A timeout means the application has failed entirely.
Why the API matters beyond web users: Rocket.Chat integrations are often invisible until they break. CI/CD pipelines that post build results, monitoring systems that alert to Rocket.Chat rooms, and support ticketing systems that create DMs all depend on the REST API. An API failure silently breaks every automated notification your team relies on.
Step 5: Create a TCP Monitor for Rocket.Chat's Application Port
In the default Rocket.Chat deployment, the Node.js application listens on port 3000. If you expose this port directly (for internal network monitoring or before the nginx reverse proxy), a TCP check gives you the earliest possible failure signal:
- Add Monitor → TCP.
- Host:
chat.example.com. - Port:
3000. - Check interval: 60 seconds.
- Response timeout: 10 seconds.
- Label:
Rocket.Chat TCP Port 3000. - Click Save.
Note: In most production Rocket.Chat deployments, port 3000 is not exposed to the public internet — traffic arrives on port 443 through nginx, which proxies to port 3000 internally. If port 3000 is not externally reachable, skip this monitor. The health endpoint and web app monitors provide equivalent coverage via the HTTPS path. If you run Rocket.Chat in a Docker network with a separate monitoring container on the same network, the TCP check is valuable for detecting Node.js process failures before nginx starts returning errors.
Step 6: Monitor SSL Certificates
An expired SSL certificate on your Rocket.Chat instance breaks all team communication simultaneously:
- The web application becomes inaccessible across all browsers and mobile clients
- Rocket.Chat desktop apps refuse to connect to the server
- Rocket.Chat mobile apps cannot authenticate against the expired certificate
- REST API integrations and webhook deliveries fail with TLS errors
- Any custom scripts that call the Rocket.Chat API break with certificate validation errors
- Add Monitor → SSL Certificate.
- Domain:
chat.example.com. - Alert when expiry is within: 30 days.
- Alert again: 14 days, 7 days, 3 days, 1 day.
- Click Save.
Step 7: Configure Alerting
In Vigilmon under Settings → Notifications, configure your alert channels:
| Monitor | Trigger | Action |
|---|---|---|
| /health | Non-200 or healthy missing | Check Rocket.Chat container; inspect MongoDB connectivity and replica set status; review application logs |
| Web Application | Non-200 or keyword missing | Check nginx/reverse proxy; verify static bundle serving; inspect Meteor logs |
| REST API (/api/v1/info) | Non-200 response | Check Node.js process; inspect Meteor framework startup; verify database migrations completed |
| TCP Port 3000 | Connection refused or timeout | Check Rocket.Chat container; inspect process health; verify Docker networking |
| SSL certificate | < 30 days to expiry | Renew certificate; verify Let's Encrypt auto-renewal is functioning |
Alert after: 2 consecutive failures for HTTP monitors. 1 failure for the TCP monitor — a closed port means the process is not running and will not self-recover without intervention.
Escalation for team communication: Route Rocket.Chat alerts to an out-of-band communication channel (email, PagerDuty, or a secondary messaging tool). When Rocket.Chat is down, you cannot use Rocket.Chat to coordinate the incident response — make sure your on-call engineer receives the alert through an independent channel.
Common Rocket.Chat Failure Modes and What Vigilmon Catches
| Scenario | Vigilmon monitor |
|---|---|
| Node.js/Meteor server OOM-killed by large room | Health endpoint returns 503; alert within 60 s |
| MongoDB down or replica set election | Health check fails with "status":"unhealthy"; all messaging stops |
| MongoDB disk full from oplog or GridFS growth | Health check fails; file uploads stop before messaging does |
| Rocket.Chat process crash after failed migration | Health endpoint unreachable; TCP monitor fires |
| Nginx reverse proxy misconfiguration | Web app monitor fires; direct port 3000 may still be reachable |
| Static bundle serving failure after upgrade | Web app keyword check fails; blank screen for all users |
| REST API router failure after Meteor update | REST API monitor fires; integrations and bots stop |
| SSL certificate expires | All web and API access blocked; mobile clients lose connection |
| DNS misconfiguration | All monitors fire simultaneously |
| File storage (S3 or local) full | Uploads fail silently; health check passes; users report attachment errors |
| Redis session store failure (if configured) | Users logged out across all sessions; re-login required |
Team messaging is the nervous system of a distributed organisation — when Rocket.Chat goes down, coordination stops and incidents compound. Vigilmon watches Rocket.Chat's health API, web application, REST API, TCP port, and SSL certificate so you're alerted within 60 seconds of any failure, before your team loses the communication channel they need most during an outage.
Start monitoring Rocket.Chat in under 5 minutes — register free at vigilmon.online.