Monitoring OpenSearch with Vigilmon: Cluster Health API, Green/Yellow/Red Status Checks & SSL Certificate Alerts

OpenSearch is Amazon's open-source fork of Elasticsearch, used in production for log analytics, full-text search, and security event pipelines. While OpenSearch maintains API compatibility with Elasticsearch, it ships with its own security model, dashboard interface (OpenSearch Dashboards), and diverging configuration surface — and it fails in its own specific ways. When an OpenSearch cluster degrades to yellow or drops to red, search latency spikes, log ingestion backs up, and data can be permanently lost if shards go unassigned long enough. Vigilmon gives you external visibility into your OpenSearch cluster's health: the cluster health API status, index availability, and the green/yellow/red traffic-light signal that tells you exactly how serious a problem is.

What You'll Build

A monitor on OpenSearch's /_cluster/health endpoint with keyword checks for cluster status
Index-level availability checks for your most critical indices
SSL certificate monitoring for your OpenSearch endpoint
An alerting setup that distinguishes yellow (degraded) from red (data unavailable) cluster states

Prerequisites

A running OpenSearch 1.x+ or 2.x+ cluster with HTTP API access
An OpenSearch endpoint accessible over HTTP or HTTPS (e.g., https://opensearch.example.com:9200)
A free account at vigilmon.online

Step 1: Verify the Cluster Health API

OpenSearch exposes the same cluster health endpoint as Elasticsearch at /_cluster/health:

# Using the default admin credentials (change in production)
curl -u admin:admin -k https://opensearch.example.com:9200/_cluster/health

# Pretty-printed
curl -u admin:admin -k https://opensearch.example.com:9200/_cluster/health?pretty

A healthy cluster returns:

{
  "cluster_name": "opensearch-cluster",
  "status": "green",
  "timed_out": false,
  "number_of_nodes": 3,
  "number_of_data_nodes": 3,
  "active_primary_shards": 15,
  "active_shards": 30,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 0
}

The status field is the key signal:

green: All primary and replica shards are assigned and healthy
yellow: All primary shards are assigned, but some replicas are unassigned (reduced redundancy — data is available but not fully protected)
red: One or more primary shards are unassigned (some data is unavailable; searches or indexing may fail)

OpenSearch Security vs. Elasticsearch X-Pack: OpenSearch ships with its own security plugin enabled by default (unlike Elasticsearch, which requires an X-Pack licence for TLS). The default superuser is admin:admin — change this immediately in production. For Vigilmon, create a dedicated monitoring user with read-only access to the _cluster APIs.

Step 2: Create a Vigilmon Monitor for Cluster Health

Log in to Vigilmon → Add Monitor → HTTP.
URL: https://opensearch.example.com:9200/_cluster/health.
Check interval: 60 seconds.
Response timeout: 15 seconds.
Expected status: 200.
Keyword: "green" (matches the status field when the cluster is fully healthy).
Label: OpenSearch Cluster Health.
Click Save.

Authentication: If your cluster requires authentication, Vigilmon supports basic auth headers. Add the Authorization: Basic <base64(user:password)> header using your monitoring user's credentials.

This monitor alerts when:

The cluster goes down entirely (connection refused or 5xx)
The cluster degrades to yellow or red (keyword "green" is absent from the response)

Note on yellow clusters: A keyword check for "green" alerts on both yellow and red states. This is intentional — yellow means replica loss and warrants investigation even though primary data is available.

Step 3: Monitor for Red Cluster Status Separately

Add a second monitor specifically for red cluster status — this is your critical/urgent alert while the green-check is your warning:

Add Monitor → HTTP.
URL: https://opensearch.example.com:9200/_cluster/health.
Check interval: 60 seconds.
Expected status: 200.
Keyword: "red" — configure this monitor to alert when the keyword is present (inverted keyword check).
Label: OpenSearch cluster RED — data unavailable.
Click Save.

A red cluster in OpenSearch means primary shards are unassigned and some data cannot be read or written. This warrants immediate escalation to on-call engineers.

Step 4: Monitor Index Availability

For your most critical indices, add individual index-level health checks. The /_cluster/health/{index-name} endpoint narrows the health report to a single index:

curl -u admin:admin -k https://opensearch.example.com:9200/_cluster/health/my-critical-index

Add a monitor for each business-critical index:

Add Monitor → HTTP.
URL: https://opensearch.example.com:9200/_cluster/health/my-critical-index.
Check interval: 2 minutes.
Expected status: 200.
Keyword: "green".
Label: Index: my-critical-index.
Click Save.

Prioritise indices that:

Back a user-facing search feature
Receive real-time log or event ingestion
Use Data Streams (OpenSearch's preferred log storage format, replacing index aliases)

OpenSearch Data Streams: If you use OpenSearch Data Streams (the OpenSearch-native replacement for Elasticsearch ILM rollover aliases), monitor the backing index pattern: /_cluster/health/.ds-logs-* or your specific stream name.

Step 5: Monitor the OpenSearch Dashboards UI

OpenSearch Dashboards (OpenSearch's replacement for Kibana) is the web interface your team uses to query data and view dashboards. Add a separate HTTP monitor for it:

Add Monitor → HTTP.
URL: https://opensearch-dashboards.example.com (typically port 5601, or behind a reverse proxy on 443).
Check interval: 5 minutes.
Expected status: 200.
Keyword: OpenSearch Dashboards.
Label: OpenSearch Dashboards UI.
Click Save.

OpenSearch Dashboards connects to the OpenSearch API at startup — if the cluster is unavailable, Dashboards may still respond with a 200 but show a connection error banner. The keyword check catches this case if OpenSearch Dashboards is absent from the rendered HTML.

Step 6: Monitor SSL Certificates

OpenSearch clusters use TLS for both the HTTP REST API and the inter-node transport layer. An expired certificate breaks:

Client connections to the REST API
Inter-node transport (causing cluster partition)
OpenSearch Dashboards connections to the cluster

Add Monitor → SSL Certificate.
Domain: opensearch.example.com.
Port: 9200 (if not behind a reverse proxy on 443).
Alert when expiry is within: 30 days.
Alert again: 14 days, 7 days, 3 days, 1 day.
Click Save.

OpenSearch certificate types: OpenSearch uses separate certificates for the REST API (http.ssl) and transport layer (transport.ssl). Vigilmon can only check the REST API certificate directly — monitor the transport certificate expiry through your certificate management tooling or OpenSearch's built-in certificate expiry alerts.

Step 7: Configure Alerting

In Vigilmon under Settings → Notifications, configure your alert channels:

| Monitor | Trigger | Action | |---|---|---| | Cluster health (green check) | Keyword "green" absent | Investigate cluster state; check unassigned shards | | Cluster health (red check) | Keyword "red" present | Escalate immediately; data availability is impacted | | Index health | Non-green status | Check shard allocation for that specific index | | Dashboards UI | Non-200 or keyword missing | Dashboards service or reverse proxy failure | | SSL certificate | < 30 days to expiry | Renew certificate; check cert-manager or manual renewal |

Alert after: 2 consecutive failures for yellow-detection monitors (brief shard movements during rolling restarts create transient yellow states). 1 failure for red-detection monitors — red means data unavailability.

Key Differences from Elasticsearch Monitoring

OpenSearch and Elasticsearch share the same /_cluster/health API structure, but there are important differences when setting up monitoring:

| Area | Elasticsearch | OpenSearch | |---|---|---| | Security default | Security disabled (OSS); X-Pack for paid | Security plugin enabled by default | | Default credentials | None (OSS) | admin:admin (must be changed) | | Dashboard | Kibana | OpenSearch Dashboards | | Certificate config | xpack.security.http.ssl | plugins.security.ssl.http | | Log format | Elasticsearch JSON | Same format; opensearch in log names | | ILM replacement | Index Lifecycle Management | Index State Management (ISM) |

If you are migrating monitoring from Elasticsearch to OpenSearch, update the authentication headers for your Vigilmon monitors and add Dashboards monitoring if you are switching from Kibana.

Common OpenSearch Failure Modes and What Vigilmon Catches

| Scenario | Vigilmon monitor | |---|---| | Cluster process down | /_cluster/health connection refused; alert within 60 s | | Node OOM / crash | Node count change; cluster may go yellow/red | | Disk full on a data node | Cluster goes red as shards become read-only | | Network partition splits cluster | Cluster goes red; all monitors fire | | OpenSearch Security plugin misconfiguration | Auth failures; REST API returns 401/403 | | SSL/TLS certificate expires (REST API) | SSL monitor alerts at 30-day threshold | | SSL/TLS transport certificate expires | Inter-node connectivity breaks; cluster goes red | | Rolling restart causes transient yellow | Green-check fires; red-check stays silent (non-critical) | | Master node loss | Cluster becomes unavailable; all monitors fire | | ISM policy failure causes index growth | Index health degrades; disk usage spikes |

OpenSearch clusters fail in ways that are invisible to application error logs — a yellow cluster serves degraded reads silently, and a red cluster can return stale cached results while actually being unable to serve fresh data. Vigilmon's external monitoring of the cluster health API with green/yellow/red keyword checks gives you the traffic-light view of your OpenSearch cluster from outside the system, catching degraded states before they escalate to full data unavailability.

Start monitoring OpenSearch in under 5 minutes — register free at vigilmon.online.