Harbor is the enterprise-grade container registry that teams self-host for image security scanning, role-based access control, and compliance — hosting the Docker images that every Kubernetes deployment pulls. When Harbor's registry component goes down, docker pull fails in CI and kubectl can't pull images for new deployments. When the core API is unhealthy, vulnerability scans stop and replication jobs fail silently. When the SSL certificate expires, every container runtime rejects TLS and image pulls fail across all environments. Vigilmon gives you external visibility into Harbor's health: the built-in health API, individual component status, SSL certificate expiry, and portal availability.
What You'll Build
- A monitor on Harbor's
/api/v2.0/healthendpoint for overall cluster health - Component-level monitors for core, registry, and portal services
- SSL certificate monitoring for your Harbor domain
- A web UI monitor for the Harbor portal
- An alerting setup that isolates component failures from full-stack outages
Prerequisites
- A running Harbor 2.0+ instance with HTTPS configured
- A domain pointing to Harbor (e.g.,
https://harbor.example.com) - A free account at vigilmon.online
Step 1: Verify Harbor's Health API
Harbor exposes a unified health endpoint at /api/v2.0/health that aggregates the status of all internal components:
curl https://harbor.example.com/api/v2.0/health
A healthy Harbor returns a JSON body listing each component's status:
{
"status": "healthy",
"components": [
{"name": "core", "status": "healthy"},
{"name": "database", "status": "healthy"},
{"name": "jobservice", "status": "healthy"},
{"name": "portal", "status": "healthy"},
{"name": "redis", "status": "healthy"},
{"name": "registry", "status": "healthy"},
{"name": "registryctl", "status": "healthy"},
{"name": "trivy", "status": "healthy"}
]
}
The top-level "status": "healthy" field reflects the overall cluster state. If any component is unhealthy, this changes to "unhealthy".
Authentication: The
/api/v2.0/healthendpoint does not require authentication — it is designed for external health checks and load balancers.
Step 2: Create a Vigilmon HTTP Monitor for the Health API
- Log in to Vigilmon → Add Monitor → HTTP.
- URL:
https://harbor.example.com/api/v2.0/health. - Check interval: 60 seconds.
- Response timeout: 15 seconds.
- Expected status:
200. - Keyword:
"status":"healthy"(matches the top-level health status field). - Click Save.
This monitor catches:
- Harbor process crashes or Docker daemon failures on the host
- Database connectivity failures (Harbor marks itself unhealthy when PostgreSQL is unreachable)
- Redis cache failures that affect session management and job queues
- Deployment failures after Harbor upgrades
Alert sensitivity: Set alerts to trigger after 1 consecutive failure. An unhealthy Harbor means image pulls may start failing for any component that's down, and CI/CD pipelines will begin failing within minutes.
Step 3: Monitor Individual Components
The top-level health check tells you something is wrong, but component-level monitors tell you what is wrong. Add separate keyword monitors for the critical components:
Registry Component (Docker image push/pull)
The registry is the most business-critical Harbor component — it handles every docker push and docker pull:
- Add Monitor → HTTP.
- URL:
https://harbor.example.com/api/v2.0/health. - Check interval: 60 seconds.
- Expected status:
200. - Keyword:
{"name":"registry","status":"healthy"}. - Label:
Harbor registry component. - Click Save.
Core Component (API and auth)
The core service handles all API requests, RBAC, and replication policy management:
- Add Monitor → HTTP.
- URL:
https://harbor.example.com/api/v2.0/health. - Check interval: 60 seconds.
- Expected status:
200. - Keyword:
{"name":"core","status":"healthy"}. - Label:
Harbor core component. - Click Save.
Jobservice Component (vulnerability scans and replication)
The jobservice handles asynchronous work: Trivy vulnerability scans, image replication, and garbage collection:
- Add Monitor → HTTP.
- URL:
https://harbor.example.com/api/v2.0/health. - Check interval: 2 minutes.
- Expected status:
200. - Keyword:
{"name":"jobservice","status":"healthy"}. - Label:
Harbor jobservice (scans/replication). - Click Save.
Why the same URL for component monitors? All component data is in the single
/api/v2.0/healthresponse. Vigilmon's keyword check searches the full response body — if{"name":"registry","status":"healthy"}appears, the registry is healthy. If the registry is unhealthy, that substring won't be in the response.
Step 4: Monitor the Harbor Portal (Web UI)
The Harbor portal is the web UI used by developers and security teams to browse images, manage projects, and review vulnerability scan results. A portal failure doesn't affect docker pull, but it blocks operational work:
curl -I https://harbor.example.com
- Add Monitor → HTTP.
- URL:
https://harbor.example.com. - Check interval: 2 minutes.
- Response timeout: 15 seconds.
- Expected status:
200. - Keyword:
Harbor(appears in the Harbor portal page title). - Label:
Harbor portal (web UI). - Click Save.
When the portal monitor fires but the health API and registry monitors are green, you have a frontend rendering issue or nginx/reverse proxy problem that doesn't affect image operations — useful for prioritizing incident response.
Step 5: Monitor SSL Certificates
Harbor's SSL certificate affects every interaction with the registry:
docker login harbor.example.comfails if the certificate is invaliddocker pull harbor.example.com/project/image:tagfails in all environments- Kubernetes nodes that mount
imagePullSecretsstart failing to pull images for new pod deployments - CI/CD pipelines that push to Harbor start failing
- Add Monitor → SSL Certificate.
- Domain:
harbor.example.com. - Alert when expiry is within: 30 days.
- Alert again: 14 days, 7 days, 3 days, 1 day.
- Click Save.
Docker daemon certificate trust: Docker daemons cache TLS connections. Even after you renew the certificate, Docker nodes may need a daemon restart to pick up the new certificate. A 30-day warning gives you time to renew and coordinate node restarts outside peak hours.
Step 6: Monitor the Registry V2 API Directly
The Docker Distribution (OCI registry) API at /v2/ is what docker pull and docker push actually use. Monitoring it confirms that image operations are possible:
curl -u admin:password https://harbor.example.com/v2/
# Returns 200 with an empty JSON body {}
# Returns 401 if auth is required
- Add Monitor → HTTP.
- URL:
https://harbor.example.com/v2/. - Check interval: 2 minutes.
- Expected status:
401(unauthenticated requests confirm the registry API is up and requiring auth). - Label:
Harbor registry v2 API. - Click Save.
A
401response is the correct health signal for an unauthenticated check against a Harbor registry. If the registry is down, you'll get a connection error,502, or503— not a401.
Step 7: Configure Alerting
In Vigilmon under Settings → Notifications, configure your alert channels:
| Monitor | Trigger | Action |
|---|---|---|
| /api/v2.0/health | Non-200 or healthy missing | Check Harbor containers with docker ps; inspect Harbor logs |
| Registry component | Component keyword missing | Registry down; docker pull/push failing; check registry container |
| Core component | Component keyword missing | API/auth failing; Harbor admin console affected |
| Jobservice | Component keyword missing | Scans and replication paused; not critical for pulls but affects compliance |
| Portal (web UI) | Non-200 or keyword missing | Frontend issue; check nginx/Caddy reverse proxy |
| SSL certificate | < 30 days to expiry | Renew certificate; test docker login after renewal |
| Registry v2 API | Non-401/200 | Direct registry API down; docker pull will start failing |
Alert after: 1 consecutive failure for registry and health monitors. 2 consecutive failures for portal and jobservice monitors (less critical for immediate operations).
Common Harbor Failure Modes and What Vigilmon Catches
| Scenario | Vigilmon monitor |
|---|---|
| Harbor container crash | /api/v2.0/health unreachable; alert within 60 s |
| PostgreSQL database down | Health API returns unhealthy; core and registry fail |
| Redis cache unavailable | Jobservice and session management affected; health check fires |
| Registry storage backend full | Registry component unhealthy; docker push fails |
| SSL certificate expires | SSL monitor alerts at 30-day threshold; all docker operations fail |
| Harbor upgrade breaks registry | Registry component keyword missing; v2 API check fires |
| Trivy scanner crashes | Jobservice unhealthy; scans stop but pulls continue |
| nginx reverse proxy misconfiguration | Portal monitor fires; v2 API may also be affected |
| DNS misconfiguration | All monitors fire simultaneously |
| Garbage collection job corrupts storage | Registry component unhealthy; manual GC recovery needed |
Harbor is the container image source of truth for production Kubernetes clusters. A registry outage doesn't just break CI builds — it prevents new pods from starting during deployments and autoscaling events. Vigilmon gives you layered external monitoring of Harbor's health API, individual components, web portal, and SSL certificate so you know the moment a component degrades and can act before it propagates to production image pulls.
Start monitoring Harbor in under 5 minutes — register free at vigilmon.online.