Self-Healing Containers
Automatically rebuild and redeploy container images to patch new CVEs — without waiting for the next feature release.
Self-Healing Containers
When a new CVE affects a base image or a package in your container, the usual response is a manual chain: someone notices, someone opens a PR, someone reviews, someone redeploys. Self-healing containers automate that chain so your production images are always patched — typically within an hour of a CVE being published.
How It Works
Self-healing runs a four-step loop, driven by continuous scanning:
- Detect — a new CVE affects a component in one of your container images.
- Plan — Griffin generates a patch plan: upgrade, apply security patch, or swap to a Gold-hardened equivalent.
- Rebuild — the runner rebuilds the image with the patch, running your existing test suite.
- Promote — if tests pass and the image passes guardrails, the image is pushed to your registry under a new tag. An admission controller rollout picks it up.
Each step is pluggable; you decide where humans are in the loop.
Modes
Advisory
Safeguard detects, plans, and files a PR against your Dockerfile repo or a Kubernetes manifest repo. Humans review and merge. Best for teams just turning self-healing on.
Autonomous (promotion gated)
Safeguard detects, plans, rebuilds, and promotes to a staging tag. Human approval required to promote to production tags. Default for most enterprise customers.
Autonomous (continuous)
Safeguard detects, plans, rebuilds, and promotes to production — end to end. Policies still apply: a change that fails the production policy bundle is held for review. Recommended once you have confidence in your test coverage.
Patch Strategies
Griffin picks a strategy per finding:
| Strategy | When |
|---|---|
| Upstream bump | A patched upstream version exists. |
| Backport patch | Upstream doesn't have a patched version, but a fix is in HEAD. |
| Gold substitution | A Gold artifact replaces the vulnerable component. See Gold Registry. |
| Dependency pin | A safe indirect dependency version exists but is not resolved; Griffin pins it. |
| Defer | No safe fix exists; the finding is left open with a time-boxed exception. |
Base-Image Self-Healing
The common case: your FROM node:20-alpine picks up a new libcrypto CVE. Self-healing rebuilds the image against the latest node:20-alpine digest (or a hardened Gold equivalent) and redeploys.
For FROM scratch and distroless images, Safeguard tracks which binaries are baked in and rebuilds the layer for the affected binary only.
Dependency-Level Self-Healing
Inside your image, npm ci, pip install, cargo fetch, etc. pull in hundreds of transitive deps. When a CVE lands on one of them:
- The lockfile is resolved to the minimum version that resolves the CVE.
- The image is rebuilt with the updated lockfile.
- The test suite runs.
- A diff of the lockfile is written into the resulting image as a provenance attestation.
Test Integration
Self-healing calls your existing test suite before promoting. Supported harnesses:
- Any CI provider —
safeguard heal --ci github/--ci gitlab/--ci azure/--ci buildkite/--ci jenkins/--ci circleci. The runner delegates the build/test to your pipeline and watches the result. - Local runner — if you have the Safeguard runner self-hosted, tests run there.
- Synthetic tests — a lightweight smoke test (health endpoints, startup time, memory footprint) runs for every rebuild even if no full test suite is declared.
Registry and Kubernetes Integration
Registry
Self-healed images are pushed with clear tagging conventions:
<registry>/<image>:<original-tag>-sg-heal-<date>for staging promotions.<registry>/<image>:<original-tag>overwritten for continuous mode (with immutable digest pinning).
Every image carries:
- An SBOM attestation describing the patch.
- A Griffin explanation attestation — plain-English description of what was fixed.
- A signed provenance attestation referencing the source commit and build environment.
Kubernetes
If the Safeguard Helm operator is installed, rolling out a self-healed image is automatic:
- The operator watches registry tags.
- New patched digests trigger a controlled rollout (one pod at a time, respecting PodDisruptionBudgets).
- The rollout is paused automatically if readiness probes fail.
Observability
The Self-Healing dashboard shows:
- Images currently being watched.
- Pending, in-progress, and completed heal cycles.
- Time-to-heal histogram (CVE published → image deployed).
- Rollback count (heals that failed verification).
Representative numbers from customer tenants:
- Median time-to-heal: 20-45 minutes (CVE published → production rollout).
- P95 time-to-heal: 2 hours.
- Rollback rate: < 1%.
Rollback
Every heal is reversible:
- The previous image digest is preserved.
- A one-click rollback from the UI or
safeguard heal rollback --image <image>restores it. - Automatic rollback triggers on: readiness probe failures, error-rate spikes, or policy violations detected on the new image.
Air-Gapped Self-Healing
For air-gapped environments, self-healing runs entirely inside your infrastructure:
- The Safeguard operator bundles rebuild tooling and the required base images.
- Snapshot updates are delivered via signed tarballs.
- No egress required at runtime.
Turning It On
# .safeguard/self-heal.yaml
self_heal:
mode: autonomous-staging # advisory | autonomous-staging | continuous
test_command: "npm test"
notify:
slack: "#sec-ops"
strategy_overrides:
- package: log4j-core
prefer: upstream-bumpOr via the UI: ESSCM → Asset → Self-Healing → Configure.
Related
- Continuous Scanning — detects the CVE that triggers self-healing.
- Gold Registry — Gold images are the preferred substitution target.
- Auto-Fix — source-code auto-fix is the companion to container self-healing.
- Workflows — orchestrates self-healing at scale.
- Griffin AI — the model that plans and verifies each heal.