Safeguard.sh Documentation Center

Self-Healing Containers

Automatically rebuild and redeploy container images to patch new CVEs — without waiting for the next feature release.

Self-Healing Containers

When a new CVE affects a base image or a package in your container, the usual response is a manual chain: someone notices, someone opens a PR, someone reviews, someone redeploys. Self-healing containers automate that chain so your production images are always patched — typically within an hour of a CVE being published.

How It Works

Self-healing runs a four-step loop, driven by continuous scanning:

  1. Detect — a new CVE affects a component in one of your container images.
  2. Plan — Griffin generates a patch plan: upgrade, apply security patch, or swap to a Gold-hardened equivalent.
  3. Rebuild — the runner rebuilds the image with the patch, running your existing test suite.
  4. Promote — if tests pass and the image passes guardrails, the image is pushed to your registry under a new tag. An admission controller rollout picks it up.

Each step is pluggable; you decide where humans are in the loop.

Modes

Advisory

Safeguard detects, plans, and files a PR against your Dockerfile repo or a Kubernetes manifest repo. Humans review and merge. Best for teams just turning self-healing on.

Autonomous (promotion gated)

Safeguard detects, plans, rebuilds, and promotes to a staging tag. Human approval required to promote to production tags. Default for most enterprise customers.

Autonomous (continuous)

Safeguard detects, plans, rebuilds, and promotes to production — end to end. Policies still apply: a change that fails the production policy bundle is held for review. Recommended once you have confidence in your test coverage.

Patch Strategies

Griffin picks a strategy per finding:

StrategyWhen
Upstream bumpA patched upstream version exists.
Backport patchUpstream doesn't have a patched version, but a fix is in HEAD.
Gold substitutionA Gold artifact replaces the vulnerable component. See Gold Registry.
Dependency pinA safe indirect dependency version exists but is not resolved; Griffin pins it.
DeferNo safe fix exists; the finding is left open with a time-boxed exception.

Base-Image Self-Healing

The common case: your FROM node:20-alpine picks up a new libcrypto CVE. Self-healing rebuilds the image against the latest node:20-alpine digest (or a hardened Gold equivalent) and redeploys.

For FROM scratch and distroless images, Safeguard tracks which binaries are baked in and rebuilds the layer for the affected binary only.

Dependency-Level Self-Healing

Inside your image, npm ci, pip install, cargo fetch, etc. pull in hundreds of transitive deps. When a CVE lands on one of them:

  • The lockfile is resolved to the minimum version that resolves the CVE.
  • The image is rebuilt with the updated lockfile.
  • The test suite runs.
  • A diff of the lockfile is written into the resulting image as a provenance attestation.

Test Integration

Self-healing calls your existing test suite before promoting. Supported harnesses:

  • Any CI providersafeguard heal --ci github / --ci gitlab / --ci azure / --ci buildkite / --ci jenkins / --ci circleci. The runner delegates the build/test to your pipeline and watches the result.
  • Local runner — if you have the Safeguard runner self-hosted, tests run there.
  • Synthetic tests — a lightweight smoke test (health endpoints, startup time, memory footprint) runs for every rebuild even if no full test suite is declared.

Registry and Kubernetes Integration

Registry

Self-healed images are pushed with clear tagging conventions:

  • <registry>/<image>:<original-tag>-sg-heal-<date> for staging promotions.
  • <registry>/<image>:<original-tag> overwritten for continuous mode (with immutable digest pinning).

Every image carries:

  • An SBOM attestation describing the patch.
  • A Griffin explanation attestation — plain-English description of what was fixed.
  • A signed provenance attestation referencing the source commit and build environment.

Kubernetes

If the Safeguard Helm operator is installed, rolling out a self-healed image is automatic:

  • The operator watches registry tags.
  • New patched digests trigger a controlled rollout (one pod at a time, respecting PodDisruptionBudgets).
  • The rollout is paused automatically if readiness probes fail.

Observability

The Self-Healing dashboard shows:

  • Images currently being watched.
  • Pending, in-progress, and completed heal cycles.
  • Time-to-heal histogram (CVE published → image deployed).
  • Rollback count (heals that failed verification).

Representative numbers from customer tenants:

  • Median time-to-heal: 20-45 minutes (CVE published → production rollout).
  • P95 time-to-heal: 2 hours.
  • Rollback rate: < 1%.

Rollback

Every heal is reversible:

  • The previous image digest is preserved.
  • A one-click rollback from the UI or safeguard heal rollback --image <image> restores it.
  • Automatic rollback triggers on: readiness probe failures, error-rate spikes, or policy violations detected on the new image.

Air-Gapped Self-Healing

For air-gapped environments, self-healing runs entirely inside your infrastructure:

  • The Safeguard operator bundles rebuild tooling and the required base images.
  • Snapshot updates are delivered via signed tarballs.
  • No egress required at runtime.

Turning It On

# .safeguard/self-heal.yaml
self_heal:
  mode: autonomous-staging   # advisory | autonomous-staging | continuous
  test_command: "npm test"
  notify:
    slack: "#sec-ops"
  strategy_overrides:
    - package: log4j-core
      prefer: upstream-bump

Or via the UI: ESSCM → Asset → Self-Healing → Configure.

  • Continuous Scanning — detects the CVE that triggers self-healing.
  • Gold Registry — Gold images are the preferred substitution target.
  • Auto-Fix — source-code auto-fix is the companion to container self-healing.
  • Workflows — orchestrates self-healing at scale.
  • Griffin AI — the model that plans and verifies each heal.

On this page