Safeguard.sh Documentation Center

AI-BOM & Model Security

Track, govern, and secure AI models — model cards, weight provenance, fine-tune lineage, and training-data risk.

AI-BOM & Model Security

As models ship inside products, the same supply chain risks that affect code affect model weights. Safeguard treats AI artifacts as first-class objects with their own SBOM — an AI-BOM — and applies equivalent governance to their lifecycle.

What an AI-BOM Contains

For every model in your environment, Safeguard captures:

  • Model identity — name, version, architecture, parameter count, dtype, quantization.
  • Weight provenance — where the weights came from (Hugging Face, internal registry, customer-provided) and the hash.
  • Signing status — whether the weights are signed; by whom; on what transparency log.
  • Fine-tune lineage — which base model, which adapter, which dataset.
  • Training / fine-tuning data — dataset references, licenses, PII / regulated-data flags where declared.
  • Dependenciestransformers, torch, vllm, quant libraries, tokenizer packages, config files.
  • Evaluation metrics — if provided by the producer.
  • Deployment targets — which workloads load these weights.

The AI-BOM is emitted in CycloneDX ML-BOM extension format and SPDX 3.0 AI profile.

Integrations

Safeguard connects to:

  • Hugging Face — org, user, private repos. Auto-discovers models you've published or starred.
  • MLflow — model registry, experiments, versions.
  • AWS SageMaker Model Registry.
  • Vertex AI Model Registry (GCP).
  • Azure ML Model Registry.
  • Databricks Model Serving.
  • Kubeflow / KServe.
  • Custom registries via a simple CRUD API.

Risks Safeguard Tracks

Pickle-Based Payloads

PyTorch's default pickle serialization can execute code at load time. Safeguard scans weights for non-data opcodes and flags any model that executes code outside allow-listed operations on load.

Model-Weight Backdoors

Eagle runs statistical tests on weights for:

  • Trojan triggers (a specific input pattern produces attacker-controlled output).
  • Class-level backdoors (one class systematically mis-classified under a trigger).
  • Gradient-inversion susceptibility.

Findings are probabilistic; Safeguard reports the likelihood score and contributing evidence.

Prompt Injection / Jailbreak Surfaces

For agentic systems that call Safeguard-wrapped LLMs, the AI-BOM annotates which models have:

  • Built-in jailbreak resistance tests passing / failing.
  • Input / output filter chains applied at serving time.
  • Tool-use allow-lists enforced.

Training-Data License Risk

For fine-tuned models, Safeguard tracks the dataset provenance and flags:

  • Unknown-license data.
  • Copyleft data used in proprietary models.
  • Regulated data (PII, PHI, PCI) used without controls.

Model Card Discrepancies

If a vendor's model card claims a training set but the weights show signs of additional data, Safeguard surfaces the discrepancy.

Governance Policies

Enforce policies at load time:

apiVersion: safeguard.sh/v1
kind: Policy
metadata:
  name: ai-model-policy
spec:
  targets:
    - kind: Model
      labels:
        env: production
  rules:
    - id: require-signed-weights
      condition: signatures.valid == false
      effect: BLOCK
    - id: no-pickle-unsafe
      condition: pickle.unsafe_opcodes == true
      effect: BLOCK
    - id: allowlist-publishers
      condition: weights.publisher NOT IN ["internal", "huggingface:meta-llama", "huggingface:mistralai", "huggingface:anthropic"]
      effect: WARN

Served models that fail load-time verification won't be loaded.

Load-Time Verification

The Safeguard model loader (a thin wrapper around torch.load, safetensors.load, transformers.pipeline) verifies:

  • Weight hash matches the attested hash.
  • Signature verifies against the configured trust anchors.
  • No disallowed pickle operations on load.

If any check fails, loading aborts.

Fine-Tuning Lineage

Every fine-tune you run produces a signed attestation capturing:

  • Base model + version.
  • Training data snapshots (content hashes, not raw data).
  • Hyperparameters and seed.
  • Evaluation metrics.
  • Environment (hardware, software, time).

The lineage is queryable — for any served model, you can answer: "Show me every model this was derived from and every dataset that touched it."

Agent Security (Inference Time)

For LLM-powered agents:

  • Tool-call logs are retained with prompt / response hashes.
  • Safeguard validates that each tool call matches a declared tool surface.
  • Unexpected tool calls (allowed but not typical for the workload) are surfaced.
  • Prompt-injection attempts are detected and scored.

See the MCP Threat Model blog post for deeper treatment.

API

safeguard aibom list --env production
safeguard aibom verify --model-id <id>
safeguard aibom export --project my-ai-app --format cyclonedx-ml > aibom.json

On this page