Track, govern, and secure AI models — model cards, weight provenance, fine-tune lineage, and training-data risk.

AI-BOM & Model Security

As models ship inside products, the same supply chain risks that affect code affect model weights. Safeguard treats AI artifacts as first-class objects with their own SBOM — an AI-BOM — and applies equivalent governance to their lifecycle.

What an AI-BOM Contains

For every model in your environment, Safeguard captures:

Model identity — name, version, architecture, parameter count, dtype, quantization.
Weight provenance — where the weights came from (Hugging Face, internal registry, customer-provided) and the hash.
Signing status — whether the weights are signed; by whom; on what transparency log.
Fine-tune lineage — which base model, which adapter, which dataset.
Training / fine-tuning data — dataset references, licenses, PII / regulated-data flags where declared.
Dependencies — transformers, torch, vllm, quant libraries, tokenizer packages, config files.
Evaluation metrics — if provided by the producer.
Deployment targets — which workloads load these weights.

The AI-BOM is emitted in CycloneDX ML-BOM extension format and SPDX 3.0 AI profile.

Integrations

Safeguard connects to:

Hugging Face — org, user, private repos. Auto-discovers models you've published or starred.
MLflow — model registry, experiments, versions.
AWS SageMaker Model Registry.
Vertex AI Model Registry (GCP).
Azure ML Model Registry.
Databricks Model Serving.
Kubeflow / KServe.
Custom registries via a simple CRUD API.

Risks Safeguard Tracks

Pickle-Based Payloads

PyTorch's default pickle serialization can execute code at load time. Safeguard scans weights for non-data opcodes and flags any model that executes code outside allow-listed operations on load.

Model-Weight Backdoors

Eagle runs statistical tests on weights for:

Trojan triggers (a specific input pattern produces attacker-controlled output).
Class-level backdoors (one class systematically mis-classified under a trigger).
Gradient-inversion susceptibility.

Findings are probabilistic; Safeguard reports the likelihood score and contributing evidence.

Prompt Injection / Jailbreak Surfaces

For agentic systems that call Safeguard-wrapped LLMs, the AI-BOM annotates which models have:

Built-in jailbreak resistance tests passing / failing.
Input / output filter chains applied at serving time.
Tool-use allow-lists enforced.

Training-Data License Risk

For fine-tuned models, Safeguard tracks the dataset provenance and flags:

Unknown-license data.
Copyleft data used in proprietary models.
Regulated data (PII, PHI, PCI) used without controls.

Model Card Discrepancies

If a vendor's model card claims a training set but the weights show signs of additional data, Safeguard surfaces the discrepancy.

Governance Policies

Enforce policies at load time:

apiVersion: safeguard.sh/v1
kind: Policy
metadata:
  name: ai-model-policy
spec:
  targets:
    - kind: Model
      labels:
        env: production
  rules:
    - id: require-signed-weights
      condition: signatures.valid == false
      effect: BLOCK
    - id: no-pickle-unsafe
      condition: pickle.unsafe_opcodes == true
      effect: BLOCK
    - id: allowlist-publishers
      condition: weights.publisher NOT IN ["internal", "huggingface:meta-llama", "huggingface:mistralai", "huggingface:anthropic"]
      effect: WARN

Served models that fail load-time verification won't be loaded.

Load-Time Verification

The Safeguard model loader (a thin wrapper around torch.load, safetensors.load, transformers.pipeline) verifies:

Weight hash matches the attested hash.
Signature verifies against the configured trust anchors.
No disallowed pickle operations on load.

If any check fails, loading aborts.

Fine-Tuning Lineage

Every fine-tune you run produces a signed attestation capturing:

Base model + version.
Training data snapshots (content hashes, not raw data).
Hyperparameters and seed.
Evaluation metrics.
Environment (hardware, software, time).

The lineage is queryable — for any served model, you can answer: "Show me every model this was derived from and every dataset that touched it."

Agent Security (Inference Time)

For LLM-powered agents:

Tool-call logs are retained with prompt / response hashes.
Safeguard validates that each tool call matches a declared tool surface.
Unexpected tool calls (allowed but not typical for the workload) are surfaced.
Prompt-injection attempts are detected and scored.

See the MCP Threat Model blog post for deeper treatment.

API

safeguard aibom list --env production
safeguard aibom verify --model-id <id>
safeguard aibom export --project my-ai-app --format cyclonedx-ml > aibom.json

AI Models (Griffin, Eagle, Lino) — Safeguard's own models.
Asset Discovery — how AI models are discovered.
Attestation & Signing — model-weight signing.
Compliance — EU AI Act, NIST AI RMF, ISO 42001.

AI-BOM & Model Security

On this page