AI-BOM & Model Security
Track, govern, and secure AI models — model cards, weight provenance, fine-tune lineage, and training-data risk.
AI-BOM & Model Security
As models ship inside products, the same supply chain risks that affect code affect model weights. Safeguard treats AI artifacts as first-class objects with their own SBOM — an AI-BOM — and applies equivalent governance to their lifecycle.
What an AI-BOM Contains
For every model in your environment, Safeguard captures:
- Model identity — name, version, architecture, parameter count, dtype, quantization.
- Weight provenance — where the weights came from (Hugging Face, internal registry, customer-provided) and the hash.
- Signing status — whether the weights are signed; by whom; on what transparency log.
- Fine-tune lineage — which base model, which adapter, which dataset.
- Training / fine-tuning data — dataset references, licenses, PII / regulated-data flags where declared.
- Dependencies —
transformers,torch,vllm, quant libraries, tokenizer packages, config files. - Evaluation metrics — if provided by the producer.
- Deployment targets — which workloads load these weights.
The AI-BOM is emitted in CycloneDX ML-BOM extension format and SPDX 3.0 AI profile.
Integrations
Safeguard connects to:
- Hugging Face — org, user, private repos. Auto-discovers models you've published or starred.
- MLflow — model registry, experiments, versions.
- AWS SageMaker Model Registry.
- Vertex AI Model Registry (GCP).
- Azure ML Model Registry.
- Databricks Model Serving.
- Kubeflow / KServe.
- Custom registries via a simple CRUD API.
Risks Safeguard Tracks
Pickle-Based Payloads
PyTorch's default pickle serialization can execute code at load time. Safeguard scans weights for non-data opcodes and flags any model that executes code outside allow-listed operations on load.
Model-Weight Backdoors
Eagle runs statistical tests on weights for:
- Trojan triggers (a specific input pattern produces attacker-controlled output).
- Class-level backdoors (one class systematically mis-classified under a trigger).
- Gradient-inversion susceptibility.
Findings are probabilistic; Safeguard reports the likelihood score and contributing evidence.
Prompt Injection / Jailbreak Surfaces
For agentic systems that call Safeguard-wrapped LLMs, the AI-BOM annotates which models have:
- Built-in jailbreak resistance tests passing / failing.
- Input / output filter chains applied at serving time.
- Tool-use allow-lists enforced.
Training-Data License Risk
For fine-tuned models, Safeguard tracks the dataset provenance and flags:
- Unknown-license data.
- Copyleft data used in proprietary models.
- Regulated data (PII, PHI, PCI) used without controls.
Model Card Discrepancies
If a vendor's model card claims a training set but the weights show signs of additional data, Safeguard surfaces the discrepancy.
Governance Policies
Enforce policies at load time:
apiVersion: safeguard.sh/v1
kind: Policy
metadata:
name: ai-model-policy
spec:
targets:
- kind: Model
labels:
env: production
rules:
- id: require-signed-weights
condition: signatures.valid == false
effect: BLOCK
- id: no-pickle-unsafe
condition: pickle.unsafe_opcodes == true
effect: BLOCK
- id: allowlist-publishers
condition: weights.publisher NOT IN ["internal", "huggingface:meta-llama", "huggingface:mistralai", "huggingface:anthropic"]
effect: WARNServed models that fail load-time verification won't be loaded.
Load-Time Verification
The Safeguard model loader (a thin wrapper around torch.load, safetensors.load, transformers.pipeline) verifies:
- Weight hash matches the attested hash.
- Signature verifies against the configured trust anchors.
- No disallowed pickle operations on load.
If any check fails, loading aborts.
Fine-Tuning Lineage
Every fine-tune you run produces a signed attestation capturing:
- Base model + version.
- Training data snapshots (content hashes, not raw data).
- Hyperparameters and seed.
- Evaluation metrics.
- Environment (hardware, software, time).
The lineage is queryable — for any served model, you can answer: "Show me every model this was derived from and every dataset that touched it."
Agent Security (Inference Time)
For LLM-powered agents:
- Tool-call logs are retained with prompt / response hashes.
- Safeguard validates that each tool call matches a declared tool surface.
- Unexpected tool calls (allowed but not typical for the workload) are surfaced.
- Prompt-injection attempts are detected and scored.
See the MCP Threat Model blog post for deeper treatment.
API
safeguard aibom list --env production
safeguard aibom verify --model-id <id>
safeguard aibom export --project my-ai-app --format cyclonedx-ml > aibom.jsonRelated
- AI Models (Griffin, Eagle, Lino) — Safeguard's own models.
- Asset Discovery — how AI models are discovered.
- Attestation & Signing — model-weight signing.
- Compliance — EU AI Act, NIST AI RMF, ISO 42001.