Malware Detection
Detect malicious packages, images, and model weights using Eagle — Safeguard's purpose-built classification model.
Malware Detection
Supply chain attackers ship malicious code as packages, container layers, or model weights. Traditional antivirus and signature-based tooling miss most of them. Safeguard uses Eagle — our classification model trained on confirmed-malicious artifacts — to classify every artifact your organization pulls or publishes.
What Eagle Looks For
Eagle scores each artifact across seven indicator classes:
| Indicator | What it means |
|---|---|
| Install-script behavior | Post-install hooks that contact unusual hosts, install additional packages, or write outside the package's own directory. |
| Obfuscated code | Dense eval / Function / base64 / zlib-wrapped payloads; string encoding patterns known from real incidents. |
| Egress patterns | Hard-coded URLs matching known C2 infrastructure, Discord / Telegram exfiltration, Pastebin, etc. |
| Credential harvesting | Reads from ~/.aws, ~/.ssh, ~/.npmrc, browser profiles, clipboard. |
| Typosquat similarity | Name within edit-distance 2 of a top-1000 package, owned by a different publisher. |
| Metadata anomalies | Newly-published package with aggressive versioning, unusual author history, no repository URL. |
| Behavior divergence from prior versions | New network / file system / process activity that didn't exist in the previous release. |
The output per artifact:
- Score 0-100.
- Classification — benign / suspicious / malicious.
- Indicators fired — specific rules that contributed.
- Griffin-authored explanation — plain-English summary of what was unusual.
Coverage
Eagle classifies artifacts in:
| Ecosystem | Coverage |
|---|---|
| npm | Every publish, retroactive scan of the last 10 years |
| PyPI | Every publish, retroactive scan |
| Maven Central | Every publish + historical scan |
| RubyGems | Every publish |
| NuGet | Every publish |
| crates.io | Every publish |
| Go modules | Every publish |
| Composer | Every publish |
| Docker Hub / GHCR / ECR / ACR / GCR | Every tag push, plus configured periodic scan of running images |
| Hugging Face | Every model, every fine-tune, every config change |
Enforcement Points
Eagle's classification feeds into policy at multiple points:
Inline at Install
Via the Safeguard runner or a pre-install hook:
# .safeguard/malware-policy.yaml
malware:
block: [malicious]
warn: [suspicious]
confirm_ttl: 300 # how long a classification is trusted before re-checkIf npm install pulls a package classified malicious, the install aborts with an explanation.
At Registry
The Gold Registry admits no artifact that Eagle classifies above score 30.
For your own internal registry, enable inline classification on pushes so internal publishes also get scored.
At CI
- uses: safeguard/malware-scan@v2
with:
fail-on: maliciousAt Admission
The Kubernetes admission controller refuses pods with an image layer containing malicious content.
Historical Scans
When Eagle is updated, it re-scores the entire corpus retroactively. If a package you depend on is reclassified as malicious because of a signal Eagle previously missed, you get a finding within seconds of the re-scoring pass.
The finding includes:
- When the package was installed / deployed.
- What specifically triggered the new classification.
- Recommended remediation (rollback, Gold substitution, or quarantine).
Real Examples
Eagle has publicly surfaced malicious activity in:
- npm packages with hidden post-install scripts exfiltrating
~/.aws/credentials. - PyPI typosquats of popular ML packages.
- Compromised model repositories on Hugging Face distributing
pickle-based payloads. - OCI images with layers that dropped cryptominers on container start.
See the Incident Analysis blog category for writeups.
False Positives
Eagle is tuned for precision — the classification band-widths are:
- Score < 30: benign, released silently.
- Score 30-70: suspicious, surfaces as a warning with the indicator list.
- Score > 70: malicious, blocked by default policy.
Reported false-positive rate for score > 70 is approximately 1 in 10,000. Reporting a suspected false positive via research@safeguard.sh triggers human review within 24 hours.
Bypass Attempts
When attackers try to evade detection, Eagle flags the evasion itself:
- String splitting / concatenation of known malicious identifiers.
- Delayed-execution payloads that only run weeks after install.
- Package trust "warm-up" — publishing a few benign versions before the malicious one.
- Typosquat rotation (publish under many similar names to defeat allowlists).
AI Model Malware
For AI artifacts specifically:
- Pickle payload detection — model weight files that embed executable Python via
pickle. - Hidden backdoors — statistical tests for neurons or prompt-triggered behaviors that produce attacker-controlled output.
- Config-level injection —
transformersconfig files that reference external code with arbitrary URLs.
API
safeguard malware scan --npm react@19.0.0
safeguard malware scan --image ghcr.io/vendor/widget:1.2
safeguard malware history --package react --since 2020Related
- AI Models — Eagle model versions and capabilities.
- Gold Registry — malware-free curated artifacts.
- Zero-Day Discovery — how malware findings feed Safeguard's research pipeline.
- Guardrails & Enforcement — enforcing malware policy at every point.