Safeguard.sh Documentation Center

Malware Detection

Detect malicious packages, images, and model weights using Eagle — Safeguard's purpose-built classification model.

Malware Detection

Supply chain attackers ship malicious code as packages, container layers, or model weights. Traditional antivirus and signature-based tooling miss most of them. Safeguard uses Eagle — our classification model trained on confirmed-malicious artifacts — to classify every artifact your organization pulls or publishes.

What Eagle Looks For

Eagle scores each artifact across seven indicator classes:

IndicatorWhat it means
Install-script behaviorPost-install hooks that contact unusual hosts, install additional packages, or write outside the package's own directory.
Obfuscated codeDense eval / Function / base64 / zlib-wrapped payloads; string encoding patterns known from real incidents.
Egress patternsHard-coded URLs matching known C2 infrastructure, Discord / Telegram exfiltration, Pastebin, etc.
Credential harvestingReads from ~/.aws, ~/.ssh, ~/.npmrc, browser profiles, clipboard.
Typosquat similarityName within edit-distance 2 of a top-1000 package, owned by a different publisher.
Metadata anomaliesNewly-published package with aggressive versioning, unusual author history, no repository URL.
Behavior divergence from prior versionsNew network / file system / process activity that didn't exist in the previous release.

The output per artifact:

  • Score 0-100.
  • Classification — benign / suspicious / malicious.
  • Indicators fired — specific rules that contributed.
  • Griffin-authored explanation — plain-English summary of what was unusual.

Coverage

Eagle classifies artifacts in:

EcosystemCoverage
npmEvery publish, retroactive scan of the last 10 years
PyPIEvery publish, retroactive scan
Maven CentralEvery publish + historical scan
RubyGemsEvery publish
NuGetEvery publish
crates.ioEvery publish
Go modulesEvery publish
ComposerEvery publish
Docker Hub / GHCR / ECR / ACR / GCREvery tag push, plus configured periodic scan of running images
Hugging FaceEvery model, every fine-tune, every config change

Enforcement Points

Eagle's classification feeds into policy at multiple points:

Inline at Install

Via the Safeguard runner or a pre-install hook:

# .safeguard/malware-policy.yaml
malware:
  block: [malicious]
  warn: [suspicious]
  confirm_ttl: 300  # how long a classification is trusted before re-check

If npm install pulls a package classified malicious, the install aborts with an explanation.

At Registry

The Gold Registry admits no artifact that Eagle classifies above score 30.

For your own internal registry, enable inline classification on pushes so internal publishes also get scored.

At CI

- uses: safeguard/malware-scan@v2
  with:
    fail-on: malicious

At Admission

The Kubernetes admission controller refuses pods with an image layer containing malicious content.

Historical Scans

When Eagle is updated, it re-scores the entire corpus retroactively. If a package you depend on is reclassified as malicious because of a signal Eagle previously missed, you get a finding within seconds of the re-scoring pass.

The finding includes:

  • When the package was installed / deployed.
  • What specifically triggered the new classification.
  • Recommended remediation (rollback, Gold substitution, or quarantine).

Real Examples

Eagle has publicly surfaced malicious activity in:

  • npm packages with hidden post-install scripts exfiltrating ~/.aws/credentials.
  • PyPI typosquats of popular ML packages.
  • Compromised model repositories on Hugging Face distributing pickle-based payloads.
  • OCI images with layers that dropped cryptominers on container start.

See the Incident Analysis blog category for writeups.

False Positives

Eagle is tuned for precision — the classification band-widths are:

  • Score < 30: benign, released silently.
  • Score 30-70: suspicious, surfaces as a warning with the indicator list.
  • Score > 70: malicious, blocked by default policy.

Reported false-positive rate for score > 70 is approximately 1 in 10,000. Reporting a suspected false positive via research@safeguard.sh triggers human review within 24 hours.

Bypass Attempts

When attackers try to evade detection, Eagle flags the evasion itself:

  • String splitting / concatenation of known malicious identifiers.
  • Delayed-execution payloads that only run weeks after install.
  • Package trust "warm-up" — publishing a few benign versions before the malicious one.
  • Typosquat rotation (publish under many similar names to defeat allowlists).

AI Model Malware

For AI artifacts specifically:

  • Pickle payload detection — model weight files that embed executable Python via pickle.
  • Hidden backdoors — statistical tests for neurons or prompt-triggered behaviors that produce attacker-controlled output.
  • Config-level injectiontransformers config files that reference external code with arbitrary URLs.

API

safeguard malware scan --npm react@19.0.0
safeguard malware scan --image ghcr.io/vendor/widget:1.2
safeguard malware history --package react --since 2020

On this page