Skip to content

ef3rguson/WeightsWatcher

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›‘οΈ WeightsWatcher

Supply Chain Security for AI Models

WeightsWatcher is a cryptographic integrity verification system for Machine Learning artifacts. It protects production environments from Model Poisoning, Ransomware, and Time-of-Check to Time-of-Use (TOCTOU) attacks by ensuring that the model loaded into memory is bit-for-bit identical to the one validated during training.

CI Status Python License


🚨 The Problem

In modern MLOps, models are trained in secure environments but deployed to edge devices or cloud servers where file systems are vulnerable.

  • Pickle is unsafe: Standard torch.load() can execute arbitrary code (RCE).
  • Race Conditions (TOCTOU): Checking a hash before loading doesn't prevent an attacker from swapping the file during the read operation.
  • "Evil Maid" Attacks: If an attacker gains write access to your server, they can overwrite both your model and your checksum file.

πŸ›‘οΈ The Solution

WeightsWatcher wraps standard loaders with a "Secure Shim" that enforces:

  1. Digital Signatures (RSA): Verifies that the lock file was signed by a trusted Private Key.
  2. Parallel Merkle Hashing: Uses multi-core processing to hash large models (10GB+) securely and efficiently.
  3. Active Sentry Mode: A background watchdog that monitors the file system and instantly locks down the API if the model file is touched.
  4. Safe Defaults: Enforces weights_only=True for PyTorch to prevent code execution.

πŸš€ Installation

WeightsWatcher is modular. Install only what you need:

# Core Library (CLI & Crypto only)
pip install weightswatcher

# With PyTorch support (Examples 01 & 02)
pip install "weightswatcher[torch]"

# With LLM/Transformers support (Example 03)
pip install "weightswatcher[llm]"

# With API/FastAPI Sentry support (Example 04)
pip install "weightswatcher[api]"

# ⚑ For Development (All features + tests)
pip install -e ".[dev]"

πŸ› οΈ Usage

1. The CLI (DevOps)

Manage keys and lock files directly from the terminal.

# 1. Generate RSA Keypair
weightswatcher keygen --out .

# 2. Lock & Sign a Model (Training Stage)
weightswatcher lock production_model.pt --key private_key.pem

# 3. Verify a Model (Deployment Stage)
weightswatcher verify production_model.pt --key public_key.pem

2. Python API (Developers)

Integrate secure loading into your inference code.

from weightswatcher import secure_load

# This will RAISE an exception if the signature is invalid
# or if the file content has been tampered with.
weights = secure_load(
    "production_model.pt", 
    public_key_path="public_key.pem", 
    weights_only=True
)

model.load_state_dict(weights)

πŸ§ͺ Examples & Demos

The examples/ directory contains runnable scripts demonstrating real-world attack vectors.

Example 01: Core Security Demo (Bit Rot vs. Evil Maid)

File: examples/01_real_world_crypto_test.py

A comprehensive security test that runs two scenarios sequentially:

  1. Act 2 (Bit Rot): Simulates random file corruption. WeightsWatcher blocks this via Hash Mismatch.
  2. Act 3 (Evil Maid): Simulates an attacker modifying the model and updating the lock file hashes to hide their tracks. WeightsWatcher blocks this via Signature Mismatch.
[ACT 2] πŸ’₯ Attack A: Simple Corruption...
    βœ… SUCCESS: Blocked by Hash Mismatch.
    πŸ›‘ LOG: Corruption detected in Chunk #0

[ACT 3] 😈 Attack B: The 'Evil Maid'...
    βœ… SUCCESS: Blocked by Signature Verification.
    πŸ›‘ LOG: 🚨 INVALID SIGNATURE: The manifest file has been tampered with!

Example 02: The LLM Lobotomy 🧠

File: examples/02_llm_test.py

Demonstrates why integrity matters for GenAI. We download GPT-2 Medium (~1.5GB) and demonstrate how a "Silent Corruption" attack can leave a model running but brain-damaged.

The Scenario: An attacker modifies the model file on disk, zeroing out a 1MB block of the Embedding Matrix. The model still loads without errors (no crash), but its vocabulary is destroyed.

What you will see:

  1. Baseline: The model successfully lists the colors of the rainbow.
  2. The Attack: We inject the corruption.
  3. The Defense: WeightsWatcher detects the hash mismatch and refuses to load the file.
  4. The "What If": The script forcibly loads the corrupted model to show the consequences. The output becomes incoherent.
[3] πŸ€– Generating Text (Baseline)...
    πŸ“ Prompt: 'The colors of the rainbow are red, orange, yellow,'
    βœ… Output: ...green, blue, indigo, and violet.

[4] 😈 SIMULATING ATTACK: Corrupting Vocabulary...
    ⚠️  Injected 1.0 MB of ZEROS at offset 10.0 MB.

[5] πŸ›‘οΈ  Attempting Secure Load...
    βœ… SUCCESS: Attack Blocked!
    πŸ›‘ LOG: Corruption detected in Chunk #1

[6] πŸ’€ DEMO: Forcing load to show damage...
    πŸ“ Prompt: 'The colors of the rainbow are red, orange, yellow,'
    πŸ’€ Output: ...black, car, dog, 2024, the, the...

Example 03: The Sentry (Active Defense) πŸ›‘οΈ

File: examples/03_fastapi_integration.py

Runs a live FastAPI server protected by a background Watchdog. This demonstrates "Event-Driven Security" where the API automatically shuts down if the model file is tampered with.

How to Run & Verify:

1. Start the Server (Terminal 1)

python examples/03_fastapi_integration.py

2. Open the Test Interface Open your web browser to: [http://localhost:8000/docs] This uses the built-in Swagger UI to let you interact with the API without needing complex curl commands.

3. Send a Valid Request

  • Click on the green POST /predict bar.
  • Click the Try it out button (top right).
  • Click the big blue Execute button.
  • Result: Scroll down to "Server response". You should see Code 200 and a JSON body:
    {
      "status": "success",
      "prediction": "class_1",
      "confidence": 0.98
    }

4. Attack the Model (Terminal 2) Keep the server running. Open a new terminal window and corrupt the model file on disk:

echo HACKED >> api_model.pt

Watch Terminal 1: You will see the Sentry detect the change and trip the kill-switch immediately.

5. Verify the Lockout (Browser)

  • Go back to your browser.
  • Click the blue Execute button again.
  • Result: The API now rejects the request. You will see Code 503 (Service Unavailable):
    {
      "detail": "🚨 SECURITY ALERT: System compromised. Model integrity check failed."
    }

πŸ—οΈ Architecture

The "Manifest" Protocol

WeightsWatcher splits models into 10MB chunks. The manifest contains the SHA-256 hash of every chunk, and the manifest itself is signed with an RSA Private Key.

The Sentry Pattern (Event-Driven Security)

Instead of re-hashing the model on every request (high latency), WeightsWatcher uses an OS-level file system watcher.

graph TD
    A[Attack: Malicious Write] -->|File Modified| B(OS Kernel Event)
    B -->|Notify| C{WeightsWatcher Sentry}
    C -->|Trigger| D[Parallel Integrity Scan]
    D -->|Fail| E[Global Panic Switch]
    E -->|Block| F["API Endpoints (503)"]
Loading

πŸ€– CI/CD Integration

This repo includes GitHub Actions workflows to automate your supply chain security.

  • verify_models.yml: Runs on PRs. Scans the repo to ensure all .pt files match their lock files.
  • release_model.yml: Runs on Tags (v*). Automatically signs release artifacts using a Private Key stored in GitHub Secrets.

πŸ—ΊοΈ Roadmap

  • Chunked Merkle Tree Hashing
  • RSA Digital Signatures
  • Parallel Processing (Multi-core hashing)
  • Active Sentry (Watchdog)
  • CLI Tool
  • Support for TensorFlow/Keras (.h5)
  • Integration with MLflow

License

MIT

About

WeightsWatcher is a cryptographic integrity verification system for Machine Learning artifacts

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages