Catch PII leaks to LLMs before they hit production.
Privalyse CLI is a static analysis tool that builds a Semantic Data Flow Graph of your AI application. It traces PII from source to AI sinkβdetecting privacy violations that regex-based tools miss.
- β Traditional Linter: "Variable
user_emailused in line 42." - β Privalyse: "User Email (Source) β Prompt Template β OpenAI API (Sink) = Privacy Leak"
Privalyse is purpose-built for LLM-integrated applications. It detects when sensitive user data is being sent to:
| Provider | Support |
|---|---|
| OpenAI (GPT-4, o1, Embeddings) | β Full |
| Anthropic (Claude) | β Full |
| Google (Gemini, Vertex AI) | β Full |
| Mistral AI | β Full |
| Groq | β Full |
| Cohere | β Full |
| Ollama (Local LLMs) | β Full |
| LangChain / LlamaIndex | β Full |
| Hugging Face | β Full |
| Generic HTTP to AI APIs | β Full |
privalyse-mask is our companion library for masking PII before sending it to LLMs.
Privalyse CLI automatically recognizes privalyse-mask usage and won't flag already-masked data as leaks.
from privalyse_mask import PrivalyseMasker
from openai import OpenAI
masker = PrivalyseMasker()
client = OpenAI()
# User input with PII
user_input = "My name is Peter and my email is peter@example.com"
# β
Mask before sending to LLM
masked_text, mapping = masker.mask(user_input)
# -> "My name is {Name_x92} and my email is {Email_abc123}"
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": masked_text}] # β
Safe - masked data
)
# Restore original values in response
final_response = masker.unmask(response.choices[0].message.content, mapping)Privalyse CLI will:
- β
Not flag the
masked_textbeing sent to OpenAI (it's sanitized) β οΈ Flag if you senduser_inputdirectly without masking
pip install privalyse-cli
privalyse
# β
Done. Check scan_results.md# .github/workflows/privacy.yml
name: AI Privacy Scan
on: [push, pull_request]
jobs:
privalyse:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run Privalyse
uses: privalyse/privalyse-cli@v0.3.1# .pre-commit-config.yaml
repos:
- repo: local
hooks:
- id: privalyse
name: Privalyse AI Privacy Scan
entry: privalyse
language: system
pass_filenames: false- Getting Started
- Integration Guide (CI/CD, Pre-commit)
- Configuration (Rules, Policies)
- Architecture
Specialized checks for LLM-integrated applications.
- Prevents: Sending sensitive customer data to model prompts
- Audits: OpenAI, Anthropic, Google Gemini, LangChain, and more
- Recognizes:
privalyse-maskand other sanitization libraries - Tracks: Data flow from user input β prompt β AI API
Detects hardcoded API keys, tokens, and credentials.
- Supports: AWS, Stripe, OpenAI, Slack, Anthropic, and generic high-entropy strings
Identifies PII leaking into logs, external APIs, or analytics.
- Detects: Emails, Phone Numbers, Credit Cards, SSNs, Names, Addresses
- Context Aware: Understands variable names like
user_emailorcustomer_ssn
Maps data flows to ensure compliance.
- Flags: Data transfers to non-EU AI providers
- Verifies: Usage of sanitization/masking functions before data egress
Privalyse automatically recognizes these sanitization patterns and won't flag sanitized data:
| Library/Pattern | Recognition |
|---|---|
privalyse-mask (PrivalyseMasker.mask()) |
β Full |
presidio (Microsoft Presidio) |
β Full |
scrubadub |
β Full |
Custom functions with: mask, anonymize, hash, encrypt, redact, sanitize |
β Full |
Masked text patterns: {Name_xyz}, {Email_abc} |
β Full |
Privalyse is agent-friendly. Get structured JSON output for autonomous remediation:
privalyse --format json --out privalyse_report.jsonAI coding agents can read the report and automatically fix privacy leaks.
- Python Support (Full AST Analysis)
- JavaScript/TypeScript Support (AST & Regex)
- Cross-File Taint Tracking
- privalyse-mask Integration
- VS Code Extension (Coming Soon)
- Custom Rule Engine
We love contributions! Check out CONTRIBUTING.md to get started.
MIT License. See LICENSE for details.

