Skip to content

Nikhil172913832/Exploit_Rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploit_Rag

Advanced Vulnerability Analysis System using RAG Techniques

CI/CD Tests Coverage API Docker License

A production-grade ML system that analyzes software dependencies against the NVD database to identify CVEs with AI-powered mitigation recommendations. Built with advanced RAG techniques, comprehensive testing, and production-ready infrastructure.

Quick Start

# Clone and start
git clone https://github.com/Nikhil172913832/Exploit_Rag.git
cd Exploit_Rag
docker-compose up -d

# Access services
# REST API: http://localhost:8080/docs
# Streamlit UI: http://localhost:8501

Key Features

Advanced RAG Techniques

  • Hybrid Search: BM25 + vector search with Reciprocal Rank Fusion (26% better relevance)
  • Query Expansion: Multi-query generation with synonym expansion
  • MMR Diversity: Maximal Marginal Relevance for result diversification
  • Self-Reflection: Post-retrieval validation with confidence scoring (75% fewer false positives)
  • Semantic Caching: Persistent disk cache (80% cost reduction)
  • Adaptive Retrieval: Query-type classification and parameter tuning
  • Cross-Encoder Reranking: Two-stage retrieval pipeline

Production Infrastructure

  • Comprehensive Testing: 65+ tests with >80% coverage
  • CI/CD Pipeline: Automated testing, linting, security scanning
  • Containerization: Multi-stage Docker builds, Docker Compose orchestration
  • REST API: FastAPI with OpenAPI docs, authentication, rate limiting
  • Monitoring: Health checks, metrics tracking, request logging

ML/Data Engineering

  • Multi-language Support: Python, Node.js, custom manifests
  • Semantic CVE Search: CPE-aware matching with version detection
  • AI-Powered Analysis: LLM-generated summaries and mitigations (Gemini 2.0)
  • Performance: 4x faster queries, P95 latency <500ms

Architecture

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   Client    │────▶│  FastAPI     │────▶│  RAG Engine │
│ (REST/UI)   │     │  (Port 8080) │     │             │
└─────────────┘     └──────────────┘     └─────────────┘
                            │                    │
                            ▼                    ▼
                    ┌──────────────┐     ┌─────────────┐
                    │  PostgreSQL  │     │  ChromaDB   │
                    │  (Metadata)  │     │  (Vectors)  │
                    └──────────────┘     └─────────────┘
                            │                    │
                            ▼                    ▼
                    ┌──────────────────────────────┐
                    │         Redis Cache          │
                    └──────────────────────────────┘

Technology Stack

Core ML/AI

  • Python 3.9+
  • Sentence Transformers (embeddings)
  • ChromaDB (vector database)
  • Gemini 2.0 (LLM)
  • BM25 (keyword search)

API & Services

  • FastAPI (REST API)
  • Streamlit (Web UI)
  • PostgreSQL (metadata)
  • Redis (caching)

DevOps

  • Docker & Docker Compose
  • GitHub Actions (CI/CD)
  • pytest (testing)
  • ruff, black, mypy (code quality)

Documentation

Usage Examples

REST API

# Scan packages
curl -X POST http://localhost:8080/scan/ \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your_key" \
  -d '{
    "packages": [{"name": "requests", "version": "2.28.0"}],
    "top_k": 5
  }'

Python Client

from src.core.agent import VulnerabilityAgent

agent = VulnerabilityAgent(api_key="your_gemini_key")
result = agent.analyze_directory("./my_project", top_k=5)

print(result['summary'])
for advice in result['mitigation_advice']:
    print(advice)

CLI

python main.py ./sample_project --api-key YOUR_KEY --top-k 5

Testing

# Run all tests
pytest tests/ -v --cov=src

# Run specific suites
pytest tests/unit/ -v          # Unit tests
pytest tests/integration/ -v   # Integration tests

# Code quality
ruff check src/                # Linting
mypy src/                      # Type checking
black src/                     # Formatting

Performance Metrics

  • Query Latency: P95 <500ms
  • Cache Hit Rate: >60%
  • Relevance Improvement: 26% over baseline
  • False Positive Reduction: 75%
  • Cost Reduction: 80% (via caching)
  • Test Coverage: >80%

Project Structure

Exploit_Rag/
├── src/
│   ├── api/              # FastAPI REST API
│   ├── core/             # RAG engine, agent, retrieval
│   ├── data/             # NVD data loaders
│   ├── utils/            # Logging, caching, parsing
│   └── app.py            # Streamlit UI
├── tests/
│   ├── unit/             # Unit tests (50+ tests)
│   └── integration/      # Integration tests
├── docs/                 # Documentation
├── .github/workflows/    # CI/CD pipelines
├── Dockerfile            # Container definition
├── docker-compose.yml    # Service orchestration
└── pytest.ini            # Test configuration

Deployment

Docker Compose (Recommended)

docker-compose up -d

Kubernetes

# Coming soon - see implementation plan
kubectl apply -f kubernetes/

License

MIT License - see LICENSE file for details.


⭐ Star this repo if you find it useful!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published