Skip to content

A sophisticated Research Assistant powered by GraphRAG technology for analyzing documents and research data.

Notifications You must be signed in to change notification settings

kliewerdaniel/basicbot

Repository files navigation

BasicBot πŸ€–

A sophisticated Research Assistant powered by GraphRAG technology for analyzing documents and research data.

Python Node.js FastAPI Next.js Neo4j Ollama

🌟 Overview

BasicBot is an advanced research assistant that leverages Graph Retrieval-Augmented Generation (GraphRAG) to provide intelligent analysis of documents, research papers, and technical content. Built with modern AI stack, it combines local LLM inference with graph-based knowledge representation for superior document understanding and question answering.

Key Technologies

  • FastAPI Backend - High-performance async API server
  • Next.js Frontend - Modern React-based user interface
  • Neo4j Graph Database - Advanced graph data storage and querying
  • Ollama Integration - Local LLM inference with Granite models
  • Vector Embeddings - Semantic search and similarity matching
  • RLHF Adaptation - Continuous learning from user interactions

✨ Features

πŸ” Intelligent Research Analysis

  • Multi-modal Retrieval: Hybrid search combining semantic vectors and graph relationships
  • Document Analysis: Advanced processing of various document types
  • Context-Aware Responses: Maintains conversation history and adapts to user needs
  • Citation Tracking: Automatically cites document sources in responses

πŸ—οΈ Advanced Architecture

  • GraphRAG Implementation: Leverages Neo4j's graph data science capabilities
  • Adaptive RLHF: Learns and improves response quality over time
  • Plugin Architecture: Extensible system for additional data sources and models
  • Real-time Evaluation: Built-in performance metrics and quality grading

🎨 Modern User Experience

  • Responsive Web Interface: Clean, intuitive design with dark mode
  • Real-time Chat: Streaming responses with typing indicators
  • Document Management: Upload and organize research materials
  • System Monitoring: Comprehensive health checks and metrics

πŸ”§ Developer Features

  • Comprehensive API: RESTful endpoints with automatic documentation
  • Evaluation Framework: Built-in testing and performance measurement
  • Modular Design: Clean separation of concerns for easy maintenance
  • Docker Integration: Containerized deployment with docker-compose

πŸ›οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Next.js       β”‚    β”‚    FastAPI       β”‚    β”‚     Neo4j       β”‚
β”‚   Frontend      │◄──►│    Backend       │◄──►│   Graph DB      β”‚
β”‚   (Port 3000)   β”‚    β”‚   (Port 8000)    β”‚    β”‚   (Port 7687)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚                       β”‚
         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Ίβ”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                 β”‚     Ollama Models
                                 β”‚  (Port 11434)
                                 └───────────────────────

Core Components

  1. Frontend Layer

    • React 19 with Next.js 14
    • TypeScript for type safety
    • Radix UI components
    • Tailwind CSS styling
  2. Backend Layer

    • FastAPI with async support
    • Modular service architecture
    • Pydantic data validation
    • CORS-enabled for frontend integration
  3. Data Layer

    • Neo4j graph database with GDS
    • Vector embeddings with similarity search
    • Schema-based data modeling
    • Redis for caching and sessions
  4. AI/ML Layer

    • Ollama integration for local inference
    • Granite4 micro model for efficiency
    • MXBAI embeddings for semantic search
    • Adaptive RLHF learning system

πŸ“‹ Prerequisites

  • Python 3.9+ with pip package manager
  • Node.js 16+ with npm package manager
  • Docker Desktop for containerized services
  • Ollama for local LLM inference
  • 4GB+ RAM recommended for optimal performance

System Requirements

Component Minimum Recommended
RAM 8GB 16GB+
CPU 4 cores 8+ cores
Storage 20GB 50GB+
Network Stable internet High-speed

πŸš€ Quick Start

1. Clone and Setup

git clone https://github.com/kliewerdaniel/basicbot.git
cd basicbot

2. Initial Setup

# Run comprehensive setup script
./setup.sh

This script will:

  • Create Python virtual environment
  • Install all dependencies
  • Setup Docker containers (Neo4j, Redis)
  • Pull required Ollama models
  • Create database schema and indexes
  • Perform initial data ingestion if files are present

3. Start the Application

# Start all services
./start.sh

4. Access the Application

πŸ“– Usage

Web Interface

  1. Navigate to the web interface
  2. Upload documents through the document management panel
  3. Ask questions in the chat interface
  4. Review responses with source citations and relevance scores

API Usage

Chat Endpoint

curl -X POST http://localhost:8000/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the main approaches to neural network optimization?",
    "chat_history": [],
    "session_id": "optional-session-id"
  }'

Document Search

curl "http://localhost:8000/api/search?q=neural%20network%20optimization&limit=10"

System Health

curl http://localhost:8000/api/health

Data Ingestion

CSV Data Files

# Ingest CSV data files
python3 scripts/ingest_data.py --csv data/your_data.csv --create-indexes

Research Papers

# Ingest PDF research papers
python3 scripts/ingest_research_data.py --directory data/research_papers/

πŸ”§ Configuration

Environment Variables

Variable Default Description
NEO4J_URI bolt://localhost:7687 Neo4j connection URI
NEO4J_USERNAME neo4j Neo4j username
NEO4J_PASSWORD research2025 Neo4j password
REDIS_URL redis://localhost:6379 Redis connection URL
OLLAMA_HOST localhost:11434 Ollama server address
PORT 8000 FastAPI server port

Model Configuration

Models are configured in data/persona.json:

{
  "name": "Research Assistant",
  "ollama_model": "granite4:micro-h",
  "rlhf_thresholds": {
    "retrieval_required": 0.6,
    "citation_requirement": 0.8,
    "formality_level": 0.7
  }
}

πŸ§ͺ Evaluation & Testing

Running Evaluations

# Run comprehensive evaluation suite
python3 evaluation/run_evaluation.py

Performance Metrics

The system provides several evaluation metrics:

  • Retrieval Quality: Precision and recall of document retrieval
  • Response Accuracy: Alignment with ground truth answers
  • Context Relevance: Usefulness of retrieved documents
  • Response Quality: Readability and completeness scores

Test Datasets

Evaluation datasets are located in evaluation/datasets/:

  • research_assistant_v1.json - General research questions
  • stress_tests.json - Edge cases and performance limits

πŸ—οΈ Development

Project Structure

basicbot/
β”œβ”€β”€ frontend/                 # Next.js application
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ app/             # Next.js app router
β”‚   β”‚   β”œβ”€β”€ components/      # React components
β”‚   β”‚   └── lib/             # Utilities and configurations
β”‚   └── package.json
β”œβ”€β”€ scripts/                  # Python core logic
β”‚   β”œβ”€β”€ eps_reasoning_agent.py
β”‚   β”œβ”€β”€ eps_retriever.py
β”‚   β”œβ”€β”€ graph_schema.py
β”‚   └── ingest_*.py
β”œβ”€β”€ evaluation/               # Testing and metrics
β”‚   β”œβ”€β”€ run_evaluation.py
β”‚   β”œβ”€β”€ metrics.py
β”‚   └── datasets/
β”œβ”€β”€ data/                     # Sample data and configurations
β”œβ”€β”€ test_*.py                 # Test scripts
β”œβ”€β”€ main.py                   # FastAPI application
β”œβ”€β”€ requirements.txt          # Python dependencies
β”œβ”€β”€ docker-compose.yml        # Container orchestration
└── README.md

Setting up Development Environment

# Backend development
cd basicbot
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-dev.txt  # If available

# Frontend development
cd frontend
npm install
npm run dev

# Database development (in separate terminal)
docker-compose up neo4j redis

Running Tests

# Backend tests
python3 -m pytest

# Frontend tests
cd frontend
npm test

# Integration tests
./test.sh

πŸ“š API Reference

Core Endpoints

Endpoint Method Description
/api/chat POST Main chat interface with GraphRAG
/api/search GET Direct document search
/api/health GET System health check
/api/status GET Detailed system status
/api/ingest POST Trigger data ingestion
/api/evaluate POST Run evaluation suite

Request/Response Examples

POST /api/chat

Request:

{
  "query": "What are the benefits of using convolutional neural networks?",
  "chat_history": [
    {
      "role": "user",
      "content": "How do neural networks work?"
    },
    {
      "role": "assistant",
      "content": "Neural networks are computational models..."
    }
  ],
  "session_id": "session-123"
}

Response:

{
  "response": "Convolutional neural networks offer several key benefits...",
  "context_used": [...],
  "quality_grade": 0.85,
  "retrieval_method": "hybrid",
  "retrieval_performed": true,
  "sources": [...],
  "session_id": "session-123"
}

πŸ“Š Performance & Benchmarks

Typical Performance

  • Query Response Time: 2-5 seconds for complex questions
  • Document Ingestion: ~1000 documents/hour
  • Memory Usage: 4-8GB during normal operation
  • Concurrent Users: 10-20 simultaneous sessions

Scaling Considerations

  • Database: Neo4j can handle millions of documents
  • LLM: Ollama supports multiple concurrent requests
  • Frontend: Next.js handles high traffic efficiently
  • Caching: Redis layer improves response times for repeated queries

πŸ”’ Security & Privacy

  • Local AI: All processing happens locally using Ollama
  • No Data Transmission: Documents stay on your system
  • Container Isolation: Services run in isolated Docker containers
  • Input Sanitization: All inputs are validated and sanitized
  • Session Management: Secure session handling with UUIDs

πŸ› Troubleshooting

Common Issues

Application won't start

# Check Docker services
docker-compose ps

# Check Ollama status
ollama list

# Verify ports are available
lsof -i :8000,3000,7474,7687

Poor response quality

# Check data ingestion
python3 -c "from scripts.retriever import Retriever; r=Retriever(); print(len(r.retrieve_context('test',1)))"

# Review RLHF thresholds
cat data/persona.json

Database connection errors

# Verify Neo4j is running
curl http://localhost:7474

# Check connection settings
docker-compose logs neo4j

Getting Help

  • Check the Health Endpoint: /api/health for system status
  • Review Logs: Check container logs with docker-compose logs
  • Run Diagnostics: Execute ./test.sh for system diagnostics

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • Neo4j for graph database technology
  • Ollama for local LLM capabilities
  • FastAPI for excellent Python web framework
  • Next.js for modern React development
  • Open Source Community for development tools and libraries

πŸ”— Links


BasicBot - Transforming research analysis with GraphRAG technology πŸš€

About

A sophisticated Research Assistant powered by GraphRAG technology for analyzing documents and research data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published