Skip to content

An advanced multi-round reasoning research tool powered by Large Language Models (LLMs) that automatically decomposes complex queries, executes multi-round searches, and generates in-depth analysis reports.

Notifications You must be signed in to change notification settings

Kwen-Chen/DeepResearchScribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DeepResearchScribe - Intelligent Multi-Round Research Assistant

Python Streamlit License Code style: black

An advanced multi-round reasoning research tool powered by Large Language Models (LLMs) that automatically decomposes complex queries, executes multi-round searches, and generates in-depth analysis reports.

🎯 Project Overview

DeepResearchScribe is an intelligent research assistant that combines the power of large language models with advanced search capabilities to provide comprehensive research reports on complex topics. The system automatically breaks down complex research queries into multiple focused aspects, conducts iterative searches, and synthesizes the findings into structured, professional reports.

🌟 Results Showcase

Main Interface

Main Interface Clean and intuitive web interface for research queries

Search Content Analysis

Search Analysis Real-time search results and analysis interface

Research Process Demo

Usage Demo

Complete research workflow demonstration

πŸ“„ Sample Output

Check out our Current Development of AI and our Current Development of LLMs- a comprehensive analysis demonstrating the system's capability to generate detailed, structured research reports on complex topics like AI development trends, key participants, technological breakthroughs, and strategic implications.

Key Features

  • πŸ” Intelligent Search: Integrated with Jina Search API supporting re-ranking and content filtering
  • 🧠 Multi-Round Reasoning: LLM-powered multi-round search reasoning and analysis
  • πŸ“Š Structured Reports: Automatically decompose complex topics into multiple sections and generate structured reports
  • 🌐 Web Interface: Intuitive Streamlit-based web interface
  • πŸ”§ Flexible Configuration: Support for multiple LLM providers and local model deployment
  • 🎯 Smart Filtering: AI-powered content filtering and relevance scoring

πŸš€ Quick Start

Requirements

  • Python 3.8+
  • Internet connection (for API calls and searches)
  • Optional: GPU for local model deployment

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/DeepResearchScribe.git
    cd DeepResearchScribe
  2. Install dependencies:

    pip install -r requirements.txt
  3. Configure environment variables:

    cp .env.example .env
    # Edit .env file and add your API keys

Usage

Web Interface (Recommend)

streamlit run src/ui/streamlit_app.py

Command Line Interface

python scripts/run_cli.py "Your research topic"

Python API

from src.core.researcher import DeepResearcher

researcher = DeepResearcher()
result = researcher.run("Artificial Intelligence in Healthcare")
print(result['final_report'])

πŸ“ Project Structure

DeepResearchScribe/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ researcher.py        # Main research engine
β”‚   β”‚   β”œβ”€β”€ llm_connector.py     # LLM connection handler
β”‚   β”‚   β”œβ”€β”€ search_tool.py       # Search tool integration
β”‚   β”‚   └── integrator.py        # Content integration
β”‚   β”œβ”€β”€ ui/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   └── streamlit_app.py     # Web interface
β”‚   └── utils/
β”‚       β”œβ”€β”€ __init__.py
β”‚       β”œβ”€β”€ helpers.py           # Utility functions
β”‚       β”œβ”€β”€ parser.py            # Content parsing
β”‚       └── prompt_templates.py  # LLM prompts
β”œβ”€β”€ assets/                      # Demo images and resources
β”œβ”€β”€ config/                      # Configuration files
β”œβ”€β”€ example_report/              # Sample research reports
β”œβ”€β”€ llm_responses/              # Cached LLM responses
β”œβ”€β”€ scripts/                    # Execution scripts
β”œβ”€β”€ tests/                      # Test files
β”œβ”€β”€ requirements.txt            # Dependencies
β”œβ”€β”€ deploy.sh                   # Deployment script
└── README.md                   # Project documentation

πŸ”§ Configuration

Environment Variables

You need to configure the following environment variables in your .env file:

# Required: LLM API Configuration
DEEPSEEK_API_KEY=your_deepseek_api_key_here

# Required: Search API Configuration  
JINA_API_KEY=your_jina_api_key_here

# Optional: Local Model Server (if using local deployment)
VLLM_SERVER_URL=http://localhost:8000

# Optional: Content Filtering (requires LLM key for intelligent filtering)
ENABLE_CONTENT_FILTERING=true
FILTERING_MODEL=deepseek-chat  # or your preferred model

πŸ” Jina Search Setup

DeepResearchScribe uses Jina Search API for comprehensive web searches with advanced filtering capabilities:

  1. Get Jina API Key: Sign up at Jina AI and obtain your API key
  2. Configure Filtering: For intelligent content filtering, you need to provide an LLM API key (DeepSeek, OpenAI, etc.)
  3. Search Parameters: The system automatically optimizes search queries and applies relevance filtering
# Example search configuration
SEARCH_CONFIG = {
    'max_results': 20,
    'enable_reranking': True,
    'content_filtering': True,
    'relevance_threshold': 0.7
}

πŸ–₯️ Local Model Deployment

For enhanced privacy and control, you can deploy local LLM models using the provided deployment script. We recommend using Qwen3 series models for optimal performance and reliability.

🌟 Recommended Models: Qwen3 Series

We recommend the following Qwen3 models based on your hardware capabilities:

Model Parameters VRAM Required Use Case
Qwen/Qwen3-32B 32B 64GB+ Best performance, research servers
Qwen/Qwen3-14B 14B 28GB+ Balanced performance, mid-range GPUs
Qwen/Qwen3-8B 8B 16GB+ Good performance, consumer GPUs

Why Qwen3 Series?

  • Superior Reasoning: Excellent performance in multi-step reasoning and complex analysis
  • Research Optimized: Specifically tuned for research and analytical tasks
  • Multilingual Support: Strong capabilities in both English and Chinese content analysis
  • Long Context: Support for extended context windows (up to 131K tokens)
  • Open Source: Fully open source with permissive licensing for commercial use

Using the Deployment Script

The deploy.sh script includes automated local model deployment with Qwen3:

# Make the script executable
chmod +x deploy.sh

# Run deployment (includes Qwen3 model server setup)
./deploy.sh

Manual Local Model Setup

For custom local model deployment with Qwen3 series:

# Install LMDeploy for model serving
pip install lmdeploy

# Deploy Qwen3-32B model (recommended for servers)
lmdeploy serve api_server Qwen/Qwen3-32B \
    --model-name qwen3-32b \
    --session-len 131000 \
    --server-port 8000 \
    --max-batch-size 1 \
    --cache-max-entry-count 0.7 \
    --tp 4

# Deploy Qwen3-14B model (recommended for mid-range setups)
lmdeploy serve api_server Qwen/Qwen3-14B \
    --model-name qwen3-14b \
    --session-len 131000 \
    --server-port 8000 \
    --max-batch-size 2 \
    --cache-max-entry-count 0.8 \
    --tp 2

# Deploy Qwen3-8B model (recommended for consumer GPUs)
lmdeploy serve api_server Qwen/Qwen3-8B \
    --model-name qwen3-8b \
    --session-len 131000 \
    --server-port 8000 \
    --max-batch-size 4 \
    --cache-max-entry-count 0.9 \
    --tp 1

Configuration for Local Models

Update your .env file for local model usage with Qwen3:

# Local model configuration
USE_LOCAL_MODEL=true
VLLM_SERVER_URL=http://localhost:8000

# Choose your Qwen3 model
LOCAL_MODEL_NAME=qwen3-32b  # or qwen3-14b, qwen3-8b

# Optional: Model-specific settings
MODEL_MAX_TOKENS=131000
MODEL_TEMPERATURE=0.7

πŸ“ˆ Usage Examples

Basic Research

from src.core.researcher import DeepResearcher

# Initialize researcher
researcher = DeepResearcher()

# Conduct research
result = researcher.run("Climate change impact on agriculture")

# Access results
print("Final Report:", result['final_report'])
print("Search History:", result['search_history'])
print("Key Findings:", result['key_insights'])

Advanced Configuration

# Custom configuration
config = {
    'max_search_rounds': 10,
    'min_sources_per_topic': 5,
    'report_depth': 'comprehensive',
    'enable_citations': True
}

researcher = DeepResearcher(config=config)
result = researcher.run("Quantum computing applications", config=config)

Batch Processing

# Process multiple research topics
topics = [
    "Renewable energy trends 2024",
    "Artificial intelligence ethics",
    "Space exploration technologies"
]

results = []
for topic in topics:
    result = researcher.run(topic)
    results.append(result)

🌐 Web Interface Features

Main Interface

  • Topic Input: Enter your research query in natural language
  • Configuration Panel: Adjust search parameters and model settings
  • Progress Tracking: Real-time progress updates during research
  • Results Display: Structured presentation of findings

Search Analysis Interface

  • Source Tracking: Monitor search sources and credibility
  • Content Preview: Preview relevant content before integration
  • Relevance Scoring: AI-powered relevance assessment
  • Citation Management: Automatic citation generation and tracking

πŸ› Troubleshooting

Common Issues

  1. API Key Errors

    • Verify your API keys are correctly set in .env
    • Check API quotas and rate limits
    • Ensure network connectivity
  2. Local Model Issues

    • Check GPU memory availability
    • Verify model path and permissions
    • Monitor server logs for errors
  3. Search Quality Issues

    • Adjust relevance thresholds
    • Enable content filtering
    • Refine search queries
  4. Performance Optimization

    • Use local models for faster processing
    • Enable response caching
    • Adjust batch sizes for your hardware

🀝 Contributing

We welcome contributions! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Development Setup

# Install development dependencies
pip install -r requirements-dev.txt

# Run tests
python -m pytest tests/

# Format code
black src/ scripts/ tests/

# Lint code
flake8 src/ scripts/ tests/

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

πŸ“ž Contact

For questions or suggestions, please:

  • Submit an issue on GitHub
  • Contact the maintainer directly
  • Join our community discussions

πŸ“š Related Resources

⭐ Star History

If you find this project useful, please consider giving it a star on GitHub!


Note: This tool is designed for research and educational purposes. Always verify information from multiple sources and apply critical thinking to the generated reports.

About

An advanced multi-round reasoning research tool powered by Large Language Models (LLMs) that automatically decomposes complex queries, executes multi-round searches, and generates in-depth analysis reports.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published