A research assistant prototype that helps you quickly find, rank, and summarize scientific papers on a given topic using retrieval-augmented generation (RAG) with OpenAI GPT models.
The system fetches paper abstracts, creates embeddings, performs similarity search (FAISS), reranks results with a cross-encoder, and summarizes relevant papers for your query. Includes a simple Streamlit UI for interactive use.
-
Search scientific papers by topic and preview abstracts.
-
Semantic ranking using FAISS embeddings.
-
cross-encoder reranking for improved relevance.
-
Chunking support for both abstracts and full-text papers (PDFs).
-
Summarization of top relevant papers with an LLM (OpenAI GPT).
-
Streamlit-based interactive UI:
- Enter a topic
- Search abstracts
- View summaries
- Copy or download summaries
- Clone the repository:
git clone https://github.com/spiridonoff/research-assistant.git
cd mini-research-assistant- Create a virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows- Install dependencies:
pip install -r requirements.txt- Set your OpenAI API key:
export OPENAI_API_KEY="your_api_key_here" # Linux/Mac
setx OPENAI_API_KEY "your_api_key_here" # WindowsRun the Streamlit app:
./run.shor manually:
export PYTHONPATH=src
streamlit run src/app/main.pyWorkflow:
- Enter a topic to fetch related papers.
- Preview the first few abstracts.
- Enter a research query to search across abstracts.
- View ranked results and summaries.
- Copy or download the summaries for further use.
src/
├─ app/
│ └─ main.py # Streamlit UI
├─ rag/
│ ├─ io/
│ | ├─ fetch_abs.py
│ │ ├─ fetch_papers.py
│ │ └─ text_utils.py
│ ├─ index/
│ │ ├─ build_index_abs.py
│ │ ├─ build_index_paper.py
│ │ ├─ search_abs.py
│ │ └─ search_paper.py
│ ├─ pipelines/
│ │ └─ summarizer.py
├─ config.py # API keys and configuration
run.sh # Launcher script with PYTHONPATH
requirements.txt
- Add selection & download of specific papers.
- Integrate OLMo or other open source LLMs for research summaries.
- Improve prompt design for better summaries.
- Extend UI for follow-up questions using conversational LLM.
- This project is intended as a mini prototype / learning project.
- Designed to be modular: abstracts search, embedding, FAISS indexing, reranking, summarization, and UI can be extended independently.
- OpenAI API usage may incur costs depending on your queries.
This project is licensed under the MIT License. MIT License – feel free to reuse and modify.