Tinker is a full-stack application designed to explore advanced concepts in Retrieval-Augmented Generation (RAG). It features a two-stage, multi-hop RAG pipeline that leverages a hybrid of search techniques (keyword, semantic, and graph-based) to answer complex questions about a large corpus of data—in this case, a subset of Wikipedia.
The system is composed of three main parts:
- AI Service (
ai/): A Python backend built with Haystack and FastAPI that performs data ingestion, indexing, and the complex two-stage question-answering. - Backend (
be/): A lightweight Go WebSocket server that acts as a bridge between the frontend and the AI service. - Frontend (
fe/): A responsive React/TypeScript single-page application that provides the user interface for chatting with the AI and viewing information on a map.
The architecture is designed to decouple the user interface from the heavy-lifting AI processing. The Go backend serves as a simple message broker, while the Python service encapsulates all the complex RAG logic and database interactions.
graph TD
subgraph User Browser
A[Frontend - React/Vite]
end
subgraph Backend Services
B[Backend - Go WebSocket Server]
C[AI Service - Python/FastAPI]
end
subgraph Data Stores
D[Redis]
E[Elasticsearch]
F[Weaviate]
G[Neo4j]
end
A -- "WebSocket Connection" --> B
B -- "HTTP POST /ask" --> C
C -- "Read/Write (Tracking)" --> D
C -- "Read/Write (Keyword Search, Chunks)" --> E
C -- "Read/Write (Vector Search)" --> F
C -- "Read/Write (Graph Traversal, Hierarchy)" --> G
C -- "Response" --> B
B -- "WebSocket Message" --> A
style A fill:#F9F,stroke:#333,stroke-width:2px
style B fill:#bbf,stroke:#333,stroke-width:2px
style C fill:#f96,stroke:#333,stroke-width:2px
There are two primary workflows in the system: the one-time Data Indexing Workflow and the runtime Question Answering Workflow.
This process downloads, chunks, and indexes Wikipedia content into the various databases. The entry point is ai/wiki/index/main.py.
Execution Flow:
- Initialization (
main.py):- Sets up logging and initializes resources: Redis, Weaviate, Elasticsearch, and a Neo4j driver.
- Reads configuration from
config.yml, including the root category (e.g., "Dinosaurs"), data filepaths, and exclusion keywords.
- Download (
Downloader):- The
fetch_wiki_datafunction is called recursively. - For a given category, it uses the Wikipedia API to get a list of member pages and subcategories (
get_wiki_category_members). - It filters out unwanted pages/categories based on
exclude_keywords. - It downloads the HTML content of each page (
download_page). - It uses Redis to keep track of already downloaded pages and categories to avoid redundant work.
- The
- Chunk (
Chunker):- The
chunk_wiki_datafunction processes the downloaded HTML files. - It uses a custom Haystack component,
WikiPageChunker(ai/lib/wiki/index/chunk/wiki_page_chunker.py). This is a key component that:- Parses the HTML, splitting it into text chunks (based on
<p>,<li>tags). - Simultaneously creates a JSON object representing the page's structural hierarchy (e.g.,
Page -> H2 Section -> H3 Section). - Assigns a unique UUID to each chunk.
- Parses the HTML, splitting it into text chunks (based on
- The chunked documents and the hierarchy JSON are saved to the
.metadata/chunkdirectory.
- The
- Index (
Indexer):- The
index_wiki_datafunction orchestrates writing the chunked data to the three databases. - Elasticsearch: The raw text chunks are written to Elasticsearch for fast, full-text (BM25) search via
ElasticsearchDocumentStore. - Weaviate:
- An
OpenAIDocumentEmbeddercreates vector embeddings for each chunk. - The chunks, now enriched with embeddings, are written to Weaviate for semantic/vector search via
WeaviateDocumentStore.
- An
- Neo4j:
Neo4jPageGraphCreatorreads the hierarchy JSON file for each page. It translates this structure into a detailed graph, creatingPage,Section(withh2,h3,h4labels), andChunknodes. It connects them with relationships like:HAS_SECTION,:NEXT_SECTION,:FIRST_CHUNK, and:NEXT_CHUNKto preserve order and structure.Neo4jCategoryGraphCreatorthen adds another layer, connectingCategorynodes to their respectivePageand sub-Categorynodes.
- The
This is the core two-stage RAG pipeline, orchestrated by ai/wiki/rag/main.py and implemented in ai/wiki/rag/question_answer_async.py.
graph TD
%% User Interaction
subgraph User Interaction
U[User asks question]
end
%% AI RAG Pipeline
subgraph AI RAG Pipeline
Start[Start]
%% Stage 1
subgraph Stage 1 Hybrid Retrieval and Initial Answer
P1_Embed[Text Embedder]
P1_Weaviate[Weaviate Embedding Retriever]
P1_ES[Elasticsearch BM25 Retriever]
P1_Join[Document Joiner using Rank Fusion]
P1_Prompt[Phase One Prompt Builder]
P1_LLM[LLM Phase One QA]
P1_Decision[Need More Context Decision]
end
%% Stage 2
subgraph Stage 2 Graph Powered Context Expansion
P2_HierBuilder[Wiki Hierarchy Builder using Neo4j]
P2_LLM_Hier[LLM Predict Hierarchy Path]
P2_CtxCreator[Wiki Context Creator using Elasticsearch]
P2_Prompt[Phase Two Prompt Builder]
P2_LLM_Final[LLM Final QA]
end
FinalAnswer[Final Answer with References]
end
%% Flow Connections
U --> Start
Start -->|Question| P1_Embed
Start -->|Question| P1_ES
P1_Embed --> P1_Weaviate
P1_Weaviate -->|Docs| P1_Join
P1_ES -->|Docs| P1_Join
P1_Join -->|Grounding Docs| P1_Prompt
P1_Prompt --> P1_LLM
P1_LLM --> P1_Decision
P1_Decision -->|No| FinalAnswer
P1_Decision -->|Yes| P2_HierBuilder
P1_Join -->|Grounding Docs| P2_HierBuilder
P2_HierBuilder -->|Page Hierarchies| P2_LLM_Hier
Start -->|Question| P2_LLM_Hier
P2_LLM_Hier -->|Predicted Paths| P2_CtxCreator
P2_HierBuilder -->|Chunk Hierarchies| P2_CtxCreator
P2_CtxCreator -->|Expanded Context| P2_Prompt
Start -->|Question| P2_Prompt
P2_Prompt --> P2_LLM_Final
P2_LLM_Final --> FinalAnswer
%% Link Styles
linkStyle 10 stroke:#f00,stroke-width:2px,color:red
linkStyle 11 stroke:#0f0,stroke-width:2px,color:green
Execution Flow:
-
Request (
fe/web/src/components/Argonk.tsx->be/.../main.go->ai/.../main.py):- User types a question in the
Argonkchat interface and clicks send. - The React component sends the prompt over a WebSocket to the Go backend.
- The Go backend forwards the question via an HTTP POST request to the FastAPI endpoint
/askon the AI service.
- User types a question in the
-
Stage 1: Hybrid Retrieval:
- The
askfunction inquestion_answer_async.pyis invoked. - The question is simultaneously sent to two retrievers:
OpenAITextEmbedder->WeaviateEmbeddingRetriever: For semantic search.ElasticsearchBM25Retriever: For keyword search.
DocumentJoiner: The results from both retrievers are merged and re-ranked using "reciprocal rank fusion" to get the best initial set of document chunks (grounding docs).PromptBuilder&OpenAIGenerator(Phase 1 LLM): These grounding docs are formatted into a prompt (usingphase_1_qa.pytemplate) and sent to the LLM. The prompt explicitly asks the LLM to answer the question and determine if the context is sufficient, returning a structured JSON response conforming to thePhase1QAPydantic model.
- The
-
Decision Point:
- The system parses the LLM's JSON response.
- If
need_more_contextisfalse, the answer is considered complete. The workflow jumps to the end, formatting the response with references. - If
need_more_contextistrue, the system proceeds to Stage 2.
-
Stage 2: Graph-Powered Context Expansion:
WikiHierarchyBuilder: Takes the grounding docs from Stage 1. For each unique page title in those docs, it queries Neo4j to reconstruct the full section hierarchy of that page.PromptBuilder&OpenAIGenerator(Hierarchy Predictor LLM): The reconstructed hierarchies and the original question are sent to the LLM (using thephase_2_hierarchy.pytemplate). This LLM's sole job is to predict which path within the hierarchy (e.g.,["Dinosaur", "Paleobiology", "Size"]) is most likely to contain the answer. It returns this path in a structured JSON format (HierarchyPathDatamodel).WikiContextCreator:- Takes the predicted path(s) from the LLM.
- For each path, it navigates the corresponding
chunks_hierarchyto find all the chunk UUIDs under that path. - It then fetches the full content of these chunks from Elasticsearch (using
document_store.filter_documents). - It assembles these chunks into a single, structurally coherent block of text, creating a highly relevant and expanded context.
PromptBuilder&OpenAIGenerator(Final QA LLM): This expanded context is used to build a final prompt (usingphase_2_qa.pytemplate). The LLM generates the final, comprehensive answer based on this rich context, again providing document IDs for citation.
-
Response (
ai/...->be/...->fe/...):build_answer_with_reference: The final JSON answer is processed. The document IDs are used to look up the metadata (title, h2, h3, etc.) of the source chunks.- The final payload, containing the answer text and a list of structured references, is sent back to the Go backend, which relays it to the frontend via WebSocket.
Argonk.tsx: Theonmessagehandler parses the response. It uses the reference metadata to construct clickable Wikipedia URLs (https://en.wikipedia.org/wiki/Page_Title#Section_Name) and displays the answer and references to the user.
wiki/index/: The data ingestion and indexing pipeline.main.py: Entry point for the entire indexing process.downloader.py: Handles downloading Wikipedia pages.chunker.py: Orchestrates the chunking process.indexer.py: Orchestrates writing data to the three databases.
wiki/rag/: The question-answering RAG pipeline.main.py: FastAPI entry point, initializes pipelines and resources.question_answer_async.py: Core asynchronous implementation of the two-stage RAG logic.
lib/wiki/: Reusable components and helpers for theai/service.index/chunk/: Logic for chunking.wiki_page_chunker.pyis the custom Haystack component.index/download/: Helpers for the downloader.index/graph/:page_graph_creator.pyandcategory_graph_creator.pycontain the logic for building the Neo4j graph.rag/components/: Custom Haystack components for the RAG pipeline, likewiki_hierarchy_builder.pyandwiki_context_creator.py.rag/helpers/: Utility functions for RAG, e.g., processing hierarchies.rag/models/: Pydantic models (Phase1QA,Phase2QA, etc.) for structured LLM input/output.rag/pipelines/: Defines the HaystackPipelineobjects for the hybrid and graph stages.rag/templates/: Contains the large, detailed prompt templates for the LLMs.
config.yml: Central configuration for filepaths, model names, API endpoints, etc.docker-compose.databases.yml: Defines the four backing data services.
chat/cmd/ws/main.go: A single-file application that:- Creates a WebSocket server at
/ws. - Listens for incoming messages from the frontend.
- Forwards these messages as HTTP POST requests to the Python AI service.
- Receives the HTTP response and sends it back to the frontend over the WebSocket.
- Creates a WebSocket server at
src/:App.tsx: The main layout component, splitting the screen betweenKnograandArgonk.components/Argonk.tsx: The main chat interface. It manages the WebSocket connection, handles user input, displays the conversation history, and crucially, parses the AI's response to create clickable reference links.main.tsx: The entry point for the React application.
-
Prerequisites:
- Docker and Docker Compose
- Python 3.10+ and pip
- Go 1.23+
- Node.js and npm/yarn
-
Configuration:
- Navigate to the
ai/directory. - Copy
.env.sampleto.envand fill in yourNEO4J_PASSWORDandOPENAI_API_KEY. - Open
ai/config.ymland verify the filepaths underindex.filepath,rag.log_filepath, etc., match your system's directory structure. - Open
ai/docker-compose.databases.ymland ensure thedevicepaths for the volumes point to valid directories on your host machine where you want to store the database data.
- Navigate to the
-
Launch Databases: From the
ai/directory, start all data services:docker compose -f docker-compose.databases.yml up -d
-
Install AI Dependencies: From the
ai/directory, create a virtual environment and install packages:python -m venv venv source venv/bin/activate pip install -r requirements.txt # (Assuming a requirements.txt exists, otherwise install from imports)
-
Install Backend Dependencies: From the
be/chat/directory:go mod tidy
-
Install Frontend Dependencies: From the
fe/web/directory:npm install
You need to run each part of the stack in a separate terminal.
-
Run the Indexing Pipeline (One-time setup): From the
ai/directory (with the virtual environment activated):python wiki/index/main.py
This will take a significant amount of time as it downloads, chunks, and indexes thousands of pages.
-
Start the AI RAG Service: From the
ai/directory (with the virtual environment activated):fastapi dev wiki/rag/main.py
This will start the FastAPI server, typically on
http://localhost:8000. -
Start the Go WebSocket Backend: From the
be/chat/directory:go run ./cmd/ws/main.go
This will start the WebSocket server, typically on
localhost:8080. -
Start the React Frontend: From the
fe/web/directory:npm run dev
This will start the Vite development server and open the application in your browser, typically at
http://localhost:5173. You can now interact with the chat interface.