This project is a lightweight local chatbot built using llama.cpp with a Gradio web interface.
It supports multiple open-source language models, allows switching between them instantly, and includes full chat-history management — all running fully offline on your own machine.
- TinyLlama-1.1B-Chat (Q8_0) — ultra-fast and lightweight
- Mistral-7B-Instruct (Q2_K) — stronger reasoning with low memory usage
- DeepSeek-R1-Qwen3-8B (Q4_K_XL) — deeper answers with slower speed
- Clean, responsive browser interface
- One-click model switching
- Smooth message display
- Download current conversation as JSON
- Upload & load past chat histories
- All files saved inside the
history/directory
- No API calls, no network dependency
- Ideal for private, offline, or on-device use
- No API calls, no network dependency
- Ideal for private, offline, or on-device use
mini_chatbot/
├── app.py # Main launcher for the Gradio UI
├── chatbot.py # Backend logic + llama.cpp wrapper
├── download.sh # Downloads all model files
├── requirements.txt # Python dependencies
├── history/ # Stored chat histories (JSON)
└── ui/ # UI helper components
-
Clone the repository
git clone https://github.com/your-username/mini_chatbot.gitcd mini_chatbot -
(Optional) Create a virtual environment
python -m venv .venv source .venv/bin/activate
-
Install dependencies
pip install -r requirements.txt -
Download models Use the included script:
chmod +x download.sh ./download.sh
This fetches all required GGUF model files into the proper folders.
Start the Gradio UI:
python app.py
Then open:
http://127.0.0.1:7860
You can now:
✔ Select a model
✔ Chat normally
✔ Save / load chat history
✔ Switch models mid-session
-
TinyLlama is best for speed and quick replies.
-
Mistral < Qwen are better for quality and depth, but will take more processing time, especially on CPU.
-
History files are standard JSONL format and editable.
-
All computation is local — suitable for private or offline applications.



