Production-ready advanced notes application with audio transcription using:
- Frontend: Vite + React + TypeScript + Tailwind CSS
- Backend: Flask + OpenAI GPT-4o-mini-transcribe (API-based)
- Python tooling:
uv(fast package manager from Astral) - Deployment-friendly: No GPU required, pure API calls
Complete documentation: See docs/ folder
- Environment Setup - OpenAI API keys, configuration
- Audio clips (S3) - Optional audio storage + upload/playback flow
- Technical Specification - AI categorization architecture
- Documentation Index - Full documentation catalog
- Node.js 20.19+ or 22.12+ (Vite requirements)
- Python 3.11+
- uv (install with:
curl -LsSf https://astral.sh/uv/install.sh | sh) - OpenAI API key (for transcription + AI categorization) - See environment setup
cd backend
# Install dependencies (creates .venv and installs from pyproject.toml)
uv sync
# Create .env file with your OpenAI API key
echo "OPENAI_API_KEY=sk-your-key-here" > .env
# Optional: enable audio clips (S3-backed)
# echo "AUDIO_CLIPS_ENABLED=true" >> .env
# echo "S3_BUCKET=your-bucket-name" >> .env
# Start Flask server on http://localhost:5001
./run.shFirst run: No model download needed! Uses OpenAI API for transcription.
Open a new terminal:
cd frontend
# Install Node dependencies
npm install
# Start Vite dev server on http://localhost:5173
npm run devVisit http://localhost:5173 and record or upload audio (MP3, WAV, WebM, M4A, etc.) to see the transcription.
chisos/
ββ README.md # This file
ββ .gitignore
ββ Makefile # Optional: run both servers with 'make dev'
β
ββ backend/ # Flask + OpenAI API
β ββ pyproject.toml # uv project definition
β ββ uv.lock # Dependency lock file (auto-generated)
β ββ run.sh # Dev server launcher
β ββ wsgi.py # WSGI entry point
β ββ app/
β ββ __init__.py # Flask app factory with CORS
β ββ asr.py # OpenAI transcription API client
β ββ routes.py # REST API endpoints (11 total)
β ββ services/ # AI categorization + storage
β
ββ frontend/ # Vite + React + TS + Tailwind
ββ package.json
ββ vite.config.ts
ββ tailwind.config.ts
ββ postcss.config.js
ββ tsconfig.json
ββ index.html
ββ .env.local # API URL: VITE_API_URL=http://localhost:5001
ββ src/
ββ main.tsx
ββ App.tsx
ββ index.css # Tailwind directives
ββ components/
ββ AudioUploader.tsx # Upload UI + transcription display
-
Transcription (
app/asr.py):- Uses OpenAI GPT-4o-mini-transcribe API
- Supports: MP3, WAV, WebM, M4A, MP4, MPEG, MPGA (up to 25MB)
- No local model download required
- Fast, reliable, API-based transcription
-
AI Categorization (
app/services/ai_categorizer.py):- Uses OpenAI GPT-4o-mini for semantic analysis
- Generates folder paths, filenames, tags automatically
- Structured JSON outputs for reliability
-
Storage (
app/services/storage.py):- SQLite database with FTS5 full-text search
- Database-only storage (no file system)
- CRUD operations, folder hierarchy, tag management
-
REST API (
app/routes.py):- Endpoints for transcription, notes, folders, tags, search, and Ask Notes
- Returns JSON responses with comprehensive metadata
-
CORS: Enabled via
flask-corsso Vite dev server (:5173) can call Flask (:5001)
- Vite: Fast dev server with HMR
- React + TypeScript: Type-safe components
- Tailwind CSS: Utility-first styling
- AudioUploader component: File upload β POST to
/api/transcribeβ display transcript + metadata
Request:
- Multipart form-data with
filefield, OR - Raw audio bytes in body
Response:
{
"text": "transcribed speech text",
"meta": {
"device": "openai-api",
"model": "gpt-4o-mini-transcribe",
"language": "en",
"duration": 3.45
},
"categorization": {
"note_id": "abc123",
"folder_path": "Ideas/Product",
"filename": "new_feature_idea.txt",
"tags": ["product", "feature"],
"confidence": 0.95,
"reasoning": "This appears to be a product feature idea..."
}
}Error:
{
"error": "error message"
}Response:
{
"status": "ok"
}Ask a natural-language question about your notes. The backend creates a structured query plan and performs hybrid retrieval (filters + full-text search + embeddings) before generating a markdown answer with sources.
cd backend
# Install/update dependencies
uv sync
# Add a new dependency
uv add <package-name>
# Run Flask dev server
./run.sh
# Or manually:
uv run flask run --host 0.0.0.0 --port 5001cd frontend
# Install dependencies
npm install
# Dev server (http://localhost:5173)
npm run dev
# Production build
npm run build
# Preview production build
npm run preview
# Lint
npm run lint# Start both backend and frontend
make dev- No GPU required: Pure API-based transcription
- No model downloads: Everything runs via OpenAI API
- Lightweight: Only ~30 Python packages (vs 166 with local models)
- Platform-agnostic: Works on any OS with Python 3.11+
- Easy scaling: API handles all compute, just scale your Flask app
Check API usage via the meta.model field in responses.
"OPENAI_API_KEY is required"
- Create
backend/.envfile with your API key - Get key from: https://platform.openai.com/api-keys
- See environment setup
Ask Notes returns 500
- Verify
OPENAI_API_KEYis set - Ensure your OpenAI project has access to the configured models:
OPENAI_MODEL(default:gpt-4o-mini)OPENAI_EMBEDDING_MODEL(default:text-embedding-3-small)
Transcription fails with 401 error
- Check your API key is valid
- Ensure you have credits/billing set up on OpenAI
Transcription too slow
- OpenAI API typically responds in 1-3 seconds
- Check your internet connection
- Verify API status: https://status.openai.com/
CORS errors
- Ensure backend is running on
:5001 - Check
frontend/.env.localhasVITE_API_URL=http://localhost:5001
Build errors
- Delete
node_modules/and runnpm installagain - Verify Node version:
node -v(should be 20.19+ or 22.12+)
Audio recording/upload fails
- Check browser console for errors
- Ensure microphone permissions are granted
- OpenAI supports: MP3, MP4, MPEG, MPGA, M4A, WAV, WebM (max 25MB)
- CORS: Restrict origins in production (edit
app/__init__.py) - File size limits: Add max file size checks in
routes.py - Rate limiting: Use
flask-limiter - Authentication: Add API keys or OAuth
- Streaming: Implement chunked upload + streaming ASR (if NeMo supports)
- Deployment: Use Gunicorn/uWSGI instead of Flask dev server
uv add gunicorn uv run gunicorn -w 4 -b 0.0.0.0:5001 wsgi:app
- Build for production:
npm run build # Output in frontend/dist/ - Environment variables: Set
VITE_API_URLto production backend URL - Static hosting: Deploy
dist/to Vercel, Netlify, or Cloudflare Pages - API proxy: Configure Vite proxy in production or use Nginx
- Cost monitoring: Track OpenAI API usage on dashboard
- Model upgrade: Switch to
gpt-4o-transcribefor higher quality (more expensive) - Diarization: Use
gpt-4o-transcribe-diarizefor speaker labels - Streaming: Enable
stream=Truefor real-time transcription - Prompting: Add custom prompts to improve accuracy for specific domains
| Component | Link |
|---|---|
| OpenAI Transcription | Speech to Text Docs |
| GPT-4o-mini | Model Docs |
| TanStack Query | React Query Docs |
| SQLite FTS5 | Full-Text Search |
| Vite | Getting Started |
| Tailwind CSS | Vite Setup |
| Flask | Quickstart |
| uv | Installation |
- β Vite frontend with Tailwind for recording/uploading audio
- β OpenAI API transcription (gpt-4o-mini-transcribe)
- β AI-powered categorization (GPT-4o-mini)
- β SQLite database storage with FTS5 search
- β 11 REST API endpoints for full CRUD
- β Split-pane layout with folder navigation
- β TanStack Query for state management
- β Keyboard navigation and accessibility
- β CORS configured for local dev
- β Deployment-ready (no GPU required)
- Speaker diarization (gpt-4o-transcribe-diarize)
- Streaming transcription with WebSocket
- Note editing UI with inline updates
- Bulk operations (move, delete, export)
- Export as Markdown/PDF
- User authentication & multi-user support
- PostgreSQL for production
- Docker Compose setup
- CI/CD pipeline (GitHub Actions)
- Real-time collaboration
This is a proof-of-concept template.
Built with β€οΈ for easy deployment