Transform PDFs into structured data with AI-powered extraction.
PDFlow is a modern, full-stack PDF extraction tool that leverages multimodal AI to intelligently extract and structure content from PDF documents. Whether you need documentation in Markdown, data in JSON, or reports in HTML, PDFlow delivers accurate extraction with web UI, CLI, and AI agent integration.
π View Full Documentation | π Quick Start | π API Reference
- PDF Upload: Intuitive drag-and-drop PDF upload interface
- CLI Support: Headless PDF processing via command-line interface for automation
- Image Conversion: Converts PDF pages to WebP images using pdftocairo
- AI Extraction: Uses Google Gemini 2.0 Flash multimodal AI for intelligent extraction
- Multiple Formats: Export results in Markdown, MDX, JSON, XML, YAML, HTML, or CSV
- Visual Progress: 4-step visual tracker (Upload β Convert β Extract β Done) with real-time updates
- Rich Previews: Rendered Markdown previews with syntax highlighting and formatting
- Threaded Output: View results as they complete with real-time streaming
- Dark Mode: Beautiful dark mode support with localStorage persistence
- Minimal Design: Clean black/white/grey aesthetic inspired by shadcn/ui
- Responsive Design: Mobile-friendly interface with TailwindCSS 4
- Type Safety: Full TypeScript implementation with Zod validation
- Session Storage API Keys: Secure API key management in browser session storage
- π³ Docker Support: Multi-stage builds with production-ready containers
- π€ MCP Server: Model Context Protocol integration for AI agents (Claude, etc.)
- π‘ REST API: Complete API for custom integrations
- π Security Features: File validation, command injection prevention, containerization
- π Full Documentation Site: Interactive documentation with comprehensive guides and examples
- π Comprehensive Logging: Dual-output logging system with file persistence, Docker integration, and advanced filtering
| Layer | Technology |
|---|---|
| Frontend | Next.js 16.0.1, React 19, TailwindCSS 4, Framer Motion |
| Rendering | React Markdown, Rehype Highlight |
| State | Zustand |
| Validation | Zod |
| Templates | Handlebars |
| AI Model | Google Gemini 2.0 Flash Exp (multimodal) |
| AI SDK | Vercel AI SDK |
| Backend | TypeScript + Next.js API Routes |
| PDF Processing | pdftocairo (poppler-utils) |
| Storage | Local filesystem (uploads, outputs) |
- Node.js 20+ (required for Next.js 16)
- npm or yarn
- pdftocairo (poppler-utils)
- Google Gemini API key
Ubuntu/Debian:
sudo apt-get install poppler-utilsmacOS:
brew install popplerWindows: Download and install poppler for Windows and add to PATH.
# Set your API key
export GEMINI_API_KEY="your-api-key-here"
# Build and start with Docker Compose (includes proper user permissions)
USER_ID=$(id -u) GROUP_ID=$(id -g) docker-compose build
USER_ID=$(id -u) GROUP_ID=$(id -g) docker-compose up -d
# Access at http://localhost:3535Note: Building with USER_ID and GROUP_ID ensures the container user matches your host user, preventing permission issues with mounted volumes.
π¦ For complete Docker documentation, see Docker Deployment Guide
- Clone the repository and install dependencies:
git clone https://github.com/traves-theberge/pdflow.git
cd pdflow
npm install- Run the development server:
npm run dev- Open the application:
- Navigate to http://localhost:3001
- Click the settings gear icon in the top right
- Enter your Google Gemini API key
- Click "Save API Key"
Your API key is stored securely in your browser's session storage and is never sent to any server except Google's Gemini API.
- Configure API Key: Enter your Gemini API key in Settings (first time only)
- Select Output Format: Choose from Markdown, MDX, JSON, XML, YAML, HTML, or CSV
- Upload a PDF: Drag and drop or click to select a PDF file
- Processing: The app automatically converts PDF to WebP images and extracts data using AI
- View Results: See extracted content in real-time as pages complete
- Download: Export individual pages or download all pages combined
π For complete web interface guide, see Web Usage Documentation
PDFlow includes a command-line interface for headless PDF processing without the web UI.
Extract PDF to structured data:
npm run pdflow -- extract <pdf-file> [options]Options:
-f, --format <format>: Output format (markdown|json|xml|yaml|html|mdx|csv) [default: markdown]-o, --output <directory>: Output directory [default: ./outputs]-k, --api-key <key>: Gemini API key (or set GEMINI_API_KEY env var)-a, --aggregate: Aggregate all pages into a single file-v, --verbose: Show verbose output
Examples:
# Extract PDF to markdown
npm run pdflow -- extract document.pdf -f markdown -o ./results
# Extract to JSON with aggregation
npm run pdflow -- extract document.pdf -f json -a
# Extract with custom API key
npm run pdflow -- extract document.pdf -k YOUR_API_KEY
# Extract with verbose output
npm run pdflow -- extract document.pdf -vValidate Gemini API key:
npm run pdflow -- validate-key
# or
npm run pdflow -- validate-key -k YOUR_API_KEYGenerate MCP configuration:
# Generate config for VS Code
npm run pdflow -- mcp-config --tool vscode
# Generate for Claude Desktop
npm run pdflow -- mcp-config --tool claude-desktop
# Generate for Cursor
npm run pdflow -- mcp-config --tool cursor
# Generate for Claude Code
npm run pdflow -- mcp-config --tool claude-code
# Use development server (port 3001)
npm run pdflow -- mcp-config --dev
# Use custom URL (e.g., Tailscale)
npm run pdflow -- mcp-config --url http://100.64.0.2:3535CLI Output: The CLI creates a session directory in your output folder with:
- Individual page files (e.g.,
page-1.md,page-2.md) - Metadata files (e.g.,
page-1.meta.json) - Aggregated file (if
-aflag is used, e.g.,full.markdown)
π For complete CLI documentation, see CLI Usage Guide
/src
/app
/api
/upload
route.ts # PDF upload endpoint
/process
route.ts # Processing endpoint with progress
/outputs/[sessionId]/[filename]
route.ts # Output file serving
/settings
/validate-key
route.ts # API key validation
/components
UploadForm.tsx # File upload component
ProgressBar.tsx # Progress tracking with polling
EnhancedOutputViewer.tsx # Real-time threaded output display
Settings.tsx # Settings modal with API key management
/utils
gemini-extractor.ts # Gemini AI extraction logic
aggregator.ts # Output aggregation
prompt-builder.ts # Dynamic prompt generation
/store
useAppStore.ts # Zustand state management
page.tsx # Main page with dark mode
layout.tsx # Root layout
globals.css # Global styles
/cli
pdflow.ts # CLI entry point
pdf-processor.ts # Headless PDF processing logic
/templates
/formats
markdown_format.hbs # Markdown extraction template
mdx_format.hbs # MDX extraction template
json_format.hbs # JSON extraction template
xml_format.hbs # XML extraction template
yaml_format.hbs # YAML extraction template
html_format.hbs # HTML extraction template
csv_format.hbs # CSV extraction template
/scripts
convert-to-webp.sh # PDF to WebP conversion script
/docs
CLI_USAGE.md # Complete CLI documentation
/public
PDFlow_Logo.png # Logo (icon only)
PDFlow_Logo_W_Text.png # Logo with text
/uploads # Temporary upload storage (gitignored)
/outputs # Processed output files (gitignored)
/test-cli-outputs # CLI test outputs (gitignored)
Uploads a PDF file and converts it to WebP images.
Request: multipart/form-data
file: PDF file
Response:
{
"success": true,
"sessionId": "session_1234567890_abc123",
"pageCount": 5,
"message": "Successfully uploaded and converted PDF to 5 pages"
}Starts processing a session or aggregates results.
Request:
{
"sessionId": "session_1234567890_abc123",
"format": "markdown",
"aggregate": true
}Response:
{
"sessionId": "session_1234567890_abc123",
"status": "completed",
"totalPages": 5,
"processedPages": 5,
"aggregate": {
"format": "markdown",
"totalPages": 5,
"createdAt": "2024-01-01T00:00:00.000Z"
}
}Gets processing progress for a session.
Response:
{
"sessionId": "session_1234567890_abc123",
"status": "processing",
"totalPages": 5,
"processedPages": 3,
"processingTime": "15.23s"
}| Variable | Description | Required |
|---|---|---|
GEMINI_API_KEY |
Google Gemini API key (can be set via UI) | Optional* |
PORT |
Server port (defaults to 3000) | No |
NODE_ENV |
Node environment | No |
*The API key can be set in the application UI via Settings. If set in .env.local, it will be used as a fallback.
npm run dev- Start development servernpm run build- Build for productionnpm run start- Start production servernpm run lint- Run ESLint
- New Output Formats: Add to
aggregator.tsand update the format selector - Custom Processing: Modify
gemini-extractor.tsfor different extraction prompts - UI Components: Add to
/src/app/componentsand import inpage.tsx
- Push to GitHub
- Connect repository to Vercel
- Add
GEMINI_API_KEYas environment variable - Deploy
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
EXPOSE 3000
CMD ["npm", "start"]PDFlow v0.5.0+ includes comprehensive logging for debugging and monitoring:
# Watch logs in real-time
./scripts/view-logs.sh --follow
# Show only errors
./scripts/view-logs.sh --errors
# Filter by session ID
./scripts/view-logs.sh --session session_123
# View Docker logs
docker logs -f pdflowLogs are stored in:
- Host:
./logs/pdflow-YYYY-MM-DD.log - Container:
/app/logs/pdflow-YYYY-MM-DD.log - Docker:
docker logs pdflow
Control logging via environment variables:
LOG_LEVEL=info # debug|info|warn|error|critical
ENABLE_FILE_LOGGING=true # Enable file-based logs
LOG_RETENTION_DAYS=7 # Days to keep logsπ For complete logging documentation, see docs/LOGGING.md
-
"pdftocairo not found"
- Install poppler-utils (see prerequisites)
-
"Gemini API key not found"
- Check
.env.localfile exists and contains valid API key
- Check
-
"PDF conversion failed"
- Ensure PDF is not password-protected
- Check file size limits
- Check logs:
./scripts/view-logs.sh --errors
-
"Processing stuck at 0%"
- Check browser console for errors
- Verify API endpoints are responding
- Review logs:
./scripts/view-logs.sh --follow
-
"Script exited with code 1"
- Check detailed error logs:
grep "Script failed" logs/pdflow-*.log - Verify ImageMagick and poppler-utils are installed
- Review stderr output in logs for specific error messages
- Check detailed error logs:
# Find errors in today's logs
./scripts/view-logs.sh --today --errors
# Search for specific errors
grep "ERROR" logs/pdflow-*.log
# View session timeline
grep "session_YOUR_SESSION_ID" logs/pdflow-*.log
# Check Docker logs
docker logs --tail 100 pdflowMIT License - see LICENSE file for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
For issues and questions:
- Open an issue on GitHub
- Check the troubleshooting section above
