PDFlow

Transform PDFs into structured data with AI-powered extraction.

PDFlow is a modern, full-stack PDF extraction tool that leverages multimodal AI to intelligently extract and structure content from PDF documents. Whether you need documentation in Markdown, data in JSON, or reports in HTML, PDFlow delivers accurate extraction with web UI, CLI, and AI agent integration.

📚 View Full Documentation | 🚀 Quick Start | 🔌 API Reference

Features

Core Features

PDF Upload: Intuitive drag-and-drop PDF upload interface
CLI Support: Headless PDF processing via command-line interface for automation
Image Conversion: Converts PDF pages to WebP images using pdftocairo
AI Extraction: Uses Google Gemini 2.0 Flash multimodal AI for intelligent extraction
Multiple Formats: Export results in Markdown, MDX, JSON, XML, YAML, HTML, or CSV
Visual Progress: 4-step visual tracker (Upload → Convert → Extract → Done) with real-time updates
Rich Previews: Rendered Markdown previews with syntax highlighting and formatting
Threaded Output: View results as they complete with real-time streaming
Dark Mode: Beautiful dark mode support with localStorage persistence
Minimal Design: Clean black/white/grey aesthetic inspired by shadcn/ui
Responsive Design: Mobile-friendly interface with TailwindCSS 4
Type Safety: Full TypeScript implementation with Zod validation
Session Storage API Keys: Secure API key management in browser session storage

New: Deployment & AI Integration

🐳 Docker Support: Multi-stage builds with production-ready containers
🤖 MCP Server: Model Context Protocol integration for AI agents (Claude, etc.)
📡 REST API: Complete API for custom integrations
🔐 Security Features: File validation, command injection prevention, containerization
📚 Full Documentation Site: Interactive documentation with comprehensive guides and examples
📋 Comprehensive Logging: Dual-output logging system with file persistence, Docker integration, and advanced filtering

Tech Stack

Layer	Technology
Frontend	Next.js 16.0.1, React 19, TailwindCSS 4, Framer Motion
Rendering	React Markdown, Rehype Highlight
State	Zustand
Validation	Zod
Templates	Handlebars
AI Model	Google Gemini 2.0 Flash Exp (multimodal)
AI SDK	Vercel AI SDK
Backend	TypeScript + Next.js API Routes
PDF Processing	pdftocairo (poppler-utils)
Storage	Local filesystem (uploads, outputs)

Prerequisites

Node.js 20+ (required for Next.js 16)
npm or yarn
pdftocairo (poppler-utils)
Google Gemini API key

Installing pdftocairo

Ubuntu/Debian:

sudo apt-get install poppler-utils

macOS:

brew install poppler

Windows: Download and install poppler for Windows and add to PATH.

Setup

Option 1: Docker (Recommended)

# Set your API key
export GEMINI_API_KEY="your-api-key-here"

# Build and start with Docker Compose (includes proper user permissions)
USER_ID=$(id -u) GROUP_ID=$(id -g) docker-compose build
USER_ID=$(id -u) GROUP_ID=$(id -g) docker-compose up -d

# Access at http://localhost:3535

Note: Building with USER_ID and GROUP_ID ensures the container user matches your host user, preventing permission issues with mounted volumes.

📦 For complete Docker documentation, see Docker Deployment Guide

Option 2: Local Development

Clone the repository and install dependencies:

git clone https://github.com/traves-theberge/pdflow.git
cd pdflow
npm install

Run the development server:

npm run dev

Open the application:
- Navigate to http://localhost:3001
- Click the settings gear icon in the top right
- Enter your Google Gemini API key
- Click "Save API Key"

Your API key is stored securely in your browser's session storage and is never sent to any server except Google's Gemini API.

Usage

Web Interface

Configure API Key: Enter your Gemini API key in Settings (first time only)
Select Output Format: Choose from Markdown, MDX, JSON, XML, YAML, HTML, or CSV
Upload a PDF: Drag and drop or click to select a PDF file
Processing: The app automatically converts PDF to WebP images and extracts data using AI
View Results: See extracted content in real-time as pages complete
Download: Export individual pages or download all pages combined

📚 For complete web interface guide, see Web Usage Documentation

CLI (Headless Mode)

PDFlow includes a command-line interface for headless PDF processing without the web UI.

Extract PDF to structured data:

npm run pdflow -- extract <pdf-file> [options]

Options:

-f, --format <format>: Output format (markdown|json|xml|yaml|html|mdx|csv) [default: markdown]
-o, --output <directory>: Output directory [default: ./outputs]
-k, --api-key <key>: Gemini API key (or set GEMINI_API_KEY env var)
-a, --aggregate: Aggregate all pages into a single file
-v, --verbose: Show verbose output

Examples:

# Extract PDF to markdown
npm run pdflow -- extract document.pdf -f markdown -o ./results

# Extract to JSON with aggregation
npm run pdflow -- extract document.pdf -f json -a

# Extract with custom API key
npm run pdflow -- extract document.pdf -k YOUR_API_KEY

# Extract with verbose output
npm run pdflow -- extract document.pdf -v

Validate Gemini API key:

npm run pdflow -- validate-key
# or
npm run pdflow -- validate-key -k YOUR_API_KEY

Generate MCP configuration:

# Generate config for VS Code
npm run pdflow -- mcp-config --tool vscode

# Generate for Claude Desktop
npm run pdflow -- mcp-config --tool claude-desktop

# Generate for Cursor
npm run pdflow -- mcp-config --tool cursor

# Generate for Claude Code
npm run pdflow -- mcp-config --tool claude-code

# Use development server (port 3001)
npm run pdflow -- mcp-config --dev

# Use custom URL (e.g., Tailscale)
npm run pdflow -- mcp-config --url http://100.64.0.2:3535

CLI Output: The CLI creates a session directory in your output folder with:

Individual page files (e.g., page-1.md, page-2.md)
Metadata files (e.g., page-1.meta.json)
Aggregated file (if -a flag is used, e.g., full.markdown)

📚 For complete CLI documentation, see CLI Usage Guide

Project Structure

/src
  /app
    /api
      /upload
        route.ts                      # PDF upload endpoint
      /process
        route.ts                      # Processing endpoint with progress
      /outputs/[sessionId]/[filename]
        route.ts                      # Output file serving
      /settings
        /validate-key
          route.ts                    # API key validation
    /components
      UploadForm.tsx                  # File upload component
      ProgressBar.tsx                 # Progress tracking with polling
      EnhancedOutputViewer.tsx        # Real-time threaded output display
      Settings.tsx                    # Settings modal with API key management
    /utils
      gemini-extractor.ts             # Gemini AI extraction logic
      aggregator.ts                   # Output aggregation
      prompt-builder.ts               # Dynamic prompt generation
    /store
      useAppStore.ts                  # Zustand state management
    page.tsx                          # Main page with dark mode
    layout.tsx                        # Root layout
    globals.css                       # Global styles
  /cli
    pdflow.ts                         # CLI entry point
    pdf-processor.ts                  # Headless PDF processing logic
/templates
  /formats
    markdown_format.hbs               # Markdown extraction template
    mdx_format.hbs                    # MDX extraction template
    json_format.hbs                   # JSON extraction template
    xml_format.hbs                    # XML extraction template
    yaml_format.hbs                   # YAML extraction template
    html_format.hbs                   # HTML extraction template
    csv_format.hbs                    # CSV extraction template
/scripts
  convert-to-webp.sh                  # PDF to WebP conversion script
/docs
  CLI_USAGE.md                        # Complete CLI documentation
/public
  PDFlow_Logo.png                     # Logo (icon only)
  PDFlow_Logo_W_Text.png              # Logo with text
/uploads                              # Temporary upload storage (gitignored)
/outputs                              # Processed output files (gitignored)
/test-cli-outputs                     # CLI test outputs (gitignored)

API Endpoints

POST /api/upload

Uploads a PDF file and converts it to WebP images.

Request: multipart/form-data

file: PDF file

Response:

{
  "success": true,
  "sessionId": "session_1234567890_abc123",
  "pageCount": 5,
  "message": "Successfully uploaded and converted PDF to 5 pages"
}

POST /api/process

Starts processing a session or aggregates results.

Request:

{
  "sessionId": "session_1234567890_abc123",
  "format": "markdown",
  "aggregate": true
}

Response:

{
  "sessionId": "session_1234567890_abc123",
  "status": "completed",
  "totalPages": 5,
  "processedPages": 5,
  "aggregate": {
    "format": "markdown",
    "totalPages": 5,
    "createdAt": "2024-01-01T00:00:00.000Z"
  }
}

GET /api/process?sessionId=

Gets processing progress for a session.

Response:

{
  "sessionId": "session_1234567890_abc123",
  "status": "processing",
  "totalPages": 5,
  "processedPages": 3,
  "processingTime": "15.23s"
}

Environment Variables

Variable	Description	Required
`GEMINI_API_KEY`	Google Gemini API key (can be set via UI)	Optional*
`PORT`	Server port (defaults to 3000)	No
`NODE_ENV`	Node environment	No

*The API key can be set in the application UI via Settings. If set in .env.local, it will be used as a fallback.

Development

Available Scripts

npm run dev - Start development server
npm run build - Build for production
npm run start - Start production server
npm run lint - Run ESLint

Adding New Features

New Output Formats: Add to aggregator.ts and update the format selector
Custom Processing: Modify gemini-extractor.ts for different extraction prompts
UI Components: Add to /src/app/components and import in page.tsx

Deployment

Vercel

Push to GitHub
Connect repository to Vercel
Add GEMINI_API_KEY as environment variable
Deploy

Docker

FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
EXPOSE 3000
CMD ["npm", "start"]

Logging & Monitoring

PDFlow v0.5.0+ includes comprehensive logging for debugging and monitoring:

Viewing Logs

# Watch logs in real-time
./scripts/view-logs.sh --follow

# Show only errors
./scripts/view-logs.sh --errors

# Filter by session ID
./scripts/view-logs.sh --session session_123

# View Docker logs
docker logs -f pdflow

Log Files

Logs are stored in:

Host: ./logs/pdflow-YYYY-MM-DD.log
Container: /app/logs/pdflow-YYYY-MM-DD.log
Docker: docker logs pdflow

Configuration

Control logging via environment variables:

LOG_LEVEL=info              # debug|info|warn|error|critical
ENABLE_FILE_LOGGING=true    # Enable file-based logs
LOG_RETENTION_DAYS=7        # Days to keep logs

📋 For complete logging documentation, see docs/LOGGING.md

Troubleshooting

Common Issues

"pdftocairo not found"
- Install poppler-utils (see prerequisites)
"Gemini API key not found"
- Check .env.local file exists and contains valid API key
"PDF conversion failed"
- Ensure PDF is not password-protected
- Check file size limits
- Check logs: ./scripts/view-logs.sh --errors
"Processing stuck at 0%"
- Check browser console for errors
- Verify API endpoints are responding
- Review logs: ./scripts/view-logs.sh --follow
"Script exited with code 1"
- Check detailed error logs: grep "Script failed" logs/pdflow-*.log
- Verify ImageMagick and poppler-utils are installed
- Review stderr output in logs for specific error messages

Debugging with Logs

# Find errors in today's logs
./scripts/view-logs.sh --today --errors

# Search for specific errors
grep "ERROR" logs/pdflow-*.log

# View session timeline
grep "session_YOUR_SESSION_ID" logs/pdflow-*.log

# Check Docker logs
docker logs --tail 100 pdflow

License

MIT License - see LICENSE file for details.

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

Support

For issues and questions:

Open an issue on GitHub
Check the troubleshooting section above

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
logs		logs
public		public
scripts		scripts
src		src
templates		templates
test-cli-outputs		test-cli-outputs
uploads		uploads
.dockerignore		.dockerignore
.env.example		.env.example
.env.local.example		.env.local.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

License

Traves-Theberge/PDFlow

Folders and files

Latest commit

History

Repository files navigation

PDFlow

Features

Core Features

New: Deployment & AI Integration

Tech Stack

Prerequisites

Installing pdftocairo

Setup

Option 1: Docker (Recommended)

Option 2: Local Development

Usage

Web Interface

CLI (Headless Mode)

Project Structure

API Endpoints

POST /api/upload

POST /api/process

GET /api/process?sessionId=

Environment Variables

Development

Available Scripts

Adding New Features

Deployment

Vercel

Docker

Logging & Monitoring

Viewing Logs

Log Files

Configuration

Troubleshooting

Common Issues

Debugging with Logs

License

Contributing

Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages