Extract and analyze bank or payment transaction data from PDF statements — an all-in-one CLI tool.
The Finance Parser reads PDFs (GPay, Canara Bank, etc.), extracts structured transaction details, and exports them to CSV or JSON for easy analysis or integration.
- ⚙️ Multi-bank support (GPay, Canara, and extendable to others)
- 📄 Smart PDF parsing using Camelot / pdfplumber
- 🧩 CLI tool for easy automation
- 🧹 Data normalization & cleaning
- 📊 Exports to CSV and JSON
- 🔒 Fully offline — no external APIs required
finance-parser/
├── src/
│ └── finance_parser/
│ ├── __init__.py
│ ├── __main__.py # CLI entry point
│ ├── main.py # Core logic
│ ├── canara_parser.py # Bank-specific parsers
│ ├── gpay_parser.py
│ └── utils/ # Shared helpers
├── media/
│ └── sample_statement.pdf # Example input
├── output/
│ ├── transactions.csv
│ └── transactions.json
├── pyproject.toml # Build system & CLI entry config
├── requirements.txt
└── README.md
git clone https://github.com/ibnu-umer/finance-parser.git
cd finance-parserpip install -r requirements.txtPlace your bank statement (e.g., GPay, Canara) inside the media/ folder.
python -m finance_parser --file "media/canara_statement.pdf" --type canara --format csvOr, if installed as a package:
finance-parser --file "media/canara_statement.pdf" --type canara --format csv| Flag | Description | Example |
|---|---|---|
-f, --file |
Path to PDF file | --file media/canara_statement.pdf |
-t, --type |
Bank/statement type (gpay, canara, etc.) |
--type canara |
-o, --output |
Output folder | --output output/ |
--format |
Output format (csv, json, or both) |
--format both |
-p, --privacy |
Processing mode (raw, clean, or masked) |
--privacy clean |
Example:
finance-parser --file media/canara_statement.pdf --type canara --format both --privacy masked- Detects and reads statement text using Camelot or pdfplumber.
- Chooses the correct parser based on
--type. - Extracts structured transaction data (date, description, debit/credit, balance, etc.).
- Applies normalization, masking, or cleaning if requested.
- Outputs the data in CSV or JSON formats.
- camelot-py / pdfplumber – PDF parsing
- pandas – Data manipulation
- argparse – Command-line interface
- re – Regex-based parsing
Install manually if needed:
pip install camelot-py pdfplumber pandasdate– Transaction datetime– Transaction timetype– Credit/Debitpayee– Counterparty / Payee nametxn_id– UPI Transaction IDaccount– Accountamount– Transaction amount
date– Transaction datetime– Transaction timetxn_type– Credit/Debitmode– UPI, NEFT, IMPS, etc.txn_id– Transaction ID (for UPI/IMPS)bank_code– 4-letter bank codepayee– Counterparty / Payee nameupi_id– UPI ID if availableamount– Transaction amountbalance– Account balance after transactioncheque_no– Cheque number if present
Some transaction fields contain sensitive information. These are handled differently depending on the output mode.
-
Canara Bank
upi_idtxn_idcheque_no
-
GPay
txn_id
-
Raw
- All columns are included.
- Sensitive fields are not masked.
-
Masked
- All columns are included.
- Sensitive fields are masked (partial hiding of UPI IDs, txn IDs, cheque numbers).
-
Clean
- All sensitive fields are dropped from the output.
- Only non-sensitive columns remain.
This ensures privacy while maintaining flexibility for analysis.