🧾 Finance Parser

Extract and analyze bank or payment transaction data from PDF statements — an all-in-one CLI tool.
The Finance Parser reads PDFs (GPay, Canara Bank, etc.), extracts structured transaction details, and exports them to CSV or JSON for easy analysis or integration.

🚀 Features

⚙️ Multi-bank support (GPay, Canara, and extendable to others)
📄 Smart PDF parsing using Camelot / pdfplumber
🧩 CLI tool for easy automation
🧹 Data normalization & cleaning
📊 Exports to CSV and JSON
🔒 Fully offline — no external APIs required

🏗️ Project Structure

finance-parser/
├── src/
│   └── finance_parser/
│       ├── __init__.py
│       ├── __main__.py             # CLI entry point
│       ├── main.py                 # Core logic
│       ├── canara_parser.py        # Bank-specific parsers
│       ├── gpay_parser.py
│       └── utils/                  # Shared helpers
├── media/
│   └── sample_statement.pdf        # Example input
├── output/
│   ├── transactions.csv
│   └── transactions.json
├── pyproject.toml                  # Build system & CLI entry config
├── requirements.txt
└── README.md

⚙️ Setup

1️⃣ Clone the repo

git clone https://github.com/ibnu-umer/finance-parser.git
cd finance-parser

2️⃣ Install dependencies

pip install -r requirements.txt

3️⃣ Add your statement PDF

Place your bank statement (e.g., GPay, Canara) inside the media/ folder.

🧩 Usage

Basic Command

python -m finance_parser --file "media/canara_statement.pdf" --type canara --format csv

Or, if installed as a package:

finance-parser --file "media/canara_statement.pdf" --type canara --format csv

⚙️ CLI Options

Flag	Description	Example
`-f`, `--file`	Path to PDF file	`--file media/canara_statement.pdf`
`-t`, `--type`	Bank/statement type (`gpay`, `canara`, etc.)	`--type canara`
`-o`, `--output`	Output folder	`--output output/`
`--format`	Output format (`csv`, `json`, or `both`)	`--format both`
`-p`, `--privacy`	Processing mode (`raw`, `clean`, or `masked`)	`--privacy clean`

Example:

finance-parser --file media/canara_statement.pdf --type canara --format both --privacy masked

🧠 How It Works

Detects and reads statement text using Camelot or pdfplumber.
Chooses the correct parser based on --type.
Extracts structured transaction data (date, description, debit/credit, balance, etc.).
Applies normalization, masking, or cleaning if requested.
Outputs the data in CSV or JSON formats.

🧰 Dependencies

camelot-py / pdfplumber – PDF parsing
pandas – Data manipulation
argparse – Command-line interface
re – Regex-based parsing

Install manually if needed:

pip install camelot-py pdfplumber pandas

🧼 Output

GPay

date – Transaction date
time – Transaction time
type – Credit/Debit
payee – Counterparty / Payee name
txn_id – UPI Transaction ID
account – Account
amount – Transaction amount

Canara

date – Transaction date
time – Transaction time
txn_type – Credit/Debit
mode – UPI, NEFT, IMPS, etc.
txn_id – Transaction ID (for UPI/IMPS)
bank_code – 4-letter bank code
payee – Counterparty / Payee name
upi_id – UPI ID if available
amount – Transaction amount
balance – Account balance after transaction
cheque_no – Cheque number if present

🥧 Sensitive Fields

Some transaction fields contain sensitive information. These are handled differently depending on the output mode.

Sensitive Fields by Source

Canara Bank
- upi_id
- txn_id
- cheque_no
GPay
- txn_id

Output Modes

Raw
- All columns are included.
- Sensitive fields are not masked.
Masked
- All columns are included.
- Sensitive fields are masked (partial hiding of UPI IDs, txn IDs, cheque numbers).
Clean
- All sensitive fields are dropped from the output.
- Only non-sensitive columns remain.

This ensures privacy while maintaining flexibility for analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
src/finance_parser		src/finance_parser
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧾 Finance Parser

🚀 Features

🏗️ Project Structure

⚙️ Setup

1️⃣ Clone the repo

2️⃣ Install dependencies

3️⃣ Add your statement PDF

🧩 Usage

Basic Command

⚙️ CLI Options

🧠 How It Works

🧰 Dependencies

🧼 Output

GPay

Canara

🥧 Sensitive Fields

Sensitive Fields by Source

Output Modes

About

Uh oh!

Releases

Packages

Languages

ibnu-umer/finance-parser

Folders and files

Latest commit

History

Repository files navigation

🧾 Finance Parser

🚀 Features

🏗️ Project Structure

⚙️ Setup

1️⃣ Clone the repo

2️⃣ Install dependencies

3️⃣ Add your statement PDF

🧩 Usage

Basic Command

⚙️ CLI Options

🧠 How It Works

🧰 Dependencies

🧼 Output

GPay

Canara

🥧 Sensitive Fields

Sensitive Fields by Source

Output Modes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages