GitHub - JackRipper01/PDF-to-TXT: 📄 An efficient Python script for automated, recursive PDF to TXT conversion. Intelligently processes nested folders, mirroring the original structure for clean output. Demonstrates practical scripting, file system management, and automation.

📄 Automated PDF to TXT Converter: Efficient Document Text Extraction

This project is a practical Python script designed for automated, recursive conversion of PDF documents into plain text files. It intelligently navigates deeply nested folder structures, mirroring the original directory layout to ensure organized output of the .txt files.

Showcases practical scripting, recursive file system management, and automation skills.

More:

Key Features & Technical Details:

Recursive File System Traversal: Automatically scans through complex and deeply nested directory structures to locate all PDF files, regardless of their location.
Batch PDF to TXT Conversion: Efficiently converts multiple PDF documents into their corresponding plain text files.
Preserves Directory Structure: Recreates the exact original folder hierarchy for the converted .txt files, ensuring intuitive organization and easy retrieval.
Simple Python Utility: A straightforward, single-script solution for automating common document processing tasks and making PDF content accessible.

This utility demonstrates strong command of Python scripting for automation, file system manipulation, and practical data transformation.

Instructions:

Place pdf files not matter if they are inside a folder inside a folder and a lot of folder with them, when running the .py file, will be mirrored the same folder structure but every pdf file would be converted to txt file.

Note: Reference section and Aknowledgement section will be deleted cause this work was implemented to convert pdf papers to txt.

Instructions:

pip install PyPDF2 or pip install -r requirements.txt

Run .py script inside src folder

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📄 Automated PDF to TXT Converter: Efficient Document Text Extraction

More:

Instructions:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

JackRipper01/PDF-to-TXT

Folders and files

Latest commit

History

Repository files navigation

📄 Automated PDF to TXT Converter: Efficient Document Text Extraction

More:

Instructions:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages