AudioTTo

Audio Transcription, Slide Integration & LaTeX Notes Generation

Transform your audio recordings into structured, professional LaTeX notes instantly.

📖 Overview

AudioTTo is a powerful Python application designed to streamline the process of creating study notes. It takes audio recordings (lectures, meetings, etc.) and optionally PDF slides, then uses advanced AI to generate comprehensive LaTeX documents.

✨ Key Features

🎙️ Local Transcription: Uses Faster-Whisper for fast, accurate, and private audio transcription.
✂️ Efficient Processing: Automatically chunks audio for parallel processing, maximizing CPU usage.
🧠 AI-Powered Notes: Leverages Google Gemini AI to synthesize transcripts into structured LaTeX notes.
🖼️ Visual Integration: Extracts images from PDF slides and embeds them directly into the notes where relevant.
🚀 Modern UI: Includes a user-friendly web interface for easy drag-and-drop operation.

📸 App Screenshots

🛠️ Prerequisites

Before you begin, ensure you have the following:

Python 3.12 (recommended) or higher (only if running from source).
A LaTeX Distribution installed and added to your PATH (required for PDF compilation). You can download it manually or use the included helper scripts:
- Windows: MiKTeX (Recommended) or TeX Live
  - Alternative: Run Install_MiKTeX.bat included in the folder.
- macOS: MacTeX
  - Alternative: Run install_deps_mac.sh included in the folder (requires Homebrew).
- Linux: Tex Live
  - Alternative: Run install_deps_linux.sh included in the folder or run sudo apt install texlive.
A Google Gemini API Key. You can get one from Google AI Studio.

📦 Installation (Executable)

If you downloaded the standalone executable:

Download the latest version from the Releases page.
Prerequisites: You still need a working LaTeX distribution installed (see Prerequisites above).
Run:
- Windows: Double-click AudioTTo.exe.
- macOS: Double-click AudioTTo.app. Note: If you see a security warning, go to System Settings > Privacy & Security and allow the app.
- Linux: Open a terminal in the folder and run ./AudioTTo (ensure it has execution permissions: chmod +x AudioTTo).

Note: On Linux, you need to open the browser manually at http://127.0.0.1:8000 or ctrl+click the link in the terminal after you run AudioTTo.

⚙️ Installation (Source Code)

Clone the repository (or download usage files):

git clone https://github.com/Manumarzo/AudioTTo.git
cd AudioTTo

Install dependencies:
```
pip install -r requirements.txt
```

🚀 How to Use

AudioTTo provides both a modern Web GUI and a classic CLI.

🖥️ Option 1: Web Interface (Recommended)

The easiest way to use AudioTTo.

Launch the application:
```
python gui_app.py
```
Interact: A window will open automatically (or go to http://localhost:8000).
Configure: Click the Settings (⚙️) button to enter your Gemini API Key.
Process:
- Drag & drop your Audio file.
- (Optional) Drag & drop your Slides (PDF).
- Click Start Processing.

💻 Option 2: Command Line Interface (CLI)

For automation or headless environments.

Set your API Key first: Create a file named .env in the root directory of the project. Open it with a text editor and add your API Key:

GEMINI_API_KEY = your_actual_api_key_here

Run the script:

# Basic transcription
python AudioTTo.py lecture.wav

# With specific threads
python AudioTTo.py lecture.wav --threads 4

# With slides
python AudioTTo.py lecture.wav --slides slides.pdf

# With specific slide pages
python AudioTTo.py lecture.wav --slides slides.pdf --pages 1-15

# With slides and specific threads
python AudioTTo.py lecture.wav --slides slides.pdf --threads 4

📂 Output Structure

All generated files are organized in the output/ directory:

output/
└── [Audio_Filename]/
    ├── [Audio_Filename]_trascrizione.txt  # Raw text transcript
    ├── [Audio_Filename]_appunti.tex       # Generated LaTeX source
    └── [Audio_Filename]_appunti.pdf       # Final compiled PDF

🧹 Intermediate files (chunks, noisy audio, logs) are automatically cleaned up.

🤝 Contributing

Contributions are welcome! Feel free to open issues or submit pull requests to improve AudioTTo.

🌟 Star History

☕ Support the Project

If you find AudioTTo useful and want to support its development, consider making a small donation! Your support helps keep the project alive and improving.

📄 License

This project is licensed under the MIT License.

Developed with ❤️ by Manumarzo

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.github/workflows		.github/workflows
logo		logo
web		web
.gitignore		.gitignore
AudioTTo.py		AudioTTo.py
LICENSE		LICENSE
README.md		README.md
build.spec		build.spec
gui_app.py		gui_app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AudioTTo

📖 Overview

✨ Key Features

📸 App Screenshots

🛠️ Prerequisites

📦 Installation (Executable)

⚙️ Installation (Source Code)

🚀 How to Use

🖥️ Option 1: Web Interface (Recommended)

💻 Option 2: Command Line Interface (CLI)

📂 Output Structure

🤝 Contributing

🌟 Star History

☕ Support the Project

📄 License

About

Uh oh!

Releases 2

Packages

Languages

License

Manumarzo/AudioTTo

Folders and files

Latest commit

History

Repository files navigation

AudioTTo

📖 Overview

✨ Key Features

📸 App Screenshots

🛠️ Prerequisites

📦 Installation (Executable)

⚙️ Installation (Source Code)

🚀 How to Use

🖥️ Option 1: Web Interface (Recommended)

💻 Option 2: Command Line Interface (CLI)

📂 Output Structure

🤝 Contributing

🌟 Star History

☕ Support the Project

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages