A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
-
Updated
Oct 2, 2025 - Python
A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
A ComfyUI custom node integration for multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, Cozy Voice 3, Step Audio EditX, IndexTTS-2, Chatterbox (classic and multilingual 23-lang), F5-TTS, Higgs Audio 2 and Microsoft VibeVoice with unlimited text length, SRT timing, Character support, and many audio tools
Soundstorm is a cutting-edge AI-powered audio manipulation application designed to provide a rich yet simplified experience for sound designers, algorithmic composers, and experimental audio enthusiasts. From sample pack creation and algorithmic composition to AI text-to-audio and onscreen ChatGPT, Soundstorm is a sonic powerhouse.
Real-Time Deepfake Pipeline
Music Generation Using Deep Learning🎶🎵
AI Voice Agents: Exploring the Next Generation of Human-Machine Interaction! 🎙️🤖🎧
AudioInsight is a web application that processes audio, generates transcriptions, and allows users to ask questions about the related audio.
An approach to Andrej Karpathy's LLM challenge, as outlined here: https://twitter.com/karpathy/status/1760740503614836917
Maya Voice AI is an open-source project that demonstrates the Maya1 model, capable of generating realistic voice audio from text input with rich emotional and descriptive control. This repository provides a demo for text-to-speech synthesis using advanced language models and the SNAC codec, focusing on high-quality audio at 24kHz.
Professional Yocto BSP Layer for Dynamic Devices Edge Computing Platforms - AI Audio Processing, E-Ink Displays, Power Management, Wireless Connectivity, i.MX8MM/i.MX93 Support
AI Audio Framework 🎵
A GPU-accelerated Python application that converts PDF and TXT documents into high-quality MP4 audio files using WhisperSpeech technology.
A project attempting to generate and extract features from music to make comparisons with popular artists, and examine where and with what demographics those artists are popular in order to craft a DIY marketing solution for aspiring artists.
Acoustic Space Analyzer AI Pro is a professional acoustic analysis tool that leverages artificial intelligence to generate optimized DSP processing chains for any acoustic environment. This innovative application combines real-time spectral analysis, 3D spatial scanning, and AI-powered audio processing to deliver precise acoustic corrections.
Open source AI speech generation solution
ComfyUI custom nodes for the Dia2 TTS model — generate speech, timestamps, and captions directly inside ComfyUI.
🎤 Generate TTS audio and captions easily within ComfyUI, supporting multiple speakers and various caption formats for flexible content creation.
This repository implements Unsupervised Domain Adaptation using Gradient Reversal Layer with PaSST feature extractors for cross-device acoustic scene classification on DCASE TAU 2020 dataset.
Add a description, image, and links to the ai-audio topic page so that developers can more easily learn about it.
To associate your repository with the ai-audio topic, visit your repo's landing page and select "manage topics."