I build conversational AI and real-time voice experiences that let people talk to software as naturally as they talk to each other. I focus on connecting LLMs, TTS, and real-time communication to create both technical demos and production-ready products.
- Building streaming ASR β LLM β TTS pipelines with interruptible playback for low-latency, natural dialogues
- Working with human-like voice synthesis, fine-tuning prosody and emotional cues to make speech sound authentic and expressive
- Delivering voice experiences via WebRTC with VAD, barge-in support, and chunked synthesis for responsive interactions
- Engineering production-grade TTS systems: Distributed task scheduling, voice dubbing platforms, and TTS ecosystem products
- Providing TTS-as-a-Service solutions and exploring adjacent voice technology applications
- Designing conversation-first products with memory, tool calling, fallback handling, and action visualization
- Fast iteration cycles: Write scenarios β Build prototypes β Interview users β Validate with data
- Tracking what matters: Conversation success rate, speech naturalness, tool reliability
- π€ Conversational AI & Natural Language Processing
- ποΈ Voice interaction, TTS, and speech technologies
- π‘ Real-time communication & WebRTC
- π§ Large Language Models, AI agents, and tool use
- Low-latency voice processing and synthesis techniques
- Multi-modal interactions that mix speech, UI, and actions
- Evaluation frameworks for dialogue/voice quality
- Cloud-native deployment of real-time AI pipelines
- Voice-first conversational products and demos
- TTS/ASR pipelines integrated with LLM agents
- Real-time communication solutions and WebRTC tooling
- AI-powered customer service and co-pilot scenarios
- TTS engineering projects: Voice dubbing platforms, distributed synthesis systems, and TTS ecosystem solutions
Interested in TTS services or voice AI collaboration? Feel free to reach out!
- π§ Email: chicogong@tencent.com
- π¦ Twitter: https://x.com/chicogongx
- πΌ LinkedIn:
I believe the future of human-computer interaction lies in natural, conversational interfaces that understand not just what we say, but how we say it and why we say it.

