Skip to content

Conversation

@brittain9
Copy link

Summary

Adds a new Inline Timestamps text format option for speech-to-text output, allowing timestamps to be embedded directly within transcribed text as an alternative to SRT subtitle format.

Closes #222

Screenshots

timestamps timestamp2

Note: the TTS output at the bottom that strips the current timestamp template.

Motivation

When transcribing audio (podcasts, meetings, interviews), I want timestamps inline with text for:

  • Easier human readability vs. SRT's rigid block format
  • LLM post-processing (summarization, Q&A) where inline context is more natural
  • Quick reference without the overhead of subtitle parsing

Changes

New Settings (Settings → Speech to Text)

Setting Description
Text format New "Inline Timestamps" option
Timestamp template Customizable format with {hh}, {mm}, {ss}, {ms}, {text} tokens
Minimum interval Prevents timestamp spam

Example output: [00:05] Hello world [00:12] This is a test

Implementation

  • text_tools.cpp: Core functions for formatting, regex compilation, and stripping timestamps
  • STT Engines: Integrated into Vosk, Whisper, FasterWhisper, April, and DeepSpeech
  • TTS Integration: Auto-strips inline timestamps before speaking (seamless text-to-speech of transcribed content)
  • Bug fix: Corrected text format resetting to "Plain Text" when clearing notepad

Tests

  • Unit tests covering format_segments_inline, compile_inline_timestamp_regex, strip_inline_timestamps
  • Edge cases: interval state across batches, complex template formats, whitespace handling

Testing Done

  • Unit tests pass (text_tools_test)
  • Manual testing with Vosk, April, Whisper engine
  • TTS strips timestamps before speaking
  • Settings UI functional with presets and custom templates

- Implement configurable timestamp templates ({hh}, {mm}, {ss}, {text})
- Support for all STT engines
- Auto-strip timestamps during TTS playback
- Fix bug when clearing text reset the format to Plain Text
@mkiol
Copy link
Owner

mkiol commented Jan 20, 2026

Sorry for late reply. It looks fantastic :)

I'm a bit busy at the moment and need a few more days to look at the code and test it. Thank you for your understanding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add timestamps to speech to text from audio file

2 participants