Skip to main content

How Dictation Works

TalkWriter transforms your voice into polished, ready-to-use text through a multi-stage pipeline. This page explains each step so you understand exactly what happens between pressing the Fn key and seeing text appear.


The Dictation Pipeline

When you dictate, your voice passes through four stages:

🎤 Voice Input → 📝 Speech-to-Text → ✨ AI Polish → 📋 Paste

Each stage builds on the previous one. Here is what happens at each step:


Stage 1: Voice Input

What happens: Your microphone captures your voice and TalkWriter streams the audio data to the cloud in real time.

  • Your Mac's built-in microphone, an external USB mic, or Bluetooth headset captures audio.
  • TalkWriter streams audio as you speak. It does not wait until you finish.
  • The pill overlay shows an animated waveform to confirm audio is being detected.
tip

For the best results, speak clearly and keep your microphone 6-12 inches from your mouth. See Microphone Best Practices for detailed setup tips.


Stage 2: Speech-to-Text (Soniox STT)

What happens: A professional-grade speech recognition engine (Soniox) converts your audio into raw text.

  • Soniox processes your audio stream in real time with low latency.
  • It supports 100+ languages and can handle accents, fast speech, and technical vocabulary.
  • The raw output is unformatted: no punctuation, no capitalization corrections, and filler words are included.

Example raw output:

hey um i wanted to follow up on our meeting from yesterday i think the project timeline looks good but uh we might need to push the design review back a week


Stage 3: AI Polish

What happens: TalkWriter's AI engine cleans up the raw transcription and produces natural, well-formatted text.

AI Polish performs these transformations:

TransformationBeforeAfter
Remove filler words"um", "uh", "like", "you know"Removed
Add punctuation"hello how are you""Hello, how are you?"
Fix capitalization"i went to new york""I went to New York"
Format numbers"twenty five dollars""$25"
Clean sentence structure"so basically the thing is that"Direct phrasing

Example polished output:

Hey, I wanted to follow up on our meeting from yesterday. I think the project timeline looks good, but we might need to push the design review back a week.

Pro Feature

TalkTone adds an extra layer after AI Polish. If you have Pro, your text is rewritten to match a selected writing style (Professional, Casual, Academic, etc.). Upgrade now →


Stage 4: Paste

What happens: The polished text is inserted at your cursor position in whatever app is active.

  • TalkWriter simulates a keyboard paste action using macOS Accessibility.
  • Text appears wherever your cursor was when you started dictating.
  • The pill overlay briefly shows a checkmark to confirm the paste.

Pipeline Summary

StageEngineHappensSpeed
Voice InputYour microphoneLocally on your MacInstant
Speech-to-TextSoniox (cloud)Real-time streaming~200ms latency
AI PolishTalkWriter AI (cloud)After speech ends~500ms-1s
PastemacOS AccessibilityLocally on your MacInstant
note

Total time from releasing the Fn key to seeing text: typically under 2 seconds for short dictations. Longer passages may take slightly more time for AI processing.


Frequently Asked Questions

Can I skip AI Polish and get raw transcription? Yes. Toggle AI Polish off in Settings > AI Polish. You will get the unformatted Soniox output directly.

Is my audio stored on the server? Audio is streamed for real-time processing and is not permanently stored. See our privacy policy for details.

Why does TalkWriter need the internet? Both the speech-to-text engine (Soniox) and AI Polish run in the cloud. An internet connection is required for all dictation.

What happens if my internet drops mid-dictation? TalkWriter will show an error on the pill overlay. Any audio captured before the disconnection may still be processed, but results are not guaranteed.


Was this helpful? Let us know at support@talkwriter.ai