How Dictation Works
TalkWriter transforms your voice into polished, ready-to-use text through a multi-stage pipeline. This page explains each step so you understand exactly what happens between pressing the Fn key and seeing text appear.
The Dictation Pipeline
When you dictate, your voice passes through four stages:
🎤 Voice Input → 📝 Speech-to-Text → ✨ AI Polish → 📋 Paste
Each stage builds on the previous one. Here is what happens at each step:
Stage 1: Voice Input
What happens: Your microphone captures your voice and TalkWriter streams the audio data to the cloud in real time.
- Your Mac's built-in microphone, an external USB mic, or Bluetooth headset captures audio.
- TalkWriter streams audio as you speak. It does not wait until you finish.
- The pill overlay shows an animated waveform to confirm audio is being detected.
For the best results, speak clearly and keep your microphone 6-12 inches from your mouth. See Microphone Best Practices for detailed setup tips.
Stage 2: Speech-to-Text (Soniox STT)
What happens: A professional-grade speech recognition engine (Soniox) converts your audio into raw text.
- Soniox processes your audio stream in real time with low latency.
- It supports 100+ languages and can handle accents, fast speech, and technical vocabulary.
- The raw output is unformatted: no punctuation, no capitalization corrections, and filler words are included.
Example raw output:
hey um i wanted to follow up on our meeting from yesterday i think the project timeline looks good but uh we might need to push the design review back a week
Stage 3: AI Polish
What happens: TalkWriter's AI engine cleans up the raw transcription and produces natural, well-formatted text.
AI Polish performs these transformations:
| Transformation | Before | After |
|---|---|---|
| Remove filler words | "um", "uh", "like", "you know" | Removed |
| Add punctuation | "hello how are you" | "Hello, how are you?" |
| Fix capitalization | "i went to new york" | "I went to New York" |
| Format numbers | "twenty five dollars" | "$25" |
| Clean sentence structure | "so basically the thing is that" | Direct phrasing |
Example polished output:
Hey, I wanted to follow up on our meeting from yesterday. I think the project timeline looks good, but we might need to push the design review back a week.
TalkTone adds an extra layer after AI Polish. If you have Pro, your text is rewritten to match a selected writing style (Professional, Casual, Academic, etc.). Upgrade now →
Stage 4: Paste
What happens: The polished text is inserted at your cursor position in whatever app is active.
- TalkWriter simulates a keyboard paste action using macOS Accessibility.
- Text appears wherever your cursor was when you started dictating.
- The pill overlay briefly shows a checkmark to confirm the paste.
Pipeline Summary
| Stage | Engine | Happens | Speed |
|---|---|---|---|
| Voice Input | Your microphone | Locally on your Mac | Instant |
| Speech-to-Text | Soniox (cloud) | Real-time streaming | ~200ms latency |
| AI Polish | TalkWriter AI (cloud) | After speech ends | ~500ms-1s |
| Paste | macOS Accessibility | Locally on your Mac | Instant |
Total time from releasing the Fn key to seeing text: typically under 2 seconds for short dictations. Longer passages may take slightly more time for AI processing.
Frequently Asked Questions
Can I skip AI Polish and get raw transcription? Yes. Toggle AI Polish off in Settings > AI Polish. You will get the unformatted Soniox output directly.
Is my audio stored on the server? Audio is streamed for real-time processing and is not permanently stored. See our privacy policy for details.
Why does TalkWriter need the internet? Both the speech-to-text engine (Soniox) and AI Polish run in the cloud. An internet connection is required for all dictation.
What happens if my internet drops mid-dictation? TalkWriter will show an error on the pill overlay. Any audio captured before the disconnection may still be processed, but results are not guaranteed.
Was this helpful? Let us know at support@talkwriter.ai