Introducing Voxtyper: a Chrome and Firefox extension for dictation with automatic punctuation
TLDR: I write for a living, and the dictation tools I tried kept making me edit: they misheard words, made me say "comma" and "colon" out loud, crawled one word at a time, hung and never returned my text, or ignored Firefox. So I built Voxtyper. This is why, and how it works: accurate, sentence-at-a-time transcription that punctuates and capitalizes for you, output that never rewrites your words, a backend with redundancy so you reliably get your text, speed from running on Cloudflare's edge, and controls built for one hand (I map it to a button on my mouse). It runs in Chrome and Firefox on any computer, never stores your audio, and is free to use for 60 minutes a month, no credit card needed.
Why I built it
Every dictation tool I tried let me down in a different way. All I wanted was to talk and have my words appear, accurately, in the browser. In order of how much they cost me, here is what I kept running into:
- Accuracy I could not trust. The browser extensions I tried got words wrong often enough that I was constantly stopping to fix the transcript. If you have to edit every few lines, dictation is not saving you anything.
- No automatic punctuation. Saying "comma," "colon," and "period" out loud, over and over, slowed me down and meant I was narrating punctuation instead of writing.
- Google Docs crawling one word at a time. It transcribed a word, paused, transcribed the next, and still made me speak the punctuation. It was painfully slow.
- Tools that never returned my text. I would speak several sentences, then watch it sit there, waiting and waiting, and never hand the transcript back, so I had to say the whole thing over again.
- Nothing reliable on Firefox, and a Chrome Web Store crowded with extensions that constantly glitched, malfunctioned, and lost my transcriptions.
I built Voxtyper to fix every one of these, starting with accuracy.
Accurate, and it punctuates for you
Accuracy came first, because it is what everything else depends on. Voxtyper takes a whole sentence at a time and works it out in context, rather than guessing word by word. That is what lets it get homophones, names, and numbers right, instead of leaving you to catch the mistakes.
From that same context it punctuates and capitalizes for you, so you speak the way you normally would and never say "comma" out loud. And it drops the text in with the spacing already correct: no missing space, no stray space before a paragraph, none of the cleanup other tools kept forcing on me. The point is to dictate and keep moving, not dictate and then edit.
Your words stay yours
I write for a living, and the one thing I could not compromise on was this: the words have to be mine. I did not want a language model quietly rewriting, paraphrasing, or "improving" what I said. When my work depends on the exact words, "helpful" rewriting is the last thing I want. So Voxtyper transcribes what you actually said, adds punctuation and capitalization, and stops there. It does not rewrite you.
You actually get your text back
The tools that hung and lost my dictation were the ones that made me give up, so reliability is built in. On the backend, our speech-to-text engine runs as multiple instances, so if one instance stalls or fails, your audio is retried on another instead of dying. The result is simple: you get your transcript back, not a spinner that waits forever and returns nothing.
- Long dictations are not cut off. You can talk for a while without the recording quietly giving up partway through.
- If something does fail, you can retry it without saying everything again. A network blip does not cost you the paragraph.
- Silent or empty clips are dropped with no charge, so a stray tap never inserts random punctuation or counts against you.
Fast, by design
To write at the speed I think, dictation cannot lag, so speed is engineered rather than hoped for. The part your browser talks to runs on Cloudflare's edge network, which means the server answering you is physically close to you, often milliseconds away, rather than across the country. Behind that, our engine runs as several instances, and Voxtyper continuously measures how fast and reliable each instance is, so your audio goes to whichever instance is currently quickest and healthiest. A built-in timing system tracks how long each transcription actually takes, so "fast" is something we measure, not assume.
Your browser talks to the nearest Cloudflare edge, which routes your audio to whichever instance of our engine is currently fastest and most reliable.
In my own use, working this way is roughly five times my old writing speed, and about twice as fast as the word-by-word dictation I used before. That is one writer's estimate, mine, not a lab benchmark, but the gain is real enough that I do not type first drafts anymore. Speaking is simply faster than typing in general, too: a Stanford study measured voice input at about three times the speed of a phone keyboard, with 20% fewer errors.
It lands in the field you actually picked
Getting dictated text to land correctly in every kind of web editor, a plain comment box, a rich text editor, even the canvas that Google Docs draws its document on, is harder than it sounds, and it is where a lot of tools quietly fall down. A big share of the work went into exactly that, so your words end up in the field you focused, instead of nowhere.
One tool on any computer, in any browser
I wanted one dictation tool that behaved the same everywhere, no matter the machine or the browser. Voxtyper runs in Chrome and Firefox from one shared codebase, and because the heavy lifting happens on the edge rather than on your computer, a low-end laptop gets the same result as a high-end desktop. I use it on an old MacBook Pro and on a new desktop, and the experience is identical. Install it once per browser and your dictation comes with you.
Because it is a browser extension, there is nothing to install and maintain on your machine, and it updates itself through the browser. You never have to think about keeping the software current the way you would with a desktop app. A desktop version for dictating into native apps is on the roadmap.
How to use it, and how I use it
There are three ways to start and stop dictation, and they all toggle: one press starts recording, the next stops it and sends automatically.
- Ctrl + Space. The default keyboard shortcut, and you can rebind it to whatever you like.
- The on-page button. Tap it with your finger (or click it with your mouse). On a laptop touchscreen that means you can dictate with your thumb, lying in bed, scrolling with one hand, no keyboard at all.
- A button on your mouse. Map the shortcut to a spare mouse button and you control everything with one hand, without leaving the page.
The mouse is how I actually use it, and it is the move that turns Voxtyper into a daily driver. I mapped it to a spare button on my mouse, and because dictation is a toggle, one click starts recording and the next stops and sends. I never touch the keyboard to write a draft: click, talk, click, and the text is in.
I map dictation to a spare button on my mouse, so one click starts recording and the next stops and sends. Ctrl + Space and the on-page button work just as well; the mouse is what makes it effortless.
You can also have each result copied straight to your clipboard, so you can paste it anywhere on your system, not just into the field you dictated into.
A status indicator you can read
The thing that made other tools maddening was not knowing the state: is it recording, did it send, is it stuck, did it fail? So the small button on the page doubles as a status display, and every stage has its own shape and color, so a glance tells you exactly where you are:
- Recording: a red oscilloscope wave. It traces your actual voice, redrawn many times a second, so it moves as you speak and sits as a flat line in silence. That live, analog wave is the unmistakable "it is hearing you right now."
- Transcribing: a yellow square wave. When you stop, the red analog wave hands off to a scrolling yellow digital one, a deliberately different shape and color so you can never confuse "still recording" with "now working." Yellow means it is busy, not stuck.
- Done: a green check. Green is used for nothing else, so a green check means one thing only: your text went in.
- Cancelled: a red X. Press Escape and a red X draws in, instantly telling you it was aborted and nothing was inserted. If a transcription genuinely fails, it tells you with a short message instead of failing silently.
The colors are reserved on purpose, red for live or aborted, yellow for working, green for done, so you are never guessing what happened. It all stays out of the way, too: the animations live inside the little indicator and nothing pops out over the page, the check and X are drawn over the mic so it still reads as the same button, hints are small and dismissible, and you can hide the indicator entirely and drive everything from the keyboard or your mouse button. The first time you use it, it shows you how, instead of leaving you to figure it out the way the other tools left me guessing.
The honest part
Voxtyper is new, so I would rather you judge it by how it behaves than by a star rating it has not earned yet. It is browser-only for now, and the free tier is metered (20 minutes a month, or 60 minutes signed in). Your audio is never stored, it types what you said without changing it, and it is free to try, so you can make up your own mind in about a minute.
Frequently asked questions
What is Voxtyper?
A browser extension for dictation. Click into any web text field, press a shortcut, a mapped mouse button, or the on-page button, speak, and it inserts accurate, punctuated, capitalized text. It works in Chrome and Firefox and does not rewrite your words.
Is Voxtyper free?
Yes. The free tier gives you 20 minutes a month without an account, or 60 minutes a month signed in, no credit card required. If you need more than that, there is a plan for unlimited usage.
Does it work on any computer and browser?
Yes, in Chrome and Firefox from one codebase. Because the heavy work happens on the edge, an old laptop gets the same result as a new desktop.
Does it change or rewrite my words?
No. It transcribes what you said and adds punctuation and capitalization, nothing more. The words stay yours.
Is my audio stored?
No. Your audio is transcribed and then discarded; it is never written to disk or kept, and silent clips are dropped with no charge.
Try it and see if it does for you what it did for me: speak, and get finished text.