Votelli: Free Local Voice-to-Text for Mac (Works Offline)

I dictate a lot.

Quick notes, prompts to an LLM, the first messy draft of an email.

Typing all of that is slower than just talking (I blame bad typing habits picked up 10 or so years ago), so I went looking for a voice-to-text app on the Mac. No subscription, and no audio shipped off to a server somewhere.

Voxtype was the best candidate, and it was great while I was running Omarchy full time. Mac is another story. Voxtype just added Mac support and it’s, to say the least, not all there yet.

So I built my own for Mac. It’s called Votelli. It’s ultra lightweight, local, and it’s free.

What is Votelli?

Votelli is a push-and-hold voice-to-text app for macOS. You hold a key, you talk, you let go, and the text appears wherever your cursor is. That’s it.

It lives in the menu bar, not the Dock. No window to manage, no app to switch to. A small microphone icon sits up in the top right and tells you whether it’s idle, recording, or transcribing. The rest of the time you forget it’s there.

The part I care about most: it runs a small AI model locally. The transcription happens on your Mac, on the GPU, with nothing leaving the machine. Turn off your wifi and it still works. No account, no API key, no monthly bill.

Why not just pay for Whispr Flow?

There are paid dictation apps that do this well. Wispr Flow is the obvious one. They’re polished, and if you live in them all day they may be worth the money.

But for what I needed, paying a subscription to talk to my own computer felt backwards. I wanted three things:

Free, with no account
Local, so my voice isn’t uploaded anywhere
Dead simple, with one key and no menus to think about

Votelli is the version of that I actually wanted to use. It does one job. It does it without asking for a credit card or an internet connection.

How it works

Under the hood it’s Whisper running through whisper.cpp, with the base.en model bundled right into the app. When you hold the key, Votelli captures your mic, hands the audio to the model on the GPU via Metal, and types the result out at your cursor using synthesized key events. That last detail matters more than it sounds: because it types instead of pasting, your clipboard is never touched.

Hold key  →  record mic  →  local Whisper model (Metal GPU)  →  text at your cursor

The model is small on purpose. A “small AI model” is the trade-off that lets this run instantly on your own machine instead of a data center. For everyday dictation (notes, prompts, messages) it’s plenty accurate, and it’s fast because there’s no network round trip.

How Votelli turns held-key speech into typed text locally on a Mac

What I actually use it for

The reason push-to-talk fits so well is that it pairs naturally with talking to LLMs. I hold the key, ramble out a prompt to Claude or ChatGPT, let go, and it’s typed. Speaking a paragraph is much faster than typing one, and you tend to give the model more context when you’re talking instead of pecking at the keyboard.

Outside of that, it’s just good for the small stuff: jotting a note, replying to a message, getting a rough draft down before I clean it up. Anywhere there’s a text field, Votelli works, because it types into whatever app has focus.

The other half of the setup is the mic. I keep a boundary mic flat on my desk so it’s always ready the moment I hold the key, no boom arm to swing over. I wrote about why that form factor beats the alternatives for dictation in my AC-44 desk mic post.

Installing it

Votelli is for the Mac only, and it needs an Apple Silicon Mac on macOS 13 or later. Here’s the short version:

Download the latest Votelli-<version>.dmg from the Releases page.
Open the DMG and drag Votelli.app into your Applications folder.
Double-click it. Because the app is self-signed and not notarized, macOS will warn you that it can’t verify the app. Click Done, then open System Settings → Privacy & Security, scroll to the Security section, and click Open Anyway.
On first launch, Votelli opens its own Preferences window and walks you through the permissions it needs, showing live status for each one as you grant it.

Those permissions are worth a quick word, because a voice app asking for them is fair to question:

Permission	Why it’s needed
Microphone	to hear you while you hold the key
Input Monitoring	to detect the push-to-talk key being held
Accessibility	to type the transcribed text into other apps

Everything else (the Whisper model, the Metal GPU shaders) is bundled in the app, so there’s nothing else to download or configure.

Using it

Click into any text field. Hold your hotkey, speak, release. A small waveform HUD rises and falls while you hold the key so you know it’s listening, and the text types in a beat after you let go.

The default key is Right Option (⌥). If that’s not comfortable, open Preferences → Push-to-talk key, click the button, and press the key you want. It has to be a modifier (⌥ ⌘ ⌃ ⇧ or Fn) so it doesn’t collide with normal typing. You can also flip on Start at login so it’s always ready.

It’s open source

Votelli is MIT licensed, and the code is on GitHub. If you want to read exactly what it does with your microphone, you can. If you want to build it from source instead of using the signed DMG, the README walks through it. And if you just want to dictate without paying for the privilege, download the DMG and you’re done in about two minutes.

That’s really the whole pitch. Free, local, offline, one key. The simplicity is the point.