Ghost Pepper: Local Speech-to-Text for macOS Users

Every dictation app on macOS wants to send your voice to the cloud. Ghost Pepper refuses.

Ghost Pepper is an open-source, fully local speech-to-text application for macOS that processes everything — transcription, filler-word cleanup, meeting summaries — directly on your Apple Silicon chip. No cloud APIs. No data leaving your machine. No subscription fees. It just hit v2.1.2 with over a thousand GitHub stars, and it's the most privacy-respecting voice tool available on the Mac today.

Hold, Speak, Release, Done

The core interaction is dead simple: hold the Control key to record, release to transcribe. The transcribed text is automatically pasted into whatever text field has focus. No app switching, no copy-paste dance, no waiting for a server round trip.

Behind the scenes, Ghost Pepper runs a two-stage pipeline:

Speech recognition via WhisperKit converts audio to raw text on-device
LLM cleanup via a local Qwen 3.5 model removes filler words ("um," "uh," "like") and corrects self-interruptions

Both stages execute entirely on your Mac's Neural Engine and GPU. The cleanup step typically takes 1–2 seconds with the default 0.8B parameter model.

Pick Your Model Stack

Ghost Pepper doesn't lock you into a single transcription engine. You choose based on your speed, accuracy, and language needs:

Model	Size	Best For
Whisper tiny.en	~75 MB	Fastest option, English only
Whisper small.en	~466 MB	Default — best English accuracy
Whisper small	~466 MB	Multilingual support
Parakeet v3	~1.4 GB	25 languages via FluidAudio
Qwen3-ASR 0.6B	~900 MB	50+ languages (macOS 15+)

For text cleanup, three Qwen 3.5 variants are available: the 0.8B model (~535 MB, 1–2s latency), the 2B model (~1.3 GB, 4–5s), and the 4B model (~2.8 GB, 5–7s). Models download automatically from Hugging Face on first use and are cached locally.

Meeting Transcription That Stays on Your Machine

Beyond quick dictation, Ghost Pepper captures entire calls — recording audio while generating transcripts, notes, and AI-powered summaries saved as markdown files. This makes it a genuine alternative to cloud-based meeting transcription services like Otter.ai or Fireflies, minus the privacy trade-off.

The app uses AVAudioEngine for microphone input and ScreenCaptureKit for system audio capture, so it can transcribe both sides of a video call without routing audio through a third-party server.

Privacy You Can Actually Verify

Most apps claim to be private. Ghost Pepper ships a PRIVACY_AUDIT.md file in its repository documenting exactly which features were audited and confirmed as local-only. The audit covers speech-to-text, text cleanup, audio recording, meeting transcription, summary generation, OCR (via Apple Vision framework), and file storage.

Transcriptions aren't written to disk by default. Debug logs exist only in memory during runtime.

There are optional cloud integrations — Zo AI chat, Trello, and Granola — but they require user-supplied API keys and remain disabled by default. The core transcription pipeline is air-gapped from the internet.

For enterprise deployment, MDM administrators can pre-approve accessibility permissions using PPPC payloads with bundle ID com.github.matthartman.ghostpepper and Team ID BBVMGXR9AY.

The Competitive Landscape Is Crowded — But Ghost Pepper Stands Out

The macOS dictation space is saturated. As one Hacker News commenter put it: "This thread is a support group for people who have each independently built the same macOS speech-to-text app." Commercial options like SuperWhisper and WisprFlow exist, plus Apple's built-in dictation has improved significantly.

Ghost Pepper's differentiator is the combination of full local processing, open-source transparency, and zero cost. SuperWhisper and WisprFlow charge subscription fees. Apple's dictation lacks the LLM cleanup stage that removes filler words. And none of the commercial options ship a verifiable privacy audit.

The project isn't without rough edges. Users have reported occasional LLM cleanup misfires — the model sometimes reinterprets context rather than cleaning it — and the transcription isn't streamed live (you speak, then wait for results). Developer Matt Hartman has been actively addressing feedback, fixing a microphone permission bug shortly after it was reported.

System Requirements

macOS 14.0 (Sonoma) or later
Apple Silicon (M1 or newer) — Intel Macs are not supported
Microphone and Accessibility permissions
~540 MB minimum disk space for default models

Installation is straightforward: download the DMG from GitHub releases, drag to Applications, grant permissions. You can also build from source via Xcode — the codebase is 97.8% Swift under the MIT license.

The Bottom Line

Ghost Pepper solves a real problem that most voice tools ignore: how to get fast, accurate dictation and meeting transcription without surrendering your audio to someone else's servers. It won't match the polish of commercial alternatives, and it requires Apple Silicon. But for developers, privacy-conscious professionals, and anyone tired of paying subscriptions for basic dictation, it's the most honest option on macOS right now. The code is open, the audit is published, and your voice stays on your machine.

Ghost Pepper: 100% Local Speech-to-Text for macOS

Hold, Speak, Release, Done

Pick Your Model Stack

Meeting Transcription That Stays on Your Machine

Privacy You Can Actually Verify

The Competitive Landscape Is Crowded — But Ghost Pepper Stands Out

System Requirements

The Bottom Line

More in Open Source

Kanwas: The Open-Source AI Workspace That Hit #1 on Product Hunt

Understand-Anything: The 37K-Star Knowledge Graph for Your Codebase

Emdash: The Open-Source IDE Built to Run 22 Coding Agents in Parallel