Whisper File Transcriber

What it is

A desktop application that runs OpenAI's Whisper speech recognition model locally on Windows. Users select an audio file, choose a Whisper model size and CPU/GPU device, and get a transcript back, with no API call, no upload, no recurring cost.

Why I built it

I wanted high-quality transcription for personal use without sending audio to a paid API. Running Whisper locally is a solved problem at the command line, but the friction is non-trivial: model downloads, CUDA setup, CLI invocation. I built the GUI so the friction was a single click for me and anyone I shared the binary with.

How it works

Frontend: PyQt6 GUI: file picker, model selector (tiny / base / small / medium / large), CPU/GPU toggle, progress indicator
Backend: OpenAI Whisper running locally via PyTorch, CUDA acceleration when available
Output: raw transcript view, optional Markdown conversion with live preview, export to .txt or .md
Distribution: PyInstaller bundles Python + Whisper + PyTorch into a single Windows executable that can run on any Windows machine without Python installed

What I learned

Packaging Python desktop apps with native dependencies is its own discipline. PyInstaller spec files, hidden imports, and the dance of getting CUDA libraries into the bundle.
Larger Whisper models are worth it when you have a GPU. The accuracy lift on noisy audio is significant; the latency hit is acceptable when transcription isn't real-time.

What it is

Why I built it

How it works

What I learned

Links