AI Tooling · 2026

YouTube Video Transcriber

A modular Python CLI and Streamlit GUI that downloads YouTube videos and turns them into text transcripts and .srt subtitles — with a local Whisper model or the OpenAI API.

Python
Whisper
OpenAI API
yt-dlp
FFmpeg
Streamlit

View repository ↗

Role: Design · Build
Timeline: 2026 · Mar — Jun

The problem

Getting the text out of a YouTube video — a talk, an interview, a lecture — usually means unreliable auto-captions or paying a transcription service per minute. There was no simple tool covering the whole flow that also let you choose between cloud quality and fully offline, cost-free transcription.

The approach

I built it as a modular Python package: yt-dlp handles the download, FFmpeg the audio extraction, and the transcription engine is pluggable — a locally run Whisper model (offline, GPU-accelerated) or the OpenAI Whisper API. Optional dependency groups mean you install only what you use, and both the CLI and the Streamlit GUI sit on the same core.

How it works

01

Fetch

yt-dlp downloads the video or audio-only stream, with format selection, browser-cookie support, and local files as an alternative input.
02

Extract

FFmpeg extracts and converts the audio, automatically re-encoding to MP3 when a file exceeds the API size limit.
03

Transcribe

The pluggable engine runs a lazily-loaded local Whisper model or calls the OpenAI Whisper API — the user picks the trade-off per run.
04

Export

Transcripts are saved as plain text and .srt subtitles; files that already exist are detected and skipped, so nothing is processed twice.

Stack

Language: Python 3.9+
Download: yt-dlp + FFmpeg
Engines: Local Whisper · OpenAI Whisper API
Interface: CLI + Streamlit GUI
Packaging: Modular extras (local / api / gui)

What I learned

Optional dependency groups keep a tool light: video download, local inference and the API client install independently.
Idempotent file handling — skip whatever already exists — turns a slow media pipeline into something you can re-run without thinking.
Handling real-world media (long videos, size limits, odd formats) is where most of the engineering actually lives.

Outcome

One tool covers download, audio extraction, transcription and subtitle generation end to end.
Fully offline transcription with a local Whisper model, or faster cloud runs via the OpenAI API — chosen per run.
Ships as an installable package with a CLI and a Streamlit GUI built on the same core.

Fetch

Extract

Transcribe

Export