I watched a 60-minute tech talk. I had the key insights in 34 seconds.

Last week I discovered a CLI tool that changed how I consume video content. It extracts transcripts from YouTube, podcasts, and web pages — then summarizes them using AI.

I watched a 60-minute tech talk. I had the key insights in 34 seconds.

Last week I discovered a CLI tool that changed how I consume video content. It extracts transcripts from YouTube, podcasts, and web pages — then summarizes them using AI.

The tool is called Summarize. It's open source, runs locally, and works with any OpenAI-compatible model.

Here's what happened when I tested it.


A 60-minute AI lecture in 34 seconds

Andrej Karpathy's "Intro to Large Language Models" is a must-watch for anyone in tech. It's also an hour long. Not everyone has that time.

I ran:

summarize "https://youtube.com/watch?v=zjkBMFhNj_g" --length medium

34 seconds later:

This talk explains what a large language model is, how one is made, and why they behave the way they do. At core an LLM is a neural network trained to predict the next token. Pre-training compresses large swaths of web text into weights using massive GPU clusters (~6,000 GPUs for ~12 days, ≈$2M), then fine-tuning on high-quality human Q&A examples converts a generic internet text sampler into a helpful assistant.

The presenter surveys current capabilities: predictable scaling laws, richer tool use, multimodality, longer deliberation modes, and task customization. He also covers security challenges: jailbreaks, prompt-injection, and data-poisoning attacks.

That's the entire hour distilled into something I can read in 2 minutes. The full summary included specific technical details, timestamps, and even quoted phrases from the talk.


A cooking video turned into a complete recipe

Someone shared a Beef Bourguignon video. I don't want to watch 10 minutes of cooking footage. I want the recipe.

summarize "https://youtu.be/fVvYTWAHoBQ" --length xl

37 seconds. I got:

  • All three cuts of beef used (short rib, shin, chuck) and why each matters
  • The full ingredient list
  • Step-by-step technique with cooking times
  • Pro tips like "don't over-flour" and "take time for caramelization"

I reformatted it into a proper recipe note. Total time from video URL to saved recipe: under 2 minutes.


Staying current with tech without watching everything

New JavaScript runtime dropped? Framework update? Conference talk everyone's discussing?

I ran Summarize on a Bun 1.0 video:

summarize "https://youtube.com/watch?v=dWqNgzZwVJQ" --length medium

33 seconds:

Bun 1.0 is a fast, all-in-one JavaScript runtime combining a runtime, bundler, test runner and package manager while staying Node-compatible. It uses JavaScriptCore instead of V8 for faster startup times. Key claims: TypeScript hello-world runs faster, package manager up to 25× faster than npm, bundled SQLite engine, zero-config TypeScript/JSX.

Now I know what Bun is about. If I want to dig deeper, I can watch the video. But for a quick assessment? This is enough.


What it actually does

Summarize works by:

  1. Extracting transcripts — YouTube captions, podcast RSS feeds, or Whisper transcription for audio/video without captions
  2. Sending to an LLM — OpenAI, Anthropic, Google, or local models
  3. Returning structured output — with timestamps, word counts, and cost estimates

The CLI shows you exactly what happened:

34s · 59m 48s YouTube · 12k words · $0.0076 · openai/gpt-5-mini

That's 34 seconds processing time, 59-minute source video, 12,000 words transcribed, $0.007 cost.


When this is useful

  • Research — quickly scan talks, interviews, podcasts for relevant content
  • Learning — get the key points before deciding to watch in full
  • Content creation — extract recipes, tutorials, how-tos from video format
  • Staying current — keep up with tech news without the time investment
  • Accessibility — turn audio/video content into readable text

When to still watch the video

Summarize doesn't replace watching. Some things need the full experience:

  • Entertainment (the summary of "Never Gonna Give You Up" was accurate but missed the point)
  • Visual demonstrations where seeing matters
  • Content where tone and delivery are important
  • Anything you genuinely want to enjoy

This is a tool for extraction, not replacement.


Two ways to use it: CLI or browser extension

The tool comes in two flavors from the same repo.

Option 1: CLI (what I use)

# Install
npm i -g @steipete/summarize

# Set your OpenAI key
export OPENAI_API_KEY="your-key"

# Run
summarize "https://youtube.com/watch?v=..." --length medium

Length options: short (900 chars), medium (1,800), long (~4,200), xl, xxl.

Add --extract to get just the transcript without summarization.

Option 2: Chrome extension (for non-terminal people)

There's also a Chrome Side Panel extension (and Firefox) that adds a one-click summarize button to your browser.

Features:

  • Summarize current tab with one click
  • Streaming markdown output in a sidebar
  • YouTube slide extraction with OCR
  • Chat interface for follow-up questions

The extension talks to a local daemon (installed via summarize daemon install) so your API keys stay on your machine. Same engine, different interface.


The workflow change

Before: bookmark video, forget about it, never watch it.

After: run summarize, get the gist in 30 seconds, save notes if useful, move on.

I'm processing more content with less time. Not everything needs deep attention. Some things just need extraction.


The tool is open source: github.com/steipete/summarize