Skip to content

genesis-kb/transcription_engine

Repository files navigation

Bitcoin Transcription Engine

A transcription pipeline for Bitcoin conference talks, podcasts, and technical content. Ingests YouTube videos, transcribes audio via multiple STT providers, corrects transcripts with LLM, extracts metadata, generates summaries, and stores everything in PostgreSQL.

Architecture

YouTube Video URL
       |
  [Preprocess] --> Download video, extract audio (FFmpeg)
       |
  [Transcribe] --> STT (Whisper / Deepgram / SmallestAI)
       |
  [Metadata Extraction] --> Gemini LLM (speakers, conference, topics)
       |
  [Correction] --> Gemini LLM (fix ASR errors, technical terms)
       |
  [Summarization] --> Gemini LLM (structured summary)
       |
  [Postprocess] --> Export to Markdown, save to PostgreSQL

Automated Ingestion Pipeline

youtube_channels (DB)
       |
  [ChannelScanner] --> YouTube Data API v3, discover new videos
       |
  [ContentClassifier] --> Gemini LLM, filter technical content
       |
  [IngestionService] --> Queue approved videos for transcription

STT Providers

Provider Type Best For
Whisper Local (OpenAI) Offline, privacy-sensitive
Deepgram Cloud API Fast, accurate, diarization
SmallestAI Cloud API Multi-speaker, emotion detection

LLM Services (Gemini)

All LLM services use google-genai SDK with gemini-3-flash-preview and include retry logic with exponential backoff for 503/429 errors.

Service Purpose Chunk Size
MetadataExtractor Extract speakers, conference, topics from video metadata Single call
CorrectionService Fix ASR errors, technical terminology 5000 chars/chunk
SummarizerService Generate structured summaries 30000 chars/chunk
ContentClassifier Classify videos as technical/non-technical Single call

Tech Stack

  • Backend: FastAPI (Python)
  • Database: AWS RDS PostgreSQL
  • Deployment: AWS EC2 (t3.small)
  • Frontend: React + TypeScript + Vite (separate repo)
  • Frontend Backend: Express.js (reads from RDS)
  • Frontend Hosting: GitHub Pages
  • HTTPS: Cloudflare Tunnel
  • Linting: Ruff

Setup

Prerequisites

  • Python 3.10+
  • FFmpeg
  • uv (recommended) or pip

Installation

# Clone the repo
git clone https://github.com/staru09/transcription_engine.git
cd transcription_engine

# Create venv and install deps
uv venv
uv pip install -r requirements.txt

# Or with pip
python -m venv venv
source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows
pip install -r requirements.txt

Configuration

cp env.example .env

Required environment variables:

Variable Purpose
DATABASE_URL PostgreSQL connection string (local Docker or AWS RDS)
GOOGLE_API_KEY Gemini API for correction, summarization, classification, metadata extraction
YOUTUBE_API_KEY YouTube Data API v3 for channel scanning
DEEPGRAM_API_KEY Deepgram STT (if using Deepgram)
SMALLEST_API_KEY SmallestAI STT (if using SmallestAI)

Optional:

Variable Purpose
TRANSCRIPTION_SERVER_URL Override transcription server URL (default: http://localhost:8000)

Database

You can run PostgreSQL either locally via Docker or against AWS RDS. The app picks whichever DATABASE_URL points at.

Option A — Local PostgreSQL via Docker (no AWS needed):

# Start just the postgres container
docker compose up -d postgres

# Point the app at it (already the default in env.example)
# DATABASE_URL=postgresql://bitcoin:bitcoin@localhost:5432/transcription_engine

# Create the schema
tstbtc db init

Data persists in the postgres_data Docker volume. To wipe it: docker compose down -v.

Option B — AWS RDS (or any remote Postgres):

Set DATABASE_URL in .env to your RDS connection string, then run tstbtc db init once to create tables.

Check connectivity anytime with tstbtc db check.

Pipeline settings

Pipeline settings are in config.ini:

[DEFAULT]
deepgram = True
diarize = True
summarize = False
llm_provider = google
llm_correction_model = gemini-3-flash-preview
llm_summary_model = gemini-3-flash-preview
smallestai = False
classification_model = gemini-3-flash-preview
classification_min_duration = 600
classification_max_duration = 3000

Usage

The project provides both a CLI (tstbtc / transcriber.py) and an HTTP API. The CLI auto-starts the server in the background, so for most workflows you only need the CLI.

CLI (recommended)

Transcribe a YouTube video with Deepgram:

tstbtc transcribe "https://www.youtube.com/watch?v=VIDEO_ID" \
  --deepgram \
  --diarize \
  --markdown \
  --summarize \
  --correct \
  --llm-provider google \
  --loc "tabconf" \
  --username "your_name"

With SmallestAI instead:

tstbtc transcribe "https://www.youtube.com/watch?v=VIDEO_ID" \
  --smallestai \
  --diarize \
  --markdown \
  --summarize \
  --correct \
  --llm-provider google \
  --loc "tabconf" \
  --username "your_name"

With Whisper (local, no cloud STT key required) — omit both --deepgram and --smallestai:

tstbtc transcribe "https://www.youtube.com/watch?v=VIDEO_ID" \
  --markdown \
  --username "your_name"

Other useful commands:

# Server management
tstbtc server start              # start FastAPI server in background
tstbtc server status             # check if running
tstbtc server stop               # stop (required before switching STT provider)
tstbtc server logs --follow      # tail logs live

# Transcribe a local audio file
tstbtc transcribe "/path/to/file.mp3" --deepgram --username "your_name"

# See all flags
tstbtc transcribe --help

Note on switching services: the server caches the transcription service (Deepgram/Whisper/SmallestAI) after the first request. To switch between them, run tstbtc server stop before the next transcription.

Output locations:

What Where
Markdown transcript local_models/<loc>/<slug>.md
Raw STT output + metadata metadata/<loc>/<slug>/
Database row transcripts table (local Postgres or AWS RDS)

HTTP API (advanced)

If you want to call the server directly (e.g. from another service):

# Start the server manually
python -m uvicorn server:app --host 0.0.0.0 --port 8000

# Queue a video
curl -X POST http://localhost:8000/transcription/add_to_queue/ \
  -F "source=https://www.youtube.com/watch?v=VIDEO_ID" \
  -F "loc=tabconf" \
  -F "username=your_username" \
  -F "smallestai=true" \
  -F "diarize=true" \
  -F "markdown=true" \
  -F "correct=true" \
  -F "summarize=true" \
  -F "llm_provider=google"

# Start processing
curl -X POST http://localhost:8000/transcription/start/

# Check queue status
curl http://localhost:8000/transcription/queue/

Channel Scanner (Automated Ingestion)

Scan a YouTube channel for new videos, classify them, and queue for transcription:

# Seed channels
python -m scripts.seed_channels

# Scan, classify, and queue via API
curl -X POST http://localhost:8000/ingestion/scan
curl -X POST http://localhost:8000/ingestion/classify
curl -X POST http://localhost:8000/ingestion/queue

API Endpoints

Transcription

Method Endpoint Description
POST /transcription/add_to_queue/ Add a video to the transcription queue
POST /transcription/start/ Start processing the queue
GET /transcription/queue/ View current queue status
GET /transcription/corrected/ Get corrected transcripts
GET /transcription/summaries/ Get summaries

Database (PostgreSQL)

Method Endpoint Description
GET /transcription/db/transcripts/ All transcripts from DB
GET /transcription/db/transcripts/{id} Single transcript by ID
GET /transcription/db/corrected/ Corrected transcripts from DB
GET /transcription/db/summaries/ Summaries from DB

Ingestion

Method Endpoint Description
POST /ingestion/scan Scan channels for new videos
POST /ingestion/classify Classify pending videos
POST /ingestion/queue Queue approved videos

Project Structure

app/
  config.py              # Settings, env vars, config.ini
  transcript.py          # Transcript data model
  transcription.py       # Pipeline orchestrator
  media_processor.py     # Audio/video download and conversion
  services/
    correction.py        # LLM transcript correction (Gemini)
    summarizer.py        # LLM summarization (Gemini)
    metadata_extractor.py # LLM metadata extraction (Gemini)
    content_classifier.py # LLM content classification (Gemini)
    channel_scanner.py   # YouTube channel scanning
    ingestion_service.py # Automated ingestion pipeline
    database_service.py  # SQLAlchemy ORM for PostgreSQL
    smallestai.py        # SmallestAI STT provider
    deepgram.py          # Deepgram STT provider
routes/
  transcription.py       # Transcription API routes
  ingestion.py           # Ingestion API routes
scripts/
  scan_tabconf.py        # Standalone channel scanner
  generate_audio.py      # TTS audio generation
  seed_channels.py       # Seed YouTube channels in DB
server.py                # FastAPI app entry point
config.ini               # Pipeline configuration

Contributors can join our discord server here

Acknowledgements

This project is a fork of tstbtc, built by the Bitcoin Transcripts team. Their work on creating an open-source transcription pipeline for Bitcoin technical content made this project possible. We've extended the original engine with LLM-powered operations but the core transcription architecture and the vision of making Bitcoin knowledge accessible to everyone comes from their efforts. Thank you to the Bitcoin Transcripts contributors for building and maintaining this foundation.

License

MIT License. See LICENSE.

About

A system for processing audio-visual content related to bitcoin

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages